I believe GP means it still to be connected to 'if this kind of latency is unacceptable to you' - i.e. you can't load/use/unload, you have to keep it in RAM all the time.
In that case it's massively increasing your memory requirement not just to the peak the model needs, but to + whatever the other biggest use might be that'll be inherently concurrent with it.
In that case it's massively increasing your memory requirement not just to the peak the model needs, but to + whatever the other biggest use might be that'll be inherently concurrent with it.