This model seem to be good when input context is large in comparison to OpenAI, ...

visarga · on May 23, 2023

Being a RNN there is another trick: caching a long prompt, because RNNs only look back one step while transformers see the whole sequence. So you can load your long context only once and reuse it many times.

pico_creator · on May 23, 2023

Yup. This is commonly done in the community for the chat models as well (due to the huge amount of reuse for each reply)