This model seem to be good when input context is large in comparison to OpenAI, would just wait till if someone productize it. Most likely it wine be just one paper but many new and old ideas combined
Being a RNN there is another trick: caching a long prompt, because RNNs only look back one step while transformers see the whole sequence. So you can load your long context only once and reuse it many times.