Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This model seem to be good when input context is large in comparison to OpenAI, would just wait till if someone productize it. Most likely it wine be just one paper but many new and old ideas combined


Being a RNN there is another trick: caching a long prompt, because RNNs only look back one step while transformers see the whole sequence. So you can load your long context only once and reuse it many times.


Yup. This is commonly done in the community for the chat models as well (due to the huge amount of reuse for each reply)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: