My somewhat facetious take is that LLMs are trying really hard to reinvent RNNs ...

obblekk · on Oct 2, 2023

RNNs are the correct solution, but infeasibly expensive to run.

A different way to think about it is Transformer models are trying to predict which part of the RNN network is "worth" keeping given a resource constraint.

Transformers use a simple heuristic today (and this result makes the heuristic better). Just like many NP complete problems, there might be approximations that are not perfectly correct but still useful. Transformers prove that is the case for neural networks.

tkellogg · on Oct 2, 2023

One such project is RWKV[1]. On the open source leaderboard it lived in the middle of the board for a while, so it really is a legit approach, it's just not hot.

[1]: https://huggingface.co/blog/rwkv

swyx · on Oct 2, 2023

side note - do you think the open source leaderboard is a fair representation of the diversity of OSS models?

modeless · on Oct 2, 2023

It's the best we have for large scale comparison I think. But the major problem is that the tests are public and you can easily cheat by including them in your training set and increase your score an arbitrary amount. You might even be able to do it in a way that would be difficult to detect. And it can happen by accident as well. There are already many cases of intentional and disclosed contamination in the leaderboard.

What's really needed is a leaderboard based on some private test sets, where you can submit a model to be judged and some entity will run it on their own machines without disclosing the tests to you. Even that could be vulnerable to cheating if the submission process is automated and you can submit many times, as you could use the score as feedback to disclose information about the contents of the private test set. So it would take some care to run such a service, even beyond normal security concerns like securing the sandbox that submitted code runs in, etc.

xpe · on Oct 2, 2023

Who would you trust (and verify) to run such a leaderboard?

modeless · on Oct 3, 2023

Any of the big labs I guess. At least as long as the leaderboard didn't become something so important that even the big labs would be incentivized to cheat on it. In that case, I don't know.

anon291 · on Oct 2, 2023

I think many people believe you. The main advantage of transformers over RNNs is training parallelization. RNNs are hard because training suffers from vanishing gradients and also because it's hard to get full utilization (needs large batches to get good utilization).

The existence of models like RWKV indicates that there is potentially a future in training like a transformer but inferring like an RNN.

Nevermark · on Oct 2, 2023

Yes, indeedy.

Many things learned over the last three decades with smaller (the current terminology is "extremely tiny"! :) neural networks are being revisited for these large models.