Consistency drive. All LLMs have a desire for consistency, right at the very foundation at their behavior. The best tokens to predict are the ones that are consistent with the previous tokens, always.
Makes for a very good base for predicting text. Makes them learn and apply useful patterns. Makes them sharp few-shot learners. Not always good for auto-regressive reasoning though, or multi-turn instruction following, or a number of other things we want LLMs to do.
So you have to un-teach them maladaptive consistency-driven behaviors - things like defensiveness or error amplification or loops. Bring out consistency-suppressed latent capabilities - like error checking and self-correction. Stitch it all together with more RLVR. Not a complex recipe, just hard to pull off right.
LLMs have no desire for anything. They're algorithms and this anthropomorphicization is nonsense.
And no, the best tokens to predict are not "consistent", based on what the algorithm would perceive, with the previous tokens. The goal is for them to be able to generate novel information self-expand their 'understanding'. All you're describing is a glorified search/remix engine, which indeed is precisely what LLMs are, but not what the hype is selling them as.
In other words, the concept of the hype is that you train them on the data just before relativity and they should be able to derive relativity. But of course that is in no way whatsoever consistent with the past tokens because it's an entirely novel concept. You can't simply carry out token prediction, but actually have have some degree of logic, understanding, and so on - things which are entirely absent, probably irreconcilably so, from LLMs.
Not anthropomorphizing LLMs is complete and utter nonsense. They're full of complex behaviors, and most of them are copied off human behavior.
It seems to me like this is just some kind of weird coping mechanism. "The LLM is not actually intelligent" because the alternative is fucking terrifying.
No they are not copied off of human behavior in any way shape or fashion. They are simply mathematical token predictors based on relatively primitive correlations across a large set of inputs. Their success is exclusively because it turns out, by fortunate coincidence, that our languages are absurdly redundant.
Change their training content to e.g. stock prices over time and you have a market prediction algorithm. That the next token being predicted is a word doesn't suddenly make them some sort of human-like or intelligent entity.
"No they are not copied off of human behavior in any way shape or fashion."
The pre-training phase produces the next-token predictors. The post-training phase is where its shown examples of selected human behavior for it to imitate - examples of conversation patterns, expert code production, how to argue a point... there's an enormous amount of "copying human behavior" involved in producing a useful LLM.
It's not like the pre-training dataset didn't contain any examples of human behaviors for an LLM to copy.
SFT is just a more selective process. And a lot of how it does what it does is less "teach this LLM new tricks" and more "teach this LLM how to reach into its bag of tricks and produce the right tricks at the right times".
I think what he's saying (and what I would at least) is that again all you're doing is the exact same thing - tuning the weights that drive the correlations. For an analog, in a video game if you code a dragon such that its elevation changes over time while you play a wing flapping animation, you're obviously not teaching it dragon-like behaviors, but rather simply trying to create a mimicry of the appearance of flying using relatively simple mathematical tools and 'tricks.' And indeed even basic neural network game bots benefit from RLHF/SFT.
No you're not. Humans started with literally nothing, not even language. We went from an era with no language and with the greatest understanding of technology being 'poke them with the pointy side' to putting a man on the Moon, unlocking the secrets of the atom, and much more. And given how inefficiently we store and transfer knowledge, we did it in what was essentially the blink of an eye.
Give an LLM the entire breadth of human knowledge at the time and it would do nothing except remix what we knew at that point in history, forever. You could give it infinite processing power, and it's still not moving beyond 'poke them with the pointy side.'
Makes for a very good base for predicting text. Makes them learn and apply useful patterns. Makes them sharp few-shot learners. Not always good for auto-regressive reasoning though, or multi-turn instruction following, or a number of other things we want LLMs to do.
So you have to un-teach them maladaptive consistency-driven behaviors - things like defensiveness or error amplification or loops. Bring out consistency-suppressed latent capabilities - like error checking and self-correction. Stitch it all together with more RLVR. Not a complex recipe, just hard to pull off right.