Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently. It is rather obvious at this point, but some companies are too far in and not cash rich enough so they have to keep the LLM dream alive.
> Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently
This is the problem. The vast majority of people over-hyping LLMs don't even have the most basic understanding of how simple LLMs are at core (manifold-fitting the semantic space of the internet), and so can't understand why they are necessarily dead ends, theoretically. This really isn't debatable for anyone with a full understanding of the training + basic dynamics of what these models do.
But, practically, it remains to be seen where the dead end with LLMs lies. I think we are clearly approaching plateaus in both academic research and in practice (people forget or are unaware how much benchmarks are being gamed as well), but, even small practical gains remain game-changers in this space, and much of the progress / tradeoffs we actually care about can't be measured accurately yet (e.g. rapid development vs. "technical debt" from fast but not-understood / weakly-reviewed LLM code).
LLMs are IMO undebatably a theoretical dead end, and for that reason, a practical dead end too. But we haven't hit that practical dead end yet.
Why are LLMs a theoretical dead-end? I understand the "manifold-fitting the semantic space of the internet", but I don't understand "why they are necessarily dead ends, theoretically."
If I had to steelman a counterargument, I'd handwave about RL and environments creating something greater-than the semantic space of the internet, and then highlight the part you mention where we haven't reached a practical dead-end. Maybe link out to the Anthropic interp work on them planning-in-advance via poking at activations when working on a rhyming poem.
I should clarify that LLMs trained on the internet are necessarily a dead end, theoretically, because the internet both (1) lacks specialist knowledge and knowledge that cannot be encoded in text / language, and (2) is polluted with not just false, but irrelevant knowledge for general tasks. LLMs (or rather, transformers and deep models tuned by gradient descent) trained on synthetic data or more curated / highly-specific data where there are actual costs / losses we can properly model (e.g. AlphaFold) could still have tremendous potential. But "LLM" in the usual, everyday sense in which people use this label, are very limited.
A good example would be trying to make an LLM trained on the entire internet do math proofs. Almost everything in its dataset tells it that the word "orthogonal" means "unrelated to", because this is how it is used colloquially. Only in a tiny amount of math forums / resources it digested does this actually mean something about the dot product, so clearly an LLM that does math well only does so by ignoring the majority of the space it is trained on. Similar considerations apply for attempting to use e.g. vision-language models trained on "pop" images to facilitate the analysis of, say, MRI scans, or LIDAR data. That we can make some progress in these domains tells us there is some substantial overlap in the semantics, but it is obvious there are limits to this.
There is no reason to believe these (often: irrelevant, incorrect) semantics learned from the entire web are going to be helpful for the LLM to produce deeply useful math / MRI analysis / LIDAR interpretation. Broadly, not all semantics useful in one domain are useful in another, and, even more clearly, linguistic semantics clearly have limited relevance to much of what we consider intelligence (which includes visual, auditory, proprioceptive/kinaesthetic, and, arguably, mathematical abstractions). But, it could well be that curve-fitting huge amounts of data from the relevant semantic space (e.g. feeding transformers enough Lean / MRI / LIDAR data) is in fact all we need, so that e.g. transformers are "good enough" for achieving most basic AI aims. It just is clearly the case that the internet can't provide all that data for all / most domains.
EDIT: Also Anthropic's writeups are basically fraud if you actually understand the math, there is no "thinking ahead" or "planning in advance" in any sense, literally just if you head down certain paths due to pre-training, yes, of course, you can "already see" weight activations of future tokens: this is just what curve-fitting in N-D looks like, there is no where else for the model to go. Actual thinking ahead means things like backtracking / backspace tokens, i.e. actually retracing your path, which current LLMs simply cannot do.
> so clearly an LLM that does math well only does so by ignoring the majority of the space it is trained on
There are probably good reasons why LLMs are not the "ultimate solution", but this argument seems wrong. Humans have to ignore the majority of their "training dataset" in tons of situations, and we seem to do it just fine.
It isn't wrong, just think about how weights are updated via (mini-)batches, and how tokenization works, and you will understand that LLM's can't ignore poisoning / outliers like humans do. This would be a classic recent example (https://arxiv.org/abs/2510.07192): IMO because the standard (non-robust) loss functions allow for anchor points .
I'm not sure about the dead end thing because you may be able to add on to them?
In human terms LLMs seem similar to talking without thinking but we can also think as a separate activity to waffling on.
In AI research terms, DeepMind have done some interesting things with Mind Evolution and AlphaEvolve, the latter being the one that came up with a more efficient matrix multiplication algorithm.
I agree but I think to be fair it seems that there’s an open question of just how much more we can get from scaling / tricks. I would assume that there’s agreement that e.g. continual learning just won’t be solved without a radical departure from the current stack. But even with all of the baggage we have right now, if you believe extrapolations we have ~2 GPT4->5 sized leaps before everyone has to get out of the pool
"I'm sure there's a lot of people at Meta, including perhaps Alex, who would like me to not tell the world that LLMs basically are a dead end when it comes to superintelligence" - Yann LeCun
I've been following Yann for years and in my opinion he's been consistently right. He's been saying something like this for a long time while Elon Musk and others breathlessly broadcast that scaling up would soon get us to AGI and beyond. Mark Zuckerberg bought in to Musk's idea. We'll see, but it's increasingly looking like LeCunn is right.
More like Yann had a long time to prove out his ideas and he did not deliver, meanwhile the industry passed Meta/Facebook by due to the sort of product-averse comfortable academic bubble that FAIR lived in. It wasn’t Zuckerberg getting swindled it was giving up on ever seeing Yann deliver anything other than LinkedIn posts and small scale tests. You do not want to bank on Yann for a big payoff. His ideas may or may not be right (joint predictive architectures, world modeling, etc), but you’d better not have him at the helm of something you expect to turn a profit on.
Also almost everyone agrees the current architecture and paradigm, where you have a finite context (or a badly compressed one in Mamba / SSM), is not sufficient. That plus lots of other issues. That said scaling has delivered a LOT and it’s hard to argue against demonstrated progress.
As I said in my cousin comment, it depends on how you define AGI and ASI. Claude Opus 4.5 tells me "[Yann LeCun] thinks the phrase AGI should be retired and replaced by "human-level AI." which supports my cousin comment
I don’t know I assume not but everyone has a product that could easily be profitable, it would just be dumb to do it because you will lose out to everyone else running at a loss to capture market share. I just mean the guy seems to have an aversion to business sensibility generally. I think he’s really in it for the love of the research. He’s of course rightly lauded for everything he’s done, he’s extremely brilliant, and in person (at a distance) very kind and reasonable (something that is very different than his LinkedIn personality which is basically a daily pissing contest). But I would not give him one cent of investment personally.
> He's been saying something like this for a long time [...] it's increasingly looking like LeCunn is right.
No? LLMs are getting smarter and smarter, only three years have passed since ChatGPT was released and we have models generating whole apps, competently working on complex features, solving math problems at a level only reached by a small percentage of the population, and much more. The progress is constant and the results are stunning. Really it makes me wonder in what sort of denial are those who think this has been proven to be a dead end.
If you call that AGI as many do or ASI, then we are not talking about the same thing. I'm talking about conversing with AI and being unable to tell if it's human or not in kind of a Turing Plus test. Turing Plus 9 would be 90% of humans can't tell if it's human or not. We're at Turing Plus 1. I can easily tell Claude Opus 4..5 is a machine by the mistakes it made. It's dumb as a box of rocks. That's how I define AGI and beyond to ASI
We are due for much more optimizations and new deep learning architectures rather than throwing more compute + RAM + money + GPUs + data at the problem, which you can do only for so long until a bottleneck occurs.
Given that we have seen research from DeepSeek and Google on optimizing parts of the lower layers of deep neural networks, it's clear that a new form of AI needs to be created and I agree that LeCun will be proven right.
Instead of borrowing tens of trillions to scale to a false "AGI".
It's too soon to say anything like that is proven. Sure, AGI hasn't been reached yet. I suspect there's some new trick that's needed. But the work going into LLM's might be part of the eventual solution.
> but it's increasingly looking like LeCunn is right.
This is an absolutely crazy statement vis-a-vis reality and the fact that it’s so upvoted is an indictment of the type of wishful thinking that has grown deep roots here.
If you are paying attention to actual research, guarded benchmarks, and understand how benchmarks are being gamed, I would say there is plenty of evidence we are approaching a clear plateau / the march-of-nines thesis of Karpathy is basically correct long-term. Short-term it remains to be seen how much more we can do with the current tech.
Your best bet would be to look deeply into performance on ARC-AGI fully-private test set performances (e.g. https://arcprize.org/blog/arc-prize-2025-results-analysis), and think carefully about the discrepancies here, or, just to broadly read any academic research on classic benchmarks and note the plateaus on classic datasets.
It is very clear when you look at academic papers actually targeting problems specific to reasoning / intelligence (e.g. rotation invariance in images, adversarial robustness) that all the big companies are doing is just fitting more data / spending more resources on human raters and other things to boost performance on (open) metrics, but that clear actual gains in genuine intelligence are being made only by milking what we know very well to be a limited approach. I.e. there are trivially-basic problems that cannot be solved by curve-fitting models, which makes it clear most current advances are indeed coming from curve(manifold) fitting. It just isn't clear how far we can exploit these current approaches and in what domains this kind of exploitation is more than good enough.
EDIT: Are people unaware Google Scholar is a thing? It is trivial to find modern AI papers that can be read without requiring access to a research institution. And e.g. HuggingFace collects trending papers (https://huggingface.co/papers/trending), and etc.
At present its only SWE's that are benefitting from a productivity stand point. I know a lot of people in finance (from accounting to portfolio management) and they scoff at the outputs of LLMs in their day to day jobs.
But the bizarre thing is, even though the productivity of SWE's is increasing I dont believe there will be much happening in regards to lay offs due to the fact that there isn't complete trust in LLMs; I dont see this changing either. In which case the LLM producers will need to figure out a way to increase the value of LLMs and get users to pay more.
Are SWE’s really experiencing a productivity uplift? When studies attempt to measure the productivity impact of AI in software the results I have seen are underwhelming compared to the frontier labs marketing.
And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.
I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).
This Stanford study on developer productivity found 0 correlation between developers assessment of their own productivity and independent measures of their productivity. Any anecdotal evidence from developers on how AI has made them more or less productive is worthless.
Yup, most progress is also confined to SWE's doing webshit / writing boilerplate code too. Anything specialized, LLMs are rarely useful, and this is all ignoring the future technical debt of debugging LLM code.
I am hopeful about LLMs for SWE, but the progress is currently contextual.
Even if LLMs could write great code with no human oversight, the world would not change over night. Human creativity is necessary to figure out what stuff to produce that will yield incremental benefits to what already exists.
The humans who possess such capability stand to win long-term; said humans tend to be those from the humanities and liberal arts.
> I've been following Yann for years and in my opinion he's been consistently right
Lol. This is the complete opposite of reality. You realize lecun is memed for all his failed assertions of what LLMs cannot do? Look it up. You clearly have not been following closely, at all.
Sure and that is fair. Seldom are extreme viewpoints likely scenarios anyways, but my disagreement with him stems from his unwarranted confidence in his own abilities to predict the future when he's already wrong about LLMs.
He has zero epistemic humility.
We don't know the nature of intelligence. His difficulties in scaling up his research is a testament to this fact. This means we really have no theoretical basis upon which to rest the claim that superintelligence cannot in principle emerge from LLM adjacent architectures--how can we make such a statement, when we don't even know what such thing looks like?
We could be staring at an imperative definition of superintelligence and not know it, nevermind that approximations to such a function could in principle be learned by LLMs (universal approximation theorem). It sounds exceedingly unlikely, but would you rather be comforted by false confidence or be told the honest truth of what our current understanding of the sciences can tell us?
human beings are estimated to use roughly 50 to 100W when idle (up to maybe 1000-2000W when exerting ourselves physically), and I think it's fair to say we're generally intelligent.
Something is fundamentally missing with LLMs w.r.t. intelligence per watt. assuming gpt4 is around human intelligence, that needs 2-4 H100s, so roughly the same and that doesn't include the rest of the computer.
That being said, we're willing to brute force our way to a solution to some extent so maybe it doesn't matter, but I say the fact that we don't use that much energy is proof enough we haven't perfected the architecture yet.
At 5 cents or less per kWH these days, 10 kW is 50 cents per hour, well below minimum wage. LLMs aren't AGI and I'm not convinced we're anywhere close to AGI, but they are useful. That the people deploying them have the same product instincts as Microsoft executives seems to be the core issue.
This being said in this setup of 2-4 h100 you’ll be able to generate with batch size of somewhere around 128 ie its 128 humans and not one. And just like that difference in efficiency isn’t that high anymore.
correct - h100 can do like 100 tokens per second on a gpt4 like model, but you'd need to account for regular fine-tuning to accurately compare to a person, hence 4 or so. of course the whole comparison is inane since computers and humans are obviously so different ha...
I don't get the anti-LLM sentiment because plenty of trends continue to show steady progress with LLMs over time. Sure, you can poke at some dumb things LLMs do as evidence of some fundamental issue, but the frontier capabilities continue to amaze people. I suspect the anti-LLM sentiment comes from people who haven't given a serious chance at seeing all the things they're capable of for themselves. I used to be skeptical, but I've changed my mind quite a bit over the past year, and there are many others who've changed their stance towards LLMs as well.
Or, people who've actually trained and used models in domains where "stuff on the internet" is of no relevance to what you are actually doing realize the profound limitations to what these LLMs actually do. They are amazing, don't get me wrong, but not so amazing in many specific contexts.
It'll steadily continue the same way Moore's law has continued for a while. I don't think people question the general trend in Moore's law besides the point where it's nearing the limit of physics. It's a lot harder to claim LLMs don't work as a universal claim, whereas claiming something is possible for LLMs only needs some evidence.
Lecun has already been proven wrong countless times over the years regarding his predictions of what LLMs can or cannot do. While LLMs continue to improve, he has yet to produce anything of practical value from his research. The salt is palpable, and for this he's memed for a reason.
Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently. It is rather obvious at this point, but some companies are too far in and not cash rich enough so they have to keep the LLM dream alive.
reply