Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Deep learning works, and we will solve the remaining problems. We can say a lot of things about what may happen next, but the main one is that AI is going to get better with scale

I'm not an AI skeptic at all, I use llms all the time, and find them very useful. But stuff like this makes me very skeptical of the people who are making and selling AI.

It seems like there was a really sweet spot wrt the capabilities AI was able to "unlock" with scale over the last couple years, but my high level sense is that each meaningful jumps of baseline raw "intelligence" required an exponential increases in scale, in terms of training data and computation, and we've reached the ceiling of "easily available" increases, it's not as easy to pour "as much as it takes" into GPT5 if it turns out you need more than A Microsoft.



The question is: For a given problem in machine intelligence, what's the expected time-horizon for a 'good' solution?

Over the last, say, five years, a pile of 50+ year problems have been toppled by the deep learning + data + compute combo. This includes language modeling (divorced from reasoning), image generation, audio generation, audio separation, image segmentation, protein folding, and so on.

(Audio separation is particularly close to my heart; the 'cocktail party problem' has been a challenge in audio processing for 100+ years, and we now have great unsupervised separation algorithms (MixIT), which hardly anyone knows about. That's an indicator of how much great stuff is happening right now.)

So, when we look at some of our known 'big' problems in AI/ML, we ask, 'what's the horizon for figuring this out?' Let's look at reasoning...

We know how to do 'reasoning' with GOFAI, and we've got interesting grafts of LLMs+GOFAI for some specific problems (like the game of Diplomacy, or some of the math olympiad solvers).

"LLMs which can reason" is a problem which has only been open for a year or two tops, and which we're already seeing some interesting progress on. Either there's something special about the problem which will make it take another 50+ years to solve, or there's nothing special about it and people will cook up good and increasingly convenient solutions over the next five years or so. (Perhaps a middle ground is 'it works but takes so much compute that we have to wait for new materials science for chip making to catch up.')


> we will solve the remaining problems

This is the part that really gets me. This is a thing that you say to your team, and a thing you say to your investors, but this isn't a thing that you can actually believe with certainty is it?


you need some amount of irrational definite optimism + knowing things others dont to be a good founder. that kind of reality distortion field is why sam is sam and we are here debating phrases on an orange website.


Related, I tongue-in-cheek believe that something analogous to the actual SCP object for a "reality distortion field" may in fact exist. There is zero good explanation for "Teflon Don" or the North Carolina Lieutenant Governor getting away with all the stuff they do while Al Franken got politically crucified.


The least-magical answer for that is that some people have fundamentally different ways of approaching the world, and certain things will be tolerated by certain sets of supporters.


With enough time it seems a reasonable assertion, but the key part is how much time. It feels like he thinks "any day now" where I think it'll be much longer. This all of course assumes that "the remaining problems" means to achieve human-like intelligence, which is perhaps the wrong problem to be solving in the first place. I'd rather have AI systems that don't have human flaws.


Why not? People believe in all sorts of weird stuff, theirs just happens to be one you don't agree with. Some people believe there are gods up in the sky that will smite them, and go to war with people that believe in a different god that will smite them for different reasons. Some people believe we landed on the Moon, others do not. What matters is what can you convince others to do based on your rational.


Mit dem nächste Kapitalrunde wird das alles in Ordnung kommen.


Scaling Improvement has never been Linear though. Every next gen model so far has required at least an order of magnitude increase in compute, sometimes several more. So it's not a new revelation and these companies are aware of that. Microsoft for instance is building a 100B data center for a future next generation model releasing in 2028.

If models genuinely keep making similar leaps each generation then we're still a few generations before "More than a Microsoft".


So at what point do the linear increases in capability not justify the exponential compute and data requirements, or when do we run out of resources to throw at it?


I never said I thought increase in ability was linear either. We're encroaching on phenomena that's genuinely hard to describe/put a number on but GPT-3 is worlds apart of 2 and it feels like 4 is easily ten times better than the OG 3. I can say Improvement lags behind compute somewhat but that’s really it.

That said, it's ultimately up to the people footing the bill isn't it ?


Microsoft seems concerned enough to greatly increase its size:

https://www.bloomberg.com/news/articles/2024-09-20/microsoft...


It's about stuff we don't know yet. From today's lens, the essay seems absurd. But I think it's hinging on continued discoveries that improve one or all of learning algorithms, compute efficiency, compute cost and applying algorithms to real world problems better.

5 years ago, I wouldn't have believed any of what exists today. I saw internal demos that showed 2nd or 3rd grade reading comprehension in 2017 and statements were made about how in the next decade, we will probably reach college level comprehension. We have come so far beyond that in less than half the time. Technology isn't about scaling incrementally and continuing on the same path using the same principles we know today. It's about disruption that felt impossible before - that feels like a constant to me now. Seeing everything I've seen in the last 20 years, it's going to continue to happen. We just can't see it yet.


Yes you are correct that jumps in intelligence were enabled by exponential increases in scale. That makes me more bullish on AI, not less. It suggests that we can continue exponentially scaling compute like we have done for the past few decades, and also get intelligence improvements from it.


As large as the absolute largest models are today, they are still microscopic compared to our brains. A 1.7T param model would only store an actual total of about 850 GB if fully saturated (4 bits of information per weight estimated for bf16 transformers), a lot less than a human brain with 150T synapses running in full analog precision. We need to scale the current gen of models at least another 10-100x to even reach the human level of complexity, something we'll be able to do in the next two decades.

And well then there's going beyond just text. Current multimodal models are basically patchwork bullshit, separately trained image/audio to text/embeddings encoders slapped onto an existing model and hoping it does something neat. Tokenization and samplers are likewise bullshit that's there to compensate for lack of training compute. Once we have enough to be able to brute force it properly with bytes in, bytes out, regardless of data format, the results should be way better.


> 150T synapses running in full analog precision

Analog systems are not known for being very precise- they're noisy, signals get corrupted easily- and that's why we prefer digital ones. As soon as we had the technology, we switched everything we could- audio and video recording, telephone calls, photography, to a digital medium. This makes me wonder if the seemingly extraordinary efficiency of artificial neural networks is simply due to the precision with which they can be trained.


With respect to brain activity, how do we know its really noise, and not just layers of meaning--or at least purpose--which we don't yet understand?

If straightforward binary signaling was so universally superior, I think the worldwide network of over a quintillion ruthlessly self-replicating nanobots would be using a much more heavily after the last billion years.


Electrical analog systems are since there's 1001 ways to create or conduct away electricity that adds noise, chemical ones might not suffer nearly as much too.

There's also an interesting bit I've observed with LLMs, quantization in range of 4-8 bits doesn't reduce performance as much as it mainly reduces consistency. If you generate an answer a bunch of times and take the average you'll end up with roughly the same result as an fp16/32 would do every time. In nature, being inconsistent is usually weighed negatively with death... so that's probably why it hasn't caught on even if it is more efficient. Or this is enough of a different abstraction that we can't draw parallels anyway.


Or we are just the biological bootloader for further evolution of binary signalling.


Comparing a human brain in these terms makes it incredibly obvious how inefficient the human brain actually is. A 1.7T model can answer questions about practically anything. You say a human brain has 150T params. So what? It struggles to give masterful answers in even 1 domain, let alone dozens/hundreds. We need to stop comparing parameters and synapses as if they actually matter, because AFAIK, they really don't.


Well once again it turns out that what is hard for people is easy for computers, and vice versa. The things we go to college for 6 years for they can (relatively) master in a week of pretraining. We are optimized to smartly kill things, eat them, and reproduce, that's what machines will beat us at last lol. Right now a human expert is still obviously better in depth, but nowhere close in breadth. Probably not for much longer though, at least on the historical time scale.

And granted a lot of parts of the human brain are dedicated to specific jobs that are entirely nonexistent in a normal LLM (kinematics, vision, sound, touch, taste, smell, autonomic organ control) so the actual bit we should be comparing for just language and reasoning is way smaller. Still the brain is pretty efficient inference energy wise, it's like the ultimate mixture of experts, extreme amounts of sparse storage and most of it is not computed until needed. The router must be pretty good.


OTOH humans can e.g. walk on two feet and drive a car.


Boston Dynamics and Waymo might not have gotten human levels of competency with those two particular tasks, but we've already got robots that are better than drunk/tired/angry humans at it, and they're getting better at it.


In terms of energy use the human brain is way more efficient than LLMs. It's a completely different hardware model - the brain may have trillions of synapses but they only fire occasionally. I agree you have to compare more on the results than number of synapses etc.

It's something that gives me pause on the idea that we have to build many GW of power stations. It may be possible to get much more energy efficient AI via better algorithms.


> Comparing a human brain in these terms makes it incredibly obvious how inefficient the human brain actually is

Until you have AGI you can't say this since until then we don't know how much the different parts costs to replace with AI systems.


> Comparing a human brain in these terms makes it incredibly obvious how inefficient the human brain actually is. A 1.7T model can answer questions about practically anything. You say a human brain has 150T params. So what? It struggles to give masterful answers in even 1 domain, let alone dozens/hundreds. We need to stop comparing parameters and synapses as if they actually matter, because AFAIK, they really don't.

Meanwhile the best class AI is trained on the output of human intelligence. When AI can learn by itself in the same way humans do (or even more efficient ways) that's when we can say human intelligence has been surpassed. Until then it's just tools for humans to use to augment their intelligence.


XAi seems to be able to dump 10-20x more compute into their Grok models each time. Don't see any signs this is slowing down...


> But stuff like this makes me very skeptical of the people who are making and selling AI.

What is there to be skeptical of? OpenAI made their current product using a 10G$ investment plus a few they are not disclosing, and now they will start to do it at scale.

Perfectly normal stuff.

By the way, what's the World's GDP again?


Even "we will solve the remaining problems" is... perhaps unduly optimistic.

At a minimum, we could ask for the evidence.


I'm not here to defend sama, but certain things cannot be proven until they arrive - they can only be extrapolated from existing observations and theoretical limits.

Imagine the Uranium Committee of early 40's, where Szilard and others were babbling about 10kg of some magical metal exploding briefly with the power of a sun, with the best evidence being some funky trail in an alcohol vapor chamber.

Maybe sama is right, maybe not, but the absence of evidence is not evidence of absence.


I'm sure you know that people in the AI community have been predicting big things ever since, I don't know, the 1970s? It's only 10 years away again. This time it's for real, right?


Alchemists predicted the transmutation of metals into gold for centuries, and on a sunny day in the 20th century, it arrived (a bit radioactive, but still).

Unless the human brain is made of some sacred substance, the worst-case scenario is that we will extrapolate current scanning methods into the future and run the scanned model in silica. I'm not recommending this "just for fun," but the laws of physics don't forbid it.


If you are comparing AI to alchemy, a subject that after thousands of years still isn’t delivering on its promises (even with the assistance of modern technological magic), then surely you can see how that’s something of a self-own.


The transmutation of uranium into plutonium and the synthesis of medically useful isotopes proceeds successfully.


That's not alchemy.

When we are successfully turning base metals into gold, hit me up.


I concede.


>Alchemists predicted the transmutation of metals into gold for centuries, and on a sunny day in the 20th century, it arrived (a bit radioactive, but still).

So is Sam Altman the modern day alchemist? Making predictions based on faulty methods and faulty understanding (per your gold example)?

What will happen is that we'll shift the economy around based on inflated tech promises and ruin people's lives. No big deal I guess.


> So is Sam Altman the modern day alchemist?

Alchemists were early scientists who later branched into fields like chemistry, mathematics, and physics (Newton explored alchemy).

Altman leads a team of experts in neural networks, programming, and HW design. While he might be mistaken, dismissing him outright is difficult.


The AI predictions based on Moore's law type reasoning by Kurzweil, Moravec etc have been pretty accurate and not subject to the it's always 10 years ahead thing.


Oh ok. I thought we were talking about the article (or at least claims that are just as bold):

"Deep learning works, and we will solve the remaining problems."

"It is possible that we will have superintelligence in a few thousand days (!)"


It was more in reply to "people in the AI community have"... Which some of them have but the Moravec type stuff has been quite accurate.

Technically a few thousand days covers quite a range. 20 thousand is 55 years.

On the Kurzweil graph, extrapolating hardware progress from 1900 through 2000, superintelligence seems to be roughly 2035, depending on how you define things. https://www.researchgate.net/figure/Kurzweils-8-71-chart-of-...


Is GPT-4 not a “big thing in AI”?


It's extremely spicy autocomplete and it burns astonishing amounts of natural resources to reach even that lofty peak


I have little doubt that even when we have superintelligent AI solving science and such problems way beyond humans it will still be dismissed as extra spicy autocomplete.


Right. Certain things cannot be proven until they arrive. Maybe sama is right, maybe not. But his certainty is misleading.


I agree. He's probably been conditioned by experience to speak with confidence until proven wrong ("strong opinions, weakly held"), but I don't like it either. Oh... the lost art of saying, "In my opinion."


Hah, atomic power is a great point of comparison: people in the "atomic age" expected atomic power to be everywhere. Electricity too cheap to measure, cars/planes/appliances all powered by small nuclear reactors... That's without going into the real nonsense like radium toothpaste.

And here we are today where nuclear energy is limited to nuclear weapons, a small set of military vehicles and <10% of the world's electricity production. Not nothing, sure, but nothing like past predictions either.


Last I checked the giant nuclear fusion reactor in the sky is driving an substantial increase in solar energy.

The toothpaste and similar products were pretty ill advised, vaseline and uranium glass are still collectable and are seeing a ressurrence of new interest: https://old.reddit.com/r/uraniumglass/


"They laughed at Columbus, they laughed at Fulton, they laughed at the Wright brothers. But they also laughed at Bozo the Clown."


You should _always_ be skeptical of what’s said by anyone who is trying to sell something.

Like everything else being sold, the marketing is 95% BS.

LLMs are amazing and wonderful tools, but me thinks we’re near a plateau in capability. Now investors are pumping for ROI before that becomes evident.

After we reach the plateau in capabilities, the next phase is cutting production and operating costs to maximize margins.

I’m the meantime, expect the marketing to get increasingly cringe until the bubble bursts.


Progress might be logarithmic in compute, but compute (transistors/sqinch and transistors/$) is growing exponentially with time.

Despite what skeptics have been saying for decades, Moore's Law is alive and well - and we haven't even figured out how to stack wafers in 3 dimensions yet!



Sure - we are just beginning to scratch the surface of 3d stacking.


Oh wow! Could you please share what processors are exponentially faster than those of 10 years ago? I'm not seeing any here: https://www.cpubenchmark.net


Macbook Airs have 20 billion+ transistors, compared to 50 million on the Pentium 4 in the early 2000s. Moore's law is about transistor density, not processor speed, which is gated by thermal limits.


Transistor count has consistently been increasing by about 10% a year over the last decade.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: