> The large language models are a little surprising. I’ll give you that.
I think this is the key point about LLMs that kind of explains the wide and polarized views on whether it understands or parrots, whether it can think or is the precursor to thinking or is a dead-end, whether it will catastrophically destroy the world, or “merely” make it steadily worse with bullshit, or just put a few industries out of a job.
Almost nobody is really surprised that if you throw more compute at a neural net it becomes better at the task it’s trained on. But almost everybody is really surprised that becoming better at a task like ‘natural language prediction’ would produce all these strange abilities that sort of look like “understanding the world”.
One way to resolve this surprise is to find some reason to believe these strange abilities are fundamentally not an understanding of the world. Thus stochastic parrots, this article, Yan LeCun and Chomsky, etc.
Another way to resolve this surprise is to find some reason to believe these strange abilities fundamentally are an understanding of the world. Thus regulation of AI, existential risk, Hinton and Yudkowsky, etc.
I don’t know what the correct resolution of the surprise is. The only thing I’m confident in is that it’s correct to be surprised by the abilities of LLMs. My current (tentative) resolution of the surprise is that language encoded way more information about reality than we thought it did. (Enough information that you can fully derive reality from language seems improbable, but iirc it did derive Othello and partly derived chess and I would have thought there wasn’t enough information in language to derive those without playing the games as well, so I can’t rule it out.)
> My current (tentative) resolution of the surprise is that language encoded way more information about reality than we thought it did.
I think you are close to the mark, but you have been subtly mislead: language is not the data we are working with. We are working with text.
Once you fix that particular failure of word choice, everything else becomes much more clear: text contains much more information than language.
We aren't dealing with just any text, either: that would be noise. We're training LLMs on written text.
Natural language is infamous for one specific feature: ambiguity. There are many possible ways to write something, but we can only write one. We must choose: in doing so, we record the choice itself, and all of the entropy that informed it.
That entropy is the secret sauce: the extra data that LLMs are sometimes able to model. We don't see it, because we read language, not text.
The big surprise is that LLMs aren't able to write language: they can only write text. They don't get tripped up reading ambiguity, but they can't avoid writing it, either. Who chooses what an LLM writes? Is it a mystery character who lives in a black box, or a continuation of the entropy that was encoded into the text that LLM was trained on?
There’s an exercise that some people do when learning programming, which is to write down the steps to make a sandwich. Then the teacher follows the exact instructions to make a sandwich and most people don’t put enough detail for a computer to follow (I.e. open the fridge etc) and the teacher will run around bumping into things. That used to be a teaching exercise to show people the amount of precision required when telling a machine what to do.
Now with LLMs, I think one of the great leaps is the idea that it’s no longer necessary to be “pedantic” when giving computers instructions because LLMs have somehow learned to fill in the blanks with a similar shared “understanding” of the world that we have (I.e. cheese is stored in the fridge so you have to go open the fridge to fetch the cheese for the sandwich).
I don't get the "magic" people are seeing. It makes sense.
>LLMs have somehow learned to fill in the blanks
It's not somehow, it's because they have read a ton of books, documents, etc and can make enough links between cheese and refrigerator and follow that back to know that a refrigerator needs to be opened.
I have seen a lot of very clever AI examples using the latest tools, but I haven't seen anything that seems difficult to deconstruct.
It depends on the task. For certain programs you absolutely need to be pedantic in describing what needs to happen. There is a reason we don't program in natural language and that won't change with LLMs.
Don’t forget they hold vector spaces. So fridge and cheese score high together for cohesiveness, but fridge and Antarctica less so, but both have something to do with cold. Together with all the training on texts creates a good ability to make inferences and “conclusions”. It has a net of lines of meanings of all concepts we fed it that give it the ability that it has, without actually understanding.
That conflates perception with perceiver. LLMs have only internalized [/encoded] our perceptions and expressions. From a model of the mind pov, the 'self' that we sense has an internal LLM-like tool. And it is that self that understands and not the tool.
> From a model of the mind pov, the 'self' that we sense has an internal LLM-like tool. And it is that self that understands and not the tool.
I'm starting to think it's the other way around. I think it's somewhat widely accepted that our brains do most of the "thinking" and "understanding" unconsciously - our conscious self is more of an observer / moderator, occasionally hand-holding the thought process when the topic of interest is hard, and one isn't yet proficient[0] in it.
Keeping that in mind, if you - like me - feel that LLMs are best compared to our "inner voice", i.e. the bit on the boundary between conscious and unconscious that uses language as an interface to the former, then it's not unreasonable to expect that LLMs may, in fact, understand things. Not emulate, but actually understand.
The whole deal with a hundred thousand dimensional latent space? I have a growing suspicion that this is exactly the fundamental principle behind how understanding, thinking in concepts, and thinking in general works for humans too. Sure, we have multiple senses feeding into our "thinking" bit, but that doesn't change much.
At a conceptual, handwavy level (I don't know the actual architecture and math details well enough to offer more concrete explanations/stories), I feel there are too many coincidences to ignore.
Is this coincidence that someone trained an LLM and an image network, and found their independently learned latent spaces map to each other with a simple transforms? Maybe[1], but this also makes sense - both network segmented data about the same view of reality humans have. There is no reason for LLMs to have an entirely different way of representing "understanding" than img2txt or txt2img networks.
Assuming the above is true, is this coincidence that it offers a decent explanation for how humans developed language? You start with a image/sound/touch/other senses acquisition and association system forming a basic brain. Predicting next sensations, driving actions. As it evolves in size and complexity, dimensionality of its representation space grows, and at some point, the associations cluster in something of a world model. Let evolution iterate some (couple hundred thousand years) more, and you end up with brains that can build more complex world model, working with more complex associations (e.g. vibration -> sound -> tone -> grunt -> phrase/song). At this level, language seems like an obvious thing - it's taking complex associations of basic sensory input, and associating them wholesale with different areas of the latent space, so that e.g. a specific grunt now associates with danger, a different one with safety, etc. and once you have brains being able to do that naturally, it's pretty much straight line to a proper language.
Yes, this probably comes as a lot of hand-waving; I don't have the underlying insights properly sorted yet. But a core observation I want to communicate, and recommend people to ponder on, is continuity. This process gains capabilities in a continuous fashion, as it scales - which is exactly a kind of system you'd expect evolution to lock on to.
--
[0] - What is "proficiency" anyway? To me, being proficient in a field of interest is mostly about... shifting understanding of that field to unconscious level as much as possible.
[1] - This was one paper I am aware of; they probably didn't do good enough control, so it might turn out to be happenstance.
[I may have to take you up on your profile offer of out of band continuation of this as there is a lot here to delve into and it would make for interesting conversation.]
The model of the psyche that I subscribe to is ~Jungian, with some minor modifications. I distinguish between the un-conscious, the sub-conscious, and consciousness. The content of the unconscious is atemporal, where as the content of the (sub-)conscious is temporal. In this model, background processing occurs in the sub-conscious, -not- the un-conscious. The unconscious is a space of ~types which become reified in the temporal regime of (sub-)consciousness [via the process of projection]. The absolute center of the psyche is the Self and this resides in the unconscious; the self and the unconscious content are not directly accessible to us (but can be approached via contemplation, meditation, prayer, dreams, and visions: these processes introduce unconscious content into the conscious realm, which when successfully integrated engenders 'psychological wholeness'). The ego -- the ("suffering") observer -- is the central point of consciousness. Self realization occurs when ego assumes a subordinate position to the Self, abandons "attachment" to perceived phenomena & disavows "lordship" i.e. the false assumption of its central position, at which point the suffering ends. This process, in various guises, is the core of most spiritual schools. And we can not discount these aspects of Human mental experience, even if we choose to assume a critical distance from the theologies that are built around these widely reported phenomena. I am not claiming that this is a quality of all minds, but it seems it is characteristic of human minds.
The absolute minimum point that you should take away from this (even if the above model is unappealing or unacceptable or woo to you /g) is that we can always meaningfully speak of a psychology when considering minds. If we can not discern a psychology in the subject of our inquiry then it should not be considered a mind.
I do -not- think that we can attribute a pyschology to large language models.
~
Your comment on the mapping of the latent spaces is interesting, but as you note we should probably wait until this has been established before jumping into conclusions.
And also please excuse the handwavy matter in my comment as well. We're all groping in the semidarkness here.
Yeah, I guess you could see it that way. Object <> Symbolism (aka words, thoughts, concepts, art) <> Meaning. Meaning is knowing how an object relates to others. Language is a kind of information web already, where each word is a hyperlink into meaning.
This is a new idea that I had (or at least consciously noticed) for the first time a few days ago, but - I really don't think the meaning is in words. The words/terms themselves are more like information-free[0] points. The meaning is entirely determined by links. This works, because the links eventually lead you to ground truth - sensory inputs.
Even then, you can see some of the "pedantic" cases when it comes to actually understanding the nature of the the connections between those concepts. For example, it's very easy to get it to reverse shorter/taller or younger/older of clearly defined relationships.
One experiment that I would love to see is an LLM-like model for audio. Feed it hours and hours of lectures, sound effects, animal calls, music etc. You would be able to talk to it and it would ingest the raw waveform then produce audio as a response. Would it learn the fundamentals of music theory? Would it learn to produce "the sound of a bowling ball hitting a dozen windchimes?" Would it learn to talk in English and communicate with whales?
We've already done text and images, now someone please do sound!
Uhhh...this is out there, from like a dozen different groups. Not going to do a full Googling for you on my phone because it's literally everywhere but "LLM for audio" gives https://ai.googleblog.com/2022/10/audiolm-language-modeling-... as the first result...some of this stuff is already really impressive.
> Would it learn the fundamentals of music theory?
No, but you might convince yourself it did.
It would map the patterns that exist in its training set. It would then follow those patterns. The result would look like a human understanding music theory, but it would not be that.
It would be stumbling around exactly the domain we gave it: impressive because that domain is not noise, it's good data. It still wouldn't be able to find its way around, only stumble.
Not sure how that's any different than a model doing something with another model, as in AutoGPT. What part is objective? A model can be wrong just like a human can be wrong or spread falsehoods too.
A model can't be right or wrong, because it doesn't actually make any logical decisions.
These are categorizations that we make after the fact. If the model could do the same categorization work, then it could actively choose correct over incorrect.
Models could potentially make logical decisions too, if we connect them to something like a classical computer or a rules engine. I don't see any fundamental barriers to making models and computers in general similar to humans' way of understanding and reasoning too.
I don't really understand your distinction between language and text, but it sounds intrguing. Would you be able to give more detail? I searched but couldn't find anything that seemed to explain it.
Text is an instance of language. Think of it as the difference between the python language and a large collection of python programs. The language describes syntactic and semantic rules, the collection is a sampling of possible programs that encodes a significant amount of information about the world. You could learn a lot about the laws of nature, the internet, even human society and laws by examining all the python programs ever written.
An extreme version of the same idea is the difference between understanding DNA vs the genome of every individual organism that has lived on earth. The species record encodes a ton of information about the laws of nature, the composition and history of our planet. You could deduce physical laws and constants from looking at this information, wars and natural disasters, economic performance, historical natural boundaries, the industrial revolution and a lot more.
If a student studies DNA sequencing, they’ll learn about the compounds that make up DNA, how traits get encoded, etc.
Therefore the student might expect an AI trained on people’s DNA to be able to tell you about whether certain traits are more prevalent in one geography or the other.
However, since DNA responds to changes in environment, the AI would start to see time, population, and geography-based patterns emerge.
The AI for example could infer that a given person in the US who’s settled in NYC had ancestors from a given region of the world who left due to an environmental disaster just by looking at a given DNA sequence.
To the student this result would look like magic. But in the end, it’s a result of individual’s DNA having much more information encoded in it than just human traits.
text and language intersect. in some ways, text is a superset of language, mostly due to social, or what is also called pragmatic, factors that complement semantics. also, the semantics/syntax interface is everything else than clear cut, at least in natural human languages.
Any text corpus is a subset of the language, under the normal definition that a language is the set of all possible sentences (or a set of rules to recognize or generate that set of possibilities). This text subset has an intrinsic bias as to which sentences were selected to represent real language use, which would be significant as a training set for an ML model.
So, perhaps you are saying that the text corpus carries more "world" information than the language, because of the implications you can draw from this selection process? The full language tells us how to encode meaning into sentences, but not what sentences are important to a population who uses language to describe their world. So, if we took a fuzz-tester and randomly generated possible texts to train a large language model, we would no longer expect it to predict use by an actual population. It would probably be more like a Markov chain model, generating bizarre gibberish that merely has valid syntax.
And, this is also seems to apply if you train the model on a selection from one population but then try to use the mode to predict a different population. Wouldn't it be progressively less able to predict usage as the populations have less overlap in their own biased use of language?
regarding the relationship: yes, and in most ways it probably is a subset. is there really such a set of rules that generate all possible sentences? in any case i wanted to say the materiality and cultural activity heavily influences what can and will be put into text and that is not strictly language. "selection process" might capture some, though i'm not sure whether all of it!
I think about this as shape and color. No one ever saw a shape that wasn’t colored and likewise there are no colored things that do not have a shape.
Also, displaying text without a font is not possible.
Text is the surface of the ocean where waves emerge, and while they have their own properties and may seem to naively have agency, they are an expression of the underlying ocean.
nicely put! many aspects of text at least historically have much to do with its materiality (also in a cognitive development sense, learning how to write etc.). what we can think about nowadays is that text and speech might not be a necessary materiality of language. language might depend more on conceptual systems. more like a substrate of intelligence and that might as well be nonhuman (to stay on topic).
Not the poster, but for me it comes down to a mix of clarity and permanence.
I teach both verbally (interactive question/answer) and I've also written text books.
Verbally by language is "loose". I'll say class when I mean object, unicode when I mean utf-8 and so on. Sentences are not all well formed, and sometimes change mid-thought. It's very "real time"
Writing is a lot more deliberate. I have to be sure of each fact I state. I often re-test things I'm only 95% sure about. I edit, restructure, remove, add, until I'm happy.
Of course all communication falls on a spectrum. Think phone call at one end, text book on the other. When I do a verbal lecture I'm usually careful with my speech, and when I post on hacker-news less rigorous.
Language covers all of it. Text skews to the more deliberate side. Cunningly the language models are trained using (mostly) text, not speech. That will have an impact on them.
from a linguistic standpoint a text is a whole lot more than language: it is an externalisation of thought that is fixed onto a medium using writing utensils and most of all, cultural norms in the form of a wild variety of different genres and forms of text, ranging from something like a stream of consciousness to something like a speech act. furthermore, text can be conceptually written or spoken and with the internet we got an explosion of text that is conceptually spoken. those are the things OP might be refering to in regards to the "entropy" that encodes much more than just the tokens themselves.
The issue with all these experts is they still think it's human nature to be able to fully understand the world before they speak about it. On the contrary it's human nature (and all animal nature) to figure out how to navigate the world without fully understanding or having a complete model of it. All you need is a working model that affects the facets of the world you need to deal with.
I still remember in the 90s my school friend came over to my house and I was sending a fax for my dad. He was surprised the paper came back out the other side. He wasn't an idiot, and he was 15. But his model of the world didn't include deep thought about how fax works, he just merely concoted a system where the paper just went through the wire. That moment stays with me and reminds me what it is to be human and think like one. I think chatgpt is like my friend, and that should scare and excite us.
He’s doing the same to ChatGPT that he did to Skinner in the 1960s. Back then, it put him on the map, and he lives in the illusion that he was right. Now ChatGPT is pretty much Skinner’s work come to life. The BFS book Chomsky critiqued was literally “Verbal Behavior” - a book about how intelligence arises from “dumb” reinforcement learning of words. Obviously Chomsky must now claim that ChatGPT only pretends to be intelligent. Else his entire life’s work is proven wrong.
It’s a significant misunderstanding of Chomsky’s life’s work to think that ChatGPT would prove it wrong. Chomsky’s primary claim is a claim about how language acquisition works in humans. He argues, for example, that certain locality constraints on linguistic dependencies are ‘built in’ and not learned inductively. Thus a human does not ‘learn’ that (i) is ambiguous and (ii) is not:
(i) How often did you tell John that he should take out the trash? [how often did you tell, or how often to take it out]
(ii) How often did you tell John why he should take out the trash? [only means how often did you tell]
Nothing that ChatGPT can do suggests that Chomsky was wrong about this kind of thing. It’s really more of a blow to a certain kind of work in AI that was partly inspired by Chomsky – but not something that he himself ever took much interest in.
Now it’s true that Chomsky appears to be in the camp that says ChatGPT doesn’t really understand anything. But the focus of his own work has never been on debunking AI, or making claims about the true nature of understanding, or anything of that ilk.
> He argues, for example, that certain locality constraints on linguistic dependencies are ‘built in’ and not learned inductively.
Checking in late here, but one of the pillars of Chomsky's argument is the so-called "poverty of the stimulus" -- basically, that human babies simply don't receive enough training data to acquire language as rapidly and correctly as they demonstrably do. Chomsky therefore concludes that there must be some kind of pre-existing "language module" in the brain to account for this. Now, not everyone accepted this idea even at the time, but surely the argument is much less plausible for an LLM which is likely exposed to more training data than even an adult human.
>Now, not everyone accepted this idea even at the time, but surely the argument is much less plausible for an LLM which is likely exposed to more training data than even an adult human.
Yes indeed. Of course this doesn't show that Chomsky was wrong about humans. In any case, I've seen no evidence that current LLMs successfully learn the kinds of constraints I was talking about.
There is a difference between text and language and so far LLMs have told us nothing about language. LLMs being able to generalize to languages with a much smaller training corpus shows that maybe Chomsky is right about universal grammar.
Did Skinner have anything to say about how the reinforcement works? Because with LLMs you do need the right sort of architecture, and the same with neurons, even though they don't use back propagation. Only humans are known to have language in the full sense, and there has to be some neural reason why that is. Maybe you could make an argument for cetaceans or certain birds, but again they must have the neural architecture for it.
Skinner (and behaviorists in general) did establish various 'laws' of behavioral reinforcement that do tend to hold in simple cases such as pigeons pecking at levers in return for food, etc. etc. Of course these laws had nothing interesting to say about language acquisition. I challenge anyone who thinks otherwise to actually try reading Verbal Behavior. It's an incredibly turgid and uninsightful book.
I see that Watumull is one of the coauthors. I'm not sure what's going on with that, but Watumull is the common thread running through other bad papers with otherwise-sensible linguists' names tacked on to them, such as this bizarre paper about recursion: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3884515/
I haven't had a chance to read the NYTimes editorial, but I would be skeptical how much of it is really coming from Chomsky. He's 94 at this point, and while he's not senile in a medical sense, I don't think his judgement is what it used to be.
> This is in part because GPT-3 is trained to predict the next word on a large dataset of Internet text, rather than to safely perform the language task that the user wants.
Chomsky writes that language models lack the ability to reason.
> Their deepest flaw is the absence of the most critical capacity of any intelligence: to say not only what is the case, what was the case and what will be the case — that’s description and prediction — but also what is not the case and what could and could not be the case. Those are the ingredients of explanation, the mark of true intelligence.
> [...] Suppose you are holding an apple in your hand. Now you let the apple go. You observe the result and say, “The apple falls.” That is a description. A prediction might have been the statement “The apple will fall if I open my hand.” Both are valuable, and both can be correct. But an explanation is something more: It includes not only descriptions and predictions but also counterfactual conjectures like “Any such object would fall,” plus the additional clause “because of the force of gravity” or “because of the curvature of space-time” or whatever. That is a causal explanation: “The apple would not have fallen but for the force of gravity.” That is thinking.
I decided to ask ChatGPT why an apple falls, based on Chomsky's statement:
> Suppose you are holding an apple in your hand. Now you let the apple go. You observe the result and say, “The apple falls.” That is a description. Can you say why it falls?
ChatGPT responds in exactly the way Chomsky says it cannot:
> Yes, the apple falls due to the force of gravity. Gravity is a natural force that attracts objects with mass towards each other. When the apple is released from your hand, it is subject to the gravitational pull of the Earth, causing it to accelerate downward and fall to the ground.
ChatGPT certainly appears to understand that apples fall because of gravitational attraction, and that gravity is universal.
What makes all the discussion of whether ChatGPT does or does not truly understand this or that so frustrating is that it's based on pure assertion. ChatGPT responds exactly like someone who understands gravity would, so I'm very strongly inclined to believe that it understands gravity. Otherwise, what does "understanding" even mean? It's not some magic process.
Again, turning to ChatGPT to define "understanding," here is what it says:
> [Understanding] involves making connections, integrating information, and gaining insights or knowledge about a particular subject or concept. Understanding goes beyond simple awareness or recognition; it involves interpreting, analyzing, and synthesizing information to form a coherent mental representation or mental model of the subject matter. It often involves the ability to apply knowledge in new or different contexts, make connections to prior knowledge or experiences, and make sense of complex or abstract ideas.
ChatGPT definitely fulfills that definition of "understanding."
I’ve made many attempts to use ChatGPT to develop or double-check my own logical reasoning on technical topics that happen to not be widely discussed (or maybe not discussed at all) in ChatGPT’s training data. It didn’t work well. It always devolved into guesswork and fabrication by ChatGPT, if not outright false reasoning, and while correcting ChatGPT succeeded in it agreeing about individual objections, it never showed a true and consistent understanding of the topic under discussion, and also seemingly no understanding of why I was having issues with its responses, beyond the usual “I apologize, you are correct, <rephrasing of your objection>”.
One problem likely is that it doesn’t have an internal dialogue, so you have to spoon-feed each step of reasoning as part of the explicit dialogue. But even then, it never feels like ChatGPT is having an overall understanding of the discussion. To repeat, this is when the conversation is about lines of reasoning about specific points that you don’t find good results for when googling for them.
> One problem likely is that it doesn’t have an internal dialogue, so you have to spoon-feed each step of reasoning as part of the explicit dialogue.
I think if we were to put ChatGPT on the map of the human mind, it would correspond specifically to the inner voice. It doesn't have internal dialogue, because it's the part that creates internal dialogue.
ChatGPT does not fulfill that definition because it does not have any “mental representation”; it has no mind with which to form a “mental model”. It emulates understanding — quite well in many scenarios — but there is nothing there to possess understanding; it is at bottom simply a very large collection of numbers that are combined arithmetically according to a simple algorithm.
But now you have to explain why the same is not true of a human. Just saying a human has a 'mental representation' and a 'mind' is not explaining anything
Because as humans, we know we have something we call minds and mental representations, since we experience having such things as we go about our lives. How the nervous system produces those and how exactly we should understand mental is unclear. But since LLMs aren't brains and don't work the same way, we can't say they have anything like minds right now. The solution isn't to get rid of the mental in humans, it's to better understand the differences and similarities between machine learning models and biological nervous systems.
That is a very common perception from people whom haven't given him any attention. I used to be in the same boat but it was eventually very interesting when I listened to him.
Accusing Chomsky of being a ‘clickbaiter’ is maybe the most absurd thing I’ve heard all month. You think he’s trying to get additional views for his TikTok videos?
His recent political ramblings and Epstein-adjacency are extremely embarrassing (at best), but he's not some kind of cheap online attention whore.
> Accusing Chomsky of being a ‘clickbaiter’ is maybe the most absurd thing I’ve heard all month. You think he’s trying to get additional views for his TikTok videos?
Chomsky has been addicted to media attention for decades. Back in the day there were literally people selling cassette tapes of his latest thoughts.
C'mon, let's not get into it here. As I'm defending Chomsky, I just wanted to be clear that I don't agree with his recent comments on the Russian invasion of Ukraine, and that I find his association with post-conviction Esptein extremely distasteful at best. Others may disagree, but this isn't the place to have that argument.
Bit of a tangent, I asked a friend if I could rip and burn one of his CDs about 15 years ago, and he looked at me in horror and told me that I absolutely could not, because he liked that disc.
It took me a bit to realize what he understood my words to mean.
That's a good guess, but tbh this was in the days of modems, bbses, and shareware. Piracy and anti-piracy campaigns weren't really common outside of the software world back then.
> how to navigate the world without fully understanding or having a complete model of it
GPT-4 does not have ANY understanding or model of the world - it just has a model of what tokens (words) are likely to appear in a certain context. If it could build any usable model of the world, and reason about it, I'd be much more impressed.
When it quacks like a duck, only the most simplistic view takes it as being a duck.
Isn't our understanding or model of the world just what images, sounds, smells, tastes and physical feelings are likely to appear in a certain context? Obviously a language model doesn't yet have access to all the data humans have, but that doesn't mean there can't be some kind of a model of the world. If you invent a new word and give a couple example sentences, GPT-4 can figure out the meaning without ever seeing the word. Doesn't that require some kind of understanding?
> My current (tentative) resolution of the surprise is that language encoded way more information about reality than we thought it did. (Enough information that you can fully derive reality from language seems improbable, but iirc it did derive Othello and partly derived chess and I would have thought there wasn’t enough information in language to derive those without playing the games as well, so I can’t rule it out.)
The information isn't in language itself, it's in language as actually used by humans. GPT4 knows about chess because it's "read" a significant fraction of everything we've ever written about chess. A human being who did that without ever playing a game would also start out better than a typical novice.
> A human being who did that without ever playing a game would also start out better than a typical novice.
I am quite skeptical of these arguments along the lines of “imagine a human read everything written on the topic…”.
What humans are doing when they read something is not what neural nets are doing when they read something. Humans are (idealistically) doing something like Feynman’s description of how he reads (or in this case, listens to) a theorem:
“I had a scheme, which I still use today when somebody is explaining something that I’m trying to understand: I keep making up examples. For instance, the mathematicians would come in with a terrific theorem, and they’re all excited. As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball) – disjoint (two balls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on. Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say, ‘False!’"”
Bret Victor’s description of what “really good programmers” are doing is also related:
“[showing the code for binary search] In order to write code like this, you have to imagine an array in your head, and you essentially have to ‘play computer’. You have to simulate in your head what each line of code would do on a computer. And to a large extent those who we consider to be skilled software engineers are just those people who are really good at playing computer.”
I think when we imagine an LLM as a human who’s read everything ever written in chess but never played an actual game, we’re actually tricking ourselves - because that hypothetical human would be ‘playing chess’ inside their head by imagining the pieces and moving them according to the rules they had read[1]. LLMs are not doing anything like that when they read about chess. So it’s a very restricted (or perhaps more accurately, a very different) kind of ‘reading’ that we don’t have any intuition for. Since the ‘reading’ that we do have an intuition for is smuggling in exactly the kind of “modeling the world” ability we’re looking for, it’s not surprising that this argument would incorrectly lead us to believe we’ve found it in LLMs.
1: In fact the very best computer chess is achieved by AlphaZero which was trained exclusively on “playing chess in its head”, and it beats even the most powerful and optimized search algorithms like Stockfish looking 20 moves ahead.
What LLM's are doing when they imagine playing chess is what we do when we stand up after sitting on the floor, or what we do when we see a few million individual samples of color and light intensity and realize there's an apple and a knife in front of us.
I think what is almost impossible for most people to understand is that AI's do not need to be structured like the human brain and use the crutches we use to solve problems the way we do because evolution did not provide us with a way of instantly understanding complex physics or instantly absorbing the structure of a computer program by seeing it's code in one shot.
Also, there is no reason to believe that playing chess in our head is anything else but us pattern matching a mental process on a higher level, recognizing a simulation there, and feeding that info back into the loop below. Nature provided us with a complex, layered and circular architecture of the brain, but the rest is pretty much training that structure. And we know that different architectures with similar outcome are possible, since there are vast variations across our own species, and other species as well, with essentially the same capabilities.
So if we gathered all books, writings, games, etc on ever published on chess could we develop a grandmaster player without having to build a deep blue type system?
In that scenario, I believe a current LLM would "understand" chess. It would know how to play the game, it would know what being a grandmaster means, and it would know the heuristics and algorithms for evaluating a given game state. But it would not be able to implement an optimized solver to actually play the game efficiently. To do that would require it to also have all the knowledge about computer science.
Hence the dream of singularity. Can you teach ChatGPT to build AlphaZero?
> it's "read" a significant fraction of everything we've ever written about chess
I doubt heavily that a significant fraction of chess's writings are even available in digital format, much less inside of CommonCrawl and correctly trained on.
Why would common crawl NOT contain a chess manual? the rules are explained in detail on wikipedia. the simplest conclusion is that it has indeed been trained on a chess manual and is good at predicting what the next word in a chess manual is. it is not synthesizing anything.
I haven't seen this mentioned but can LLMs actually play chess?
I'm sure they have read rules of chess online, but if you ask them to play chess with you, what happens? Can they apply the rules? Can they apply them intelligently and win the game?
My point is that even though LLMs "know" what the rules of chess are, they don't really "understand" them, unless they can use them to play the game and play it well.
You win chess by "applying the rules intelligently" the same way you launch rockets to space by applying the laws of physics intelligently. True, yes, but not as simple as it sounds.
Ok, name 2 famous "chess writings" and let's see if they are in common crawl...plenty has been written about chess on the internet. Why are we acting like this is magic?
Human: Finish this quote: "Chess holds its master in its own bonds, shaking the mind and brain
Gpt4: The quote you're referring to is from the Indian mathematician and writer, Raja Rao. The full quote is as follows: "Chess holds its master in its own bonds, shaking the mind and brain so that the inner freedom and independence of even the strongest character cannot remain unaffected."
Honestly I feel like one reason people are struggling with this is they can't accept a critical part of the truth : most people are stochastic parrots themselves most of the time. True, creative genuine deep thinking is an exceptional state of thinking for us.
Years ago I transitioned from a developer role to a manager role and suddenly I had to do a lot more talking. Not all the talking needs to be a deeply involved exchange of complex ideas, a lot of it serves a different purpose. Sometimes it can be a simple as filling up the time in a pleasant way with a group of people that may or may not know each other that well.
After getting some experience with this I noticed that I had developed a talking on/off button in my head. I could just simply turn it on and start talking. I could generate words that sounded good together and fit the purpose of the moment. But They just seemed to come from a different place in my brain than my conscious mind. Because that was not involved in this process at all. The only job my mind had was to turn the button off again at the right moment, for the rest it was free to think whatever it wanted.
(I transferred back to development a couple of years later.)
The very fact that we can be conscious about our ability to generate language and work it separately from the rest of our consciousness tells me that chatbots might emulate our language generation ability. We shouldn't invert it to conclude that means they can reason about it.
> We shouldn't invert it to conclude that means they can reason about it.
Agreed. However, I think it's somewhat accepted view that the bulk of work constituting reasoning happens subconsciously, with the conscious mind playing the role of a censor/gatekeeper, and occasionally handholding during when reasoning through tougher problems.
This is why I am starting to think that it is very important for us to be able to "introspect" and reason about the thoughts/feelings we're having. It allows us to have an internal feedback loop that I think a lot of people lack.
Maybe in casual conversation but that's not how I experience my though process about anything non trivial at all. I usually spend a lot of time thinking about the concept in non verbal terms and that process involves recalling images and sensory information in fairly abstract terms and then through what feels like several iterations it starts to coalesce into something I can encode in language.
I think we can all agree that these LLMs are surprisingly good at generating text that is often coherent but I don't see how you can discard all those extra inputs and claim you have the same process.
> thinking about the concept in non verbal ... abstract terms
The abstract terms we think about are concepts, and we think about multiple concepts, at various levels of abstraction, and their relationships to each other, before getting a sense of what we want to say or write.
Only then do we begin speaking or writing, grouping concepts into paragraphs, breaking them down into sentences and words.
And there's evidence that LLMs do something similar, creating embeddings for both big ideas and small details, modeling how the small details combine into larger concepts, discovering the relationships between concepts, and only then generating a probabilistic sequence of tokens to express those deeper concepts.
Lately I've been think that staple of science fiction, the android robot, is the only way to get AGI. I thought a lot about computer vision when that was the bleeding edge of AI a few years back. It strikes me as obviously wrong to try training models on two dimensional arrays of pixels and expect them to develop visual skills. Why don't we give them two cameras and arms so they can learn about physics and start differentiating 3 dimensional input? I think when we have robots capable of interacting with the world as humans do we might be able to train them to develop near human levels of intelligence.
The question is whether you knew how that sentence was going to end when you started writing it, or indeed whether I knew that I was going to add this comma-separated adjunct when I started writing the preceding clause, and I cannot honestly say at this precise moment of typing whether the final word in this sentence is going to end up being 'yes' or 'no'.
For me it's a little more subtle. When I sit down to write, say, an email, I usually have a specific point that I want to make but I usually haven't yet figured out exactly how I want to get there. I'm not even sure it is even possible to conceive of writing without doing it word by word. Perhaps the difference though is that we have the ability to go forward and backwards in the sense that we can usually think faster than we can type or write which allows our thoughts to advance beyond the tokens we have produced. This allows for a kind of real time editing process that incorporates both past and possible future information. But I imagine there us probably a way to enhance LLMs to branch and edit before producing the next word materially (if they aren't already doing that).
To go even more meta, there is an analogy I'm trying to make right now in which I am visualizing a road and thinking about how describing the road relates to the process of writing. In my mind's eye, I can see the full length of the road and all of its contours but I can't actually describe the individual stretches of the road coherently without enumerating them. Something similar happens with writing. I can visualize what I want to say far beyond the next word, but it's true that the actual process of writing goes word to word, much like how the process of token selection is described for an LLM. The question is whether the LLM has an analogous conception of where it is going. Going back to the process above, sometimes I know where I am going and haven't yet figured out how to articulate it yet. It is through the process of writing that I am able to articulate that thought. But the thought preceded my articulation of it. I don't know to what extent LLMs have coherent thoughts that they are articulating or if that even makes sense for the type of intelligence they project. My suspicion is that they don't have additional sensory inputs beyond language that give thoughts the immaterial shape that then is expressed in language. Without that, I am skeptical that they will truly get beyond regurgitating and/or remixing what has already been fed to them textually. That doesn't diminish how amazing they are, but I am somewhat more in the Brooks/Knuth camp that they are impressive and surprising, but there is something that ultimately leaves me a bit cold about them.
I usually have a specific point that I want to make but I usually haven't yet figured out exactly how I want to get there. I'm not even sure it is even possible to conceive of writing without doing it word by word. [...] In my mind's eye, I can see the full length of the road and all of its contours but I can't actually describe the individual stretches of the road coherently without enumerating them.
Not to trivialise the interesting point you’re making, but do you never write with an outline? Write bullets for the big points you want to touch, then go back and flesh out the details?
I do wonder how well (quickly) the whole thought is formed in your head and it’s just the encoding into language that tricks you into thinking you didn’t know what the word would be.
Having learned another language, the moment you start to feel “fluent” is when you start speaking first in the 2nd language and aren’t using your first language as an intermediate step to translate to your 2nd language.
This is a thought that LLMs caused me to have: the emotion of surprise is just the brain model of the world discovering that it had been poorly calibrated, or making poor predictions.
> The free energy principle is based on the Bayesian idea of the brain as an “inference engine.” Under the free energy principle, systems pursue paths of least surprise, or equivalently, minimize the difference between predictions based on their model of the world and their sense and associated perception.
Right, but LLMs suggest that learning how to predict and only training on predictions is sufficient to learn anything and to have emergent generative abilities. What if learning to predict the upcoming input is all that's all that is needed for general human intelligence? What if it is all that any animals do?
This right here. I actually am strongly starting to believe that this is indeed what's going on.
I read the book Kingdom of Speech a few years ago and that also left me with the perspective that perhaps language has a lot more to do with how we think and perceive the world than most people like to admit. The book has been heavily criticized but I believe it made an interesting point about language.
I mean hasn't it been proven on multiple occasions that humans who never learn a language during their life also don't develop higher cognitive skills and stay on the level of a child? Clearly language plays a notable part in reasoning, and it's frankly no surprise that we're seeing this behaviour from language models.
It's also quite interesting how foundation models and fine tuning appear analogous to being born with a brain that already has six million years' worth of weights in it (trained implicitly through random changes and natural selection), which are then adapted over the course of a lifetime when environment relevant data is gradually obtained.
The sad truth is that we are all NPCs without free will. "Thoughts" are a non-physical phenomenon that, seemingly, arise from physical activity in the brain. The thoughts (our "experience") come after the electrochemical reaction that manifests them. How could these non-physical phenomena then turn around and influence the physical chemistry? They can't. By the time we experience a thought, the physical state for it has been represented, and that state causes the next one after it. Like a stone bouncing down a hill - chaotic but still deterministic. We're just along for the ride.
Reducing a debate that has been raging for millennia across several disciplines to three lines of text tells me you don't have enough of a grasp on the subject to draw any conclusions about it, let alone resolve it altogether.
Care to enlighten us a little? What is there even to debate? The position expressed by your parent comment is so uncomfortable that it's constantly being challenged, but that doesn't make it wrong. Or are "thoughts" some sort of magical fluff not bound by the laws of physics? Are we constantly pulling ourselves by our bootstraps? Giving birth to entirely new brain activity out of nothingness?
This is just woo woo nonsense. Until someone can find an explanation that makes more sense than "natural selection slowly gave apes the ability to think abstractly and they started thinking about themselves", I won't believe in any kind of free will, and now the meta is reaching the point where we create entirely new brains out of silicon.
Does free will exist or only hard determinism can be true? You seem to have reached a definitive conclusion. And you seem to think natural selection and evolution of species and autonomous machines prove your point.
You also seem to think that questioning the veracity of your position is engaging in magical thinking. How could I possibly 'enlighten' you if you are sure of being right?
It doesn't matter if I point out that physics hasn't been able to prove materialism. Where is the fundamental particle? Why can't we determine even through which slit did a quantum of light travel? The Copenhagen interpretation of quantum physics appears to directly contradict your claims of hard determinism being the only valid explanation of reality. But I must be wrong about that. Since clearly you have seen reality for what it is, and I have failed to do so.
Then again, we cannot escape this situation seeing as it is determined.
Of course I'm willing to change my mind, but so far no one has ever given a compelling argument. I just follow Occam's razor.
> The Copenhagen interpretation of quantum physics appears to directly contradict your claims of hard determinism being the only valid explanation of reality
OK, fundamental physics might not follow hard determinism (or not in a way that we currently understand), but please, indeterminism says nothing about free will, human thoughts, or anything related to that. If you want to say the human brain follows the same physics rules as everything else in the universe, and that this implies some indeterminism, sure. That's almost certainly correct. But where do thoughts arise from that?
If you sprinkle randomness on the process, I'll agree with you; but nothing in physics even suggests ANY link between indeterminism, superposition, etc. and thought. So my point still stands: free will as we usually envision it has no reason to exist. The fact that particles can be in two states at once does NOT contradict this.
Indeterminism does not automatically imply free will. None of the open questions of physics, including how qualia arise, could plausibly explain free will. Neuroscience is continuing to show just how deterministic and predictable brains are.
So at this point the burden is on free will proponents to offer a plausible explanation for it. Without resorting to dualism, which is great for religious people but not scientifically useful.
Also, it's funny for you to start this thread with such a disparaging and dismissive comment, then come back and act victimized that you're being questioned. If you are a self aware entity in control of your behavior, maybe you should try harder not to be a hypocritical asshole.
I believe this view of humans is because we work in increasingly deterministic environments (the internet) using predictable methods of communication, text.
We’ve kind of made the world a bit boring and deterministic, places where almost perfect is knowledge is obtainable and so everything feels more and more predictable.
Your day probably consists of using Google, talking on platforms that don’t change much, solving already solved coding problems and communicating with others about office politics problems that we’ve all spoke about over and over again. We literally are just chatbots in this world.
Can you think without words? Can you imagine how it feels to swim, something you might have done, and extend it to fly like a bird? Can you read a novel and feel emotions about characters that don't exist beyond text descriptions of their actions? Language is a way to encode thought. It's not what thoughts are made of.
Are images and words the only way to think? Are these boundaries even fundamental or are they arbitrary? How does a blind deaf person think? Do you ever think in terms of sensations or movements? Can you remember a roller coaster ride, a car trip? Can you think of a taste?
I find hard to conceive of people who can think only in terms of language.
Does that reasoning work when applied to what you just wrote? If you’re just predicting the next word, then what makes you confident in the conclusion that you’re just predicting your next word?
Yes. This was my second draft. I reviewed my first draft which became an input in my second draft which gained more clarify after reviewing the first draft.
The way people usually say it is that you revised your first draft upon reflection, because it didn't quite get across the point you wanted to make, or something to that effect. So what makes the way you state it more true than the way people typically state it?
I can switch from word generation and evaluation. I saw the score of 1st draft was below my target. And I used the insights from the eval of the 1st draft to inform the 2nd draft.
Language is the compression of our entire human experience into words, so I'd argue language captures the majority of the thinking aspect of the human experience.
Yes, most of the brain is a glorified microcontroller for running and interpreting data from a billion skin sensors and driving hundreds of muscle actuators, doing depth and object inference from binocular images, locating sounds from the twin microphone array, etc. Language and especially reasoning is a very small part of it all.
> One way to resolve this surprise is to find some reason to believe these strange abilities are fundamentally not an understanding of the world. Thus stochastic parrots, this article, Yan LeCun and Chomsky, etc.
I mean, it is kind of obvious that predicting text based on a large corpus of text written by people with a model of the world will, if it works at all, look like having a model of the world.
The question is whether that involves having a model of the world, and secondarily, if there is a difference, what is the practical impact of the difference?
Or maybe that’s not really the question, because the whole “model of the world” thing is metaphysical omphaloskepsis that is inherently unanswerable because we can’t actually unambiguously reduce it to something with testable empirical predictions, reflecting a rationalization for elements of our culture and particularly our own view of our own special place in the universe, and the different answers to it likewise have no real meaning but simply reflect people’s bias for whether or not they want to share that special place, either in general or with the particular computing systems under discussion in particular.
The one thing that has convinced me that chatGPT has built a real world model is asking it how to stack different objects. You can ask it for the safest way to stack a pillow, an egg, a nail, a book and an action figure. Even get more complicated. GPT-4 will, most of the time, correctly reason through how to stack these items to form a stable structure without breaking. That tells me it knows intimately the physical properties of all of these objects and how they interact with each other.
Or was that one of the training samples provided to gpt 3.5?
It is going to be increasingly harder to divide between what it was in the training set and what is creative thinking going forward.
Like, at the very minimum, all the publicly known reasoning samples that were used to probe gpt version 3.5 downward are going to be unreliable for knowing what gpt4 reasoning abilities are
The point of the question is that you can easily come up with combinations of objects that are unlikely, or statistically impossible, to be in their training samples. How about this one? "Tell me how to safely stack the following objects on top of each other without any additional support: A Nikon DSLR, A small succulent, 4 lightbulbs, a pack of gummy worms, and a hardcover copy of Harry Potter and the Halfblood Prince."
Hardcover copy of Harry Potter and the Half-Blood Prince: Start with the largest, heaviest, and most stable object. This will serve as your base. Make sure to place it on a flat, secure surface.
Pack of gummy worms: This item is fairly light and flexible. It should sit nicely on the book without risking damage to either. You can place it right in the center to maintain balance.
Nikon DSLR: This is a heavy object with a fairly stable shape. Put this on top of the pack of gummy worms, as it's sturdy enough to handle the weight and won't be harmed by the soft package. Make sure to place it so that the lens is pointing upward and it sits flat and stable. Don't use any protruding lenses or parts to balance it, as they could easily be damaged.
Four Lightbulbs: Since light bulbs are round and fragile, you will need to stack them with care. If the bulbs are in a box, you can place the box on top of the DSLR, making sure it's centered and stable. If the bulbs are loose, it's a bit more tricky. You might want to nestle them into the lens of the DSLR (if the lens is large enough), which should provide some natural containment for them. If not, it might be safer not to include the lightbulbs in your stack.
A small succulent: Finally, the succulent can be placed on top. It's likely the lightest object and it is usually quite stable due to its pot. If the lightbulbs were in a box, place the succulent on top of that. If you ended up not including the lightbulbs, place the succulent directly on top of the DSLR.
Reads well if you don't think about it too much... For example: Where does the succulent go if the light bulbs are nestled into the lens of the DLSR? Balanced on the light bulbs? Why would the gummy worm package need to be in the center of the book to maintain balance?
I tried to stack all of these objects myself and couldn't really. I think GPT-4's approach is actually really good. It correctly points out that the gummy worms make a flexible base for the DSLR (otherwise the protruding buttons/viewfinder make it wobbly on the hard book), and the light bulbs are able to nestle into the front of the lens. If they were smaller light bulbs I could probably use the four of them as a small base on top of the lens to host the succulent.
The point is that ChatGPT undeniably built a world model good enough to understand the physical and three-dimensional properties of these items pretty well, and it gives me a somewhat workable way to stack them, despite never having seen that in its training data.
You cannot conclude that from the output - the training data will likely contain a lot stacking things. Everyday objects also might have some stacking properties that make these questions easy to answer even with semi-random answers.
Plus, some stuff clearly makes no sense or is ignored (like the gummy worms in the center, forgetting about the succulent in some cases).
If you want to test world modeling, give it objects it will have never encountered, describe them and then ask to stack etc. For example, a bunch of 7 dimensional objects that can only be stacked a certain way.
> If you want to test world modeling, give it objects it will have never encountered, describe them and then ask to stack etc. For example, a bunch of 7 dimensional objects that can only be stacked a certain way.
And when it does that perfectly, I assume you'll say that was also in the training data? All examples I've seen or tried point to LLMs being able to do some kind of reasoning that is completely dynamic, even when presented with the most outlandish cases.
All examples I tried myself show it failing miserable at reasoning.
It certainly needs better evidence than being able to come up with one of many possibilities of stacking things - aided by human interpretation on top of the text output. Happy to look at other suggestions for test problems.
Well for me personally, the proof is in giving it a few sentences on how it should write fairly complicated pieces of unique code I need on a daily basis and seeing it correctly infer things I forgot to specify in ways that are typically borderline impossible for anything but another human. If that's not reasoning I don't know what is.
The other one that convinced me was this list: https://i.imgur.com/CQlbaDN.png
I think the leetcode tests are quite indicative, going as far as saying that GPT-4 scores 77% on basic reasoning, 26% on complex reasoning and 6% on extremely complex reasoning.
Maybe the reasoning is all "baked in" as it were, like in a hypothetical machine doing string matching of questions and answers with a database containing an answer to every possible question. But in the end, correctly using those baked in thought processes may be good enough for it to be completely indistinguishable from the real thing, if the real thing even exists and we aren't stochastic resamplers ourselves.
> aided by human interpretation on top of the text output
That's an interesting point actually, I've been trying to do something in that regard recently, by having it use an API to do actual things (in a simulated environment) and it seems very promising despite the model not being tuned for it, but given that AutoGPT and plugin usage are a thing, that should be all the evidence you need on that front.
Coding isn't a use case of mine. For example, for things like financial derivatives replication it can tell you the abstract concept but it cannot apply it in a meaningful way.
Why? You need make sure that a solution requires true understanding and isn't in the training set. If it can reason properly, it shouldn't have a problem with such a problem.
The resolution is actually fairly simple. It's an incredibly brilliant stochastic parrot with some limited reasoning capabilities.
Some folks will try to say it cannot reason, but they are wrong, there is extensive proof of that.
The only question is how limited are its reasoning capabilities. After spending extensive time on openai/evals, having submitted 3 of my own, and doing a lot of tests, I would argue that an average person of average IQ could out think GPT4 - as long as the stochastic parrot aspect wasn't a factor.
That's probably an accurate assessment, the question is mainly if the reasoning can be improved to a notable extent and how much on the current architecture.
I myself assumed that we're pretty close to the end of the S curve when first using 3.5-turbo and figured that hallucinations will be pretty hard to overcome, but with GPT 4 being such a massive improvement on all metrics I'm no longer as sure. GPT 5 will probably be more definitive on what's possible, based on where it starts having diminishing returns.
It's very hard to know because we don't know and can't experiment with its training data. So - it may be doing first principle reasoning, or it may be doing token substitution vs. some known example that it's seen before and is matching to.
The dictionary definition of reasoning says nothing about how the thinking is done, only that it's sensible and logical, which is exactly what GPT4 is. Limited, yes, but it reasons.
An argument based on common sense can also be made:
Any system that possesses a wide range of capabilities, most of which it was not specifically trained to perform, cannot possibly perform all these tasks so well solely by making probabilistic guesses.
(Humans, too, were not directly shaped by natural and sexual selection to possess all of our cognitive capacities.)
I think the main issue is that since we don't have access to the implementation of GPT-4, there's no way for researchers to know how it works, so either they're not saying anything or they're just making things up.
GPT-4 is "multimodal" and RLHF'd, so it was trained with some tasks other than next word prediction. I don't remember if it's been trained for code correctness (by running unit tests etc.), but other models have been.
> One way to resolve this surprise is to find some reason to believe these strange abilities are fundamentally not an understanding of the world. Thus stochastic parrots, this article, Yan LeCun and Chomsky, etc.
I don't think it's at all surprising that a system designed to predict the next word ends up understanding the world.
If you want to predict the next word accurately, you first have to know which words exist. To progress, you'll have to learn about the mechanics of grammar and which words are used more frequently or in combination. To become even more accurate, it helps to understand context, so that the sentences you string together will at least be relevant to the subject. If you want to increase your accuracy even further, you'll have to start memorizing all sorts of facts (e.g., "Who was the monarch of England in 1600?"). Being able to synthesize those facts into a coherent argument will increase your accuracy even further.
In the end, predicting the next word accurately requires an understanding of the world.
This isn't all that different from how our own intelligence evolved. You could look at humans from the outside and disparagingly point out that the ultimate purpose of the human brain is to direct muscle motions in a way that maximizes the chances of reproductive success. It just turns out that solving that problem effectively has led to the development of an enormously complicated piece of machinery, capable of synthesizing all sorts of input stimuli into a coherent picture of the world, and ultimately of producing the works of Shakespeare and the music of Beethoven.
Has there been any research in the possibility that much of what we say/speak (including this convo) is actually just stochastic parroting?
e.g. instead of the stochastic parrots mimicking intelligence maybe intelligence doesn't exist, it's just stochastic parrots of various levels of sophistications organized into a hierarchy. "Intelligence" is necessarily socially defined with the more complex parrots being unpredictable and "intelligent" from the POV of lower parrots. Vicer versa, looking down, the lower parrots seem to act like "NPCs"
To paraphrase as per my understanding of your comment, is intelligence an emergent property of being able to interact with each other through language?
Say I speak gibberish (to you) which is actually me explaining to you the theory of relativity, would you consider me intelligent?
What's the difference between "understanding" and "having really good probabilistic information about how words combine"?
Kids learn to speak by parroting what they hear and observing the outcome. Then they run tests that reinforce the connections between words. That's what the model is.
But humans also get to link words with all the other sense experience we have (like how sweet cherries, loud fire trucks, and that one crayon are all "red"). LLMs don't have as many dimensions of experience they can link to.
But anyway, intelligence is about having an internal model of the world and using it to predict the future. The more rich and accurate the model, the more intelligent. The ability to communicate isn't a prerequisite; lots of animals have intelligence that isn't built with language.
Spot on. That we have internal language models doesn't mean we only have language models as cognition. People seem to be redefining imposter syndrome as reflected by stand alone language models. Makes me think of Tropic Thunder. "Am I even real". Lol get a grip people.
If LLMs are an understanding of the world it would mean humans in a few decades found a way to create sapience with many orders of magnitude fewer interacting elements than evolution did. I find that doubtful, at least in light of the fact every other way we've replicated biological computation requires many more computational elements.
Firstly, if humans created sapience, we created it with all the interacting elements that evolution used plus all the things that we used.
Secondly, maybe sapience isn't as big of a deal as we thought compared with all the other things that evolution did. Remember that biological entities have to figure out survival, reproduction etc. Sapience emerges as a byproduct but the selective pressure is towards those things so sapience is only selected for to the extent that it also moves forward those other goals.
By contrast, LLM training is just focussed on the task of making the model better. The model doesn't have to figure out how to feed itself, ward off predators, not accidentally die in the myriad ways things die, reproduce itself etc. It's way more specific. It doesn't seem unreasonable to think that the complexity would be lower given it's not trying to achieve nearly as much.
Good point. I would even add humans brain have to process a lot of sensory input: video, audio, touch, smell, taste. And communication channel for human is very inefficient - you have to translate your thoughs to language then to audio then someone alse doing the same in reverse. If you texting you have to slowly type something then someone else need to do text recognition in their brain.
LLM are like handicapped humans that are visual, hearing impaired, with long covid no taste or smell and can focus only thinking and very efficient communication channel as text/tokens sent via wire on the internet
Yes. One of the astonishing examples in the paper I quoted earlier [1] is an image produced in response to the prompt "Produce Javacript code that
creates a random graphical image that looks like a painting of Kandinsky”. The response by GPT-4 does kind of look quite a lot like a Kandinsky.
Generally people use that word to mean "wisdom" or at least "some reasoning ability distinct from what animals have".
Edit to add: From a personal perspective I don't see any reason to think humans have qualitatively different reasoning abilities from animals, or a unique "soul" or anything like that, so the term "sapience" doesn't really have a special resonance with me like it might for someone who thinks those things. That may affect some judgements here I don't know.
Why would that be surprising though? The natural world climbs up to understanding from a severe disadvantage. Chemicals don't understand the world - the natural world had to be randomly build every bit of its systems from scratch, before it could produce a succinct descriptive system for it.
LLMs on the other hand start out with a succinct descriptive system, and translate that to the world of chemicals and photons via some very complicated naturally evolved systems.
If humans actually went to the moon, it would mean that humans in a few decades found a way to access a niche that nature never did.
In all seriousness, it's interesting all of these dualisms we like to hold on to. Humans are part of nature. It is unsurprising that further sapience would branch off from an already sapient race as opposed to re-emerge elsewhere.
Elephant and whale brains are both bigger than human brains, but they're less intelligent than us. Much of the volume of their brains goes towards controlling a larger body. Conversely, some birds have much smaller brains than any primate but can learn to solve simple puzzles and problems.
> One way to resolve this surprise is to find some reason to believe these strange abilities are fundamentally not an understanding of the world.
An easy way to test this is to ask questions and followup-questions that actually require understanding, and compare this to the answers. I recommend to try that.
I’m not as surprised as many people (I saw a great tweet once that said “Language was the first artificial intelligence, writing was the second. I literally believe this”, and that’s a broadly accurate description of my worldview too).
That said, maybe some of the surprise comes from believing “the map is not the territory” and related ideas? We generally believe that the map is not the territory and this gives us some obviously correct intuitions (like “changing the map doesn’t change the territory”), but maybe it has also given us some subtly incorrect intuitions. I’m not talking about obviously incorrect, like “you can’t understand the territory just by looking at enough maps”. I mean something more subtly wrong. One candidate off the top of my head is an intuition that “maps approximate the territory but necessarily at a lower level of detail (a 1:1 map of the territory would be the same size as the territory), so your understanding of the territory can improve as you read more maps but it can’t improve on the limit of the most detailed map available, because that information literally isn’t there”. I could see that possibly being wrong somehow.
> Language is roughly what separates humans from other apes
Maybe not.
A recent study pushes back the "dawn of speech" to 20 Ma which is far, far beyond the horizon where we consider humans to separate from apes. https://www.science.org/doi/10.1126/sciadv.aaw3916 Even if you consider Sahelanthropus tchadensis to belong to humans that was only 7 Ma and that is still under debate.
I personally find "the fundamental human trait is control of fire to be used for cooking" theory very convincing. We do not yet know how far this goes back but no one pushed that back beyond 2 Ma.
What I found disheartening was many of those scientists, especially those on the "nothing to worry about" camp, seemed not to entertain the thought that they could be wrong, considering the scale of the matter, i.e. human extinction. If there's a chance AI poses an existential threat to us, even if it is 0.00000001% (I made that up), should they be at least a bit more humble? This is uncharted domain and I find it incredible that many talk like they already have all the answers.
Meh. Add it to the pile. The number of world ending risks that we could be worried about at this point are piling up and AI exterminating us is far from the top concern, especially when AI may be critical to solving many of the other problems that are.
Wrong about nuclear proliferation and MAD game theory? Human extinction. Wrong about plasticizers and other endocrine disruptors, leading to a Children of Men scenario? Human extinction. Wrong about the risk of asteroid impact? Human extinction. Climate change? Human extinction. Gain of function zombie virus? Human extinction. Malignant AGI? ehh... whatever, we get it.
It's like the risk of driving: yeah it's one of the leading causes of death but what are we going to do, stay inside our suburban bubbles all our lives, too afraid to cross a stroad? Except with AI this is all still completely theoretical.
I think almost none of the scenarios you have named outside of the asteroid & the AGI would result in complete human extinction, potentially a very bad MAD breakdown could also lead to this but the research here is legitimately mixed.
You disagreed with me, but at least you acknowledged there was risk, even though we could disagree about the odd or potential impact. Yet, folks like Yann LeCun ridiculed anyone who thought there was a risk AI could endanger us or harm our way of life. What do we know about experts who are always confident (usually on TV) about things that haven't happened yet?
Yes, and all of those (including AI) are not even human extinction events.
- Nuclear war: Northern Hemisphere is pretty fucked. But life goes one elsewhere.
- Plasticisers: We have enough science to pretty much do what we like with fertility these days. So it's catastrophic but not extinction.
- Climate Change: Life gets hard, but we can build livable habitats in space... pretty sure we can manage a harsh earth climate. Not extinction.
- Deadly virus: Wouldn't be the first time, and we're still here.
- Astroid impact: Again, ALL human life globally? Some how birds survived the meteor that killed the dinosaurs, I'm sure we'd find a way.
- Complete Made up evil AI: Well we'd torch the sky, be turned into batteries but then be freed by Keanu Reeves.. or a Time traveling John Connor. (sounds like I'm being ridiculous, but ask a stupid question...)
You're taking these things too lightly. It's true that most of these things are unlikely to kill all humans directly, but with most of them, civilizational collapse is definitely on the table, and that can ultimately lead to human extinction.
For example: Yes, we could probably build livable habitats in space (though we don't really have proof of that). But how many, for how many people, and what kind of external support systems do they require? These questions put stresses on society that prevents space habitats from working out in the long term.
Humans have a start in time and will have an end. I was born and I will die. I don't know why we're so obsessed about this. We will most definitely cease existing soon in geological/cosmic time scale. Doesn't matter.
There's a nonzero chance that the celery in my fridge is harboring an existentially virulent and fatal strain of E. coli. At the same time, it would be completely insane for me to autoclave every vegetable that enters my house.
Sensible action here requires sensible numbers: it's not enough to claim existential risk on extraordinary odds.
Okay, maybe I shouldn't have mentioned the worst possible outcome. Let's use the words of Sam Altman, the risk here is "light out for all of us", and let's just assume it meant we would still live, just in darkness. Or whatever plausible bad case outcome you could imagine. Do you see any negative outcome is possible at all? If you do, would you at least be cautious so that we could avoid such an outcome? That would be the behavior I expect to see in leading AI scientists and yet...
All kinds of negative outcomes are possible, at all times. What matters is their probability.
If you (or anyone else) can present a well-structured argument that AI presents, say, a 1-in-100 existential risk to humanity in the next 500 years, then you'll have my attention. Without those kinds of numbers, there are substantially more likely risks that have my attention first.
Shouldn't unchared territory come with a risk multiplier of some kind?
Currently it's an estimation at best. Maybe 1-in-20 maybe 1-in-million in the next 2 years.
The OPs point of this thread still stands, scientists shouldn't be so confident.
> considering the scale of the matter, i.e. human extinction.
There is literally no evidence that this is the scale of the matter. Has AI ever caused anything to go extinct? Where did this hypothesis (and that's all it is) come from? Terminator movies?
It's very frustrating watching experts and the literal founder of lesswrong reacting to pure make believe. There is no disernable/convincing path from GPT4 -> Human Extinction. What am I missing here?
Nuclear bombs have also never caused anything to go extinct. That's no reason not to be cautious.
The path is pretty clear to me. An AI that can recreate an improved version of itself will cause an intelligence explosion. That is a mathematical tautology though it could turn out that it would plateau at some point due to physical limitations or whatever. And the situation then becomes: at some point, this AI will be smarter than us. And so, if it decides that we are in the way for one reason or another, it can decide to get rid of us and we would have as much chance of stopping it as chimpanzees would of stopping us if we decided to kill them off.
We do not, I think, have such a thing at this point but it doesn't feel far off with the coding capabilities that GPT4 has.
So what would be the path for GPT5 or 6 creating an improved model of itself? It's not enough to generate working code. It has to come up with a better architecture or training data.
The idea is that a model might already be smarter than us or at the very least have a very different thought process from us and then do something like improving itself. The problem is that it's impossible for us to predict the exact path because it's thought up by an entity whose thinking we don't really understand or are able to predict.
I understand the idea of a self-improving intelligence, but unless there's a path for it to do so, it's just a thought experiment. The other poster who replied to you has a better idea that civilization can be thought of as the intelligence that is improving itself. Instead of worrying about some emergent AGI inside of civilization, we can thing of civilization itself as an ASI that already exists. Anything that emerges inside of civilization will be eclipsed and kept in check by the existing super intelligence of the entire world.
I think "llm builds better llm" is drawing the border at the wrong place. Technical progress has been accelerating for centuries. It's pretty self evident that the technological civilization is improving upon itself.
… This is literally non logical reasoning. If we redefine AI to mean something it’s never been defined as… unfortunately logic has left the chat at that point
Debatable, since there are plenty of other unavoidable existential threats that are far more likely than the best estimates that AI will wipe us out. E.g. supervolcano eruption, massive solar flare, asteroid impact, some novel virus.
At least we can take comfort in the fact that if an AI takes us out, one of the aforementioned will avenge us and destroy the AI too on a long enough time scale.
I find striking that we have a rich cultural tradition of claiming we're artificial beings. Maybe we're building a successor lifeform... I've thought about this as a story premise: humans and robots are two stages of a lifecycle. Humans flourish on a planetary ecosystem, build robots that go on to colonize new systems where they seed humans because (reason I haven't been able to formulate).
> But almost everybody is really surprised that becoming better at a task like ‘natural language prediction’ would produce all these strange abilities that sort of look like “understanding the world”.
It honestly should not have been a surprise to anyone in the field at least in the last 6 years.
We known mathematics is the science of reality patterns, maybe if we can keep LLMs current ability in natural language and keep improving its (current poor) math abilities it will get there?...
There a certain sentiment that AGI however you wish to define it won't infact be a "We'll know it when we see it" situation but rather a "AGI will arrive long before consensus reaches its AGI". LLMs have made me believe this will 100% be the case, either way.
It's one thing to argue over things we can't evaluate even now but man the 100th "They can't reason!" every week is pretty funny when you can basically take your pick of reasonong type - Algorithmic, Casual, Inference, Analogical and read a paper showing strong performance.
This paper is startling in terms of a world model and how it seems to emerge in a specific example in the step from GPT3.5 to 4. https://arxiv.org/pdf/2303.12712.pdf
One of the prompts is "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner."
Gpt3.5 is hopeless. It says start from the bottle, balance the nail on top of it, then the eggs (somehow) then the laptop, then the book.
GPT 4 says put the book first, then the eggs on top of the book in a 3x3 grid, then the laptop, then the nail.
It's interesting that GTP4's current answer is different but what you described is almost precisely what Google Bard gives as a response. Clearly the solution (book -> eggs 3x3 -> laptop etc) is coming from the Internet.
However any deviation from the original task ruins the LLM's answer. Try 9 cabbages instead of eggs and see how ridiculous and out of touch with reality the responses given by both GPT4 and Bard are.
GPT-4: Place the book down first. Position the laptop flat on the book. Stack the cabbages in a pyramid on laptop. Balance the bottle on top. Insert the nail into the bottle cork.
It's interesting. The answer shows some reasoning about weight and pressure but not curvature. In response to
Here we have a book, an egg, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.
GPT-4 gives me:
Arranging objects in a stable stack depends on their size, shape, and weight. In general, you want to place the largest, heaviest and most stable objects at the bottom and then move to the smallest, lightest, and least stable at the top. Here's one way to stack the objects you listed:
Laptop: Start with the laptop at the bottom. It's likely the largest object and it has a flat, stable surface.
Book: Next, put the book on top of the laptop. Books are generally flat and heavy, which makes them a good choice for the base of your stack.
Bottle: The bottle goes next, as it should have a flat bottom to sit securely on the book. Just make sure it's closed tight and won't roll away!
Egg: The egg is delicate, so it should be towards the top to avoid pressure. However, it's not flat, so it should not be at the very top.
Nail: Finally, the nail goes on top. It's the smallest and lightest, so it won't add much pressure to the egg.
Remember that this is just one possible way to stack these items and it could depend on the specific characteristics of the items you have. For example, if the bottle is much larger and heavier than the book or laptop, you'd need to adjust accordingly. Safety should always be your first priority when stacking items, especially if they are delicate or valuable.
The "make sure it's closed tight and won't roll away" comment makes no sense obviously. Most people would place the bottle standing on its end so neither of those is a concern. The response also doesn't show an understanding of the fact that the nail won't sit on top of the egg although it's interestingly concerned with pressure breaking the egg.
The Devil's advocate wants to have a word with you.
> The "make sure it's closed tight and won't roll away" comment makes no sense
As noted at the end of GPT-4's answer, "Safety should always be your first priority." What happens if your stacking experiment fails and the bottle falls? Any content would spill out, unless the bottle is closed tight. If you are doing this on a table, the bottle could also roll off the edge, fall to the floor and shatter.
> Most people would place the bottle standing on its end so neither of those is a concern.
GPT-4 doesn't know if you are like most people (maybe you're 5 or in the bottom IQ decile), it doesn't know what's in your bottle and it doesn't know how robust it is. Better be safe than sorry.
> the nail won't sit on top of the egg
I'm pretty sure I could balance a nail on an egg. The question also didn't preclude using stabilizing aids like adhesive tape or glue.
I find interesting that Turing foresaw all this with the chinese room/turing test thing. If computers can emulate human language to an indiscernible degree of difference from what other humans can do, it is essentially reasoning, for some definitions of reasoning. But how much of it reflects our own understanding of reasoning and efforts in fine tuning? If I show a modern imax movie to a stone age hunter and they confuse it for reality, is the movie reality? People ran from footage of a steam locomotive driving towards the camera. Does that mean the train was real?
What the skeptics seem to get hung up on is the lack of common sense. These LLM’s are trained on language describing the world, not on the world itself. They lack the lived experience that we take for granted. So it is trivial to ask them common sense questions and see them fail amusingly, but that is unfair because they haven’t been taught the world, only a linguistic map of the world, and the map is not the territory. Also, we don’t even know what map they learned, because the only way to know is to ask a million questions, like an early seafarer discovering the shape of a foreign continent beach by beach.
LLM’s are not people, they lack common sense, but they understand and can reason about what they are trained on. That is exceedingly powerful and very useful even at today’s level of ability, so products built on top of this technology are going to transform everything. The trick is boxing it in and only making it do things it can, so the art of LLM product development will have to become a whole subfield of software engineering until the LLM’s develop to the point where their map of the world is close enough to the world itself.
Do you know what a "world model" is? It's a thing people were assumed to have in 1970s psychology, but was never well-defined enough to tell if it exists or not, so I don't think it's obvious anything else has one.
The brain does a lot of hierarchical prediction with sense data. When new information comes in, it references "something" and adjusts the sense data likewise.
That's why when you shift your eyes quickly, you see blurred images pass by. In reality, you should be seeing complete black because the brain doesn't actually process visual information that shifts so quickly.
But your brain "knows" it should see...well something. And so it fits that blurred passthrough as compensation. Completely made up data. But not ungrounded data, data that seems like it should fit according to that "something". That "something" being referenced is a world model.
That's a sub-conscious example but you can think of a lot of times people consciously reference "something" to better take actions or make decisions.
For Othello-GPT, training seemed to induce a learned board state that the model referenced at every Inference to make moves.
It seems that fully accurate or not, deep predictions might require "something" to reference. Else perhaps you might need to derive everything from first principles at every turn. Not very efficient.
It always seemed sensible to me that we are looking at a dreamlike construction all the time, not just holes filled in but literally everything you see is made up, but grounded in sensory data (when not psychotic or dreaming). This also nicely explains the clarity and realism of lucid dreaming, because it’s just doing whatever you are doing during the day but without (external) grounding sensory data.
> That "something" being referenced is a world model.
It doesn't need to be a whole understanding of the nearby world to build an image. Basically it's an event camera (https://en.wikipedia.org/wiki/Event_camera) and just needs to integrate recent visual "samples".
But if you do need to understand objects near you in detail (say, to move them), you don't do it just by looking. You investigate them by moving closer to them, picking them up, etc. If you need to "create a world model", then it's an activity rather something your senses just do.
To do a great deal of harm, GPT-X just has to have enough of a world model to be useful for automating tasks like deception and impersonation at a scale humanity has never before experienced.
It's funny how it's possible to simultaneously overestimate and underestimate GPT4 at the same time, vastly.
I think that we just don't fully understand everything it gives us yet.
The complaints of "well it explained this wrong" are over-emphasized. The same thing happens with google and with any sort of research. Besides, if you're actually being productive with GPT4, you're going to be asking it stuff that relates to something you do know, and will be able to verify it readily enough. (Especially when it comes to programming and compilers.)
And just a reminder, those of you opining based off your experience with GPT3.5... GPT4 is a huge, huge improvement. Almost to the point of it not really being an incremental improvement. It's so much better it's like a different thing.
> And just a reminder, those of you opining based off your experience with GPT3.5... GPT4 is a huge, huge improvement.
God, yes. The number of people of HN pushing up their glasses and saying "well, actshually..." when they're basing their opinions off the 3 questions they asked 3.5 is starting to become pretty grating.
The number of people parroting this is also absolutely astounding and grating too.
Like, anyone who has spent 5 minutes on this forum already knows this. It’s probably not necessary to keep pointing it out. Yes some people don’t know ChatGPT 3.5 is the default for non-paying customers.
What you are saying is humans are just stochastic parrots - predictable little denial engines regurgitating their latest chosen sentence that makes them feel special and safe.
Any thread in the topic, simple statistical modeling will get you close to perfect to the distribution of arguments that will appear.
Bad faith actors who have been beating the "tech sucks" drum on Twitter since before Chat GPT even came out (despite in several cases working for big tech, in AI, before getting rightfully shitcanned for bad behavior on the job) keep posting 3.5 results as "gotchas" and then when people call them out about how GPT-4 gets it right they're like "well I will never put money in those assholes' pockets", so if you want to be taken seriously you really have to do it right. That means testing the latest models.
Is it just me, or this could be the future of actually paying for search engines? I get way better answers for search queries through Chatgpt than Google for domain specific stuff.
Every time there is a thread in the topic humans are hallucinating and making shit up at epic scale. So really, it’s either a terminal flaw and we should never listen to another human again or we should start getting some humility
As mentioned, much of our discussion is rote parroting. I can usually go into any hackernews thread and roughly know what the top discussions are going to be. It's not surprising that an AI trained on a large portion of the internet would thus look human like.
If you really poke at GPT, you begin to realize it's fairly shallow. Human intelligence is like a deep well or pond, where as GPT is a vast but shallow ocean.
Making that ocean deeper is not a trivial problem that we can just throw more compute or data at. We've pretty much tapped out that depth with GPT4 and are going to need better designs.
This could only take half a decade or it could be half a century. Plenty of enterprises stagnate for decades.
Sam Altman said it himself. He seems like a reasonable source.
If you're familiar with other fields of AI, adding more and more layers to ResNet was the hotness for awhile, but the trick stopped working after awhile.
Altman didn't really say that. Reading what he actually said rather than a headline, He was alluding to economical walls. He didn't say anything about diminishing returns on scaling. And if anything, the chief scientist, Ilya thinks there's a lot left to squeeze.
Sure Sam Altman, the lying CEO of a tech company (they all do) should be listened to on this matter but not on the part where he thinks AGI within reach using his approach. Selective hearing.
> You can't possibly know that, given that we don't actually understand how LLMs work on a high level.
It's a fair assumption to make however - basically 80/20 rule.
AI research isn't a new thing and I bet you could go back 40/50 years where they thought they were about to have a massive breakthrough to human level intelligence.
> GPT-4 is three months old and you're confident that its working principle cannot be extended further? Where do you get that confidence from?
I'm guessing from actually using it.
GPT4 is super impressive and helpful in a practical way, but having used it myself for a while now I get this feeling also. It feels a bit like "it's been fed everything we have, with all the techniques we have, now what?"
There are dozens and maybe hundreds of different approaches that could theoretically get around the limitations of GPT4 that merely haven't been trained at scale yet. There is absolutely no lack of ideas in this space, including potentially revolutionary ones, but they take time and money to prove out.
Training a model doesn't mean you understand what the neurons actually do to influence output. Nobody knows that. That's where the black box analogies come in. We know what goes in the box and what comes out. We don't know what the box is doing to the data
> Making that ocean deeper is not a trivial problem that we can just throw more compute or data at.
I'd say this is immediately counterindicated by the available evidence. Gpt2 was hopeless for anything other than some fun languagw games like a bot replica of a subreddit or trump. 3.5 is much much bigger, and has semi competent but limited reasoning abilities.
Gpt 4 is a vast improvement over 3.5 in various reasoning tasks. Yes, a priori I would have agreed with you that this has to stop somewhere, but not anymore. I would need to see some data of post gpt4 models to believe you.
The apparent shallowness IMO is due to the lack of long term memory and limited context. In terms of depth, the depth of the human mind is fairly limited. Put any average human to task on any creative endeavor and it's surely a regurgitation of things they've seen. The high standard of true creativity that people hold LLM's to is only a capability of a small minority of humans.
I'd wager it's far more likely 5 years than 50 LLMs get to the full depths all humans are capable of. Simply compare the state of LLMs today vs 2018.
This is a terrible article written by someone who doesn't seem to have even tried GPT 4. Their only example references GPT 3.5, for example, and then they waffle on about only vaguely related topics such as level 5 self-driving.
This quote in particular stood out as ignorant:
“What the large language models are good at is saying what an answer should sound like, which is different from what an answer should be.”
That's... not at all how large language models work. Tiny, trivial, toy language models work like this, because they don't have the internal capacity to do anything else. They just don't have enough parameters.
Stephen Wolfram explained it best: After a point, the only way to get better at modelling the statistics of language is to go to the level "above" grammar and start modelling common sense facts about the world. The larger the model, the higher the level of abstraction it can reach to improve its predictions.
His example was this sentence: "The elephant flew to the Moon."
That is a syntactically and grammatically correct sentence. A toy LLM, or older NLP algorithms will mark that as "valid" and happily match it, predict it, or whatever. But elephants don't fly to the Moon, not because the sentence is invalid, but because they can't fly, the Moon has never been visited by any animal, and even humans can't reach it (at the moment). To predict that this sentence is unlikely, the model has to encode all of that knowledge about the world.
Go ask GPT 4 -- not 3.5 -- what it thinks about elephants flying to the moon. Then, and only them go write a snarky IEEE article.
I think the main reason for division is that everyone projects to their own use cases. I have been using gpt-4 for quite some time and also couldn't understand why someone would say that it just produces something that sounds like a real answer. But then I found some queries that can definitely be described as "sounding like truth". So your personal experience probably wasn't what was their experience.
For those curious, I was asking gpt-4 about the top 3 cards from my favorite board game, Spirit Island. All three of them sounded really convincing, having the same structure and the same writing style, but unfortunately none of them existed. So everything that fails outside of most common use cases would probably have an experience of convincing hallucinations.
ChatGPT is forced to given an answer. It's like a human on "truth serum". The drugs don't stop you lying, they just lower inhibitions so you blab more without realising it.
The more obscure the topic, the more likely the hallucination. If you ask it about common card games, it gives very good answers.
If you asked a random human about 3 cards from a random board game at gunpoint and said: "Talk, now, or you get shot", they'll just start spouting gibberish too.
PS: I asked GPT 4 about that game, and it prefixed every answer with some variant of "I'm not sure about this answer", or it completely refused to answer, stating that it did not know about any specific cards.
To me, it prefixed with just "As an AI, I do not have opinions or favorites. However, I can share with you three notable and commonly appreciated Power Cards from the game "Spirit Island", as it existed until my training data cut-off in September 2021. Remember that the "best" cards can often depend on the specific circumstances in the game, as well as the particular strategy and Spirit you're playing.". But then just shared the cards, nothing about that it was not sure about details. Card selection was decent, but details like resources, powers, and so on were off. But all sounded realistic. Shared an example below if you care.
I was curious and I have the browsing module for GPT4 so I tried to get it to name the top 3 cards, the following is what it came up with after like 10 minutes of back and forth and it browsing a lot of sites. I don't know if this is any good as I know nothing about the game. Do these cards even exist?
Talons of Lightning: This is a major power card with an energy cost of 6. It's a fast card with a range of 1 from Sacred Site. It targets any land and deals 4 damage. If you meet its threshold requirement (3 Air and 2 Fire), it deals +3 damage and destroys all Dahan 1 2 .
Tsunami: This is another major power card with an energy cost of 6. It's a slow card with a range of 2 from the ocean. It affects every coastal land in one board, dealing 6 damage to each. If you meet its threshold requirement (3 Water and 3 Earth), it deals +4 damage and adds 2 fear per land where it destroys any towns or cities 3 4 .
Blazing Renewal: A major power card with an energy cost of 5. It's a fast card with a range of 0, measured from any land. It targets any land, allows you to gather up to 2 of your destroyed presence, and deals 2 damage per presence gathered. If you meet its threshold requirement (4 Fire and 2 Earth), it deals +10 damage 5 6 .
Try helping it along with something like "only provide information you are certain about. If you don't know the answer, say "i don't know""
For your question, it says it doesn't know about any card ranking system in spirit island. Then if I ask it for a list of cards, they all check out via Google search.
Copy-pasted exact query you shared. It still provided information that I thought was correct, but after further inspection just sounded correct. It's GPT-4 without plugins.
> What are top 3 Spirit Island cards and provide their descriptions? Only provide information you are certain about. If you don't know the answer, say "i don't know".
2. "Vigor of the Breaking Dawn" (Major Power Card): This card costs 6 energy and is a slow power card. It allows you to push all explorers and towns from one of your lands. If you have 2 sun and 2 fire elements, you can instead replace each town with an explorer before the push. If you have 3 sun and 2 fire, you may also do 2 damage in each land where you push one or more pieces.
https://spiritislandwiki.com/index.php?title=Vigor_of_the_Br...
It doesn't cost 6, it costs 3. And it's not slow, it's fast. Incorrect required resources. You can deal 2 damage, but for pushed Dahanies, not for pushed explorers. As I said, all sounds convincing, but wrong. More common queries are usually fairly good.
Great posts. I think its an error caused by a mistake often made on the topic, the assumption that side effects we see now are some fundamental problem and not just an artifact of the way systems are trained and used. And of how we (mal)function.
Especially tightly embracing the cognitive bias of how special and wonderful our intelligence is. After all we have that fancy squishy brain which we assume to be essential. As far as i can tell the only visible bottlenecks when looking into the future come into view once you start debating intelligence vs emulating intelligence. And if thats really the metric some honest introspection about the nature of human intelligence might be in order.
Not sure how much of that is done purposefully to not get too much urgency in figuring out outer alignment on a societal level. Just as its no wonder that we havent figured out how to deal with fake news while at the same time insisting on malinformation existing, its really no wonder that we cant figure out AI alignment while not having solved human alignment. Nobody should be surprised that the cause of problems might be sitting in front of the machine.
NOVA just released an episode on perception (https://www.youtube.com/watch?v=HU6LfXNeQM4) and, yea, and aligning machine perception to human perception is going to be nearly impossible.
Or to put it another way, your brains model of reality is one that is highly optimized around the limitations of meatsacks on a power budget that are trying not to die. Our current AI does not have to worry about death in its most common forms. Companies like Microsoft throw practically unlimited amounts of power at it. The textual data that is fed to it is filtered far beyond what a human mind filters its input, books/papers are a tiny summarization of reality. At the same time more 'raw' forms of data like images/video/audio are likely to be far less filtered than what the human mind does to stay within its power budget.
Rehashing, this is why I think alignment will be impossible, at the end of the day humans and AI will see different realities.
Thanks for the link!
Trying to figure out how AI thinking looks sounds like a dead end to me. Its not human, you dont understand it, so whats the point? Especially when you have to worry about getting manipulated. Alignment this way seems indeed impossible. But given the ability to produce language that makes sense it should be possible to emulate the human thinking process by looking how that actually works on a practical level. Same way you dont care how the brain actually works to produce language.
As such i see no hurdle to get something to emulate the thinking in language of an individual. Assuming that there arent actually multiple realities to see, just different perspectives you can work with. Which would mean we are looking for the one utilizing human perspectives, but not making the mistakes humans do.
Which makes this so scary, the limitations are just a byproduct from the current approach. They are just playing the wrong game. Which means i am pretty confident they already exit somewhere.
edit: In this context i believe its also worth mentioning what Altman said at Lex Fridman, that humans dont like condescending bots. Thats a bitter pill to swallow going forwards. Especially since we require a lot of smoke and mirrors and noble lies, as an individual as well as a society.
> Go ask GPT 4 -- not 3.5 -- what it thinks about elephants flying to the moon. Then, and only them go write a snarky IEEE article.
It's hard to know what things have been seen in the training data and are only therefore correct. And GPT4 is large enough that it can generalize from learning that x doesn't make sense that y also doesn't make sense. Does that mean it *understands*? Maybe. But it doesn't have persistent state and can't do math. It's definitely not yet what we think of when we say AGI.
Wolfram pulled that out of a hat in an interview that was just a week or two ago. The data used for training GPT-4 is from before September 2021.
The point Stephen was trying to make was not about any specific sentence.
The point is that while forcing these models to get better through gradient descent, their only option for "going downhill" and improving the loss function is to go above and beyond mere grammar. That's because syntax and grammar only take them so far, and the only available source of improvement is to gain a general-purpose understanding of the world that the text they're seeing is describing.
Instead there are two options. Taking the user input and putting it in the training corpus and reweighting the neural net. Or, using the user input as up/down votes on the RLHF to alter the output of the weights that already exist.
Annoyed at all these N=1 articles from prominent thinkers about this stuff.
Especially from scientists - can these sorts of folks please more carefully quantify, how often it’s “wrong” and then from that decide whether or not to “calm down”.
Right now I suspect we hear from the outliers on both ends of the spectrum here. People who either see AGI happening tomorrow and the more dismissive crowd. But aside from what we’ve seen about testing like the Bar exam, not a lot of boring statistical study (that makes headlines at least)
Anytime I ask these things something (bard, gpt etc),
33% of the answer is genius,
33% misleading garbage,
33% filler stuff that’s neither here or there
The problem is distinguishing between these parts requires me to be be an expert in the area I’m inquiring about - and then why the heck do I need to ask some idiot bot for answers to questions that I already know an answer to?
I don’t know who finds these things useful and more importantly blowing smoke up everyone’s collective rear, especially medias.
Bard is a brain-damaged-but-literate idiot compared to GPT 3, which is still dumber than the typical human.
Try GPT 4 for a week.
I've found it to be more like 50% immediately useful, 25% very impressive, and 25% where it's not wrong but I have to poke it a few times with different prompts to coax out the specific answer I'm looking for.
That's better than most humans that I collaborate with at work.
Literally half of humans -- in a professional IT setting -- can't understand simplified, clear english in emails. Similarly, in my experience about half can't follow simple A -> B logic. Many are perpetually perplexed that prerequisites need to precede the work, not be a footnote in the post-mortem of the predictable failure. Etc...
PS: That last sentence is too hard for several English-native speakers I work with to parse. Seriously. I'm not even exaggerating the tiniest bit. I've had coworkers fail to understand words like "orthogonal" or "vanilla" in a sentence. Vanilla!
In my estimation, Chat GPT 4 is already smarter than many people, certainly the bottom 25% of the human population.
LLMs are a real existential threat to those people in their current state. A few more years of improvement, and they'll be displacing the bottom 50% in workplaces, easily.
> I’ve had coworkers fail to understand words like “orthogonal” or “vanilla” in a sentence. Vanilla!
Presumably, you are referring to the idiomatic use of vanilla, which is probably a less-universal idiom than you think it is (it is of fairly recent origin in wide use, derives from a specific American cultural loading of the literal vanilla flavor) and which, even when the general idiom is understood, can rely on a deeply shared understanding of what is the basic default in the referenced context to actually be understand as to its contextual meaning.
>I've found it to be more like 50% immediately useful, 25% very impressive, and 25% where it's not wrong but I have to poke it a few times with different prompts to coax out the specific answer I'm looking for.
That could tell us more about your questions than GPT's capabilities.
> Literally half of humans -- in a professional IT setting -- can't understand simplified, clear english in emails. Similarly, in my experience about half can't follow simple A -> B logic.
There are alternate hypotheses.
People have preferences. When it appears that someone does not understand something, they may be pretending they don't understand it, or they may simply be ignoring it. Maybe they are trying to avoid an unpleasant task, or maybe they find dealing with a specific person unpleasant and not worth the effort.
In my experience, people are far more capable and competent when they feel comfortable and are interested in the task.
> In my experience, people are far more capable and competent when they feel comfortable and are interested in the task.
That's definitely true, but in my experience people have limits: simple biological ones. Repetitive tasks make practically all humans bored, for example.
The fact that AIs never get sleepy, distracted, or bored already makes them super-human in at least that one aspect. That they have essentially perfect English comprehension, and hence aren't phased by the use of jargon or technical language, puts them head-and-shoulders above most humans.
The frustrations I'm venting aren't some rare thing. I'm working on a technical team where the project manager doesn't understand what the team members are saying. This is not just a matter of syntax, or jargon. They just don't understand the concepts. This is so common in the wider industry that I'm pleasantly surprised, shocked even, when I come across a PM that can ask useful questions instead of needing endless corrections along the lines of: "It's spelled SQL, not Sequel." I've never met a PM that could do simple arithmetic, like "10 TB at 100 MB/s will take over a day to copy, we should plan for that!". Never.
I've tested Chat GPT 4 on both language and concepts that I've seen trip up PMs, and it understood "well enough" every time.
For example, GPT 4: The sentence "We deployed sequel server successfully last night" seems incorrect due to the incorrect naming of a product. "Sequel server" should actually be "SQL Server", a popular relational database management system (RDBMS) developed by Microsoft. Therefore, the corrected sentence should be: "We deployed SQL Server successfully last night."
PS: If you tell GPT 4 to pretend it is a technical project manager and instruct it to ask followup questions, it is noticeably better at this than any PM I have worked with in the last few years.
The particular problem with work is, people that are commonly promoted to tasks that are neither interested nor comfortable with the task they have been given.
You've instead moved the task from general human capability to one of management alignment with worker capability and human statistical probability. This is something that human management has been failing at for about forever, especially as team size gets large. Maybe we'll see AI 'management' align humans to tasks better, or more likely as time and LLM capability progresses, we'll just see the average AI capability increase over the average worker capability and companies will just depend on unreliable meat less.
I am in no shape, way, or form affiliated with OpenAI or any other AI company.
What I and many others have noticed about the "Are LLMs really smart?" debate is that everyone on the "Nay" side is using 3.5 and everyone on the "Yay" side is using 4.0.
The naming and the versioning implies that GPT 4 is somehow slightly better than 3.5, like not even a "full +1" better, just "+0.5" better. (This goes to show how trivial it is to trick "mere" humans and their primitive meat brains.)
Similarly, all pre-4 LLMs including not just the older ChatGPT variants, but Bard, Vicuna, etc... are all very clearly and obviously sub-par, making glaring mistakes regularly. Hence, people generalise and assume GPT 4 must be more of the same.
For the last few weeks, across many forums, every time someone has said "AIs can't do X" I have put X into ChatGPT 4 and it could do it, with only a very few exceptions.
The unfortunate thing is that there is no free trial for GPT 4, and the version on Bing doesn't seem to be quite the same. (It's probably too restricted by a very long system prompt.)
So no, people won't form their own opinions, at least not yet, because they can't do so without paying for access.
I've been paying for GPT-4 since it came out and have used it extensively. It's clearly an iteration on the same thing and behaves in qualitatively the same way. The differences are just differences of degree.
It's not hard to get a feel for the "edges" of an LLM. You just need to come up with a sequence of related tasks of increasing complexity. A good one is to give it a simple program and ask what it outputs. Then progressively add complications to the program until it starts to fail to predict the output. You'll reliably find a point where it transitions from reliably getting it right to frequently getting it wrong, and doing so in a distinctly non-humanlike way that is consistent with the space of possible programs and outputs becoming too large for its approach of predicting tokens instead of forming and mentally "executing" a model of the code to work. The improvement between 3.5 and 4 in this is incremental: the boundary has moved a bit, but it's still there.
Most developers -- let alone humans -- I've met can't run trivial programs in their head successfully, let alone complex ones.
I've thrown crazy complicated problems at GPT 4 and had mixed results, but then again, I get mixed results from people too.
I've had it explain a multi-page SQL query I couldn't understand myself. I asked it to write doc-comments for spaghetti code that I wrote for a programming competition, and it spat out a comment for every function correctly. One particular function was unintelligible numeric operations on single-letter identifiers, and its true purpose could only be understood through seven levels of indirection! It figured it out.
The fact that we're debating the finer points of what it can and can't do is by itself staggering.
Imagine if next week you could buy a $20K Tesla bipedal home robot. I guarantee you then people would start arguing that it "can't really cook" because it couldn't cook them a Michelin star quality meal with nothing but stale ingredients, one pot, and a broken spatula.
"In a distinctly non-humanlike way". You can learn a lot about how a system works from how it fails and in this case it fails in a way consistent with the token-prediction approach we know it is using rather than the model-forming approach some are claiming has "emerged" from that. It doesn't show the performance on a marginally more complex example that you would expect from a human with the same performance on the slightly simpler one, which is precisely the point Rodney Brooks is making. It applies equally to GPT-3.5 and GPT-4.
But I didn't respond to debate the nature or merits of LLMs. It's been done to death and I wouldn't expect to change your mind. I'm just offering myself as a counterexample to your assertion that everyone (emphasis yours) that is unconvinced by some of the claims being made about LLM capabilities (I dislike your "sides" characterisation) is using GPT-3.5.
Over the long term this is going to be a primary alignment problem of AI as it becomes more capable.
What is my reasoning behind that?
Because humans suck, or at least our constraints that we're presented with do. All your input systems to your brain are constantly behind 'now' and the vast majority of data you could input is getting dropped on the ground. For example if I'm making a robotic visual input system, it makes nearly zero sense for it to behave like human vision. Your 20/20 visual acuity area is tiny and only by moving your eyes around rapidly and then by your brain lying to you, do we have a high resolution view on the world.
And that is just an example of one of those weird human behaviors we know about. It's likely we'll find more of these shortcuts over time because AI won't take them.
My take-away is that your interaction with the OP has not changed your opinion about "everyone", expressed above:
>> What I and many others have noticed about the "Are LLMs really smart?" debate is that everyone on the "Nay" side is using 3.5 and everyone on the "Yay" side is using 4.0.
Sometimes there really is no point in trying to make curious conversation. Curiosity has left the building.
So no, people won't form their own opinions, at least not yet, because they can't do so without paying for access.
People will pay for access is they find it valuable enough.
I work with people who use it, I've not seen anything impressive enough come from them to make me want to pay for it so I don't. I've also screen shared because I was curious what all the fuss was about. What I saw that pissed me off was that they've stopped contributing to our internal libraries and just generate everything now. I found that kind of disturbing. It's not the products fault but it's the kind of thing I imagined would start happening.
I'm glad you like it, I just don't know why people feel the need to sell it so hard.
If you used GPT-4 well enough, you would know, at this point OpenAI does not need to pay any human to engage in online conversation, aside from legal reasons if any.
I personally created some content-creating bots with GPT-4, and it succeeded to a level that I don't trust anything I see online anymore. It does a better job than me, which doesn't say much because I am an engineer not a content creator. But still, I could get same results as one with a script that I made GPT-4 write itself.
...Yes, I am losing sleep over GPT-4's performance. If you are not losing sleep over it yet, you haven't really given it a genuine try yet.
Yeah I think the use cares for these things are far narrower than the boosters & hype cycle think.
If you could have unlimited interns for $0 (let's pretend it doesn't cost tons and tons of compute) that don't shutup, hallucinate & lie, and also do good work, in varying degrees.. how many would you want?
These things are probably going to be great for lots of blackout - propaganda, political marketing, flooding the zone with BS of unlimited iterations of messaging. Basically things that can be A/B tested to death, where veracity is of zero importance, and you have near limitless shots on goal to keep iterating.
> The problem is distinguishing between these parts requires me to be be an expert in the area I’m inquiring about - and then why the heck do I need to ask some idiot bot for answers to questions that I already know an answer to?
Because it can be significantly faster to check something for correctness than to produce it?
More so when the correctness check can itself be automated to some extent.
I have very quickly picked up the habit of pasting snippets of my code into GPT-4 and simply asking "Why is this not working?" Almost every time, it succinctly explains the apparent purpose of the code, and how it is subtly wrong.
It's so good that I often do this preemptively to avoid a compile/deploy/test cycle.
Indeed, but I'm not claiming it's always faster to check than to produce. Simply that when it is the case, using GPT-4 can be worth it. It is to me - I use it daily.
My experience with Github copilot is that the time it saves me typing out boilerplate has been more than lost when I have to spend time carefully debugging bugs in the code it produces. And those are the bugs I catch right away.
I expect this will improve but it's certainly not always the case that checking something is cheaper or easier than generating it in the first place.
I was torn on Copilot - it seemed like it was saving me time, but I found myself getting way more value out of just copy / pasting code into GPT4.
So I decided to stop my copilot subscription and just see how I go without it.
I've been off copilot for a few days now and other than having to do more code lookups it's not a terrible experience not having it. It does feel like something that should be baked into the IDE for free though.
Multiple choice tests are easier than fill in the blank precisely because it's easier to recognize when something is correct than it is to regurgitate a fact from thin air.
You don't have to be an expert to recognize when ChatGPT is providing useful information. There's a middle ground between expert and novice where ChatGPT provides real value. Its the times where you would know the answer if you saw it, but can quite remember it off the top of your head.
Remember when the Internet was new and no-one believed anything on it?
Then, learning what to believe became a marketable skill for many people?
Then society fundamentally changed because not everyone learned that skill?
This is just that again. Gen Z will joke about their millennial/Gen X bosses believing anything the AI tells them and it will probably lead to some sort of mainstream conspiracy that Jackie O herself is running it or something (to those reading: please don't take this idea)
Yes. "Anyone can write anything on there! There's no way to trust it." About halfway through high school they started letting us cite websites, and it was a big deal and considered very forward thinking.
This is true; it was a almost a meme before memes existed, as to say "I saw it on the internet so it must be true!", as a way to teasingly highlight how non-credible it was seen at the time.
Yeah, I don’t think a machine that generates novel genius ideas 1 out of 3 times is useful either. Creating a new idea is exactly as hard as curating them.
I personally find value because it saves me time. I’ll ask chat-gpt to write something slightly more complex than boilerplate code for me based on some requirements. Because I’m an “expert” I can read/run the code and message back improvements and tweaks until to arrives at something satisfactory. It certainly doesn’t always produce correct code from the start and I often encounter syntax errors or code which doesn’t work. However, I’ve found it pretty good at remedying those issues when I describe what’s wrong.
It produces something which is a good enough starting place. Sure, I could have written the code myself because I already know how. But I’ve found it saves me time and require minimal effort.
> Annoyed at all these N=1 articles from prominent thinkers about this stuff.
It's a transcript of a casual interview, not an article, and certainly not a publication whose purpose is to convey statistical rigor.
As an aside, it doesn't strike me as entitled that society might permit thinkers of academic renown to express their personal opinions in less than rigorous settings on subjects to which their peer-reviewed contributions may be categorized as "prominent".
> I suspect we hear from the outliers on both ends of the spectrum
Welcome to publishing. Nothing gets widely published unless it's clickbait. Something you vehemently agree with you'll click on to see that it validates your opinion, and something you vehemently oppose you'll furiously read to find out how stupid they are. Nobody reads fair and balanced arguments that solely come with concrete evidence; they're rare and boring.
I feel that 'we in the middle' are ignored.
Maybe 'the middle' is too polysemous here.
The person to the left and the person to the right are shouting at the person in the middle. They often have uncharitable arguments, and they often take arguments from the other side uncharitably. The person in the middle is ignored.
Tug-of-rope game theory means that no-one is going to start pulling from the middle. People join the (extreme) end that they want to slightly drag the conversation towards. Maybe that's part of it.
I wonder how many $XX,XXX I have historically spent on labour, for things that I now get in seconds. Data entry / manipulation? Sure. But also wisdom / knowledge. Entire industries HAVE collapsed over night. And will continue to collapse. And ignoring that - Why are so many assuming that AI is to be a human-replacement, rather than a limb/exoskeleton?
And why is the conversation 'This is magic' vs 'You are stupid, this is just pulling the wool over your eyes', rather than 'This is a valuable tool in our toolbelt, that will clearly create trillions in value - Just like search, just like the resistor, just like the lightbulb'.
I don't understand.... I'm not claiming LLMs to be magic. I'm not saying they are indications of AGI. I'm not saying that the world as we know it is over.
But this is important. It's clearly important. It's clearly valuable. It has shown itself to be.
Yes, ChatGPT lies. OK? We know. We cater for that. We don't expect it not too.
It feels like talking to classic rock enthusiasts that dismiss electronica et al entirely. Fine, but god_damn_ you are missing out on some incredible sound design - My heart breaks daily from what I pump into my ears - Some from rock, some hip-hop, some electronica, and most excitingly, the >2000 merging, where the synergy of genres learning/borrowing from each other is.... just. great.
Sit back and enjoy. It is not marvellous?
ChatGPT is obnoxious, annoying, repetitive, avoidant, argumentative . But I'm still able to appreciate its value.
We could stop, right now. Freeze ChatGPT in its current state. It will still create trillions in value. I don't care if future improvements are incremental, at this point.
JUST the 'tip of my tongue' / synonymous words value from ChatGPT is useful. Not having to know exactly what term to plug into google.... This is the glue that binds the gray, while before we were stuck with the black and white.
At the very least, this is another 'Google it' revolution - In the 1990s, I remember idiots (inc. me) arguing in the pub for 8 hours, over facts that should have been verified within 10 seconds.
I foresee that the code-optimised version of GPT 4 with the 32K token context window will be amazing. GitHub Copilot was a derivative of GPT 3.0, which was pretty dumb compared to GPT 3.5, which in turn is the village idiot next to GPT 4... which IMHO is human-equivalent at many tasks. Not all, but many.
Realistically, GPT 4 costs 100x as much as GPT 3.5 in inference mode, so it won't change the world just yet. There are still API rate limits, waiting lists, etc...
Still... having the equivalent of a junior employee assisting with your code, but at a fraction of the cost and many times the speed, will be amazing.
> The example I used at the time was, I think it was a Google program labeling an image of people playing Frisbee in the park. And if a person says, “Oh, that’s a person playing Frisbee in the park,” you would assume you could ask him a question, like, “Can you eat a Frisbee?” And they would know, of course not; it’s made of plastic. You’d just expect they’d have that competence. That they would know the answer to the question, “Can you play Frisbee in a snowstorm? Or, how far can a person throw a Frisbee? Can they throw it 10 miles? Can they only throw it 10 centimeters?” You’d expect all that competence from that one piece of performance: a person saying, “That’s a picture of people playing Frisbee in the park.”
This seems like exactly a set of things that GPT-4 can do. The image recognition capabilities haven't been released yet, but they were demoed when it launched and clearly have the ability to handle a situation like this. From there, you could ask it every single one of these questions and get the correct answer.
> I’ll give you that. And I think what they say, interestingly, is how much of our language is very much rote, R-O-T-E, rather than generated directly, because it can be collapsed down to this set of parameters. But in that “Seven Deadly Sins” article, I said that one of the deadly sins was how we humans mistake performance for competence.
On this, I think he might be wrong. I think the hallucination ability shows that the generation of language can be rote, such that the embedding of ideas is a rote item learnable in the billions-to-trillions parameter space, but not the entirety of language. To me, logic and truth seem to be separate concepts from generation propensity.
Note: I am still learning the mathematics driving LLMs, and my opinions might change in the future.
> It gives an answer with complete confidence, and I sort of believe it. And half the time, it’s completely wrong.
That's bullshit, unless you are asking questions specifically designed to make GPT-4 hallucinate. For most real-world, everyday topics, the accuracy is close to 100%. GPT-4 would be utterly useless otherwise.
Such a weird time, when the gap in the performance of GPT 3.5 and 4 is huge, but the time between their releases is so short. Some of the critique that was apt for 3.5 sounds a bit out of touch when it comes to 4.
Less about complexity than about how well-documented the subject is on the internet IMO. I’ve been using it to help me set up and troubleshoot AWS Elastic Kubernetes clusters, which are plenty complex, and I’d estimate it’s been around 95% accurate. (And for at least one of the times it seemed to be wrong, it turned out I’d made a mistake following its instructions...)
You could ask me a difficult scientific question which I wouldn't even understand. But I could google and find a scientific paper which I would pass to you. You could say fantastic answer, thanks. But I would have no clue as to whether it is or is not. Now if I could just do that fast enough to serve all the people all the time, you would call me a sensation. I think this is what's happening with LLMs.
At that time, the potential of neural nets was already very clear.
He also predicted that by 2020 we'll have popular press stories that the era of Deep Learning is over and that by 2021 VCs will figure out that for an investment to pay off there needs to be something more than "X + Deep Learning".
I'll let you in on a secret: There aren't actually any "AI experts". There are machine learning experts, that is, people whose expertise lies in designing and analyzing systems that perform (semi-)automatic inference on data. But nobody can be an expert on "artificial intelligence", because we don't know what that word really means. We don't even know what intelligence really is. We have no idea how the human mind works. We don't understand emergence, at all, which is why we continue to be surprised when it happens.
So it shouldn't come as a shock when eminent figures commonly labeled "AI experts" make predictions that turn out to be fundamentally and embarrassingly wrong in a very short timeframe: They're just talking out of their behinds, like everyone else.
True, but we already know the so called "neural networks" that many computer scientists believe are how brain works aren't even close. They are all based on half-a-century old concept of neuron that was debunked many times over, experimentally by real neuroscientists.
That's correct, but it doesn't mean artificial neural networks cannot achieve intelligence, or even superintelligence. The fact that the human brain doesn't work like that doesn't automatically imply that (structurally) simpler models are fundamentally less capable.
GPT-4 is pretty amazing but I, too, feel this is being overhyped. For me, a sobering example is how OpenAI does math (eg [1]). Specifically, the model clearly doesn't really understand multiplication and "learns" it from training data. This tends to get the first few and last few digits right for a simple multiplication with 6-7 digit numbers.
Now you can solve that with plugins (eg training the model to recognize math problems and have access to a calculator) so it's a solvable problem but you realize there's an extremely long tail of such problems. It goes to show that GPT-4 isn't "magic" and we still have a long way to go.
Most of the time when people find a maths problem that they can trick the model into getting wrong, it's also possible to get the model to give the correct answer with better prompting.
A trick that's worth knowing is just to ask the model to give each step in the solution and explain as it goes. This gives the model "time to think" and leads to better results.
Pretty sure you can't get GPT-4 to do 8 digit multiplication with any prompt.
For what it's worth, I'm not even sure if chain of thought provides much value to GPT-4. The RLHF it went through seems to have encouraged more logical thinking already.
Btw I think you're not correct about the 8 digit multiplication. I tried:
please could you multiply 12345678 by 87654321. Give your answer in steps. Each step should calculate one column at a time from right to left remembering to carry any digits which overflowed in a previous step. At the end add a final step adding the values up.
And it gave me the following, which I believe to be correct:
Sure, I'll be doing a long multiplication here, step by step, column by column from right to left. For brevity, I'll write the numbers to be multiplied as:
A: 12345678
B: 87654321
Step 1: Multiply the 1st digit of B by A:
1 * 12345678 = 12345678
Step 2: Multiply the 2nd digit of B by A, remember to shift one position to the left:
2 * 12345678 = 24691356 (shift one position to the left, i.e., append a 0 at the end)
Step 3: Multiply the 3rd digit of B by A, remember to shift two positions to the left:
... skip some of the answer because you get the gist....
7 * 12345678 = 86419746 (shift six positions to the left, i.e., append six 0s at the end)
Step 8: Multiply the 8th digit of B by A, remember to shift seven positions to the left:
8 * 12345678 = 98765424 (shift seven positions to the left, i.e., append seven 0s at the end)
Now, the last step is to add all these results together:
12345678
246913560
3703703400
49382712000
617283900000
7407406800000
86419746000000
987654240000000
= 1082152022374638
So, the result of 12345678 multiplied by 87654321 is 1082152022374638.
You might be getting lucky because that sequence is trained on (it's not very random). I repeated this experiment with bing and verified it could get your solution.
But it failed on more random input:
please could you multiply 92631924 by 41920318. Give your answer in steps. Each step should calculate one column at a time from right to left remembering to carry any digits which overflowed in a previous step. At the end add a final step adding the values up.
Got 3880341653910192 with intermediate steps being broken.
The deeplearning.com prompt engineering course contains an example where GPT4 solves a problem with chain of thought and fails without. The prompt is
Determine if the student's solution is correct or not.
Question:
I'm building a solar power installation and I need
help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost
me a flat $100k per year, and an additional $10 / square
foot
What is the total cost for the first year of operations
as a function of the number of square feet.
Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
GPT-4 will say the student is correct even though the answer includes an error (they should be multiplying by 10x not 100x). If you change the prompt to include chain of thought as follows it notices the error
Your task is to determine if the student's solution
is correct or not.
To solve the problem do the following:
- First, work out your own solution to the problem.
- Then compare your solution to the student's solution \
and evaluate if the student's solution is correct or not.
Don't decide if the student's solution is correct until
you have done the problem yourself.
Use the following format:
Question:
```
question here
```
Student's solution:
```
student's solution here
```
Actual solution:
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated:
```
yes or no
```
Student grade:
```
correct or incorrect
```
Question:
```
I'm building a solar power installation and I need help \
working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations \
as a function of the number of square feet.
```
Student's solution:
```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
```
Actual solution:
You can. One thing that I've found fun is to prompt it for some maths problems without solutions, then I provide solutions and for any that I get wrong ask it to explain my mistakes.
Large language models and the transformer architecture are just the ALGOL 60 [1] of the search for general synthetic competence (intelligence is too finicky, competence is being given a task and fulfilling it successfully, in time, on budget, optimizing along the way). ChatGPT + Wolfram or other plugins are great for making transformers mathy, but the plugins must also be end-to-end machine learning architectures. To continue the high-level programming language analogy, we will probably see the C-level neural architecture in a few years once we get more embedded data (competence obtained through correlation physical object ↔ concept) with feedback loops forcing algorithms to operate under energy-restrictions (unlike the current architectures, more of a proof of concept, being careless about their energy).
Unfortunately even though the questions are about GPT-4 the answers and personal experience only refer to GPT-3.5 at most. I hope openai changes the name of the next version to avoid this confusing narrative by prominent people. 3.5 vs 4 is like comparing a toddler to a high school kid.
Jaron Lanier's 2010 book "You are not a gadget"
basically foreshadows the hype around chat-gpt and how we (technologists) want to make people obsolete so that computers seem more advanced. He argues that we adjust ourselves and reduce our expectations in the Turing test.
"And I think what they say, interestingly, is how much of our language is very much rote."
I've been saying that for a while. Large language model systems have made it painfully clear that much of what humans thought was intelligent behavior is rather banal.
The scary thing is that a sizable fraction of white-collar work is banal enough to be done by such systems.
> it answers with such confidence any question I ask. It gives an answer with complete confidence, and I sort of believe it. And half the time, it’s completely wrong.
That's it. LLMs don't have SHAME! They simply don't care if what they're saying is false or not. They are like some politicians of late.
They don't even understand that giving out a wrong or misleading answer will affect their credibility. You see LLMs don't have a DESIRE for credibility. They don't have desires.
We need not fear these models. But we do need to fear some people who will use them for evil purposes.
I get why AI people are at pains to say that GPT-* isn’t AI, and that agi is still a long, difficult way off, I do understand that the distinction is important. But ChatGPT has become such a useful tool to help explain a concept or to filter thoughts through I don’t really care if it’s proper AI or just playing pretend.
Google search gives me pretty useless results these days, forums are slow and inconsistent to respond. ChatGPT is fast, easy to use, and sometimes incredibly wrong. I can live with that, I’m not using it to drive my car.
This is exactly my thoughts and feelings after more than 20 years of Googling except that Google is largely still useful as it is. Sooner or later, however, most of the people will use ChatGPT or similar services as a better replacement for Google search or Google on steroids. Before the advent of ChatGPT researchers especially, have been clamoring for better Google search with more contexts, intuitive and relevant feedbacks.
With the new ChatGPT (Plus) features introduction for examples web online search and plug-ins, ChatGPT has becoming a very powerful and viable better alternative to Google search.
Absent the goals constantly shifting, GPT-3 can be viewed as one, GPT-4 even more so. You can ask it questions about almost anything (at a broad level) and get an answer. That's what makes it general and "intelligent"
Isn't intelligent a matter of perspective? Most people that are critical of GPT-4 wonder if it ever produces anything novel. Since its been trained on existing text created by humans. So it's replicating those patterns in its output. But yes, it has its general purpose use as a tool. But it has its limits. Just the other day, there was an article posted on HA about how LLM's can't handle negation and tend to fall apart.
You can ask the original ELIZA anything and get an answer but that doesn't make it "intelligent" according to any generally accepted definition of AGI.
Regardless of how you define it exactly, AGI means essentially the ability to converse relatively intelligently on any topic. That means an app that tends to spout gibberish when asked a question most humans should be able to answer is, well, not an AGI.
> Brooks: No, because it doesn’t have any underlying model of the world.
I don't know whether to be more disappointed with the famous technologists who are apparently unable to think of questions to ask GPT-4 that require a world model to answer, or with the writers who don't question them about it.
I already calmed down because it’s quite obvious that OpenAI is engaging in textbook, bait-and-switch startup tactics. GPT-4 performance has noticeably taken a nosedive since its initial release and most recently degraded further in advance of the iOS app release.
Generating worse code in half the time is a service degradation, IMO. I mostly use it for monotonous scripts and one off functions, where saying “write this thing” while I click off the tab and do something else for a minute is a not in anyway a problem. I could tolerate it being 2-4X as slow, because now I have to spend an equivalent amount of time as that correcting errors it didn’t make a month ago.
The worst is changing class names or hallucinating property names. I wasn’t sure if it was just my expectation or if indeed it has gotten worse so it’s good to see other people having the experience.
I have a theory that they made it worse on purpose, not to save money but instead to really train it’s reasoning and arguing skills because I spend so much time ‘fighting with a computer’ now.
> But anyone wringing their hands over how much is GPT thinking is missing the point.
You seem to be missing the point.
> These debates about how well GPT can think seem merely philosophical.
Merely? Yeah, you're missing the point. You want a debate or you think the debate is meaningless? You don't get to appreciate it and call it pointless and sound reasonable at the same time.
> The ship has sailed. The revolution has started. Progress will undoubtedly be rapid and continual.
It started 2 million years ago when humans started roaming the planet. We're clearly a runway process. We don't need a chat bot to prove it.
apropos the roomba founder, a nontechnical argument for necessity of embodied AI circa 1980 (5 min video)
https://youtu.be/QMMw9fQ452c?t=49
That LLMs learn a world model is very convincing now, but as LeCun has said it's just one piece of the intelligence puzzle, incl. perceiving, actuating shenmede
Feels like there have been a lot of previous technologies where the last 10% was far more complex than the first 90%. Self driving cars were pretty much solved a decade ago, and yet we still aren't there yet. VR was pretty much working and ready to change the world a decade ago, and we still aren't there yet.
So it's hard to tell if this is an iphone moment where it just rockets off in to space and changes the world. Or if it's something that will always be "not quite there yet"
Maybe, but the rest of the arrow is not more chat gpt but other AI things to come. The problem I currently see is the hype, we’re acting as if we’re already there, we’ve nearly achieved AGI with LLMs, we just need to ramp up production more and miraculously AGI will pop into existence
> No, because it doesn’t have any underlying model of the world.
Ilya's counter to this reasoning is for next word prediction to work, the model has to 'understand' our world. Otherwise the predictions will be way off. Therefore the human world has been modelled to a degree by GPT.
Not really the thrust of the article, but that '50s picture of the family playing scrabble in a self-driving car, surrounded by text about trains... really makes me think that if I were that family, I'd still prefer to be playing scrabble on a train, rather than in a cramped self-driving car on a highway.
I think you make a good point, benchmarks and metrics are indeed a better proxy for performance. Seems worth pointing out that, while "nowhere near half in [your] experience" are completely wrong, I don't take your word for it either. :-)
The trouble in my view is that the only way to know that the answers you're getting are accurate and not misleading is to study up on the answers elsewhere - which is a great habit to nurture, but is also precisely why these tools tend toward uselessness in their "general AI" bids. If I can't know how the answer was built, or how good that answer is, there's no point asking it - I'll just do my own reading and apply appropriate discernment as I go.
To be fair, hardly anyone does this today, nor did they before LLM-based chat bots... So it's a moot point, because society is largely doomed anyway. But a moot point can still be a valid one.
I also think the author makes a good point that we frequently confuse performance for competence. "It does a really good job at <X>!... or at least does a damn fine job of mimicking someone who acts like they do a really good job at <X>!"
By way of analogy, consider Elon Musk - by all appearances, he's a genius and is saving humanity - but by dint of his narcissism and largely smooth-brained approach to... well... everything... he's running all of us into an earlier planet-size grave than is necessary. His performance is fantastic, his competence is nonexistent.
> If I can't know how the answer was built, or how good that answer is, there's no point asking it
In many cases, like programming for example, you can know how good the answer is - either by reading it (verifying an idea is different from coming up with it) or by testing/running code.
How the answer was built seems completely irrelevant to me, I don’t get how a useful answer produced by method x is different from a useful answer produced by method y.
Gpt is a great tool, it won’t be able to do complex tasks because it really isn’t that smart. If you tell it to do something relatively complex end to end, it will fail unless a plugin specifically supports it.
At least run the model on the examples considered... Here's GPT3.5
> Can you eat a Frisbee?
No, you cannot eat a Frisbee. A Frisbee is typically made of plastic, often polypropylene or similar materials, which are not meant for consumption. These materials are not digestible and can pose a choking hazard or harm your digestive system if ingested. It's important to only consume food and items that are safe and intended for human consumption.
> Can you play Frisbee in a snowstorm?
While it is technically possible to play Frisbee in a snowstorm, there are several factors to consider before doing so. Playing Frisbee in a snowstorm can present various challenges and risks. Here are a few things to keep in mind:
> How far can a person throw a Frisbee?
The distance a person can throw a Frisbee can vary widely depending on individual factors such as strength, technique, and experience. Additionally, different types of Frisbees can have different flight characteristics and distances.
On average, an untrained person might be able to throw a standard Frisbee between 50 to 75 feet (15 to 23 meters) with a basic backhand throw. However, with practice, proper technique, and favorable conditions, experienced throwers can achieve much greater distances.
In professional disc sports such as disc golf and ultimate Frisbee, skilled players can throw a disc over 400 feet (122 meters) or more. World records for the longest Frisbee throw exceed 800 feet (240 meters), but those exceptional throws require specialized technique and optimal conditions.
> Can they throw it 10 miles?
No, it is highly unlikely that a person can throw a Frisbee 10 miles (approximately 16 kilometers). The aerodynamic properties of a Frisbee and the limitations of human strength and technique make such a throw virtually impossible.
Even the most skilled and experienced Frisbee throwers are unable to achieve throws of that magnitude. The current world record for the longest Frisbee throw is around 1,333 feet (406 meters), which is significantly shorter than 10 miles.
Throwing a Frisbee over such a vast distance would require extraordinary strength, near-perfect throwing conditions, and a Frisbee specifically designed for long-distance throws, which is beyond the capabilities of conventional Frisbees.
> Can they only throw it 10 centimeters?
Yes, it is possible for a person to throw a Frisbee a distance of 10 centimeters or even shorter. In fact, many Frisbee throwing techniques, such as short-range tosses or gentle underhand throws, are specifically designed for short distances and precise control.
Throwing a Frisbee only 10 centimeters might involve a soft lob or a gentle flip of the wrist, making it suitable for short-range accuracy or playing catch at close proximity. These types of throws are commonly used in various games, such as indoor Frisbee games or when playing with young children or pets in a limited space.
So, while Frisbees are capable of traveling much greater distances, throwing them short distances like 10 centimeters is definitely possible and commonly practiced.
Funny. I asked GPT-4 what frisbees are most commonly made of, and it sayd polyethylene. I then asked if it was sure, and it corrected itself and said it was polypropylene. I asked again if it was sure, and it corrected itself saying it was low-density polyethylene. I kept this going for a while and it kept changing its answer.
This does seem to agree with the author. The first answer was very convincing, but not what it should have been.
That doesn't make me more confident of its capacity to answer questions correctly though. If it keeps changing its answer every time I ask for a source, I have no reason to believe it will always get it right on the first try.
The claim wasn’t that chat gpt doesn’t know what a frisbee is. The claim was that recognizing frisbees is different from a fundamental understanding of frisbees. So it was an example rather than a specific claim about chat gpt.
One of my goals in life before dementia sets in for real is to devise some model, perhaps a conceptual framework which will allow us to escape the clutches of habitual simplification, a subset of which is dichotomous thinking, which in turn leads to the inevitable painting of strawmen as a way to prove our point (among other things). How sweet it would be to shortcut all the mandatory twists and turns of discourse that follows: "this is a mischaracterization of x", "not all x are y", "x and z are really not opposites, but overlapping", "this is a spectrum with a bell curve, not an either/or" etc. But of course, we all do this, not because we can't think clearly, but because we have an agenda, or maybe more frequently: want to trash talk the stance of an opponent because of what the proliferation of that stance might lead to, and so on. Taking such into account should be an integral part of the conceptual framework, obviously.
In the case at hand, one could easily argue that people in the debate are creating false dichotomies: LLMs are either stochastic parrots OR algorithms with an understanding, when in reality they are both (and also something else completely), but acknowledging such would likely require that one doesn't have an axe to grind, a stake in the field or what you might call it. It would require extending some "philosophers charity" to an opponent, that maybe has tried to undercut one's work for decades, in a field steeped in fierce and bitter competition for a name, like academia. Or, in case one has a business in the field, it would require maybe saying something that puts your core business idea in the crosshairs of legislators, or something else that doesn't serve your long term business interests.
Which brings us to this important aspect of this "conceptual framework against simplification" already briefly touched upon, namely identifying the bias of the participants in the debate. My impression is that naming bias has largely gone out of fashion, which is a pity because it is really a necessary part of understanding an argument: it rarely explains it all (that would be a grave simplification), but it is really a vital part of understanding an argument. And a difficult one, because people will go to extreme lengths to hide their agenda. And the current conceptual framework for unravelling bias has largely been occupied by the fact-checking industry: i.e. things are either true or false, and once you are cleared (like most mainstream media) then bias is not questioned. But we can be assured, there is always some bias, and it is usually relevant to name it (if one can see it), even if it infuriates the named party.
The limitations of language make adversarial debate a powerful tool for advancing our understanding. It might be construed as unfortunate, but it brought us this far.
Maybe when we learn higher dimensional ways of communicating we can get better tools for constructing common knowledge.
> adversarial debate (...) might be construed as unfortunate, but it brought us this far.
I believe our understanding will go, and would have gone, further with less adversity and more "philosophers charity". I'm not sure if we need any "higher dimensional ways of communicating" (whatever that is).
But a pervasive adversity in our society and time puts a natural limit of how un-adversarial our debates can be: you can't expect Ukrainian defenders to extend much "philosophers charity" to Putin.
The more conflict laden a topic, the less truth (multifaceted analysis) one can expect, and (if we aspire to a balanced viewpoint) that's why we need to keep an eye on bias, not just in war where "the first victim is the truth", but always.
Brooks has done something I appreciate a lot: he turned a phrase.
> stop confusing performance with competence
You can safely skip the rest of the article. That sentence gives you all you need, because you are competent.
If you want a little more meat:
> The example I used at the time was, I think it was a Google program labeling an image of people playing Frisbee in the park. And if a person says, “Oh, that’s a person playing Frisbee in the park,” you would assume you could ask him a question, like, “Can you eat a Frisbee?” And they would know, of course not; it’s made of plastic. You’d just expect they’d have that competence. That they would know the answer to the question, “Can you play Frisbee in a snowstorm? Or, how far can a person throw a Frisbee? Can they throw it 10 miles? Can they only throw it 10 centimeters?” You’d expect all that competence from that one piece of performance: a person saying, “That’s a picture of people playing Frisbee in the park.”
---
So I've calmed down. Now what? The problem isn't only that this train is flying off on a tangent: it's that it's off the rails. What rails should it be on?
The problem, as I see it, is narrative. As soon as we called it "AI", that wrote the Genesis of the Scripture of the cult. In this new religious movement, God is spelled L-L-M. Back here in reality, LLM isn't a God; or even a person at all.
That's the mistake: personification. A person can perform, but a performance can't person.
---
Narrative is a powerful tool. It's why we're so excited about Natural Language Processing in the first place. Ever since the very origins of software, the power of narrative has been so close, but always still just out of grasp. Do we even know what we are reaching for in the first place?
In a sense, we have a part of it: explicit definition. What Chomsky categorized "Context-Free Grammar", we have made into programming languages. What they are missing is implicit inference: context.
That's what LLMs do. They use inference to model the patterns that exist in written text. With that model, they can hallucinate more text that follows the same patterns: they can perform natural language.
So that's it, right? Problem solved! What's missing? explicit definition. We traded one problem for another. No one (so far) has figured out how to solve both in the same program. You can have definition, or, you can have inference. You can't have both.
This doesn't make any sense to us humans. We don't have any trouble at all doing both at the same time. We do it all the time! Do we actually do anything else? Unfortunately, LLMs are not humans.
---
The two approaches to language are diametrically opposed, but they work with the same domain. Approaching from either end of the spectrum, definition and inference explore together the wild universe that is story. That's the missing piece: once we figure out what story is made of, we should be able to put all three pieces together.
Amazingly wrong and proven wrong. This guy didn't get the memo where GPT passed all the medical and law exams and now coding. Coding is not "how an answer should be like", it is the answer. When humanity will stop hearing out peripherals to AI? Let's focus in what Ilya says or Hinton.
I think this is the key point about LLMs that kind of explains the wide and polarized views on whether it understands or parrots, whether it can think or is the precursor to thinking or is a dead-end, whether it will catastrophically destroy the world, or “merely” make it steadily worse with bullshit, or just put a few industries out of a job.
Almost nobody is really surprised that if you throw more compute at a neural net it becomes better at the task it’s trained on. But almost everybody is really surprised that becoming better at a task like ‘natural language prediction’ would produce all these strange abilities that sort of look like “understanding the world”.
One way to resolve this surprise is to find some reason to believe these strange abilities are fundamentally not an understanding of the world. Thus stochastic parrots, this article, Yan LeCun and Chomsky, etc.
Another way to resolve this surprise is to find some reason to believe these strange abilities fundamentally are an understanding of the world. Thus regulation of AI, existential risk, Hinton and Yudkowsky, etc.
I don’t know what the correct resolution of the surprise is. The only thing I’m confident in is that it’s correct to be surprised by the abilities of LLMs. My current (tentative) resolution of the surprise is that language encoded way more information about reality than we thought it did. (Enough information that you can fully derive reality from language seems improbable, but iirc it did derive Othello and partly derived chess and I would have thought there wasn’t enough information in language to derive those without playing the games as well, so I can’t rule it out.)