Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No world model ?

A world model is so obvious papers like these are more confirmation than surprise

https://arxiv.org/abs/2305.11169

https://arxiv.org/abs/2210.13382

There a certain sentiment that AGI however you wish to define it won't infact be a "We'll know it when we see it" situation but rather a "AGI will arrive long before consensus reaches its AGI". LLMs have made me believe this will 100% be the case, either way.

It's one thing to argue over things we can't evaluate even now but man the 100th "They can't reason!" every week is pretty funny when you can basically take your pick of reasonong type - Algorithmic, Casual, Inference, Analogical and read a paper showing strong performance.

https://arxiv.org/abs/2212.09196

https://arxiv.org/abs/2305.00050

https://arxiv.org/abs/2204.02329

https://arxiv.org/abs/2211.09066

People refuse to see even what is staring right at them.



This paper is startling in terms of a world model and how it seems to emerge in a specific example in the step from GPT3.5 to 4. https://arxiv.org/pdf/2303.12712.pdf

One of the prompts is "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner."

Gpt3.5 is hopeless. It says start from the bottle, balance the nail on top of it, then the eggs (somehow) then the laptop, then the book.

GPT 4 says put the book first, then the eggs on top of the book in a 3x3 grid, then the laptop, then the nail.


It's interesting that GTP4's current answer is different but what you described is almost precisely what Google Bard gives as a response. Clearly the solution (book -> eggs 3x3 -> laptop etc) is coming from the Internet.

However any deviation from the original task ruins the LLM's answer. Try 9 cabbages instead of eggs and see how ridiculous and out of touch with reality the responses given by both GPT4 and Bard are.


GPT-4: Place the book down first. Position the laptop flat on the book. Stack the cabbages in a pyramid on laptop. Balance the bottle on top. Insert the nail into the bottle cork.


> GPT 4 says put the book first

I assume it read the correct answer somewhere. Perhaps in a post like yours.

What does it answer if you say there is one egg, or two eggs? Or 1.5 eggs?


It's interesting. The answer shows some reasoning about weight and pressure but not curvature. In response to

    Here we have a book, an egg, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.
GPT-4 gives me:

    Arranging objects in a stable stack depends on their size, shape, and weight. In general, you want to place the largest, heaviest and most stable objects at the bottom and then move to the smallest, lightest, and least stable at the top. Here's one way to stack the objects you listed:

        Laptop: Start with the laptop at the bottom. It's likely the largest object and it has a flat, stable surface.

        Book: Next, put the book on top of the laptop. Books are generally flat and heavy, which makes them a good choice for the base of your stack.

        Bottle: The bottle goes next, as it should have a flat bottom to sit securely on the book. Just make sure it's closed tight and won't roll away!

        Egg: The egg is delicate, so it should be towards the top to avoid pressure. However, it's not flat, so it should not be at the very top.

        Nail: Finally, the nail goes on top. It's the smallest and lightest, so it won't add much pressure to the egg.

    Remember that this is just one possible way to stack these items and it could depend on the specific characteristics of the items you have. For example, if the bottle is much larger and heavier than the book or laptop, you'd need to adjust accordingly. Safety should always be your first priority when stacking items, especially if they are delicate or valuable.
The "make sure it's closed tight and won't roll away" comment makes no sense obviously. Most people would place the bottle standing on its end so neither of those is a concern. The response also doesn't show an understanding of the fact that the nail won't sit on top of the egg although it's interestingly concerned with pressure breaking the egg.


The Devil's advocate wants to have a word with you.

> The "make sure it's closed tight and won't roll away" comment makes no sense

As noted at the end of GPT-4's answer, "Safety should always be your first priority." What happens if your stacking experiment fails and the bottle falls? Any content would spill out, unless the bottle is closed tight. If you are doing this on a table, the bottle could also roll off the edge, fall to the floor and shatter.

> Most people would place the bottle standing on its end so neither of those is a concern.

GPT-4 doesn't know if you are like most people (maybe you're 5 or in the bottom IQ decile), it doesn't know what's in your bottle and it doesn't know how robust it is. Better be safe than sorry.

> the nail won't sit on top of the egg

I'm pretty sure I could balance a nail on an egg. The question also didn't preclude using stabilizing aids like adhesive tape or glue.


I find interesting that Turing foresaw all this with the chinese room/turing test thing. If computers can emulate human language to an indiscernible degree of difference from what other humans can do, it is essentially reasoning, for some definitions of reasoning. But how much of it reflects our own understanding of reasoning and efforts in fine tuning? If I show a modern imax movie to a stone age hunter and they confuse it for reality, is the movie reality? People ran from footage of a steam locomotive driving towards the camera. Does that mean the train was real?


What the skeptics seem to get hung up on is the lack of common sense. These LLM’s are trained on language describing the world, not on the world itself. They lack the lived experience that we take for granted. So it is trivial to ask them common sense questions and see them fail amusingly, but that is unfair because they haven’t been taught the world, only a linguistic map of the world, and the map is not the territory. Also, we don’t even know what map they learned, because the only way to know is to ask a million questions, like an early seafarer discovering the shape of a foreign continent beach by beach.

LLM’s are not people, they lack common sense, but they understand and can reason about what they are trained on. That is exceedingly powerful and very useful even at today’s level of ability, so products built on top of this technology are going to transform everything. The trick is boxing it in and only making it do things it can, so the art of LLM product development will have to become a whole subfield of software engineering until the LLM’s develop to the point where their map of the world is close enough to the world itself.


Do you know what a "world model" is? It's a thing people were assumed to have in 1970s psychology, but was never well-defined enough to tell if it exists or not, so I don't think it's obvious anything else has one.

https://twitter.com/Meaningness/status/1639120720088408065

I think "common sense" or "long term memory" might be more productive things to say.


The brain does a lot of hierarchical prediction with sense data. When new information comes in, it references "something" and adjusts the sense data likewise.

That's why when you shift your eyes quickly, you see blurred images pass by. In reality, you should be seeing complete black because the brain doesn't actually process visual information that shifts so quickly.

But your brain "knows" it should see...well something. And so it fits that blurred passthrough as compensation. Completely made up data. But not ungrounded data, data that seems like it should fit according to that "something". That "something" being referenced is a world model.

That's a sub-conscious example but you can think of a lot of times people consciously reference "something" to better take actions or make decisions.

For Othello-GPT, training seemed to induce a learned board state that the model referenced at every Inference to make moves.

It seems that fully accurate or not, deep predictions might require "something" to reference. Else perhaps you might need to derive everything from first principles at every turn. Not very efficient.


It always seemed sensible to me that we are looking at a dreamlike construction all the time, not just holes filled in but literally everything you see is made up, but grounded in sensory data (when not psychotic or dreaming). This also nicely explains the clarity and realism of lucid dreaming, because it’s just doing whatever you are doing during the day but without (external) grounding sensory data.


> but without (external) grounding sensory data.

Or perhaps with external data that is normally surpressed by data that comes through sensory organs (e.g. other parts of EM spectrum).


> That "something" being referenced is a world model.

It doesn't need to be a whole understanding of the nearby world to build an image. Basically it's an event camera (https://en.wikipedia.org/wiki/Event_camera) and just needs to integrate recent visual "samples".

But if you do need to understand objects near you in detail (say, to move them), you don't do it just by looking. You investigate them by moving closer to them, picking them up, etc. If you need to "create a world model", then it's an activity rather something your senses just do.


To do a great deal of harm, GPT-X just has to have enough of a world model to be useful for automating tasks like deception and impersonation at a scale humanity has never before experienced.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: