https://i.imgur.com/feEiiZA.png I tried to stack all of these objects myself and...

RandomLensman · on May 21, 2023

Might also put the light bulbs as a base (especially if in a box). They are pretty sturdy and can hold a book.

tornato7 · on May 21, 2023

The point is that ChatGPT undeniably built a world model good enough to understand the physical and three-dimensional properties of these items pretty well, and it gives me a somewhat workable way to stack them, despite never having seen that in its training data.

RandomLensman · on May 21, 2023

You cannot conclude that from the output - the training data will likely contain a lot stacking things. Everyday objects also might have some stacking properties that make these questions easy to answer even with semi-random answers.

Plus, some stuff clearly makes no sense or is ignored (like the gummy worms in the center, forgetting about the succulent in some cases).

If you want to test world modeling, give it objects it will have never encountered, describe them and then ask to stack etc. For example, a bunch of 7 dimensional objects that can only be stacked a certain way.

moffkalast · on May 21, 2023

> If you want to test world modeling, give it objects it will have never encountered, describe them and then ask to stack etc. For example, a bunch of 7 dimensional objects that can only be stacked a certain way.

And when it does that perfectly, I assume you'll say that was also in the training data? All examples I've seen or tried point to LLMs being able to do some kind of reasoning that is completely dynamic, even when presented with the most outlandish cases.

RandomLensman · on May 21, 2023

All examples I tried myself show it failing miserable at reasoning.

It certainly needs better evidence than being able to come up with one of many possibilities of stacking things - aided by human interpretation on top of the text output. Happy to look at other suggestions for test problems.

moffkalast · on May 21, 2023

Well for me personally, the proof is in giving it a few sentences on how it should write fairly complicated pieces of unique code I need on a daily basis and seeing it correctly infer things I forgot to specify in ways that are typically borderline impossible for anything but another human. If that's not reasoning I don't know what is.

The other one that convinced me was this list: https://i.imgur.com/CQlbaDN.png I think the leetcode tests are quite indicative, going as far as saying that GPT-4 scores 77% on basic reasoning, 26% on complex reasoning and 6% on extremely complex reasoning.

Maybe the reasoning is all "baked in" as it were, like in a hypothetical machine doing string matching of questions and answers with a database containing an answer to every possible question. But in the end, correctly using those baked in thought processes may be good enough for it to be completely indistinguishable from the real thing, if the real thing even exists and we aren't stochastic resamplers ourselves.

> aided by human interpretation on top of the text output

That's an interesting point actually, I've been trying to do something in that regard recently, by having it use an API to do actual things (in a simulated environment) and it seems very promising despite the model not being tuned for it, but given that AutoGPT and plugin usage are a thing, that should be all the evidence you need on that front.

Google also did this with their old Palm model which is vastly inferior to even GPT 3.5: https://www.youtube.com/watch?v=j6O_uePUKKI

RandomLensman · on May 21, 2023

Coding isn't a use case of mine. For example, for things like financial derivatives replication it can tell you the abstract concept but it cannot apply it in a meaningful way.

biorach · on May 21, 2023

> For example, a bunch of 7 dimensional objects that can only be stacked a certain way.

That's a ridiculous example.

RandomLensman · on May 21, 2023

Why? You need make sure that a solution requires true understanding and isn't in the training set. If it can reason properly, it shouldn't have a problem with such a problem.

DiogenesKynikos · on May 21, 2023

How well do humans reason about 7-dimensional objects?

I'm already impressed if a computer can reason flexibly about 3-dimensional objects.

RandomLensman · on May 21, 2023

Humans having the right mathematical tooling do ok.

DiogenesKynikos · on May 23, 2023

What percentage of humans have that mathematical tooling?

The fact that people are even raising these sorts of obscure tests shows just how far AI has advanced.

tornato7 · on May 21, 2023

Please tell me how you would pose the question of a bunch of seven-dimensional objects that can only be stacked in a certain way.