> *I'm not sure if you read the entirety of my comment?* I did, and I tried my b...

UniverseHacker · 2025-01-03T23:06:43 1735945603

> You seem to be equating "being able to predict the next symbol in a sequence" with "possessing a deep causal understanding of the real-world processes that generated that sequence"

More or less, but to be more specific I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events. When constrained with a lot of diversity and a small model size, it must eventually become something of a general world model.

I am not understanding why you would see that as anthropomorphism- I see it as quite the opposite. I would expect something non-human that can accurately predict outcomes of a huge diversity of real world situations based purely on some type of model that spontaneously develops by optimization- to do so in an extremely alien and non-human way that is likely incomprehensible in structure to us. Having an extremely alien but accurate way of predicatively modeling events that is not subject to human limitations and biases would be, I think, incredibly useful for escaping limitations of human thought processes, even if replacing them with other different ones.

I am using modeling/predicting accurately in a way synonymous with understanding, but I could see people objecting to the word 'understanding' as itself anthropomorphic... although I disagree. It would require a philosophical debate on what it means to understand something I suppose, but my overall point still stands without using that word at all.

weakfish · 2025-01-04T04:14:13 1735964053

> specific I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events

But it doesn’t - it’s a statistical model using training data, not a physical or physics model, which you seem to be equating it to (correct me if I am misunderstanding)

And in response to the other portion you present, an LLM fundamentally can’t be alien because it’s trained on human produced output. In a way, it’s a model of the worst parts of human output - garbage in, garbage out, as they say - since it’s trained on the corpus of the internet.

UniverseHacker · 2025-01-04T05:17:38 1735967858

> But it doesn’t - it’s a statistical model using training data, not a physical or physics model, which you seem to be equating it to (correct me if I am misunderstanding)

All learning and understanding is fundamentally statistical in nature- probability theory is the mathematical formalization of the process of learning from real world information, e.g. reasoning under uncertainty[1].

The model is assembling 'organically' under a stochastic optimization process- and as a result is is largely inscrutable, and not rationally designed- not entirely unlike how biological systems evolve (although also still quite different). The fact that it is statistical and using training data is just a surface level fact about how a computer was setup to allow the model to generate, and tells you absolutely nothing about how it is internally structured to represent the patterns in the data. When your training data contains for example descriptions of physical situations and the resulting outcomes, the model will need to at least develop some type of simple heuristic ability to approximate the physical processes generating those outcomes- and at the limit of increasing accuracy, that is an increasingly sophisticated and accurate representation of the real process. It does not matter if the input is text or images any more than it matters to a human that understands physics if they are speaking or writing about it- the internal model that lets it accurately predict the underlying processes leading to specific text describing those events is what I am talking about here, and deep learning easily abstracts away the mundane I/O.

An LLM is an alien intelligence because of the type of structures it generates for modeling reality are radically different from those in human brains, and the way it solves problems and reasons is radically different- as is quite apparent when you pose it a series of novel problems and see what kind of solutions it comes up with. The fact that it is trained on data provided by humans doesn't change the fact that it is not itself anything like a human brain. As such it will always have different strengths, weaknesses, and abilities from humans- and the ability to interact with a non-human intelligence to get a radically non-human perspective for creative problem solving is IMO, the biggest opportunity they present. This is something they are already very good at, as opposed to being used as an 'oracle' for answering questions about known facts, which is what people want to use it for, but they are quite poor at.

[1] Probability Theory: The Logic of Science by E.T. Jaynes

zahlman · 2025-01-04T13:26:16 1735997176

> I would say that increasingly accurately predicting the next symbols in a massive set of diverse sequences, which explain a huge diversity of real world events described in sequential order, requires increasingly accurate models of the underlying processes of said events.

I disagree. Understanding things is more than just being able to predict their behaviour.

Flat Earthers can still come up with a pretty good idea of where (direction relative to the vantage point) and when the Sun will appear to rise tomorrow.

UniverseHacker · 2025-01-04T17:24:59 1736011499

> Flat Earthers can still come up with a pretty good idea of where (direction relative to the vantage point) and when the Sun will appear to rise tomorrow.

Understanding is having a mechanistic model of reality- but all models are wrong to varying degrees. The Flat Earther model is actually quite a good one for someone human sized on a massive sphere- it is locally accurate enough that it works for most practical purposes. I doubt most humans could come up with something so accurate on their own from direct observation- even the fact that the local area is approximately flat in the abstract is far from obvious with hills, etc.

A more common belief nowadays is that the earth is approximately a sphere, but very few people are aware of the fact that it actually bulges at the equator, and is more flat at the poles. Does that mean all people that think the earth is a sphere are therefore fundamentally lacking the mental capacity to understand concepts or to accurately model reality? Moreover, people are mostly accepting this spherical model on faith, they are not reasoning out their own understanding from data or anything like that.

I think it's very important to distinguish between something that fundamentally can only repeat it's input patterns in a stochastic way, like a Hidden Markov Model, and something that can make even quite oversimplified and incorrect models, that it can still sometimes use to extrapolate correctly to situations not exactly like those it was trained on. Many people seem to think LLMs are the former, but they are provably not- we can fabricate new scenarios, like simple physics experiments not in the training data set that require tracking the location and movement of objects, and they can do this correctly- something that can only be done with simple physical models- however ones still far simpler than what even a flat earther has. I think being able to tell that a new joke is funny, what it means, and why it is funny is also an example of, e.g. having a general model that understands what types of things humans think are funny at an abstract level.