Hacker Newsnew | past | comments | ask | show | jobs | submit | more nybsjytm's commentslogin

Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.


I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.

Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.


Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.


The ability to multiply two numbers superhumanly has already transformed society.


That superhuman capability of "multiplying two large numbers instantly" has transformed human society like not even plumbing has. I really can't see how this you could not consider this useful.


It's worth emphasizing that it's been possible for years to use an automatic theorem prover to prove novel results. The whole problem is to get novel interesting results.


We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.


> We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs.

Which computer science theorem is this from?


It's a direct consequence of the format of a proof. They're finite-length sequences of a finite alphabet of symbols, so of course they're enumerable (the same algorithm you use to count works to enumerate them).

If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).


It’s just an obvious statement. If a proof exists, you will eventually get to it.


Only if we take AC, I guess?


No, this has nothing to do with choice.


I guess it is tautological from the definition of "provable". A theorem is provable by definition if there is a finite well-formulated formula that has the theorem as consequence (https://en.wikipedia.org/wiki/Theorem paragraph theorem in logic)


Not sure it’s a tautology. It’s not obvious that a recursively enumerable procedure exists for arbitrary formal systems that will eventually reach all theorems derivable via the axioms and transformation rules. For example, if you perform depth-first traversal, you will not reach all theorems.

Hilbert’s program was a (failed) attempt to determine, loosely speaking, whether there was a process or procedure that could discover all mathematical truths. Any theorem depends on the formal system you start with, but the deeper implicit question is: where do the axioms come from and can we discover all of them (answer: “unknown” and “no”)?


It's "obvious" in the sense that it's a trivial corollary of the completeness theorem (so it wouldn't be true for second order logic, for example).

Hilbert's program failed in no contradiction to what GP wrote because the language of FOL theorems is only recursively enumerable and not decidable. It's obvious that something is true if you've found a proof, but if you haven't found a proof yet, is the theorem wrong or do you simply have to wait a little longer?


Yeah but width-first immediately gives you the solution for any finite alphabet. So in that sense it is trivial.


The shortcut vs enumeration is definitely enormous right? just take average shannon entropy to the exponent of the length for example will be probably > heat death (or whatever death) of universe on all of human compute (I'm assuming I didn't bother to check but it seems intuitively true by a margin)


> A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

What does this have to do with a hypothetical automatic theorem prover?


Logic is pretty much absent from our culture and daily life, but that could be due to its limited supply.


Being logical in social life is pretty much completely different from being logical in a mathematical argument, especially in a formal theorem proving environment. (Just try to write any kind of cultural proposition in a formal language!)


That's the way things are now, but this regime came about when proving things took intense concentration and specialized skills that very few people had. Contrast going to look something up in a library with googling something during a conversation.


Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.

I imagine they are under pressure not to make this mistake again.


Hinton was never in the running for a Fields medal since he never made a single contribution to the field of mathematics. His work is about empirical discoveries in CS.


An incomprehensibly silly publicity stunt!


Tao is famous for being the world's most singular and inspiring genius. It's kind of a meme position* that most people accept because they think everyone else accepts it, but for people of a certain inclination, it makes anything he might say into a Pronouncement to read depths into. If you think someone is possibly the greatest mathematician of all time, you'd be interested in everything they think!

(* I think one could very legitimately view him as the top researcher in harmonic analysis in the world - he is a great mathematician - but it's not clear to me how people go from that to Epochal Genius and his extreme celebrity status across STEM)


> If arguably the person with the highest IQ currently living, is impressed but still not fully satisfied that a computer doesn’t give Nobel prize winning mathematical reasoning

No offense, but every part of this characterization is really unserious! He says "the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." That's very different from what you're suggesting.

The way you're talking, it sounds like it'd be actually impossible for him to meaningfully say anything negative about AI. Presumably, if he was directly critical of it, it would only be because his standards as the world's smartest genius must simply be too high!

In reality, he's very optimistic about it in the future but doesn't find it useful now except for basic coding and bibliography formatting. It's fascinating to see how this very concrete and easily understood sentiment is routinely warped by the Tao-as-IQ-genius mythos.


I’m not sure i understand your complaint

Are you arguing that I’m making an appeal to authority fallacy?


> ‘Able to make the same creative mathematical leaps as Terence Tao’ seems like a pretty high bar to be setting for AI.

There's no need to try to infer this kind of high bar, because what he says is actually very specific and concrete: "Here the result was mildly disappointing ... Essentially the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." Crucially the blog post in question was part of his input to ChatGPT.

Otherwise, he's been clear that while he anticipates a future where it is more useful, at present he only uses AI/ChatGPT for bibliography formatting and for writing out simple "Hello World" style code. (He is, after all, a mathematician and not a coder.) I've seen various claims online that he's using ChatGPT all the time to help with his research and, beyond the coding usage, that just seems to not be true.

(However, it's fair to say that "able to help Terence Tao with research" is actually a high bar.)


Daniel Litt, an algebraic geometer on twitter, said "Pretty impressed by o1-preview! Still not having much luck asking it to do any interesting math but it seems much more reliable with simple things; I can actually imagine it being a net time-saver at this point with some non-mathematical tasks."

Any other takes by mathematicians out there?


Do note that Terry has access to the full o1. o1-preview is, well, a preview.


Does it matter that, like so much in the Math for AI sphere, core details seem to be totally bungled? e.g. see various comments in this thread https://x.com/waltonstevenj/status/1834327595862950207


How will you know if its answers are correct or not?


Because I'm verifying everything by hand, as is the whole point of studying pure mathematics.


How can you verify a proof though? Pure math isn't really about computations, and it can be very hard to spot subtle errors in a proof that an LLM might introduce, especially since they seem better at sounding convincing rather than being right.


The same way I verify my own proofs of textbook exercises: very cautiously. Subtle errors are a feature of the problem domain, not a new novelty.


By using Lean, a proof assistant and a functional programming language.

Here's @tao on mathstodon saying he's learning it.

https://mathstodon.xyz/@tao/111206761117553482


To code proofs in lean, you have to understand the proof very well. It doesn't seem to be very reasonable for someone learning material for the first time.


That's not true at all.

You can literally learn how to write proofs using Lean: https://djvelleman.github.io/HTPIwL/


The examples in this book are extraordinarily simple, and covers material that many proof assistants were designed to be extremely good at expressing. I wouldn't be surprised if a LLM could automate the exercises in this book completely.

Writing nontrivial proofs in a theorem prover is a different beast. In my experience (as someone who writes mechanized mathematical proofs for a living) you need to not only know the proof very well beforehand, but you also need to know the design considerations for all of the steps you are going to use beforehand, and you also need to think about all of the ways your proof is going to be used beforehand. Getting these wrong frequently means redoing a ton of work, because design errors in proof systems are subtle and can remain latent for a long time.


> think about all of the ways your proof is going to be used beforehand

What do you mean by that? I don't know much about theorem provers, but my POV would be that a proof is used to verify a statement. What other uses are there one should consider?


The issue is-- there are lots of way to write down a statement.

One common example is if you're going to internalize or externalize a property of a data structure: eg. represent it with a dependent type, or a property about a non-dependent type. This comes with design tradeoffs: some lemmas might expect internalized representations only, some rewrites might only be usable (eg. no horrifying dependent type errors) with externalized representations. For math in particular, which involves rich hierarchies of data structures, your choice about internalization might can impacts about what structures from your mathematical library you can use, or the level of fragile type coercion magic that needs to happen behind the scenes.


The premise is to have the LLM put up something that might be true, then have lean tell you whether it is true. If you trust lean, you don't need to understand the proof yourself to trust it.


The issue is that a hypothetical answer from a LLM is not even remotely easy to directly put into lean. You might ask the LLM to give you an answer together with a lean formalization, but the issue is that this kind of 'autoformalization' is at present not at at all reliable.


Tao says that isn't the case for all of it and that on massive collaborative projects he's done many nonmathemeticians did sections of them. He says someone who understands it well needs to do the initial proof sketch and key parts but that lots of parts of the proof can be worked on by nonmathemeticians.


If Tao says he's interested in something being coded in lean, there are literal teams of people who will throw themselves at him. Those projects are very well organized from the top down by people who know what they're doing, it's no surprise that they are able to create some space for people who don't understand the whole scope.

This is also the case for other top-profile mathematicians like Peter Scholze. Good luck to someone who wants to put chatgpt answers to random hypotheticals into lean to see if they're right, I don't think they'll have so easy a time of it.


are you questioning the entire premise of pure mathematics?


Good luck! That can be pretty hard to do when you're at the learning stage, and I would think doubly so given the LLM style where everything 'looks' very convincing.


To what extent is the training and structure of AlphaProof tailored specifically to IMO-type problems, which typically have short solutions using combinations of a small handful of specific techniques?

(It's not my main point, but it's always worth remembering - even aside from any AI context - that many top mathematicians can't do IMO-type problems, and many top IMO medalists turn out to be unable to solve actual problems in research mathematics. IMO problems are generally regarded as somewhat niche.)


The last statement is largely correct (though idk what the imo medalists that are unable to solve actual problems most mathematicians can't solve most open problems). But i kind of disagree with the assessment of imo problems--the search space is huge if it were as you say it would be easy to search.


No, I don't mean that the search space is small. I just mean that there are special techniques which are highly relevant for IMO-type problems. It'd be interesting to know how important that knowledge was for the design and training of AlphaProof.

In other words, how does AlphaProof fare on mathematical problems which aren't in the IMO style? (As such exceptions comprise most mathematical problems)


Probably less well? They rely heavily on the dataset of existing problems


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: