I don't mean to be cynical, but I don't think these points matter as much as you think, at least not in practice. The hardest part of a proof is working out the intermediate steps; joining them up is often trivial, even for a student. So even if it works out a few good steps or finds an effective theorem to apply, and does so only every one in a hundred prompts, the time savings can be significant.
I should know, I've been using LLM thinking models to help brainstorm ideas for stickier proofs. It's been more successful at discovering esoteric entry points than I would like to admit.
I should know, I've been using LLM thinking models to help brainstorm ideas for stickier proofs. It's been more successful at discovering esoteric entry points than I would like to admit.