> It might be an excellent model, but reading the exact text repeatedly is takin...

flir · 2025-03-25T17:35:03 1742924103

Man, I hope those benchmarks actually measure something.

Legend2440 · 2025-03-25T17:40:49 1742924449

I would say they are a fairly good measure of how well the model has integrated information from pretraining.

They are not so good at measuring reasoning, out-of-domain performance, or creativity.

Workaccount2 · 2025-03-25T17:38:59 1742924339

Sooner or later someone is going to find "secret sauce" that provides a step-up in capability, and it will be closely guarded by whoever finds it.

As big players look to start monetizing, they are going to desperately be searching for moats.

bangaladore · 2025-03-25T17:43:53 1742924633

Reasoning was supposed to be that for "Open" AI, that's why they go to such lengths to hide the reasoning output. Look how that turned out.

Right now, in my opinion, OpenAI has actually a useful deep research feature which I've found nobody else matches. But there is no moat to be seen there.

CamperBob2 · 2025-03-25T17:49:48 1742924988

If you've seen DeepSeek R1's <think> output, you'll understand why OpenAI hides their own. It can be pretty "unsafe" relative to their squeaky-clean public image.

stepanhruda · 2025-03-25T23:13:17 1742944397

They don’t hide reasoning output anymore?

bangaladore · 2025-03-27T17:12:46 1743095566

I was looking at this the other day. I'm pretty sure OpenAI run the internal reasoning into a model that purges the reasoning and makes it worse to train other models from.

I might be mistaken, but originally the reasoning was fully hidden? Or maybe it was just far more aggressively purged. I agree that today the reasoning output seems higher quality then originally.

cratermoon · 2025-03-25T18:46:01 1742928361

Sooner or later someone is going to find the "secret sauce" that allows building a stepladder tall enough to reach the moon.

It's called the "first step fallacy", and AI hype believers continue to fall for it.