I regularly have the opposite experience: o3 is almost unusable, and Gemini 2.5 ...

Davidzheng · 2025-07-21T17:57:57 1753120677

I have this distinctive feeling that o3 tries to trick me intentionally when it can't solve a problem by cleverly hiding its mistakes. But I could be imagining it

int_19h · 2025-07-21T18:45:06 1753123506

It's certainly the "laziest" model, in the sense that it seems to be the likeliest to avoid doing the actual work and generate "TBD" stubs instead.

helloplanets · 2025-07-21T18:14:34 1753121674

Are you using a tool other than ChatGPT? If so, check the full prompt that's being sent. It can sometimes kneecap the model.

Tools having slightly unsuitable built in prompts/context sometimes lead to the models saying weird stuff out of the blue, instead of it actually being a 'baked in' behavior of the model itself. Seen this happen for both Gemini 2.5 Pro and o3.

square_usual · 2025-07-21T18:25:33 1753122333

Are you using o3 on the official ChatGPT app or via API? I use it on the app and it performs very well, it's my go-to model for general purpose LLM use.

erichocean · 2025-07-21T21:34:48 1753133688

official ChatGPT app