I'm impressed by this one. I tried it on audio transcription with timestamps and...

jillesvangurp · 2025-03-26T10:23:59 1742984639

Have you considered that they must be training on images of pelicans driving bicycle's at this point ;-). At least given how often that comes up in your reviews, a smart LLM engineer might put their fingers on the scales a bit and optimize for those things that come up in reviews of their work a lot.

redox99 · 2025-03-26T02:16:07 1742955367

Claude's pelican is way better than Gemini's

jonomacd · 2025-03-26T07:48:40 1742975320

I'm not so sure. I've run it a bunch of times. It makes a great pelican.

Personally I'm convinced this model is the best out there right now.

https://www.reddit.com/r/Bard/comments/1jjobaz/pelican_on_a_...

fao_ · 2025-03-26T14:55:01 1743000901

I think a competent 5yro could make a better pelican on a bicycle than that. Which to me feels like the hallmark of AI.

I mean, hell, I have drawings from when I was eight of leaves and they are botanically-accurate enough to still be used for plant identification, which itself is a very difficult task that people study decades for. I don't see why this is interesting or noteworthy, call me a neo-luddite if you must.

ashenke · 2025-03-26T14:57:24 1743001044

The complexity is that it's not a drawing : It's SVG. So it's code that must, in the end, display a pelican, so it's one step further.

ggeorgovassilis · 2025-03-26T13:59:11 1742997551

I've been following your blog for a while now, great stuff!

kridsdale3 · 2025-03-25T20:56:06 1742936166

I just tried your trademark benchmark on the new 4o Image Output, though it's not the same test:

https://imgur.com/a/xuPn8Yq

jonomacd · 2025-03-26T07:46:47 1742975207

And the same thing with gemini 2.0 flash native image output.

https://imgur.com/a/V4YAkX5

It's sort of irrelevant though as the test is about SVGs.

Unroasted6154 · 2025-03-25T21:21:54 1742937714

Was that an actual SVG?

simonw · 2025-03-25T21:36:16 1742938576

No that's GPT-4o native image output.

sebzim4500 · 2025-03-25T22:27:36 1742941656

I wonder how far away we are from models which, given this prompt, generate that image in the first step in their chain-of-thought and then use it as a reference to generate SVG code.

It could be useful for much more than just silly benchmarks, there's a reason why physics students are taught to draw a diagram before attempting a problem.

simonw · 2025-03-25T22:46:50 1742942810

Someone managed to get ChatGPT to render the image using GPT-4o, then save that image to a Code Interpreter container and run Python code with OpenCV to trace the edges and produce an SVG: https://bsky.app/profile/btucker.net/post/3lla7extk5c2u

qingcharles · 2025-03-26T16:31:56 1743006716

Does this match the rules of your test, or is it cheating? :)