Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm impressed by this one. I tried it on audio transcription with timestamps and speaker identification (over a 10 minute MP3) and drawing bounding boxes around creatures in a complex photograph and it did extremely well on both of those.

Plus it drew me a very decent pelican riding a bicycle.

Notes here: https://simonwillison.net/2025/Mar/25/gemini/



Have you considered that they must be training on images of pelicans driving bicycle's at this point ;-). At least given how often that comes up in your reviews, a smart LLM engineer might put their fingers on the scales a bit and optimize for those things that come up in reviews of their work a lot.


Claude's pelican is way better than Gemini's


I'm not so sure. I've run it a bunch of times. It makes a great pelican.

Personally I'm convinced this model is the best out there right now.

https://www.reddit.com/r/Bard/comments/1jjobaz/pelican_on_a_...


I think a competent 5yro could make a better pelican on a bicycle than that. Which to me feels like the hallmark of AI.

I mean, hell, I have drawings from when I was eight of leaves and they are botanically-accurate enough to still be used for plant identification, which itself is a very difficult task that people study decades for. I don't see why this is interesting or noteworthy, call me a neo-luddite if you must.


The complexity is that it's not a drawing : It's SVG. So it's code that must, in the end, display a pelican, so it's one step further.


I've been following your blog for a while now, great stuff!


I just tried your trademark benchmark on the new 4o Image Output, though it's not the same test:

https://imgur.com/a/xuPn8Yq


And the same thing with gemini 2.0 flash native image output.

https://imgur.com/a/V4YAkX5

It's sort of irrelevant though as the test is about SVGs.


Was that an actual SVG?


No that's GPT-4o native image output.


I wonder how far away we are from models which, given this prompt, generate that image in the first step in their chain-of-thought and then use it as a reference to generate SVG code.

It could be useful for much more than just silly benchmarks, there's a reason why physics students are taught to draw a diagram before attempting a problem.


Someone managed to get ChatGPT to render the image using GPT-4o, then save that image to a Code Interpreter container and run Python code with OpenCV to trace the edges and produce an SVG: https://bsky.app/profile/btucker.net/post/3lla7extk5c2u


Does this match the rules of your test, or is it cheating? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: