Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>I think OpenAI may have taken the medal for best available open weight model back from the Chinese AI labs

I have a bunch of scripts that use tool calling. Qwen-3-32B handles everything flawlessly at 60 tok/sec. Gpt-oss-120B breaks in some cases and runs at mere 35 tok/sec (doesn't fit on the GPU).

But I hope there's still some ironing out to do in llama.cpp and in the quants. So far it feels lackluster compared to Qwen3-32B and GLM-4.5-Air



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: