>I think OpenAI may have taken the medal for best available open weight model back from the Chinese AI labs
I have a bunch of scripts that use tool calling. Qwen-3-32B handles everything flawlessly at 60 tok/sec. Gpt-oss-120B breaks in some cases and runs at mere 35 tok/sec (doesn't fit on the GPU).
But I hope there's still some ironing out to do in llama.cpp and in the quants. So far it feels lackluster compared to Qwen3-32B and GLM-4.5-Air
I have a bunch of scripts that use tool calling. Qwen-3-32B handles everything flawlessly at 60 tok/sec. Gpt-oss-120B breaks in some cases and runs at mere 35 tok/sec (doesn't fit on the GPU).
But I hope there's still some ironing out to do in llama.cpp and in the quants. So far it feels lackluster compared to Qwen3-32B and GLM-4.5-Air