Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who hot on early on the Ryzen AI 395+, are there any added value for the DGX Spark beside having cuda (compared to ROCm/vulkan)? I feel Nvidia fumbled the marketing, either making it sound like an inference miracle, or a dev toolkit (then again not enough to differentiate it from the superior AGX Thor).

I am curious about where you find its main value, and how would it fit within your tooling, and use cases compared to other hardware?

From the inference benchmarks I've seen, a M3 Ultra always come on top.



M3 Ultra has slow GPU and no HW FP4 support so its initial token decoding is going to be slow, practically unusable for 100k+ context sizes. For token generation that is memory bound M3 Ultra would be much faster, but who wants to wait 15 minutes to read the context? Spark will be much faster for initial token processing, giving you a much better time to first token, but then 3x slower (273 vs 800GB/s) in token generation throughput. You need to decide what is more important for you. Strix Halo is IMO the worst of both worlds at the moment due to having the worst specs in both dimensions and the least mature software stack.


This is 100% the truth, and I am really puzzled to see people push Strix Halo so much for local inference. For about $1200 more you can just build a DDR5 + 5090 machine that will crush a Strix Halo with just about every MoE model (equal decode and 10-20x faster prefill for large, and huge gaps for any MoE that fits in 32GB VRAM). I'd have a lot more confidence in reselling a 5090 in the future than a Strix Halo machine, too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: