Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.

Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.



> Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory.

It's not comparable to 4090 inference speed. It's significantly slower, because of the lack of MXFP4 models out there. Even compared to Ryzen AI 395 (ROCm / Vulkan), on gpt-oss-120B mxfp4, somehow DGX manages to lose on token generation (pp is faster though.

> Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

ROCm (v7) for APUs came a long way actually, mostly thanks to the community effort, it's quite competitive and more mature. It's still not totally user friendly, but it doesn't break between updates (I know the bar is low, but that was the status a year ago). So in comparison, the strix halo offers lots of value for your money if you need a cheap compact inference box.

Havn't tested finetuning / training yet, but in theory it's supported, not to forget that APU is extremely performany for "normal" tasks (threadripper level) compared to the CPU of the DGX Spark.


Yeah, good point on the FP4. I'm seeing people complain about INT8 as well, which ought to "just work", but everyone who has one (not many) is wary of wandering off the happy path.


This thing is dramatically slower than a 4090 both in prefill and decode. And I do mean DRAMATICALLY.

I have no immediate numbers for prefill, but the memory bandwidth is ~4x greater on a 4090 which will lead to ~4x faster decode.


This is kind of an embedded 5070 with a massive amount of relatively slow memory, don't expect miracles.


No need to put unified in scare quotes.


Given the likelihood you are bound by the 4x lower memory bandwidth this implies; at least for decode, I think they are warranted.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: