Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think, quite frankly, AMD ceded the AI accelerator market to Nvidia the day they decided not to offer consistent compute-API support throughout their product range. ROCm support is limited to a very select number of flagship products, and even flagship products from recent generations are unsupported.

The future users of your highest-end HPC accelerators start with affordable hardware they can develop and test against on their desktop. No sane developer jumps headfirst into your most expensive product (with the highest power/cooling infrastructure requirements) just to test compatibility and get familiar with your APIs.

Nvidia understands this, and you are able to run the same basic algorithms across everything from mid-range desktop GPUs to high-end HPC accelerators (performance varies, of course). Intel somewhat understands this, although has had major missteps in this area (e.g. limited AVX-512 support on desktop processors, poor developer story for Xeon Phi).



I wonder how hard it is to just sell a GPU and say it's CUDA compatible. Google built their own toolchain over PTX, AMD could do the same, and have CUDA compatibility if they wanted. I think the difference here just might be that Google's still buying A100s, while AMD wouldn't be.

HIP/ROCm support should absolutely be better supported on all AMD hardware for more adoption, instead it seems to barely register like OpenACC or Vulkan compute. Intel might have better luck with OpenAPI.


For PTX:

For sm_70 onwards (which is the arch that comes with tensor cores) NVIDIA made the task significantly harder.

Those newer architectures use a separate instruction pointer per thread/lane, for notably supporting C++ atomics across threads in the same warp without deadlocks.

This doesn't match the semantics present on AMD GPUs.

For HIP/ROCm:

I think that they need an abstraction layer that can make a single slice of binary code that is usable across multiple gens.

Compounded by the fact that different dies have different binary slices on the AMD side, so that 6800 XT and 6700 XT run different code slices. ROCm only supports Navi21 cards for RDNA2, not the other ones...

For oneAPI, OpenCL SPIR-V fulfills that role.


Reports are that the 6800XT runs ROCm pretty good now. I don't got the hardware, but it seems like it took AMD a few years to get things sorted over to RDNA / RDNA2.


Navi21 (corresponding to the 6800/6800 XT/6900 XT customer cards) is supported, but 6700 XT and below are not.


I've been talking with some of the folks on the ROCm compiler team about this. It seems that each Navi 2x processor was assigned a unique architecture number just in case an incompatibility was discovered. Nobody I talked to knew of any actual incompatibilities, though nobody had done any comprehensive testing either.

You can tell HSA to pretend your GPU is Navi 21 by setting an environment variable:

    export HSA_OVERRIDE_GFX_VERSION=10.3.0
This is not a configuration that has gone through any QA testing, so I couldn't in good conscience recommend buying a GPU to use in that way. However, if you already have a 6000 series desktop GPU and you always wanted to play around with PyTorch... maybe set that variable and give it a try.


Yeah that's the workaround that some people use.

But you see the catch right? People buy hardware to have support from the manufacturer. The no QA part is very very bad. :/

Nobody wants to be the one troubleshooting issues all the time, and that can alone make an NVIDIA GPU worthwhile to buy over an AMD one.

Hopefully this gets fixed in the future.

and maybe some very big past mistakes too. See the G4ad instance on AWS. That runs on the Navi12 ASIC, which never got (proper) ROCm support. Wouldn't it be awesome if an AWS instance was available widely for people to test their software with ROCm? The hardware is already there...


It works excellently on my 6900xt for another anecdote.


They could still participate in this space if they bring out a 32GB card first that has ROCm support.


The Radeon Pro W6800 has 32 GB of memory and is officially supported by ROCm.

https://www.amd.com/en/products/professional-graphics/amd-ra...


Noted! But I'm not sure if I should get that as a gaming card. The Radeon VII was more explicitly dual-use.


Ah. The W6800 has a very different set of features and performance characteristics. The Radeon VII is a better choice than the W6800 for some workloads, so it's not a clear upgrade for someone like yourself.


Ah, good to know. Yeah, that's why I'm holding out hope for the RX 7950 XT.

The goal is a card that is primarily capable of gaming (VR), but can also pull double duty for training and running moderately large networks. I think AMD systematically underestimate the importance of that niche.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: