I think, quite frankly, AMD ceded the AI accelerator market to Nvidia the day th...

ece · on Sept 19, 2022

I wonder how hard it is to just sell a GPU and say it's CUDA compatible. Google built their own toolchain over PTX, AMD could do the same, and have CUDA compatibility if they wanted. I think the difference here just might be that Google's still buying A100s, while AMD wouldn't be.

HIP/ROCm support should absolutely be better supported on all AMD hardware for more adoption, instead it seems to barely register like OpenACC or Vulkan compute. Intel might have better luck with OpenAPI.

my123 · on Sept 19, 2022

For PTX:

For sm_70 onwards (which is the arch that comes with tensor cores) NVIDIA made the task significantly harder.

Those newer architectures use a separate instruction pointer per thread/lane, for notably supporting C++ atomics across threads in the same warp without deadlocks.

This doesn't match the semantics present on AMD GPUs.

For HIP/ROCm:

I think that they need an abstraction layer that can make a single slice of binary code that is usable across multiple gens.

Compounded by the fact that different dies have different binary slices on the AMD side, so that 6800 XT and 6700 XT run different code slices. ROCm only supports Navi21 cards for RDNA2, not the other ones...

For oneAPI, OpenCL SPIR-V fulfills that role.

dragontamer · on Sept 19, 2022

Reports are that the 6800XT runs ROCm pretty good now. I don't got the hardware, but it seems like it took AMD a few years to get things sorted over to RDNA / RDNA2.

my123 · on Sept 19, 2022

Navi21 (corresponding to the 6800/6800 XT/6900 XT customer cards) is supported, but 6700 XT and below are not.

slavik81 · on Sept 19, 2022

I've been talking with some of the folks on the ROCm compiler team about this. It seems that each Navi 2x processor was assigned a unique architecture number just in case an incompatibility was discovered. Nobody I talked to knew of any actual incompatibilities, though nobody had done any comprehensive testing either.

You can tell HSA to pretend your GPU is Navi 21 by setting an environment variable:

    export HSA_OVERRIDE_GFX_VERSION=10.3.0

This is not a configuration that has gone through any QA testing, so I couldn't in good conscience recommend buying a GPU to use in that way. However, if you already have a 6000 series desktop GPU and you always wanted to play around with PyTorch... maybe set that variable and give it a try.

my123 · on Sept 19, 2022

Yeah that's the workaround that some people use.

But you see the catch right? People buy hardware to have support from the manufacturer. The no QA part is very very bad. :/

Nobody wants to be the one troubleshooting issues all the time, and that can alone make an NVIDIA GPU worthwhile to buy over an AMD one.

Hopefully this gets fixed in the future.

and maybe some very big past mistakes too. See the G4ad instance on AWS. That runs on the Navi12 ASIC, which never got (proper) ROCm support. Wouldn't it be awesome if an AWS instance was available widely for people to test their software with ROCm? The hardware is already there...

buildbot · on Sept 19, 2022

It works excellently on my 6900xt for another anecdote.

FeepingCreature · on Sept 19, 2022

They could still participate in this space if they bring out a 32GB card first that has ROCm support.

slavik81 · on Sept 19, 2022

The Radeon Pro W6800 has 32 GB of memory and is officially supported by ROCm.

https://www.amd.com/en/products/professional-graphics/amd-ra...

FeepingCreature · on Sept 19, 2022

Noted! But I'm not sure if I should get that as a gaming card. The Radeon VII was more explicitly dual-use.

slavik81 · on Sept 19, 2022

Ah. The W6800 has a very different set of features and performance characteristics. The Radeon VII is a better choice than the W6800 for some workloads, so it's not a clear upgrade for someone like yourself.

FeepingCreature · on Sept 20, 2022

Ah, good to know. Yeah, that's why I'm holding out hope for the RX 7950 XT.

The goal is a card that is primarily capable of gaming (VR), but can also pull double duty for training and running moderately large networks. I think AMD systematically underestimate the importance of that niche.