Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPUs are very complex. Nvidia pioneers in everything related to GPUs. A rich software stack, the most sophisticated Tensor Cores, and bleeding edge features like 8 bit floating point (FP8) support. And they are working with FP4 next. This matters because by halving the data size it almost doubles the flops (see Hopper specs [1]).

The compute is so powerful it creates bottlenecks in data loading. So they have SXM, Nvlink, and since Hopper smart data async load to Tensor Cores (TMA).

It's so advanced the ML software hasn't caught up yet. And it's not trivial to tile and schedule properly at these levels. (See FlashAttention [2])

I wouldn't be surprised if they stay on Hopper for a while and just crank up the bandwidth and bundle more GPUs together. They already released H100 NVL which is basically 2 H100s. And the H200 with faster High Bandwidth Memory (v3).

AMD and Intel are way behind and have nothing even remotely close in planning.

[1] https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor...

[2] https://crfm.stanford.edu/2023/07/17/flash2.html



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: