Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Or you can swap to DRAM and stream over PCI. I do that all the time and it works fine, especially since you rarely need to load all 70gb into memory at once.

If you're a serious (read: enterprise) customer, you can buy InfiniBand-enabled cards and get duplex bandwidth faster than the M2 Max's entire memory bus. 'Unified memory' isn't even a bullet point on their spec sheet, it means nothing to their customers when they have CUDA primatives that do the same thing faster at a larger scale.



> and get duplex bandwidth faster than the M2 Max's entire memory bus.

Doesn't seem like it. Looking at the wikipedia article for the M2 Max, that says:

    ... with up to 400 GB/s memory bandwidth.
https://en.wikipedia.org/wiki/Apple_silicon#Apple_M2_Max

Looking at the current "Nvidia networking" product page, it lists ConnectX-7 adapters with 400Gb/s total bandwidth:

https://www.nvidia.com/en-us/networking/infiniband-adapters/

So, accounting for the bit -> byte unit difference in those figures it seems like having eight ConnectX-7 adapters would roughly match up.

However, the data sheet for those adapters seems to say those 400Gb/s adapters are for PCIe Gen5 x32 slots (not the standard x16 ones).

    Host interface: PCIe Gen5, up to x32 lanes
Eight slots of x32 lanes doesn't quite seem possible, even with very latest generation AMD EPYC (9004 series) processors:

https://www.amd.com/system/files/documents/epyc-9004-series-...

Those have 128 PCIe Gen5 lanes per cpu. Dual socket systems seem to expand that out a bit, allowing up to 160x usable PCIe lanes in a server.

https://www.servethehome.com/pcie-lanes-and-bandwidth-increa...

So, at least on paper it doesn't seem possible for a single server with a bunch of PCIe Infiniband links to actually match the bandwidth of the M2 Max memory bus. Maybe 3/4 of it though, which isn't terrible. :)


InfiniBand is intended to be run in parallel. You could handily exceed the M2 Ultra bandwidth if you ran 4 channels of IB, but rarely do people need more bandwidth than what PCI offers. At least for AI.


> You could handily exceed the M2 Ultra bandwidth if you ran 4 channels of IB ...

Hmmm, that doesn't seem to be the case?

Those adapters are 400Gb/s "total bandwidth" each. Not 400Gb/s "each way". And you can't get 8 of those adapters into a server (with x32 links anyway).

Where's my calculation going wrong? :)


> And you can't get 8 of those adapters into a server (with x32 links anyway).

Nvidia's DGX-100 system promises over 3.2Tb/s across 10x Mellanox ConnectX-6 cards. That's their old Epyc system too, with the Grace Superchip you can get 900GB/s of card interconnect and 1TB/s of memory bandwidth. Either one could exceed the bandwidth of the M2 Max's memory.

The "each way" shtick is worth contesting, and ultimately comes down to how you use Nvidia's CUDA primatives. Agree to disagree on that - however I think my point still stands. "Unified memory architecture" is an anachronism at that scale, literally rendered obsolete by a unified address space and fast enough interconnect.


Cool. Yeah that's an interesting point about the DGX-100 system. I'd forgotten about those. :)

Looking at the data sheet, it seems to have a maximum of 8 single port Nvidia ConnectX-7 adapters:

https://resources.nvidia.com/en-us-dgx-systems/ai-enterprise...

Bearing in mind the unit conversions (Tb/s vs GB/s), 3.2Tb/s matches up with the 400GB/s bandwidth of the M2 Max's memory bandwidth.

---

Their newly announced Grace and/or Grace Hopper "Superchip" does seem interesting. Haven't (yet) seen now it's supposed to connect to other infrastructure though.

Their whitepaper talks about "OEM-defined I/O" but doesn't (in my skimming thus far) indicate what the upper bounds are.

May look further later on, but we're pretty far into the weeds already. ;)

---

Further along in the whitepaper, it says the "NVLink Switch System" in them communicates with the network at 900GB/s "total bandwidth". If that's indeed the case, they yep they're beating the M2 Max's memory bandwidth (400GB/s).

That even beats the M2 Ultra's memory bandwidth (800GB/s):

https://en.wikipedia.org/wiki/Apple_silicon#Apple_M2_Ultra




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: