Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They are "cheap hacks" compared to a packet switched network that runs at that speed. Also, this is another reason why Google can't sell TPUs to other people - nobody would put up with managing this sort of switching network for something they bought. The equivalent for NVidia is to use HDR/NDR Infiniband, and it allows you to run a multi-tenant cluster a lot more efficiently, at no practical loss of performance (due to the marginally higher latency).


I don;'t think you can directly compare a reconfigurable optical switch with a packet switched network. A packet switched networks receives packets electrically, sends them through a processor, and then outputs the results on another port. This is a device that creates static paths between endpoints that can then be dynamically changed later.

It also has the advantage that when they move to multi-wavelenght, its performance will greatly exceed electrical packet-switching networks.


I disagree with you about comparing the two - I have some experience from trading firms which convinced me that they are not actually that different (for context, trading firms use a lot of layer 1 switching, and hacks that take place between layers 1 and 2).

If you think of packets like snakes going through a network, a layer 1 switching network creates tunnels for the snakes that you choose ahead of time (and can reconfigure whenever you want). A packet switched network creates tunnels that are chosen by the snakes. If you run a packet switched network, you can do everything you do with a layer 1 switched network by simply restricting which peers you send data to. On a hardware level, you need to convert from optics to electricity to do this, but you don't strictly need to do any buffering (the use of large switch buffers on Ethernet switches is because of Ethernet, not because of packet switching). Low-latency switches don't buffer unless they need to, and basically just read the header as it's coming in to choose a route for the packet.

EDR Infiniband networks could certainly handle TPU v4 levels of bandwidth in a packet-switched fashion (at the time when TPU v4 was being built and deployed), particularly when the packets are doing something as tame as going around a torus. It also gives you the flexibility to do other things, though.

It certainly raises the complexity of the system, but I assume sometime around TPU v6 or v7, Google will rediscover packet switching for inter-TPU links.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: