Nowadays a modern radio telescope, let's take MeerKAT in South Africa, can produce maybe 10T of data per minute. The total bandwidth out of the site is orders of magnitude less, because it's in the middle of nowhere in the desert. Same thing for pretty much all the radio telescopes - the technology improves over time, and we capture orders of magnitude more data, but the internet connections to these remote locations stay very slow.
So we have to do most of the processing on site.
AI training is similar but not quite so extreme - when you try to use a bunch of consumer machines, bandwidth becomes the limiting factor.
If bandwidth wasn't a factor, the first rearchitecture that I think would make sense is to do most of the processing in a single datacenter somewhere. Having one datacenter in West Virginia, one in South Africa, one in New Mexico, one in the California mountains, it introduces a lot of overhead. The systems naturally drift their configurations over time, and nothing is quite as "write once run everywhere" as you'd like.
So we have to do most of the processing on site.
AI training is similar but not quite so extreme - when you try to use a bunch of consumer machines, bandwidth becomes the limiting factor.