But those people usually have more system RAM than VRAM.
At those scales, most people become bandwidth and compute constrained using CPU inference instead of multiple GPUs. In those cases, an MOE with a low number of active parameters is the fastest.