x86's age and complexity give it a significant disadvantage. Both the cache coherency model and the instruction set incur a lot of overhead to do at speed.
A lot of the most commonly run software out there doesn't use the large, complex instructions offered on x86, so a bunch of pristine silicon goes to waste. Use the space taken by AVX512 etc to make more, simpler cores, and you get more performance for the same price, or less cost for the same performance. Simpler cores are easier to clock higher with less voltage, and less likely to have defects that would pull down yields.
The big vector units aren't the problem though. They're a consequence of the big complicated schedulers that most x86 cores are designed with. As long as the core has to be huge anyway, you might as well spend some space on more powerful math units.
It's possible to design an x86 chip with much more priority on throughput per square centimeter, with many more simple cores working together, but I have no idea how it would work out.