Paravirtualization is relevant if the CPU does not support full virtualization - Xen was originally designed to use PV because it was very difficult to make x86 VMs fast because unmodified guest OSs required slow emulation. See section 2 of https://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xens...
You may be confusing the concept of hardware-assisted virtualization with full virtualization. Moreover, the famous Xen paper you've linked to never claimed that x86 didn't support full-virtualization because it did thanks to VMWare.
VMware didn't do full virtualization - it couldn't because x86 did not support it at that time! There were privileged instructions that did not cause a trap, and so which could not be virtualized using the normal trap-to-monitor technique. VMware used dynamic translation to JIT the guest kernel so that it did not execute privileged instructions, and so that it would run much faster than a simple emulator. This is explained in the Xen paper and also in https://inst.eecs.berkeley.edu//~cs252/sp17/papers/vmware.pd...
> VMware [10] and Connectix [8] both virtualize commodity PC hardware, allowing multiple operating systems to run on a single host. All of these examples implement a full virtualization of (at least a subset of) the underlying hardware, rather than paravirtualizing and presenting a modified interface to the guest OS.
And no, I haven't forgotten about binary translation. As you mention, it was only used to replace privileged instructions and not a full-blown CPU emulator.
VMware VMs still ran native CPU instructions, and the overhead incurred by translation is a whole different matter unrelated to my original point.