It instantiates multiple logical PCI adaptors for a single physical adaptor. The logical adaptors can then be mapped into VMs which can directly program a hardware-virtualized view of the graphics card. Intel has the same feature in their graphics and networking chips
Same as any hypervisor/virtual machine setup. Sharing resources. You can build 1 big server with 1 big GPU and have multiple people doing multiple things on it at once, or one person using all the resources for a single intensive load.
However, I was under the impression - at least on Linux - that I could run multiple workloads in parallel on the same GPU without having to resort to vGPU.
You can, but only directly under that OS. If you wanted to run, say, a Windows VM to run a game that doesn't work in Wine, you'd need some way to give a virtual GPU to the virtual machine. (As it is now, the only way you'd be able to do this is to have a separate GPU that's dedicated to the VM and pass that through entirely.)
In addition to the answer by skykooler, virtual GPUs also allow you to set hard resource limits (e.g., amount of L2 cache, number of streaming multiprocessors), so different workloads do not interfere with each other.
What you're saying is true, but it's generally using either the API remoting or device emulation methods mentioned on that wiki page. In those cases, the VM does not see your actual GPU device, but emulated device provided by the VM software. I'm running Windows within Parallels on a Mac, and here[2] is a screenshot showing the different devices each sees.
In the general case, the multiplexing is all software based. The guest VM talks to the an emulated GPU, the virtualized device driver then passes those to the hypervisor/host, which then generates equivalent calls on to the GPU, then back up the chain. So while you're still ultimately using the GPU, the software-based indirection introduces a performance penalty and potential bottleneck. And you're also limited to the cross-section of capabilities exposed by your virtualized GPU driver, hypervisor system, and the driver being used by that hypervisor (or host OS, for Type 2 hypervisors). The table under API remoting shows just how varied 3D acceleration support is across different hypervisors.
As an alternative to that, you can use fixed passthrough to directly expose your physical GPU to the VM. This lets you tap into the full capabilities of the GPU (or other PCI device), and achieves near native performance. The graphics calls you make in the VM now go directly to the GPU, cutting out game of telephone that emulated devices play. Assuming, of course, your video card drivers aren't actively trying to block you from running within a VM[3].
The problem is that when a device is assigned to a guest VM in this manner, that VM gets exclusive access to it. Even the host OS can't use it while its assigned to the guest.
This article is about the fourth option – mediated passthrough. The vGPU functionality enables the graphics card to expose itself as multiple logical interfaces. So every VM gets its own logical interface to the GPU and send calls directly to the physical GPU like it does in normal passthrough mode, and the hardware handles the multiplexing aspect instead of the host/hypervisor worrying about it. Which gives you the best of both worlds.
> To maybe ask a better way: will this practically help me train my DNN faster?
Probably not. It will only help you if you previously needed to train it on a CPU because you were in a VM, but this seems unlikely. It will not speed up your existing GPU in any way compared to simply using it bare-metal right now.
> Or if I'm a cloud vendor, will this allow me to deploy cheaper GPU for my users?
Yes. This ports a feature from the XXXX$-range of GPUs to the XXX$-range of GPUs. Since the performance of those is similar or nearly similar, you can save a lot of money this way. It will also make the entry costs to the market lower (i.e. now a hypervisor could be sub-1k$, if you go for cheap parts).
On the other hand, a business selling GPU time to customer will probably not want to rely on a hack (especially since there's a good chance it's violating NVidias license), so unless you're building your on HW, your bill will probably not drop. But if you're an ML startup or a hobbyist, you can now cheap out on/actually afford this kind of setup.
https://www.nvidia.com/en-us/data-center/virtual-solutions/
TL;DR: seems to be something useful for deploying GPUs in the cloud, but I may not have understood fully.