If - like me - you don't have a clue what vGPU is: https://www.nvidia.com/en-us/...

lovedswain · on April 9, 2021

It instantiates multiple logical PCI adaptors for a single physical adaptor. The logical adaptors can then be mapped into VMs which can directly program a hardware-virtualized view of the graphics card. Intel has the same feature in their graphics and networking chips

ur-whale · on April 9, 2021

Thanks for the explanation, but that's more of a "this is how it works" than a "this is why it's useful".

What would be the main use case?

jowsie · on April 9, 2021

Same as any hypervisor/virtual machine setup. Sharing resources. You can build 1 big server with 1 big GPU and have multiple people doing multiple things on it at once, or one person using all the resources for a single intensive load.

ur-whale · on April 9, 2021

Thanks, this is a concise answer.

However, I was under the impression - at least on Linux - that I could run multiple workloads in parallel on the same GPU without having to resort to vGPU.

I seem to be missing something.

skykooler · on April 9, 2021

You can, but only directly under that OS. If you wanted to run, say, a Windows VM to run a game that doesn't work in Wine, you'd need some way to give a virtual GPU to the virtual machine. (As it is now, the only way you'd be able to do this is to have a separate GPU that's dedicated to the VM and pass that through entirely.)

hesk · on April 9, 2021

In addition to the answer by skykooler, virtual GPUs also allow you to set hard resource limits (e.g., amount of L2 cache, number of streaming multiprocessors), so different workloads do not interfere with each other.

antattack · on April 9, 2021

If you are running Linux in a VM, vGPU will allow acceleration with OpenGL, WebGL, Vulcan applications like games, CAD, CAM, EDA, for example.

cosmie · on April 9, 2021

This[1] may help.

What you're saying is true, but it's generally using either the API remoting or device emulation methods mentioned on that wiki page. In those cases, the VM does not see your actual GPU device, but emulated device provided by the VM software. I'm running Windows within Parallels on a Mac, and here[2] is a screenshot showing the different devices each sees.

In the general case, the multiplexing is all software based. The guest VM talks to the an emulated GPU, the virtualized device driver then passes those to the hypervisor/host, which then generates equivalent calls on to the GPU, then back up the chain. So while you're still ultimately using the GPU, the software-based indirection introduces a performance penalty and potential bottleneck. And you're also limited to the cross-section of capabilities exposed by your virtualized GPU driver, hypervisor system, and the driver being used by that hypervisor (or host OS, for Type 2 hypervisors). The table under API remoting shows just how varied 3D acceleration support is across different hypervisors.

As an alternative to that, you can use fixed passthrough to directly expose your physical GPU to the VM. This lets you tap into the full capabilities of the GPU (or other PCI device), and achieves near native performance. The graphics calls you make in the VM now go directly to the GPU, cutting out game of telephone that emulated devices play. Assuming, of course, your video card drivers aren't actively trying to block you from running within a VM[3].

The problem is that when a device is assigned to a guest VM in this manner, that VM gets exclusive access to it. Even the host OS can't use it while its assigned to the guest.

This article is about the fourth option – mediated passthrough. The vGPU functionality enables the graphics card to expose itself as multiple logical interfaces. So every VM gets its own logical interface to the GPU and send calls directly to the physical GPU like it does in normal passthrough mode, and the hardware handles the multiplexing aspect instead of the host/hypervisor worrying about it. Which gives you the best of both worlds.

[1] https://en.wikipedia.org/wiki/GPU_virtualization

[2] https://imgur.com/VMAGs5D

[3] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...

wmf · on April 9, 2021

The use case is allowing the host system and VM(s) to access the same GPU at the same time.

jandrese · on April 9, 2021

You have a Linux box but you want to play a game and it doesn't work properly under Proton, so you spin up a Windows VM to play it instead.

The host still wants access to the GPU to do stuff like compositing windows and H.265 encode/decode.

skykooler · on April 9, 2021

And outputting anything to the screen in general. Usually, your monitor(s) are plugged into the ports on the GPU.

ur-whale · on April 9, 2021

Yeah, I got that from the technical explanation.

What's the practical use case, as in, when would I need this?

[EDIT]: To maybe ask a better way: will this practically help me train my DNN faster?

Or if I'm a cloud vendor, will this allow me to deploy cheaper GPU for my users?

I guess I'm asking about the economic value of the hack.

Sebb767 · on April 9, 2021

> To maybe ask a better way: will this practically help me train my DNN faster?

Probably not. It will only help you if you previously needed to train it on a CPU because you were in a VM, but this seems unlikely. It will not speed up your existing GPU in any way compared to simply using it bare-metal right now.

> Or if I'm a cloud vendor, will this allow me to deploy cheaper GPU for my users?

Yes. This ports a feature from the XXXX$-range of GPUs to the XXX$-range of GPUs. Since the performance of those is similar or nearly similar, you can save a lot of money this way. It will also make the entry costs to the market lower (i.e. now a hypervisor could be sub-1k$, if you go for cheap parts).

On the other hand, a business selling GPU time to customer will probably not want to rely on a hack (especially since there's a good chance it's violating NVidias license), so unless you're building your on HW, your bill will probably not drop. But if you're an ML startup or a hobbyist, you can now cheap out on/actually afford this kind of setup.

lovedswain · on April 9, 2021

Running certain ML models in VMs

Running CUDA in VMs

Running transcoders in VMs

Running <anything that needs a GPU> in VMs

ur-whale · on April 9, 2021

This is the exact same information you posted above.

Please see my edit.

kjjjjjjjjjjjjjj · on April 9, 2021

4 people sharing 1 CPU and 1 GPU that is running a hypervisor with separate installations of windows for gaming

Basically any workload that requires sharing a GPU between discrete VMs