Full Dedicated GPGPU Virtualization with Mediated Pass-Through

Zusammenfassung

Over the last few years, the transition from traditional Graphics-centric GPU to general-purpose GPU (GPGPU) has been adapted for many applications with high graphic fidelity as well as computation- intensive workloads which fit the GPU SIMD (Single Instruction, Multiple Data) architecture well. This adoption has stimulated development of GPU cloud services, which allow sharing high performant GPU resources to multiple tenants. Cloud providers thus face the challenge to virtualize fast-developing modern GPGPU devices driven by increasingly more sophisticated graphics and computing API.

While modern CPU allow direct usage of devices (PCI pass-through) such as a GPU from virtual machines through an input/output memory management unit (IOMMU) and achieve near-native performance for GPUs, the solution assumes exclusive access and does not allow any form of multiplexing. Time-
division multiplexing as common approach for accelerators could be applied, however saving and restoring the full GPU hardware state greatly affects GPU performance. Another approach to allow GPU virtualization concentrates on forwarding high-level API calls to the host graphics stacks (API Forwarding). This enables round-robin multiplexing which is controlled by the host graphics driver, but introduces overhead for often highly frequent API calls due to the Hypervisor intervention. Most recent solutions introduce mediated pass-through to allow direct access to performance critical GPU resources while still allowing multiplexing. The implementations however are currently limited to vendor-specific GPU or certain used API.

In this presentation we present a concept to allow full virtualization of dedicated modern GPGPU for any graphics and computing API. This concept is based on mediation of memory allocation as well as command buffer creation at driver-level, thus enabling direct mapping in the address space of the virtual machine. Following access to these resources needs no intervention by the Hypervisor which greatly improves access times to these frequently accessed GPU resources due to reduced „VM Exits“.