Title IOMMU-assisted Memory Management: Sharing Virtual-Memory Objects with PCIe Devices in the Linux Kernel Authors Kenny Albes Affiliation Fachgebiet System- und Rechnerarchitektur, Leibniz Universität Hannover Abstract In order to reduce the burden on the CPU and enable efficient data transfers, a large number of devices, such as network cards and SSDs, access the main memory autonomously using Direct Memory Access (DMA). In modern systems, this access is managed by an I/O Memory Management Unit (IOMMU), which provides memory virtualization and isolation for external devices. To share data between devices and processes, the corresponding mappings must be created and kept synchronized on both the MMU and the IOMMU. Existing methods, such as Shared Virtual Memory (SVM) and bounce buffers, are not always suitable or scale poorly with object size. One approach to sharing memory efficiently between processes are Morsels. Traditional memory management via paging divides the memory into individual 4 KiB pages, which results in a high management overhead for large amounts of memory. Morsels reduce this burden by treating subtrees of page tables as indivisible virtual memory objects, that can be shared between processes by simply mounting them. This work extends the morsel concept to the IOMMU. The concept is demonstrated using a prototype implementation in the Linux Kernel for the AMD IOMMU. To this end, the page tables of a morsel are shared between the MMU and IOMMU, which is made possible by compatible page table formats. Evaluation of the prototype shows that morsels can be used to map and unmap memory on the IOMMU several orders of magnitude faster than with existing methods. Especially larger memory objects benefit greatly from this, since the morsel operations can be carried out in constant time, regardless of the object size. This improvement is reflected in a reduction in kernel time from 20 percent to well under 5 percent for the evaluated scenarios. By minimizing modifications to page tables, morsels also show better scalability in parallelized scenarios. Compared to conventional DMA buffers in Linux, morsels allow for new use cases as they can be mapped on the IOMMU without being mapped into a process address space first.