Title
GPU Resource Sharing in Interactive User Sessions

Authors
Felix Grzelka

Affiliation
Hasso-Plattner-Institut

Abstract
Developing deep learning models is a trial-and-error process. Exploratory programming is therefore often used for data preparation, prototyping, and testing models. Most of the time is spent editing and reasoning about code. While most preprocessing can be executed on the CPU, GPU acceleration is preferable even for small experiments to reduce the response time by orders of magnitude. State-of-the-art deep-learning frameworks treat the GPU as dedicated. They tend to over-allocate memory and keep it until the program exits. From a data-center operator's perspective, GPUs are an expensive resource and thus are often shared among different developers on a coarse timescale, giving them exclusive FIFO access (for a limited time frame). Editing phases in exploratory programming, thus limit the overall throughput of users' compute requests. To overcome this problem, we propose to isolate and automatically manage GPU-intensive sections of exploratory programming sessions. This enables to us to interleave the editing and execution phases of different users. We model the underlying queuing problem using a continuous-time Markov chain (CTMC). Using this model, we investigate and compare the impact of (i) the number of users, (ii) the (average) job length, and (iii) the job arrival rate between existing exclusive systems and our proposed solution, on (a) throughput, (b) average and worst waiting time, (c) average job completion time, and (d) utilization. We provide a prototypical implementation in the form of a plugin for the popular interactive computing environment Jupyter.