SHF:Small:Enabling Efficient Context Switching and Effective Latency Hiding in GPUs
Project runs from 08/01/2016 to 07/31/2019
This project investigates novel ways to enable efficient preemption and effective latency hiding in single-instruction multiple-thread (SIMT) processors such as graphics processing units (GPUs). With the advent of the big data era, there is an increasing demand for data processing. Given their high computational throughput and high memory access bandwidth, GPUs have been widely used, ranging from smartphones, cloud servers, to supercomputers. Although virtualization has been introduced to enable GPUs as shared resource, significant hurdles remain. First, due to the high number of concurrent threads, GPUs have a large context size. Consequently, state-of-art GPUs resort to techniques like draining to complete the actively running threads before context switching. This may incur significant delay and fail the required quality of service (QoS). Second, it is very common that applications fail to fully utilize the computational resource and achieve the peak performance. There are two fundamental reasons. (a) Each thread requires a non-trivial amount of resource. Therefore, only a limited number of threads can run concurrently even if applications themselves have abundant thread-level parallelism. Without a sufficiently high number of threads, the latency hiding capability of fine-grain multithreading is severely impaired. (b) Long latency operations, off-chip memory accesses in particular, need a very high number of concurrent threads to hide their latency. The on-chip resources, however, cannot accommodate such large numbers of concurrent threads.