CSR: Small: Middleware Technologies for Multi-Accelerator Clusters
Project runs from 06/15/2018 to 05/31/2021
Computing systems are increasingly becoming heterogeneous and leveraging many-core processors and reconfigurable accelerators along with general-purpose CPUs. While GPUs have been part of supercomputers for several years, more recently there has been an increased interest in adding FPGAs to data centers and high-performance computing clusters. A popular example is Microsoft’s Configurable Cloud, a cloud-scale FPGA-accelerated system (consisting of over 5,000 servers) originated from Microsoft’s Project Catapult. Meanwhile, in order to facilitate the adoption of FPGAs, there has been a push towards increasing their programmability through the use of programming models – like OpenCL – intended for multi- and many-core architectures. For example, both Xilinx and Intel are providing their own OpenCL-to-FPGA development toolchain and runtime system. This opens the way for enabling the transparent use of heterogeneous devices on single computers and clusters. These software stacks, however, are meant for the use of FPGAs in a dedicated environment. In addition, given this architectural variety, it becomes difficult for end-users to select the device most suited to their applications.
In this project, we aim to design and develop middleware technologies enabling the transparent and efficient use of diverse accelerator devices on computing systems including FPGAs and GPUs. This research builds of our previous work on the design of scheduling and virtualization techniques for heterogeneous GPU clusters, adding a layer of support for FPGAs and for the efficient combined use of these two accelerator devices. Specifically, this work will target the following issues: first, design of scheduling techniques allowing to map parallel applications on heterogeneous devices, possibly by transparently distributing them across multiple accelerators; second, design of a performance model for predicting the suitability of different compute kernels to different accelerators; third, the design of memory unification techniques to provide a simplified view of the underlying distributed memory system; forth, the study of the opportunity to increase FPGA utilization by space- and time-sharing these devices across applications.