Cloud Computing platforms (e.g, Amazon EC2 and Microsoft Azure) have been widely adopted by many users and organizations due to their high availability and scalable computing resources. By using virtualization technology, VM or container instances in a cloud can be constructed on bare-metal hosts for users to run their systems and applications whenever they need computational resources. This has significantly increased the flexibility of resource provisioning in clouds compared to the traditional resource management approaches. These days cloud computing has gained momentum in HPC communities, which brings us a broad challenge: how to design and build efficient HPC clouds with modern networking technologies and virtualization capabilities on heterogeneous HPC clusters?
Through the convergence of HPC and cloud computing, the users can get all the desirable features such as ease of system management, fast deployment, and resource sharing. However, many HPC applications running on the cloud still suffer from fairly low performance, more specifically, the degraded I/O performance from the virtualized I/O devices. Recently, a hardware-based I/O virtualization standard called Single Root I/O Virtualization (SR-IOV) has been proposed to help solve the problem, which makes SR-IOV achieve near-native I/O performance. Whereas SR-IOV lacks locality-aware communication support, which makes the communications across the co-located VMs or containers not able to leverage the shared memory backed communication mechanisms. To deliver high performance to the end HPC applications in the HPC cloud, we present a high-performance locality-aware and NUMA-aware MPI library over SR-IOV enabled InfiniBand clusters, which is able to dynamically detect the locality information on VM, container or even nested cloud environment and coordinate the data movements appropriately. The proposed design improves the performance of NAS by up to 43% over the default SR-IOV based scheme across 32 VMs, while incurring less 9% overhead compared with native performance. As one of the most attractive container technologies to build HPC clouds, we evaluate the performance of Singularity in various aspects including processor architectures, advanced interconnects, memory access modes, and the virtualization overhead. Singularity shows very little overhead for running MPI-based HPC applications.
SR-IOV is able to provide efficient sharing of high-speed interconnect resources and achieve near-native I/O performance, however, SR-IOV based virtual networks prevent VM migration, which is an essential virtualization capability towards high flexibility and availability. Although several initial solutions have been proposed in the literature to solve this problem, there are still many restrictions on these proposed approaches, such as depend- ing on the specific network adapters and/or hypervisors, which will limit the usage scope of these solutions on HPC environments. In this thesis, we propose a high-performance hypervisor-independent and InfiniBand driver-independent VM migration framework for MPI applications on SR-IOV enabled InfiniBand clusters, which is able to not only achieve fast VM migration but also guarantee the high performance for MPI applications during the migration in the HPC cloud. The evaluation results indicate that our proposed design could completely hide the migration overhead through the computation and migration overlapping.
In addition, the resource management and scheduling systems, such as Slurm and PBS, are widely used in the modern HPC clusters. In order to build efficient HPC clouds, some of the critical HPC resources, like SR-IOV enabled virtual devices and Inter-VM shared memory devices, need to be properly enabled and isolated among V Ms. We thus propose a novel framework, Slurm-V, which extends Slurm with virtualization-oriented capabilities to support efficiently running multiple concurrent MPI jobs on HPC clusters. The proposed Slurm-V framework shows good scalability and the ability of efficiently running concurrent MPI jobs on SR-IOV enabled InfiniBand clusters. To the best of our knowledge, Slurm-V is the first attempt to extend Slurm for the support of running concurrent MPI jobs with isolated SR-IOV and IVShmem resources.
On a heterogeneous HPC cluster, GPU devices have received significant success for parallel applications. In addition to highly optimized computation kernels on GPUs, the cost of data movement on GPU clusters plays critical roles in delivering high performance for the end applications. Our studies show that there is a significant demand to design high-performance cloud-aware GPU-to-GPU communication schemes to deliver the near-native performance on clouds. We propose C-GDR, the high-performance Cloud-aware GPUDirect communication schemes on RDMA networks. It allows communication runtime to successfully detect process locality, GPU residency, NUMA architecture information, and communication pattern to enable intelligent and dynamic selection of the best communication and data movement schemes on GPU-enabled clouds. Our evaluations show C-GDR can outperform the default scheme by up to 26% on HPC applications.