Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects

Potluri, Sreeram

Abstract Details

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Accelerators (such as NVIDIA GPUs) and coprocessors (such as Intel MIC/Xeon Phi) are fueling the growth of next-generation ultra-scale systems that have high compute density and high performance per watt. However, these many-core architectures cause systems to be heterogeneous by introducing multiple levels of parallelism and varying computation/communication costs at each level. Application developers also use a hierarchy of programming models to extract maximum performance from these heterogeneous systems. Models such as CUDA, OpenCL, LEO, and others are used to express parallelism across accelerator or coprocessor cores, while higher level programming models such as MPI or OpenSHMEM are used to express parallelism across a cluster. The presence of multiple programming models, their runtimes and the varying communication performance at different levels of the system hierarchy has hindered applications from achieving peak performance on these systems. Modern interconnects such as InfiniBand, enable asynchronous communication progress through RDMA, freeing up the cores to do useful computation. MPI and PGAS models offer one-sided communication primitives that extract maximum performance, minimize process synchronization overheads and enable better computation and communication overlap using the high performance networks. However, there is limited literature available to guide scientists in taking advantage of these one-sided communication semantics on high-end applications, more so on heterogeneous clusters. In our work, we present an enhanced model, MVAPICH2-GPU, to use MPI for data movement from both CPU and GPU memories, in a unified manner. We also extend the OpenSHMEM PGAS model to support such unified communication. These models considerably simplify data movement in MPI and OpenSHMEM applications running on GPU clusters. We propose designs in MPI and OpenSHMEM runtimes to optimize data movement on GPU clusters, using state-of-the-art GPU technologies such as CUDA IPC and GPUDirect RDMA. Further, we introduce PRISM, a proxy-based multi-channel framework that enables an optimized MPI library for communication on clusters with Intel Xeon Phi co-processors. We evaluate our designs using micro-benchmarks, application kernels and end-applications. We present the re-design of a petascale seismic modeling code to demonstrate the use of one-sided semantics in end-applications and their impact on performance. We finally demonstrate the benefits of using one-sided semantics on heterogeneous clusters.
Dhabaleswar K. Panda (Advisor)
Ponnuswamy Sadayappan (Committee Member)
Radu Teodorescu (Committee Member)
Karen Tomko (Committee Member)
209 p.

Recommended Citations

Citations

  • Potluri, S. (2014). Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397797221

    APA Style (7th edition)

  • Potluri, Sreeram. Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1397797221.

    MLA Style (8th edition)

  • Potluri, Sreeram. "Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397797221

    Chicago Manual of Style (17th edition)