Skip to Main Content
 

Global Search Box

 
 
 
 

Files

File List

ETD Abstract Container

Abstract Header

Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors

Abstract Details

2013, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
High End Computing (HEC) has been growing dramatically over the past decades. The emerging multi-core systems, heterogeneous architectures and interconnects introduce various challenges and opportunities to improve the performance of communication middlewares and applications. The increasing number of processor cores and Co-Processors results in not only heavy contention on communication resources, but also much more complicated levels of communication patterns. Message Passing Interface (MPI) is the dominant parallel programming language for HPC application area in the past two decades. MPI has been very successful in implementing regular, iterative parallel algorithms with well defined communication pattern. Instead, the Partitioned Global Address Space (PGAS) programming model provides a flexible way for these applications to express parallelism. Different variations and combinations of these programming languages present new challenges in designing optimized programming model runtimes, in terms of efficient sharing of networking resources and efficient work-stealing techniques for computation load balancing across threads/processes, etc. Middlewares play a key role in delivering the benefits of new hardware techniques to support the new requirement from applications and programming models. This dissertation aims to study several critical contention problems of existing runtimes, which supports popular parallel programming models (MPI and UPC) on emerging multi-core/many-core systems. We start with shared memory contention problem within existing MPI runtime. Then we explore the network throughput congestion issue at node level, based on Unified Parallel C (UPC) runtime. We propose and implement lock-free multi-threaded runtimes for MPI/OpenMP and UPC with multi-endpoint support, respectively. Based on the multi-endpoint design, we further explore how to enhance MPI/OpenMP applications with transparent support for collective operations and minimal modifications for point-to-point operations. Finally we extend our multi-endpoint research to include GPU and MIC architecture for UPC and explore the performance features. Software developed as a part of this dissertation is available in MVAPICH2 and MVAPICH2-X. MVAPICH2 is a popular open-source implementation of MPI over InfiniBand and is used by hundreds of top computing sites all around the world. MVAPICH2-X supports both MPI and UPC hybrid programming models on InfiniBand clusters and is based on MVAPICH2 stack.
Dhabaleswar K. Panda (Advisor)
P. Sadayappan (Committee Member)
Radu Teodorescu (Committee Member)
192 p.

Recommended Citations

Citations

  • Luo, M. (2013). Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374197706

    APA Style (7th edition)

  • Luo, Miao. Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors. 2013. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1374197706.

    MLA Style (8th edition)

  • Luo, Miao. "Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand, Accelerators and Co-Processors." Doctoral dissertation, Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374197706

    Chicago Manual of Style (17th edition)