Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Scalable and High Performance Collective Communication for Next Generation Multicore Infiniband Clusters

Mamidala, Amith Rajith

Abstract Details

2008, Doctor of Philosophy, Ohio State University, Computer and Information Science.

High Performance Computing is enabling rapid innovations spanning several key areas ranging from science, technology and manufacturing disciplines to entertainment and financial markets. One computing paradigm contributing significantly to the outreach of such capabilitiesis Cluster Computing. Cluster computing involves the use of multiple Commodity PCs interconnected by a network to provide the required computational resource in a cost-effective manner. Recently, commodity clusters are rapidly transforming into capability class machines with several of them featuring in the Top 10 list of supercomputers. The two primary drivers for this trend being: a) Advent of Multicore technology and b) Performance and Scalability of InfiniBand, an open standard based interconnection network. These two factors are ushering in an era of ultra-scale InfiniBand Multicore clusters comprising of tens of thousands of compute cores.

Utilizing Message Passing Interface (MPI) is the most popular method of programming parallel appplications. In this model, communication occurs via explicit exchange of data messages. MPI provides for plethora of communication primitives out of which Collective primtives are especially significant. These are extensively used in a variety of scientific and engineering applications (such as to compute fast fourier transforms and multiply large matrices, etc.). It is imperative that these collectives be designed efficiently to ensure good performance and scalability. MPI collectives pose several challenges and requirements in terms of guaranteeing data reliability, enabling efficient scalable means of data transfers and providing for process skew tolerance mechanisms. Moreover, the characteristics of underlying network and multicore systems directly impact the behavior of the collective operations and need to be taken into consideration for optimizing performance and resource usage.

In this dissertation, we take on these challenges to design a Scalable and High Performance Collective Communication subsystem for MPI over InfiniBand Multicore clusters. The central theme used in our approach is to have an in-depth understanding of the capabilities of underlying network/system architecture and leverage these to provide optimal design alternatives. Specifically, the dissertation describes novel communication protocols and algorithms utilizing a) InfiniBand's hardware Multicast, RDMA capabilities and b) System's shared memory to meet the stated requirements and challenges. Also, the collective optimizations discussed in the dissertation take into account the different transport methods of InfiniBand and the architectural attributes of Multicore systems. The designs proposed in the dissertation have been incorporated into the open source MVAPICH software used by more than 680 organizations worldwide. It is used in several cluster installations, and currently used by the world's third fastest supercomputer.

Dhabaleswar K Panda, PhD (Advisor)
P Sadayappan (Committee Member)
Feng Qin (Committee Member)
163 p.

Recommended Citations

Citations

  • Mamidala, A. R. (2008). Scalable and High Performance Collective Communication for Next Generation Multicore Infiniband Clusters [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211426632

    APA Style (7th edition)

  • Mamidala, Amith Rajith. Scalable and High Performance Collective Communication for Next Generation Multicore Infiniband Clusters. 2008. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1211426632.

    MLA Style (8th edition)

  • Mamidala, Amith Rajith. "Scalable and High Performance Collective Communication for Next Generation Multicore Infiniband Clusters." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1211426632

    Chicago Manual of Style (17th edition)