Scientific computing has seen an unprecedented growth in the recent years.The growth of high performance interconnects and the emergence of multi-core
processors have fueled this growth. Complement to the growing cluster sizes,
researchers have developed varied parallel programming models to harness the
power of larger clusters. Popular parallel programming models in use range from
traditional message passing and shared memory models to newer partitioned global
address space models. MPI, a de-facto programming model for distributed memory
machines was extended in MPI-2 to support two new programming paradigms: the MPI-2
dynamic process management interface and the MPI-2 remote memory access interface.
The MPI-2 dynamic process management provides MPI applications
the flexibility to dynamically alter the scale of the job by allowing applications to
spawn new processes, making way for a master/slave paradigm in MPI.
The MPI-2 remote memory access interface allows applications the illusion of
globally accessible memory. In this thesis, we study the two
MPI-2 programming interfaces and propose optimized designs for the them.
We design a low overhead connection-less transport based dynamic process interface and
demonstrate the effectiveness of our design using benchmarks. We address the design of
the remote memory interface on onload-ed InfiniBand using a DMA copy offload. Our design
of the remote memory interface provides for computation-copy overlap and minimal cache pollution.
The proposed designs are implemented and evaluated on InfiniBand, a modern interconnect which provides
a rich set of features. The designs developed as a part of this thesis are available
in MVAPICH2, a popular open-source implementation of MPI over InfiniBand used by over
900 organizations.