High End Computing (HEC) has been growing dramatically over the past decade. Both High
Performance Computing (HPC) and Cloud Computing clusters are becoming increasingly homogeneous
in their hardware platforms. Meanwhile both domains are facing a common challenge
caused by the IO subsystem.
This dissertation aims at designing an efficient IO middleware that can largely mitigate the
aforementioned IO bottleneck. We design a write-aggregation scheme to reduce IO overhead
during C/R activities. Based on that a hierarchical data staging framework is designed to substantially
reduce the time cost of C/R. To tackle the performance issue in Process Migration, we
propose a new protocol, Pipelined Process Migration with RDMA (PPMR), that fully pipelines
data writing, data transfer, and data read operations during different phases of a migration cycle.
Additionally we explore how to adopt Solid State Disk (SSD) into the IO stack to leverage SSD’s
superior performance. We extend the SSD Flash Translation Layer to provide a new IO primitive
called Atomic Write, which significantly improves the database performance by removing the
overhead of the atomic completion guarantee for a group of discrete IO requests. We also propose
an SSD-Assisted Hybrid Memory that expands RAM with SSD to make available a large amount
of memory that can be used as an efficient data caching layer for datacenters.