CPU-GPU clusters have emerged as a dominant HPC platform, with the three of the four fastest supercomputers in the world falling in this category. The reasons for the popularity of these environments include their cost-effectiveness and energy efficiency. The need for exploiting both the CPU and GPU on each node of such platforms has created a renewed interest in heterogeneous computing [14]. Implementation of such a heterogeneous system on a cluster is a challenge.
At the same time, FREERIDE - a map-reduce like framework can be used efficiently to develop data-intensive applications on clusters and multi-core systems, because of its simplicity and robustness.
In this thesis, we are developing a heterogeneous implementation on a CPU-GPU cluster for a Monte Carlo Simulation application using FREERIDE - a map-reduce like framework based on the generalized reduction. We show through experiments, the support for enabling scalable and efficient implementation of data-intensive applications in a heterogeneous cluster of many-core GPUs and CPUs. Our contributions are 2 fold: 1) develop heterogeneous version of Monte Carlo application for distributed environment using FREERIDE APIs; 2) We present a new approach of load balancing between a CPU and a GPU on a node to better utilize the computing power of CPUs and/or GPUs.
We evaluate our heterogeneous implementation on a cluster. We show an almost linear speedup on this cluster over execution with 1 CPU core, 1 GPU core and a combination of 1 CPU and 1 GPU cores respectively. Our application also achieve an improvement of 20% by using CPUs and GPUs simultaneously, over the best performance achieved from using only one of the types of resources in the cluster using the new load balancing technique.