Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Locality Optimizations for Regular and Irregular Applications

Rajbhandari, Samyam

Abstract Details

2016, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
The fastest supercomputer in the world as of July 2016 is the Sunway TaihuLight. It can achieve a staggering performance of 93 PetaFlops. This incredible performance is achieved via massive parallelism. Today’s supercomputers and compute clusters have tens of thousands of distributed memory nodes with each node comprised of several shared memory multi/many core processors. Scaling on these massively parallel systems is not an easy task. A major performance and scalability bottleneck is the limited data movement bandwidth, which can be orders of magnitude smaller than the computation bandwidth. Developing applications to scale on these massively parallel systems requires minimizing data movement volume at different levels of memory hierarchy using locality optimization techniques. Locality optimization aims to reduce the data movement between slow and fast memory by rescheduling/remapping the original computation to reuse the data once it is in fast memory, thereby avoiding subsequent movement of the same data from slow memory. This dissertation explores multiple aspects of locality optimizations for enhancing scalability and performance of various regular and irregular applications on massively parallel computing environment. It develops distributed algorithms, lower bound techniques, and compiler and runtime frameworks for optimizing Tensor Contractions, Four-Index Transform, Convolutional Neural Networks (CNNs), and Recursive Tree Traversal on k-d trees. Each of these application domains is limited in performance and scalability primarily by data movement costs at a particular level of memory hierarchy. To be specific, on a massively parallel system, distributed Tensor Contractions can have limited scalability due to the cost of communication between distributed memory nodes. The Four-Index Transform, on the other hand, can be limited in the size of the largest problem that can be completed in a reasonable amount of time due to data transfer cost from disk to memory. On a multi-core CPU, the state-of-art approach for training CNNs can have limited scalability and performance due to relatively large data movement between L3/L2 cache. Similarly, Recursive Tree Traversal programs on k-d trees are limited in performance and scalability by memory/cache bandwidth. This thesis develops solutions primarily for reducing the aforementioned data movement costs. The solutions developed in this thesis improves performance and scalability of the above-mentioned computations resulting in overall speedups ranging from 4x to more than 10x over the state-of-art on target systems.
P. Sadayappan (Advisor)
273 p.

Recommended Citations

Citations

  • Rajbhandari, S. (2016). Locality Optimizations for Regular and Irregular Applications [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1469033289

    APA Style (7th edition)

  • Rajbhandari, Samyam. Locality Optimizations for Regular and Irregular Applications . 2016. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1469033289.

    MLA Style (8th edition)

  • Rajbhandari, Samyam. "Locality Optimizations for Regular and Irregular Applications ." Doctoral dissertation, Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1469033289

    Chicago Manual of Style (17th edition)