Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
dissertation.pdf (2.25 MB)
ETD Abstract Container
Abstract Header
Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators
Author Info
Sedaghati Mokhtari, Naseraddin
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686
Abstract Details
Year and Degree
2016, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
High performance applications depend on high utilization of memory bandwidth and computing resources, and data parallel accelerators have proven to be very effective in providing both, when needed. However, memory bound programs push the limits of system bandwidth, causing under-utilization in computing resources and thus energy inefficient executions. The objective of this research is to investigate opportunities on data parallel accelerators (i.e., SIMD units and GPUs) and design solutions for improving the performance of three classes of memory-bound applications: stencil computation, sparse matrix-vector multiplication (SpVM) and graph analytics. This research first focuses on performance bottlenecks of stencil computations on short-vector SIMD ISAs and presents StVEC, a hardware-based solution for extending the vector ISA and improving data movement and bandwidth utilization. StVEC includes an extension to the standard addressing mode of vector floating-point instructions in contemporary vector ISAs (e.g. SSE, AVX, VMX). A code generation approach is designed and implemented to help a vectorizing compiler generate code for processors with StVEC extensions. Using an optimistic as well as a pessimistic emulation of the proposed StVEC instructions, it is shown that the proposed solution can be effective on top of SSE and AVX capable processors. To analyze hardware overhead, parts of the proposed design are synthesized using a 45nm CMOS library and shown to have minimal impact on processor cycle time. As the second class of memory-bound programs, this research has focused on sparse matrix-vector multiplications (SpMV) on GPUs and shown that no sparse matrix representation is consistently superior, with the best representation being dependent on the matrix sparsity patterns. This part focuses on four standard sparse representations (i.e. CSR, ELL, COO and a hybrid ELL-COO) and studies the correlations between SpMV performance and the sparsity features. The research then uses machine learning techniques to automatically select the best sparse representation for a given matrix. Extensive characterization of pertinent sparsity features is performed on around 700 sparse matrices and their SpMV performance with different sparse representations. Applying learning on such a rich dataset leads to developing a decision model to automatically select the best representation for a given sparse matrix on a given target GPU. Experimental results on three GPUs demonstrate that the approach is very effective in selecting the best representation. The last part is dedicated to characterizing performance of graph processing systems on GPUs. It focuses on a vertex-centric graph programming framework (Virtual Warp Centric, VWC), and characterizes performance bottlenecks when running different graph primitives. The analysis shows how sensitive the VWC parameter is to the input graph and signifies the importance of selecting the correct warp size in order to avoid performance penalties. The study also applies machine learning techniques on the input dataset in order to predict the best VWC configuration for a given graph. It shows the applicability of simple machine learning models to improve performance and reduce the auto-tuning time for graph algorithms on GPUs.
Committee
Ponnuswamy Sadayappan (Advisor)
Louis-Noel Pouchet (Committee Member)
Mircea-Radu Teodorescu (Committee Member)
Atanas Ivanov Rountev (Committee Member)
Pages
166 p.
Subject Headings
Computer Engineering
;
Computer Science
;
Engineering
Keywords
Stencil Computation, GPU, CUDA, SpMV, Graph Processing, Performance Analysis, SIMD
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sedaghati Mokhtari, N. (2016).
Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686
APA Style (7th edition)
Sedaghati Mokhtari, Naseraddin.
Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators.
2016. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686.
MLA Style (8th edition)
Sedaghati Mokhtari, Naseraddin. "Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators." Doctoral dissertation, Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1452255686
Download Count:
1,433
Copyright Info
© 2016, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.