Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators

Sedaghati Mokhtari, Naseraddin

Keyword Search

School Logo

dissertation.pdf (2.25 MB)

Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators

Author Info

Sedaghati Mokhtari, Naseraddin

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686

Year and Degree

2016, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

High performance applications depend on high utilization of memory bandwidth and computing resources, and data parallel accelerators have proven to be very effective in providing both, when needed. However, memory bound programs push the limits of system bandwidth, causing under-utilization in computing resources and thus energy inefficient executions. The objective of this research is to investigate opportunities on data parallel accelerators (i.e., SIMD units and GPUs) and design solutions for improving the performance of three classes of memory-bound applications: stencil computation, sparse matrix-vector multiplication (SpVM) and graph analytics. This research first focuses on performance bottlenecks of stencil computations on short-vector SIMD ISAs and presents StVEC, a hardware-based solution for extending the vector ISA and improving data movement and bandwidth utilization. StVEC includes an extension to the standard addressing mode of vector floating-point instructions in contemporary vector ISAs (e.g. SSE, AVX, VMX). A code generation approach is designed and implemented to help a vectorizing compiler generate code for processors with StVEC extensions. Using an optimistic as well as a pessimistic emulation of the proposed StVEC instructions, it is shown that the proposed solution can be effective on top of SSE and AVX capable processors. To analyze hardware overhead, parts of the proposed design are synthesized using a 45nm CMOS library and shown to have minimal impact on processor cycle time. As the second class of memory-bound programs, this research has focused on sparse matrix-vector multiplications (SpMV) on GPUs and shown that no sparse matrix representation is consistently superior, with the best representation being dependent on the matrix sparsity patterns. This part focuses on four standard sparse representations (i.e. CSR, ELL, COO and a hybrid ELL-COO) and studies the correlations between SpMV performance and the sparsity features. The research then uses machine learning techniques to automatically select the best sparse representation for a given matrix. Extensive characterization of pertinent sparsity features is performed on around 700 sparse matrices and their SpMV performance with different sparse representations. Applying learning on such a rich dataset leads to developing a decision model to automatically select the best representation for a given sparse matrix on a given target GPU. Experimental results on three GPUs demonstrate that the approach is very effective in selecting the best representation. The last part is dedicated to characterizing performance of graph processing systems on GPUs. It focuses on a vertex-centric graph programming framework (Virtual Warp Centric, VWC), and characterizes performance bottlenecks when running different graph primitives. The analysis shows how sensitive the VWC parameter is to the input graph and signifies the importance of selecting the correct warp size in order to avoid performance penalties. The study also applies machine learning techniques on the input dataset in order to predict the best VWC configuration for a given graph. It shows the applicability of simple machine learning models to improve performance and reduce the auto-tuning time for graph algorithms on GPUs.

Committee

Ponnuswamy Sadayappan (Advisor)
Louis-Noel Pouchet (Committee Member)
Mircea-Radu Teodorescu (Committee Member)
Atanas Ivanov Rountev (Committee Member)

Pages

166 p.

Subject Headings

Computer Engineering; Computer Science; Engineering

Keywords

Stencil Computation, GPU, CUDA, SpMV, Graph Processing, Performance Analysis, SIMD

Sedaghati Mokhtari, N. (2016). Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686
APA Style (7th edition)
Sedaghati Mokhtari, Naseraddin. Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators. 2016. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686.
MLA Style (8th edition)
Sedaghati Mokhtari, Naseraddin. "Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators." Doctoral dissertation, Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1452255686
Chicago Manual of Style (17th edition)

Document number:

osu1452255686

Download Count:

1,433

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Performance Optimization of Memory-Bound Programs on Data Parallel Accelerators

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations