Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Nisa, Israt

Keyword Search

School Logo

dissertation.pdf (3.03 MB)

Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Author Info

Nisa, Israt

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656

Year and Degree

2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

Sparse matrix/tensor operations have been a common computational motif in a wide spectrum of domains - numerical linear algebra, graph analytics, machine learning, health-care, etc. Sparse kernels play a key role in numerous machine learning algorithms and the rising popularity of this domain increases the significance of the primitives like SpMV (Sparse Matrix-Vector Multiplication), SDDMM (Sampled Dense-Dense Matrix Multiplication), MF/TF(Sparse Matrix/Tensor Factorization), etc. These primitives are data-parallel and highly suitable for GPU-like architectures that provide massive parallelism. Real-world matrices and tensors are large-scale and have millions of data points, which is sufficient to utilize all the cores of a GPU. Yet, a data-parallel algorithm can become the bottleneck of an application and perform way below than the upper bound of the roofline model. Some common reasons are frequent irregular global memory access, low data reuse, and imbalanced work distribution. However, efficient utilization of GPU memory hierarchy, reduced thread communication, increased data locality, and an even workload distribution can provide ample opportunities for significant performance improvement. The challenge lies in utilizing the techniques across applications and achieve an even performance in spite of the irregularity of the input matrices or tensors. In this work, we systematically identify the performance bottlenecks of the important sparse algorithms and provide optimized and high performing solutions. At the beginning of this dissertation, we explore the application of cost-effective ML techniques in solving the format selection and performance modeling problem in the SpMV domain. By identifying a small set of sparse matrix features to use in training the ML models, we are able to select the best storage format and predict the execution time of an SpMV kernel as well. Next, we optimize the SDDMM kernel, which is a key bottleneck in factor analysis and topic modeling algorithms like ALS, LDA, GaP, ALS, etc. The performance constraints are addressed by exploiting data reuse and increasing parallelism using virtual warping, multi-level tiling, and effective use of on-chip memory (shared memory), etc. Rest of the following works are on the optimization of factorization techniques of sparse matrix and tensors on GPUs. For matrix factorization, we optimize the cyclic coordinate descent (CCD++), which is the state-of-the-art factorization method. An efficient GPU implementation is devised by using kernel fusion, tiling and binning. Next, we extend the optimization of the factorization problem to higher-order data - tensor. MTTKRP (Matricized Tensor Times Khatri-Rao Products) is a key bottleneck of one of the most common tensor factorization techniques - CPD (CANDECOMP/PARAFAC decomposition). We develop new storage-efficient representations, B-CSF and HB-CSF, for tensors that enables high-performance and load-balanced execution of MTTKRP on GPUs. However, for a tensor with d modes, CPD requires a sequence of d tensor computations. To guarantee efficient memory access with respect to different modes, many storage formats stored distinct representations despite d-fold space overhead. Hence, we devise MM-CSF, a compact mixed-mode representation where better performance is achieved compared to existing solutions while utilizing a small fraction of the space.

Committee

P. (Saday) Sadayappan (Advisor)
Atanas Rountev (Committee Member)
Radu Teodorescu (Committee Member)

Pages

197 p.

Subject Headings

Computer Science

Keywords

Tensor factorization, matrix factorization, High-performance computing, GPU, optimizations, parallel computing, sparse kernels

Nisa, I. (2019). Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656
APA Style (7th edition)
Nisa, Israt. Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656.
MLA Style (8th edition)
Nisa, Israt. "Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656
Chicago Manual of Style (17th edition)

Document number:

osu1563183611985656

Download Count:

993

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations