Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs

Abstract Details

2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Sparse matrix/tensor operations have been a common computational motif in a wide spectrum of domains - numerical linear algebra, graph analytics, machine learning, health-care, etc. Sparse kernels play a key role in numerous machine learning algorithms and the rising popularity of this domain increases the significance of the primitives like SpMV (Sparse Matrix-Vector Multiplication), SDDMM (Sampled Dense-Dense Matrix Multiplication), MF/TF(Sparse Matrix/Tensor Factorization), etc. These primitives are data-parallel and highly suitable for GPU-like architectures that provide massive parallelism. Real-world matrices and tensors are large-scale and have millions of data points, which is sufficient to utilize all the cores of a GPU. Yet, a data-parallel algorithm can become the bottleneck of an application and perform way below than the upper bound of the roofline model. Some common reasons are frequent irregular global memory access, low data reuse, and imbalanced work distribution. However, efficient utilization of GPU memory hierarchy, reduced thread communication, increased data locality, and an even workload distribution can provide ample opportunities for significant performance improvement. The challenge lies in utilizing the techniques across applications and achieve an even performance in spite of the irregularity of the input matrices or tensors. In this work, we systematically identify the performance bottlenecks of the important sparse algorithms and provide optimized and high performing solutions. At the beginning of this dissertation, we explore the application of cost-eff ective ML techniques in solving the format selection and performance modeling problem in the SpMV domain. By identifying a small set of sparse matrix features to use in training the ML models, we are able to select the best storage format and predict the execution time of an SpMV kernel as well. Next, we optimize the SDDMM kernel, which is a key bottleneck in factor analysis and topic modeling algorithms like ALS, LDA, GaP, ALS, etc. The performance constraints are addressed by exploiting data reuse and increasing parallelism using virtual warping, multi-level tiling, and effective use of on-chip memory (shared memory), etc. Rest of the following works are on the optimization of factorization techniques of sparse matrix and tensors on GPUs. For matrix factorization, we optimize the cyclic coordinate descent (CCD++), which is the state-of-the-art factorization method. An efficient GPU implementation is devised by using kernel fusion, tiling and binning. Next, we extend the optimization of the factorization problem to higher-order data - tensor. MTTKRP (Matricized Tensor Times Khatri-Rao Products) is a key bottleneck of one of the most common tensor factorization techniques - CPD (CANDECOMP/PARAFAC decomposition). We develop new storage-efficient representations, B-CSF and HB-CSF, for tensors that enables high-performance and load-balanced execution of MTTKRP on GPUs. However, for a tensor with d modes, CPD requires a sequence of d tensor computations. To guarantee efficient memory access with respect to di fferent modes, many storage formats stored distinct representations despite d-fold space overhead. Hence, we devise MM-CSF, a compact mixed-mode representation where better performance is achieved compared to existing solutions while utilizing a small fraction of the space.
P. (Saday) Sadayappan (Advisor)
Atanas Rountev (Committee Member)
Radu Teodorescu (Committee Member)
197 p.

Recommended Citations

Citations

  • Nisa, I. (2019). Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656

    APA Style (7th edition)

  • Nisa, Israt. Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656.

    MLA Style (8th edition)

  • Nisa, Israt. "Architecture-aware Algorithm Design of Sparse Tensor/Matrix Primitives for GPUs." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1563183611985656

    Chicago Manual of Style (17th edition)