Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Optimal Loop Unrolling for GPGPU Programs

Sreenivasa Murthy, Giridhar

Abstract Details

2009, Master of Science, Ohio State University, Computer Science and Engineering.

Graphics Processing Units (GPUs) are massively parallel, many-coreprocessors with tremendous computational power and very high memory bandwidth. GPUs are primarily designed for accelerating 3D graphics applications on modern computer systems and are therefore, specialized for highly data parallel, compute intensive problems, unlike general-purpose CPUs. In recent times, there has been significant interest in finding ways to accelerate general purpose (non-graphics), data parallel computations using the high processing power of GPUs. General-purpose Programming on GPUs (GPGPU) was earlier considered difficult because the only available techniques to program the GPUs were graphics-specific programming models such as OpenGL and DirectX. However, with the advent of GPGPU programming models such as NVIDIA's CUDA and the new standard OpenCL, GPGPU has become mainstream.

Optimizations performed by the compiler play a very important role in improving the performance of computer programs. While compiler optimizations for CPUs have been researched for many decades now, the arrival of GPGPU, and it's differences in architecture and programming model, has brought along with it many new opportunities for compiler optimizations. One such classical optimization is 'Loop Unrolling'. Loop unrolling has proven to be a relatively inexpensive and beneficial optimization for CPU programs. However, current GPGPU compilers perform little to no loop unrolling.

In this thesis, we attempt to understand the impact of loop unrolling on GPGPU programs and using this understanding, we develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we also propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors, and that these optimized versions achieve speedups of up to nearly 1.5, relative to the unoptimized version.

Ponnuswamy Sadayappan, PhD (Advisor)
Atanas Rountev, PhD (Committee Member)
70 p.

Recommended Citations

Citations

  • Sreenivasa Murthy, G. (2009). Optimal Loop Unrolling for GPGPU Programs [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1253131903

    APA Style (7th edition)

  • Sreenivasa Murthy, Giridhar. Optimal Loop Unrolling for GPGPU Programs. 2009. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1253131903.

    MLA Style (8th edition)

  • Sreenivasa Murthy, Giridhar. "Optimal Loop Unrolling for GPGPU Programs." Master's thesis, Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1253131903

    Chicago Manual of Style (17th edition)