Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns

Ravishankar, Mahesh

Keyword Search

School Logo

dissertation.pdf (2.48 MB)

Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns

Author Info

Ravishankar, Mahesh

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1400085733

Year and Degree

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

With the era of increasing clock speeds coming to an end, parallel computing architectures have now become main-stream. Due to the wide range of architectures available today that can be used to exploit parallelism, ranging from multicore CPUs, to GPGPUs, to distributed memory machines; adapting applications for efficient execution on all these architectures poses a significant challenge. Scientific computing applications exhibit significant coarse-grained parallelism. Domain experts have exploited this to target distributed memory machines through the use of Message Passing Interface (MPI) libraries. Many such applications have been shown to scale to hundreds of thousands of processors. While powerful, programming in MPI is tedious and error prone, with significant portion of the parallelized application dedicated to managing communication and synchronization between the processes. Developing compiler techniques that can automatically generate parallel distributed memory versions of such codes is challenging due to the presence of data-dependent control flow and data access patterns. In this thesis we develop compiler algorithms for automatic parallelization of a class of computations common to many scientific applications. Under the inspector/executor paradigm, the generated code partitions and executes the computation in load-balanced manner, while reducing the communication costs. This approach is further enhanced by developing a framework capable of expressing both affine and non-affine parts of the code. This enables the use of polyhedral compilation tools to analyze parts of the computation which can be completely characterized statically. The effectiveness of the developed approach is demonstrated on several benchmarks and real-world applications. Image processing applications on the other hand exhibit significant fine-grained parallelism and are well suited for architectures with Single-Instruction Multiple Data (SIMD) processing units like the vector processing units on CPUs or special graphics processing units like the NVIDIA Kepler GPU kepler. Such architectures can be targeted either through hardware specific intrinsics (like SSE/AVX to target vector units), or through language support for writing device-specific code (like CUDA to target NVIDIA GPUs). State of the art compilers like ICC, Pluto, etc. can automatically implement architecture specific optimizations for programs written in low-level languages like C/C++. While effective, these approaches still require significant effort from the application developers. Programs written this way are less portable and difficult to maintain. Domain Specific Languages offer a convenient abstraction, allowing programmers to specify the computation at a high-level, while still allowing the compiler to use the semantics of operations to generate efficient code to target multiple architectures. In this thesis, we develop Forma, a DSL that provides a convenient syntax to specify many common image processing pipelines in a succinct manner. The compiler backend can generate code to target both multicore CPUs with SIMD units and NVIDIA GPUs, while making use of device specific features like texture units on GPUs. These ease of programming in Forma is demonstrated by porting complex image processing pipelines like Laplacian Pyramids and Exposure Fusion. The performance of the generated code is compared against a state of the art DSL for image processing, Halide.

Committee

P Sadayappan, Prof. (Advisor)
Atanas Rountev, Prof. (Committee Member)
Gagan Agrawal, Prof. (Committee Member)

Pages

197 p.

Subject Headings

Computer Engineering

Keywords

Distributed Memory, Inspector-Executor, DSL, Image Processing, Irregular Computations, Compiler

Ravishankar, M. (2014). Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1400085733
APA Style (7th edition)
Ravishankar, Mahesh. Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1400085733.
MLA Style (8th edition)
Ravishankar, Mahesh. "Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1400085733
Chicago Manual of Style (17th edition)

Document number:

osu1400085733

Download Count:

485

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Automatic Parallelization of Loops with Data Dependent Control Flow and Array Access Patterns

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations