Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Performance Optimization of Stencil Computations on Modern SIMD Architectures

Henretty, Thomas Steel

Abstract Details

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Performance of scientific computing codes on modern high-performance computing (HPC) systems has, in some cases, not achieved a significant percentage of the system’s peak performance. Three of the fundamental causes of this lack of efficiency are (1) less than optimal utilization of the short-vector SIMD units found in nearly all modern HPC systems, (2) less than optimal utilization of the memory hierarchy and (3) less than optimal utilization of all computing cores available in a system. Codes that are able to overcome one or more of these limitations are generally very complex and their implementation requires both an expert programmer and a substantial amount of time. In this work, a class of scientific computing codes known stencil computations is examined and shown to exhibit a fundamental algorithmic limitation that interferes with the generation of optimal SIMD code. A data layout transformation (DLT) to overcome this limitation is described and comprehensive results for cache-resident problem sizes are presented. It is shown that this DLT can significantly increase the performance of stencil computations on modern SIMD architectures. While substantial performance gains can be realized using the DLT for small problem sizes, larger problem sizes require the application of spatial and temporal loop tiling techniques to relieve pressure on the memory subsystem and exploit all available multicore parallelism. Two closely related tiling techniques, nested and hybrid split tiling, are developed and shown to exhibit high performance across a variety of modern multicore SIMD architectures and stencil benchmarks. Combining SIMD, memory hierarchy, and parallelism optimizations for stencil computations leads to code that is very complex and difficult for scientists and even seasoned programmers to implement. Further, these optimizations are difficult to integrate into a general purpose compiler as there is no existing framework for reliably identifying and representing stencil computations in a general purpose language such as C. These problems are resolved with the creation of the Stencil Domain Specific Language (SDSL). This language uses data structures and concepts specific to stencil computations to enable the retention of fundamental information about the stencil throughout the compilation process. Preserving the details of a stencil computation enables the automated generation of complex, highly optimized code for multiple parallel vector architectures from a simple specification in SDSL.
P Sadayappan, PhD (Advisor)
Atanas Rountev, PhD (Committee Member)
Radu Teodorescu, PhD (Committee Member)
176 p.

Recommended Citations

Citations

  • Henretty, T. S. (2014). Performance Optimization of Stencil Computations on Modern SIMD Architectures [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1408937226

    APA Style (7th edition)

  • Henretty, Thomas. Performance Optimization of Stencil Computations on Modern SIMD Architectures. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1408937226.

    MLA Style (8th edition)

  • Henretty, Thomas. "Performance Optimization of Stencil Computations on Modern SIMD Architectures." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1408937226

    Chicago Manual of Style (17th edition)