Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Vectorization and Register Reuse in High Performance Computing

Stock, Kevin Alan

Abstract Details

2014, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Automatic vectorization is crucial for attaining maximum performance on modern processors in a performance portable fashion. There is significant room for improvement over techniques in existing compilers, as demonstrated in this work, through careful synthesis of vectorized code and application of aggressive performance optimizations, especially in the domains of tensor contractions and stencils. Tensor contractions are a class of computation used extensively in computational chemistry which are essentially higher-dimension matrix multiplication. Vector code synthesis for tensor contractions has been explored, utilizing a novel in-register transpose for efficiently accessing tensors with SIMD vectors along any dimension. Aggressive use of unroll-and-jam and loop permutation optimizations is able to increase register reuse and amortize the cost of expensive vector operations. Stencils are a critical part of many scientific codes including PDE solvers and image processing. This work presents a framework for reordering operations in regular loop computations, such as stencils, by utilizing the associativity and commutativity of operations. The freedom to reorder computations involving associative operators has been widely recognized and exploited in designing parallel algorithms and to a more limited extent in optimizing compilers. The reordering of operations enables substantially improved register reuse through reduced register pressure, resulting in up to 4x performance improvement for existing benchmarks. The multidimensional retiming formalism characterizes the space of valid implementations in conjunction with other transformations from the vector code synthesis used for tensor contractions. Given the transformations used to optimize high performance software, there are many parameters which create a space of possible variants from which the best, or nearly best, candidate must be found. In addition to auto-tuning over the space of synthesized vector codes, two models have been developed for selecting a high performance variant at compile time. The first model is based on estimating the total number of SIMD instructions executed and demonstrated and for a set of benchmark kernels achieved a mean performance of 73% of the best observed performance across multiple architectures and compilers. The second model used features extracted from the assembly code generated by the compiler which were then processed by machine learning algorithms. By considering a linear combination of the most successful models, a mean performance of 89% of the best observed was achieved in the same tests as the first model. This research has shown that integration of tile code generation techniques with existing automatic tiling and parallelizing algorithms based on the polyhedral model can produce significant performance improvements on modern architectures such as Intel's Many Integrated Core. Additionally, these techniques have been applied in production software to automatically produce high performance code on multiple architectures for the dominant computational kernel in MADNESS.
P Sadayappan (Advisor)
Atanas Rountev (Committee Member)
Srinivasan Parthasarathy (Committee Member)
172 p.

Recommended Citations

Citations

  • Stock, K. A. (2014). Vectorization and Register Reuse in High Performance Computing [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1408969385

    APA Style (7th edition)

  • Stock, Kevin. Vectorization and Register Reuse in High Performance Computing. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1408969385.

    MLA Style (8th edition)

  • Stock, Kevin. "Vectorization and Register Reuse in High Performance Computing." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1408969385

    Chicago Manual of Style (17th edition)