Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Optimizing array processing on complex I/O stacks using indices and data summarization

Abstract Details

2021, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Increasingly, the ability of human beings to understand the universe and ourselves depends on our ability to obtain and process data. With an explosion of data being generated every day, efficiently storing and querying such data, usually multidimensional and can be represented using an array data model, is increasingly vital. Meanwhile, along with more and more powerful CPUs and accelerators adding into the system, most modern computing systems contain an increasingly complex I/O stack, ranging from traditional disk-based file systems to heterogeneous accelerators with individual memory spaces. Efficiently accessing such a complex I/O stack in array processing is essential to utilize the enormous computational power of modern computational platforms. One key to achieving such efficiency is identifying where the data is being generated or stored, and choosing appropriate representation and processing strategies accordingly. This dissertation focuses on optimizing array processing in such complex I/O stacks by studying these two fundamental questions: what data representation should be used, and where the data should be stored and processed. The two basic scenarios of scientific data analytics are considered one-by-one; The first half of the dissertation tackles the problem of efficiently processing array data post-hoc, presents a compact array storage for disk-based data, integrating lossless value-based indexing into it. Such integrated indices improve the value-based filtering operation performance by orders of magnitude without sacrificing storage size or accuracy. The dissertation then demonstrates how complex queries such as equal and similarity array joins can also be performed on such novel storage. The second half of the dissertation focuses on data generated by simulations on accelerators in-situ without storing the generated data. The system generates an improved bitmap representation on GPU to reduce the bandwidth bottleneck between host and accelerators while allowing fast processing of a set of complex queries such as contrast set mining on both host and the accelerators. As the abundance of data representation and processing options provides a myriad of choices for in-situ array processing, this dissertation then presents a detailed study on how such choice could affect the analytic performance, and applies a cost modeling methodology to predict the optimal placement and representation for a given analytical workload.
Rajiv Ramnath (Advisor)
Gagan Agrawal (Advisor)
Jason Blevins (Other)
Yang Wang (Committee Member)
Srinivasan Parthasarathy (Committee Member)
192 p.

Recommended Citations

Citations

  • Xing, H. (2021). Optimizing array processing on complex I/O stacks using indices and data summarization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903

    APA Style (7th edition)

  • Xing, Haoyuan. Optimizing array processing on complex I/O stacks using indices and data summarization. 2021. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903.

    MLA Style (8th edition)

  • Xing, Haoyuan. "Optimizing array processing on complex I/O stacks using indices and data summarization." Doctoral dissertation, Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903

    Chicago Manual of Style (17th edition)