Optimizing array processing on complex I/O stacks using
indices and data summarization

Xing, Haoyuan

Keyword Search

School Logo

Xing-Dissertation.pdf (1.81 MB)

Optimizing array processing on complex I/O stacks using indices and data summarization

Author Info

Xing, Haoyuan

ORCID® Identifier

http://orcid.org/0000-0001-5444-0704

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903

Year and Degree

2021, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

Increasingly, the ability of human beings to understand the universe and ourselves depends on our ability to obtain and process data. With an explosion of data being generated every day, efficiently storing and querying such data, usually multidimensional and can be represented using an array data model, is increasingly vital. Meanwhile, along with more and more powerful CPUs and accelerators adding into the system, most modern computing systems contain an increasingly complex I/O stack, ranging from traditional disk-based file systems to heterogeneous accelerators with individual memory spaces. Efficiently accessing such a complex I/O stack in array processing is essential to utilize the enormous computational power of modern computational platforms. One key to achieving such efficiency is identifying where the data is being generated or stored, and choosing appropriate representation and processing strategies accordingly. This dissertation focuses on optimizing array processing in such complex I/O stacks by studying these two fundamental questions: what data representation should be used, and where the data should be stored and processed. The two basic scenarios of scientific data analytics are considered one-by-one; The first half of the dissertation tackles the problem of efficiently processing array data post-hoc, presents a compact array storage for disk-based data, integrating lossless value-based indexing into it. Such integrated indices improve the value-based filtering operation performance by orders of magnitude without sacrificing storage size or accuracy. The dissertation then demonstrates how complex queries such as equal and similarity array joins can also be performed on such novel storage. The second half of the dissertation focuses on data generated by simulations on accelerators in-situ without storing the generated data. The system generates an improved bitmap representation on GPU to reduce the bandwidth bottleneck between host and accelerators while allowing fast processing of a set of complex queries such as contrast set mining on both host and the accelerators. As the abundance of data representation and processing options provides a myriad of choices for in-situ array processing, this dissertation then presents a detailed study on how such choice could affect the analytic performance, and applies a cost modeling methodology to predict the optimal placement and representation for a given analytical workload.

Committee

Rajiv Ramnath (Advisor)
Gagan Agrawal (Advisor)
Jason Blevins (Other)
Yang Wang (Committee Member)
Srinivasan Parthasarathy (Committee Member)

Pages

192 p.

Subject Headings

Computer Engineering; Computer Science

Keywords

array, array storage, adaptive processing, bitmap, complex i/o stack, compression, databases, data summarization, GPU, high-performance computing, heterogeneous processing, index, join, scientific data

Xing, H. (2021). Optimizing array processing on complex I/O stacks using indices and data summarization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903
APA Style (7th edition)
Xing, Haoyuan. Optimizing array processing on complex I/O stacks using indices and data summarization. 2021. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903.
MLA Style (8th edition)
Xing, Haoyuan. "Optimizing array processing on complex I/O stacks using indices and data summarization." Doctoral dissertation, Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1629474552932903
Chicago Manual of Style (17th edition)

Document number:

osu1629474552932903

Download Count:

189

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Optimizing array processing on complex I/O stacks using indices and data summarization

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Optimizing array processing on complex I/O stacks using indices and data summarization

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations