Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

High Performance Iterative Processing in Relational Database Management Systems

Floratos, Sofoklis

Abstract Details

2020, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Increasingly more iterative and recursive query tasks are processed in data management systems, such as graph-structured data analytics, demanding fast response time. We have identified several critical issues that hinder high performance iterative data processing. First, the existing iterative SQL model has limitations to express certain important classes of iterations, such as aggregation-based tasks. PageRank is a representative example of this class. Second, even the existing model is extended; its implementation in database systems is non-trivial. Third, an iterative processing can be nested, such as processing of subqueries, demanding high computing power and memory space. The existing solution of un-nesting queries requires in most cases a customized approach to handle nested loop execution. For this reason, un-nesting algorithms have not been merged to mainstream database systems. In this dissertation, we address the above mentioned issues by extending the iteration execution model with effective implementations in database systems, and by developing parallel algorithms for nested loop execution of subqueries with effective implementation on GPU systems. All the experiment results show the effectiveness of our algorithms development efforts. Existing CTE-based recursive SQL and its implementation ineffectively respond to this intensive query processing with two major drawbacks. First, its iteration execution model is based on implicit set-oriented terminating conditions that cannot express aggregation-based tasks, such as PageRank. Second, its synchronous execution model cannot perform asynchronous computing to further accelerate execution in parallel. To address these two issues, we have designed and implemented SQLoop, a framework that extends the semantics of current SQL standard in order to accommodate iterative SQL queries. SQLoop interfaces between users and different database engines with two powerful components. First, it provides a uniform SQL expression for users to access any database engine so that they do not need to write database dependent SQL or move datasets from a target engine to process in their own sites. Second, SQLoop automatically parallelizes iterative queries that contain certain aggregate functions in both synchronous and asynchronous ways. More specifically, SQLoop is able to take advantage of intermediate results generated between different iterations and to prioritize the execution of partitions that accelerate the query processing. We have tested and evaluated SQLoop by using three popular database engines with real-world datasets and queries, and shown its effectiveness and high performance. Furthermore, we discuss how iterative SQL queries can be incorporated into a RDBMS without major intrusion to the system. The implementation is based on a functional rewrite that translates iterative queries to other existing SQL operators in the same way as SQLoop. However, the rewrite takes place inside the system and the iterative query is now a single query plan tree. Thus, it can be optimized, parallelized and executed by the engine without major modifications to the code base. Finally, iterative SQL queries is not the only structure that requires efficient iterative evaluation in relational database management systems. Nested SQL queries are an important part of SQL that allows users to express complex use-cases by using the output of a sub-query as an input to another. However, the baseline execution method is often expensive as it requires the execution of the nested part multiple times. Researchers have proposed various algorithms and techniques that un-nest subqueries to improve performance. Since this is a customized approach and needs high algorithmic and engineering efforts, it is largely not an open feature in most existing database systems. Our approach is general-purpose and GPU-acceleration based, aiming for both high performance and minimum development cost. We look into the major differences between nested and un-nested query structures to identify their merits and limits for GPU processing. Furthermore, we focus on the nested approach that is algorithmically simple and rich in parallels, in relatively low space complexity, and general purpose in program structure. We create a new code generation framework that best fits GPU for the nested method. We also make several critical system optimizations including massive parallel scanning with indexing, effective vectorization to optimize join operations, exploiting cache locality for loops and efficient GPU memory management. We have implemented the proposed solutions in NestGPU, a GPU-based column-store database system that is GPU device independent. We have extensively evaluated and tested the system to show the effectiveness of our proposed methods.
Xiaodong Zhang (Advisor)
Srinivasan Parthasarathy (Committee Member)
Yang Wang (Committee Member)

Recommended Citations

Citations

  • Floratos, S. (2020). High Performance Iterative Processing in Relational Database Management Systems [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1605909940057503

    APA Style (7th edition)

  • Floratos, Sofoklis. High Performance Iterative Processing in Relational Database Management Systems. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1605909940057503.

    MLA Style (8th edition)

  • Floratos, Sofoklis. "High Performance Iterative Processing in Relational Database Management Systems." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1605909940057503

    Chicago Manual of Style (17th edition)