Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Performance Characterization and Improvements of SQL-On-Hadoop Systems

Kulkarni, Kunal Vikas

Abstract Details

2016, Master of Science, Ohio State University, Computer Science and Engineering.
Impala and Hive bring SQL technologies on Hadoop Systems enabling users to run analytics queries aganist data stored in HDFS and Apache HBase without requiring data movement or transformation. In this work we characterize BigDataBench SQL workloads in Impala as I/O, Communication or Compute intensive. We do detailed profiling and analysis of query execution in Impala to understand the performance of SQL queries. From the analysis we observe that the performance of Inner Join queries can be improved in Impala since the existing Join implementation is blocking based. This work implements a non-blocking Join where the reading of right-side table of Join and building of its Hashtable is overlapped with construction of left-side table data. Experimental results show that non-blocking Join implementation improves the execution of Join queries by 9-12%. Next scalability study of Impala is performed to evaluate how well Impala scales out on increasing the number of compute nodes for divergent SQL queries. We observe that the default Inner Join SQL query is not scaling well since Impala by default does a broadcast Join. We change the default Inner Join in Impala to do partitioned/shuffle Join and the results show that it scales linearly. We then evaluate Hive SQL queries running on top of Triple-H - RDMA (Remote Direct Memory Access) based HDFS which is optimized for HDFS-Write. We design new write intensive SQL benchmark queries and the experimental results show that Triple-H brings benefit of 45% to write intensive queries and 25% benefit to read intensive query in Hive. In another scheme we evaluate querying of HBase tables in Hive running on top of Triple-H and we see 20-33% benefit for write intensive queries and 15% benefit for read intensive query. From these results we show improvements of SQL queries on Hadoop Systems.
Dhabaleswar Panda, Dr (Advisor)
P Sadayappan, Dr (Committee Member)
Xiaoyi Lu, Dr (Committee Member)
46 p.

Recommended Citations

Citations

  • Kulkarni, K. V. (2016). Performance Characterization and Improvements of SQL-On-Hadoop Systems [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1469147027

    APA Style (7th edition)

  • Kulkarni, Kunal. Performance Characterization and Improvements of SQL-On-Hadoop Systems. 2016. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1469147027.

    MLA Style (8th edition)

  • Kulkarni, Kunal. "Performance Characterization and Improvements of SQL-On-Hadoop Systems." Master's thesis, Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1469147027

    Chicago Manual of Style (17th edition)