Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Correlated Sample Synopsis on Big Data

Abstract Details

2018, Master of Computing and Information Systems, Youngstown State University, Department of Computer Science and Information Systems.
Correlated Sample Synopsis (or CS2) has been proven to be a valuable option concerning centralized databases but has yet to be tested on big data. With the overall accumulation of data growing at an alarming rate, scalable query estimation and approximate query processing are becoming necessary for large databases. Query estimations based on the Simple Random Sample Without Replacement (or SRSWOR) return results with extremely high relative errors for join queries. Existing methods, such as Join Synopses, only work well with foreign key joins, and the sample size can grow dramatically as the dataset gets larger. This research aims to provide that CS2 can speed up search query length results, give precise join query estimations, and minimize storage costs when presented with big data. In addition, this research extends the correlated sampling techniques and estimation methods of CS2 to the big data environment with no index present. Extensive experiments with large TPC-H datasets in Apache Hive show that CS2 produces fast and accurate query estimations on big data.
Feng Yu, PhD (Advisor)
John Sullins, PhD (Committee Member)
Yong Zhang, PhD (Committee Member)
44 p.

Recommended Citations

Citations

  • Wilson, D. S. (2018). Correlated Sample Synopsis on Big Data [Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086

    APA Style (7th edition)

  • Wilson, David. Correlated Sample Synopsis on Big Data. 2018. Youngstown State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086.

    MLA Style (8th edition)

  • Wilson, David. "Correlated Sample Synopsis on Big Data." Master's thesis, Youngstown State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086

    Chicago Manual of Style (17th edition)