Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Wilson with signature.pdf (780.61 KB)
ETD Abstract Container
Abstract Header
Correlated Sample Synopsis on Big Data
Author Info
Wilson, David S
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086
Abstract Details
Year and Degree
2018, Master of Computing and Information Systems, Youngstown State University, Department of Computer Science and Information Systems.
Abstract
Correlated Sample Synopsis (or CS2) has been proven to be a valuable option concerning centralized databases but has yet to be tested on big data. With the overall accumulation of data growing at an alarming rate, scalable query estimation and approximate query processing are becoming necessary for large databases. Query estimations based on the Simple Random Sample Without Replacement (or SRSWOR) return results with extremely high relative errors for join queries. Existing methods, such as Join Synopses, only work well with foreign key joins, and the sample size can grow dramatically as the dataset gets larger. This research aims to provide that CS2 can speed up search query length results, give precise join query estimations, and minimize storage costs when presented with big data. In addition, this research extends the correlated sampling techniques and estimation methods of CS2 to the big data environment with no index present. Extensive experiments with large TPC-H datasets in Apache Hive show that CS2 produces fast and accurate query estimations on big data.
Committee
Feng Yu, PhD (Advisor)
John Sullins, PhD (Committee Member)
Yong Zhang, PhD (Committee Member)
Pages
44 p.
Subject Headings
Computer Science
Keywords
CS2
;
Big Data
;
Simple Random Sample Without Replacement
;
Join Synopses
;
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Wilson, D. S. (2018).
Correlated Sample Synopsis on Big Data
[Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086
APA Style (7th edition)
Wilson, David.
Correlated Sample Synopsis on Big Data.
2018. Youngstown State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086.
MLA Style (8th edition)
Wilson, David. "Correlated Sample Synopsis on Big Data." Master's thesis, Youngstown State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1544264480082086
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ysu1544264480082086
Download Count:
399
Copyright Info
© 2018, all rights reserved.
This open access ETD is published by Youngstown State University and OhioLINK.