Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Evaluating Query Estimation Errors Using Bootstrap Sampling

Abstract Details

2021, Master of Computing and Information Systems, Youngstown State University, Department of Computer Science and Information Systems.
Big data embodies a massive amount of knowledge. Many businesses now rely on big data information mining to forecast the viability of future business operations. Information mining of big data can require a large investment of time. To reduce time requirements, sampling is one of the most preferred methods. Evaluation of quality (e.g. query prediction error) for the query estimates is crucial for meaningful results. The main method used in the past to solve this problem is based on bootstrap sampling. Existing work typically makes strong dataset assumptions that may not apply to real-world datasets. This research aims to evaluate query estimation errors using the bootstrap sampling method. There exist different kinds of bootstrap methods. In this work, we used non-parametric bootstrap sampling to calculate the error distribution of the queries that we choose. Then we calculated the confidence intervals to find out the hit ratio. Even though the bootstrap sampling method is one of the main approaches for finding the error in statistic estimates, it is computationally expensive on large data. To solve this problem, we test both memory and disk as storage for optimizing bootstrap sampling. Furthermore, two different total numbers of bootstrap samples (B=2000, and B=200) have been tested to reduce bootstrap computation with reliable results for optimization purposes. In the experiment part, we use three different sizes of data (100MB, 1GB, and 10GB) as well as three different sampling ratios (0.1%, 0.5%, and 1%) to analyze the data that we generated on the TPC-H benchmark in terms of accuracy and performance.The results demonstrate that the hit ratios are very high even with a 0.1% sampling ratio. The optimization strategies that were used reduced the bootstrap sample computation time adequately.
Feng Yu, PhD (Advisor)
John R. Sullins, PhD (Committee Member)
Yong Zhang, PhD (Committee Member)
39 p.

Recommended Citations

Citations

  • Cal, S. (2021). Evaluating Query Estimation Errors Using Bootstrap Sampling [Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1627358871966099

    APA Style (7th edition)

  • Cal, Semih. Evaluating Query Estimation Errors Using Bootstrap Sampling. 2021. Youngstown State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1627358871966099.

    MLA Style (8th edition)

  • Cal, Semih. "Evaluating Query Estimation Errors Using Bootstrap Sampling." Master's thesis, Youngstown State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1627358871966099

    Chicago Manual of Style (17th edition)