Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Judgment Post-Strati cation with Machine Learning Techniques: Adjusting for Missing Data in Surveys and Data Mining

Abstract Details

2013, Doctor of Philosophy, Ohio State University, Statistics.
Missing data is found in every type of data collection. How to deal with missing data has long been discussed in the survey sampling literature. It has not, however, been the topic of much research involving the huge data sets common in the data mining setting. In this dissertation, we combine ideas from the survey sampling and data mining literature to develop methods for handling missing data in both contexts. Judgement Post-Stratification (JPS) is a data analysis method, motivated by ranked set sampling (RSS), that uses judgement ranking for post-stratification. This dissertation briefly introduces RSS and JPS. Then it connects the JPS method with machine learning (ML) techniques in two ways. One is to use the ML techniques to build a ranking function therefore solving the judgement ranking problem. The other is to compare the estimates from the JPS method with these well-known ML techniques and provide efficiency measurements for the JPS method. We investigate the effect of set size, the number of units ranked at one time, through simulation studies. We also consider possible extensions for JPS, such as proportional proration. To our knowledge, we provide the first systematic study of the influences of three types of missing data on various ML techniques using simulated data. Finally, two real life examples are used to demonstrate the application of the JPS method in real world problems.
Elizabeth Stasny (Committee Co-Chair)
Tao Shi (Committee Co-Chair)
Omer Ozturk (Committee Member)
Aleix Martinez (Committee Member)

Recommended Citations

Citations

  • Chen, T. (2013). Judgment Post-Strati cation with Machine Learning Techniques: Adjusting for Missing Data in Surveys and Data Mining [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374213636

    APA Style (7th edition)

  • Chen, Tian. Judgment Post-Strati cation with Machine Learning Techniques: Adjusting for Missing Data in Surveys and Data Mining. 2013. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1374213636.

    MLA Style (8th edition)

  • Chen, Tian. "Judgment Post-Strati cation with Machine Learning Techniques: Adjusting for Missing Data in Surveys and Data Mining." Doctoral dissertation, Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1374213636

    Chicago Manual of Style (17th edition)