Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Data Mining of Medical Datasets with Missing Attributes from Different Sources

Sajja, Sunitha

Abstract Details

2010, Master of Science in Mathematics, Youngstown State University, Department of Mathematics and Statistics.
Two major problems in data mining are 1) dealing with missing values in the datasets used for knowledge discovery, and 2) using one data set as a predictor of other datasets. We explore this problem using four different datasets from the UCI Machine learning repository, from four different sources with different missing values. Each dataset contains 13 attributes and one class attribute which denotes the presence of heart disease and the absence of heart disease. Missing values were replaced in a number of ways; first by using normal mean and mode method, secondly by removing the attributes that contains missing values, thirdly by removing the records that contains more than 60 percent of values missing and filling the remaining missing values. We also experimented with different classification techniques, including Decision tree, Naive Bayes, and MultiLayerPerceptron, using Medical Datasets. Rapid Miner and Weka tools. The consistency of the datasets was found by combining the datasets together and comparing the results of this datasets with the classification error of different datasets. It can be seen from the results that if fewer number of missing values are present, the normal mean and mode method is good. If larger amount of missing values are present than removing instances that contain 60% of missing values and replacing with remaining along with different preprocessing steps works better, and using one dataset as a predictor of other dataset produced moderate accuracy.
John Sullins, PhD (Advisor)
Alina Lazar, PhD (Committee Member)
Jamal Tartir, PhD (Committee Member)
29 p.

Recommended Citations

Citations

  • Sajja, S. (2010). Data Mining of Medical Datasets with Missing Attributes from Different Sources [Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263

    APA Style (7th edition)

  • Sajja, Sunitha. Data Mining of Medical Datasets with Missing Attributes from Different Sources. 2010. Youngstown State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263.

    MLA Style (8th edition)

  • Sajja, Sunitha. "Data Mining of Medical Datasets with Missing Attributes from Different Sources." Master's thesis, Youngstown State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263

    Chicago Manual of Style (17th edition)