Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ysu1300298263.pdf (742.09 KB)
ETD Abstract Container
Abstract Header
Data Mining of Medical Datasets with Missing Attributes from Different Sources
Author Info
Sajja, Sunitha
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
Abstract Details
Year and Degree
2010, Master of Science in Mathematics, Youngstown State University, Department of Mathematics and Statistics.
Abstract
Two major problems in data mining are 1) dealing with missing values in the datasets used for knowledge discovery, and 2) using one data set as a predictor of other datasets. We explore this problem using four different datasets from the UCI Machine learning repository, from four different sources with different missing values. Each dataset contains 13 attributes and one class attribute which denotes the presence of heart disease and the absence of heart disease. Missing values were replaced in a number of ways; first by using normal mean and mode method, secondly by removing the attributes that contains missing values, thirdly by removing the records that contains more than 60 percent of values missing and filling the remaining missing values. We also experimented with different classification techniques, including Decision tree, Naive Bayes, and MultiLayerPerceptron, using Medical Datasets. Rapid Miner and Weka tools. The consistency of the datasets was found by combining the datasets together and comparing the results of this datasets with the classification error of different datasets. It can be seen from the results that if fewer number of missing values are present, the normal mean and mode method is good. If larger amount of missing values are present than removing instances that contain 60% of missing values and replacing with remaining along with different preprocessing steps works better, and using one dataset as a predictor of other dataset produced moderate accuracy.
Committee
John Sullins, PhD (Advisor)
Alina Lazar, PhD (Committee Member)
Jamal Tartir, PhD (Committee Member)
Pages
29 p.
Subject Headings
Computer Science
Keywords
data mining
;
missing attributes
;
data classification
;
outliers
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sajja, S. (2010).
Data Mining of Medical Datasets with Missing Attributes from Different Sources
[Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
APA Style (7th edition)
Sajja, Sunitha.
Data Mining of Medical Datasets with Missing Attributes from Different Sources.
2010. Youngstown State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263.
MLA Style (8th edition)
Sajja, Sunitha. "Data Mining of Medical Datasets with Missing Attributes from Different Sources." Master's thesis, Youngstown State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ysu1300298263
Download Count:
1,782
Copyright Info
© 2010, all rights reserved.
This open access ETD is published by Youngstown State University and OhioLINK.