Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ucin1330024393.pdf (2.97 MB)
ETD Abstract Container
Abstract Header
Data Quality Assessment Methodology for Improved Prognostics Modeling
Author Info
Chen, Yan
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1330024393
Abstract Details
Year and Degree
2012, PhD, University of Cincinnati, Engineering and Applied Science: Mechanical Engineering.
Abstract
Recently there is a recognized trend of increasing interests in Prognostics and Health management (PHM) techniques from the automotive, renewable energy, and petrochemical industry. Considerable efforts and time are spent on the acquisition of a large amount of data from which system behavior information is expected to be extracted. However, many data quality issues hinder the data to information conversion, for example, signal noise caused by hardware error and disturbances, redundant/incomplete features, and outlier instances during data preparation. Data sets with these data quality issues not only cause a waste of time and cost, but also paralyze further PHM development. Currently, although a large amount of data mining techniques have been developed to cope with similar issues in clinical research, imaging process, and other areas, in the prognostics and health management field, there are limited systematic methods to guarantee that the collected data will be sufficient to model multiple system failure modes or their degradation mechanism. This has led us to look for a systematic data quality evaluation and improvement methodology based on the enrichment of data mining techniques. In this dissertation, the goal is to establish methods to evaluate and improve the quality of the training data used for system health diagnostic modeling. Inspired by spectral graph clustering techniques, a set of methods are proposed to evaluate training data quality and improve them by filtering out instance outliers and refining feature selection process. In the proposed quality evaluation method, data inherent cluster structures are first revealed. Then considering these structures ideally are to be used as data models of system behavior, their fitness as an independent cluster and their separation with others are quantitatively measured by a set of selected metrics. To improve the corresponding data quality, on one hand, a filtering method is proposed to detect outliers by analyzing two graphical objects that are constructed over the data instances. Local Outlier Factors (LOFs) are also calculated for discovered outlier candidates as to quantify and rank their outlier-ness. On the other hand, a feature ranking based optimization method is introduced to select the optimal feature set for the best data structure formulation. All proposed data improvement methods use a concept of graph Laplacian, such as non-linear Laplacian embedding based data filtering method and Ratio-Laplacian score for feature ranking. Besides the typical data mining testing data set, two experiment datasets from real applications provided by IMS member companies were used to validate the performance of proposed methods. Some popular methods are also compared with the proposed method in terms of performance and accuracy. The study proves that the proposed method has competitive advantages when handling nonlinear factors comparing with Principal Component Analysis in terms of space embedding and Information Gain in terms of feature ranking criterion.
Committee
Jay Lee, PhD (Committee Chair)
Radu Pavel, PhD (Committee Member)
Hongdao Huang, PhD (Committee Member)
Manish Kumar, PhD (Committee Member)
Pages
105 p.
Subject Headings
Industrial Engineering
Keywords
Prognostics and Health Management
;
Data Quality
;
Laplacian Eigenmap
;
diagnostic modeling
;
Feature Ranking
;
Outlier Detection
;
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Chen, Y. (2012).
Data Quality Assessment Methodology for Improved Prognostics Modeling
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1330024393
APA Style (7th edition)
Chen, Yan.
Data Quality Assessment Methodology for Improved Prognostics Modeling.
2012. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1330024393.
MLA Style (8th edition)
Chen, Yan. "Data Quality Assessment Methodology for Improved Prognostics Modeling." Doctoral dissertation, University of Cincinnati, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1330024393
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1330024393
Download Count:
866
Copyright Info
© 2012, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.