Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
43050.pdf (6.45 MB)
ETD Abstract Container
Abstract Header
A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique
Author Info
Pahren, Laura
ORCID® Identifier
http://orcid.org/0000-0001-9648-9629
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337
Abstract Details
Year and Degree
2022, PhD, University of Cincinnati, Engineering and Applied Science: Mechanical Engineering.
Abstract
Data quantity can be one paramount issue in many industries as data collection can be constrained due to several mitigating factors. Likewise, another common issue lies in data quality, specifically data imbalance, which becomes a topic of interest as it can introduce bias to the models themselves. For many applications, imbalanced data sets are also unavoidable, where there exists a clear disproportion in the distribution between the classes. Researchers have aimed to address the data imbalance issue with many approaches. However, some unique applications in both manufacturing and medicine have provided opportunity for a new methodology, leveraging existing resampling techniques with class information that has not been utilized. The class information in these applications lie in the ordinal class outputs, where there is a clear natural order to the classes, but not necessarily a linear relationship between these classes (which would have otherwise allowed for regression-based methodologies). Further, these applications have data availability challenges, where data cannot be forcefully generated or inducing failure is possible but producing a degree of failure becomes extremely difficult. The scope of this work is to present a designed, systematic methodology to address severe data imbalance for ordinal output classification problems. Looking at these applications has led to the proposed methodology for these sparse minority class datasets for two different data types as well: waveform data and image data. This methodology is highlighted by the core novelty in a new oversampling technique, the Class Ordered Synthetic Minority Oversampling Technique, where synthetic data can be generated to better define the boundaries between classes, and thus increasing performance on out of sample data. In each scenario, for waveform and image data, a proposed validation for these synthetically generated data points is also presented. This methodology and approach are demonstrated in four case studies: defining intracranial pressure burden (ICP) with respect to patient outcomes, translating qualitative electrocorticography (ECoG) characteristics to quantitative features, increasing accuracy of manufacturing quality inspections and a multi-component manufacturing quality inspection, the latter two of which use image data versus waveform data.
Committee
Jay Lee, Ph.D. (Committee Member)
Jay Kim, Ph.D. (Committee Member)
Paul Kawka, Ph.D. (Committee Member)
Xiaodong Jia, Ph.D. (Committee Member)
Pages
188 p.
Subject Headings
Mechanical Engineering
Keywords
Class Imbalance
;
Classification
;
Data Resampling
;
Oversampling
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Pahren, L. (2022).
A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337
APA Style (7th edition)
Pahren, Laura.
A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique.
2022. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337.
MLA Style (8th edition)
Pahren, Laura. "A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique." Doctoral dissertation, University of Cincinnati, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1659533502197337
Download Count:
206
Copyright Info
© 2022, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.