Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Abstract Details

2022, PhD, University of Cincinnati, Engineering and Applied Science: Mechanical Engineering.
Data quantity can be one paramount issue in many industries as data collection can be constrained due to several mitigating factors. Likewise, another common issue lies in data quality, specifically data imbalance, which becomes a topic of interest as it can introduce bias to the models themselves. For many applications, imbalanced data sets are also unavoidable, where there exists a clear disproportion in the distribution between the classes. Researchers have aimed to address the data imbalance issue with many approaches. However, some unique applications in both manufacturing and medicine have provided opportunity for a new methodology, leveraging existing resampling techniques with class information that has not been utilized. The class information in these applications lie in the ordinal class outputs, where there is a clear natural order to the classes, but not necessarily a linear relationship between these classes (which would have otherwise allowed for regression-based methodologies). Further, these applications have data availability challenges, where data cannot be forcefully generated or inducing failure is possible but producing a degree of failure becomes extremely difficult. The scope of this work is to present a designed, systematic methodology to address severe data imbalance for ordinal output classification problems. Looking at these applications has led to the proposed methodology for these sparse minority class datasets for two different data types as well: waveform data and image data. This methodology is highlighted by the core novelty in a new oversampling technique, the Class Ordered Synthetic Minority Oversampling Technique, where synthetic data can be generated to better define the boundaries between classes, and thus increasing performance on out of sample data. In each scenario, for waveform and image data, a proposed validation for these synthetically generated data points is also presented. This methodology and approach are demonstrated in four case studies: defining intracranial pressure burden (ICP) with respect to patient outcomes, translating qualitative electrocorticography (ECoG) characteristics to quantitative features, increasing accuracy of manufacturing quality inspections and a multi-component manufacturing quality inspection, the latter two of which use image data versus waveform data.
Jay Lee, Ph.D. (Committee Member)
Jay Kim, Ph.D. (Committee Member)
Paul Kawka, Ph.D. (Committee Member)
Xiaodong Jia, Ph.D. (Committee Member)
188 p.

Recommended Citations

Citations

  • Pahren, L. (2022). A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337

    APA Style (7th edition)

  • Pahren, Laura. A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique. 2022. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337.

    MLA Style (8th edition)

  • Pahren, Laura. "A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique." Doctoral dissertation, University of Cincinnati, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337

    Chicago Manual of Style (17th edition)