A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Pahren, Laura

Keyword Search

School Logo

43050.pdf (6.45 MB)

A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Author Info

Pahren, Laura

ORCID® Identifier

http://orcid.org/0000-0001-9648-9629

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337

Year and Degree

2022, PhD, University of Cincinnati, Engineering and Applied Science: Mechanical Engineering.

Abstract

Data quantity can be one paramount issue in many industries as data collection can be constrained due to several mitigating factors. Likewise, another common issue lies in data quality, specifically data imbalance, which becomes a topic of interest as it can introduce bias to the models themselves. For many applications, imbalanced data sets are also unavoidable, where there exists a clear disproportion in the distribution between the classes. Researchers have aimed to address the data imbalance issue with many approaches. However, some unique applications in both manufacturing and medicine have provided opportunity for a new methodology, leveraging existing resampling techniques with class information that has not been utilized. The class information in these applications lie in the ordinal class outputs, where there is a clear natural order to the classes, but not necessarily a linear relationship between these classes (which would have otherwise allowed for regression-based methodologies). Further, these applications have data availability challenges, where data cannot be forcefully generated or inducing failure is possible but producing a degree of failure becomes extremely difficult. The scope of this work is to present a designed, systematic methodology to address severe data imbalance for ordinal output classification problems. Looking at these applications has led to the proposed methodology for these sparse minority class datasets for two different data types as well: waveform data and image data. This methodology is highlighted by the core novelty in a new oversampling technique, the Class Ordered Synthetic Minority Oversampling Technique, where synthetic data can be generated to better define the boundaries between classes, and thus increasing performance on out of sample data. In each scenario, for waveform and image data, a proposed validation for these synthetically generated data points is also presented. This methodology and approach are demonstrated in four case studies: defining intracranial pressure burden (ICP) with respect to patient outcomes, translating qualitative electrocorticography (ECoG) characteristics to quantitative features, increasing accuracy of manufacturing quality inspections and a multi-component manufacturing quality inspection, the latter two of which use image data versus waveform data.

Committee

Jay Lee, Ph.D. (Committee Member)
Jay Kim, Ph.D. (Committee Member)
Paul Kawka, Ph.D. (Committee Member)
Xiaodong Jia, Ph.D. (Committee Member)

Pages

188 p.

Subject Headings

Mechanical Engineering

Keywords

Class Imbalance; Classification; Data Resampling; Oversampling

Pahren, L. (2022). A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337
APA Style (7th edition)
Pahren, Laura. A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique. 2022. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337.
MLA Style (8th edition)
Pahren, Laura. "A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique." Doctoral dissertation, University of Cincinnati, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659533502197337
Chicago Manual of Style (17th edition)

Document number:

ucin1659533502197337

Download Count:

206

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

A Novel Data Imbalance Methodology Using a Class Ordered Synthetic Oversampling Technique

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations