Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Unsupervised Anomaly Detection in Numerical Datasets

Joshi, Vineet

Abstract Details

2015, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Anomaly detection is an important problem in data mining with diverse applications such as financial fraud detection, medical diagnosis and computer systems intrusion detection. Anomalies are data points that are substantially different from the rest of the population. These generally represent valuable information about the system for which the analyst is interested in detecting anomalies accurately and efficiently, and then taking appropriate actions in response. There are scenarios where tremendous impact can be made by detecting anomalies in a timely and accurate manner, e.g. early detection of spurious credit card transactions can prevent financial damages to a credit card holder as well as the banking institution that issued the credit card. Similarly, abnormal readings by a sensor monitoring an industrial plant can help detect system faults and avert damages. All these applications have led to an interest in finding efficient methods for detecting anomalies. Anomaly detection continues to be an active research area within data mining. In this dissertation we investigate various aspects of anomaly detection problem. To determine anomalies in a dataset, a concrete definition of anomalous behavior is required. There is no single universally applicable definition of anomalies because each definition presents perspective of an anomalous behavior which may not necessarily apply across diverse datasets. In this work we investigate a new definition of anomalous behavior. We compare this definition with an existing definition of outlier-ness and demonstrate the effectiveness of the new definition. We further present a refinement of the metric of outlier-ness that we have mentioned above. We discovered that the metric initially proposed can be altered to yield a new metric of outlier-ness that accentuates the difference in the outlier-ness scores of strong outliers as compared to the non-anomalous datapoints. We compare this updated metric with the metric we first presented, and also with an established metric of outlier-ness. As the number of attributes increases, the distances between the nearest and the farthest data points tend to converge resulting in distance concentration. Thus the anomalies reported by most definitions of anomalous behavior tend to lose meaning with increasing numbers of attributes. It has been suggested that in such datasets, the anomalies are located in smaller subspaces of attributes. Hence, anomalies should be searched in subspaces of the attributes, instead of the complete attribute space. However the number of subspaces increases very rapidly as the number of attributes increases. The number of possible subspaces for a given set of attributes in the dataset is a combinatorial number. This makes, an exhaustive search through all possible subspaces infeasible. In this dissertation, after presenting a novel definition of anomalous behavior, we present an efficient method of exploring the possible subspaces arising from the attributes of a dataset. The subspaces of attributes in any dataset can be arranged in a lattice. The anomalous behavior of data points as we traverse this lattice conveys meaningful information about the structure of the data. In the fourth problem that we address, we present a method that investigates the anomalous behavior of data points across the different subspaces in the lattice in which the same point displays anomalous behavior. Further, our method also computes the contiguous regions of the subspace lattice where the same data point demonstrates anomalous behavior.
Raj Bhatnagar, Ph.D. (Committee Chair)
Prabir Bhattacharya, Ph.D. (Committee Member)
Karen Davis, Ph.D. (Committee Member)
Anil Jegga, D.V.M. M.Res. (Committee Member)
Mario Medvedovic, Ph.D. (Committee Member)
143 p.

Recommended Citations

Citations

  • Joshi, V. (2015). Unsupervised Anomaly Detection in Numerical Datasets [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744

    APA Style (7th edition)

  • Joshi, Vineet. Unsupervised Anomaly Detection in Numerical Datasets. 2015. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744.

    MLA Style (8th edition)

  • Joshi, Vineet. "Unsupervised Anomaly Detection in Numerical Datasets." Doctoral dissertation, University of Cincinnati, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799744

    Chicago Manual of Style (17th edition)