Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Machine Learning for Host-based Misuse and Anomaly Detection in UNIX Environment

Abstract Details

2017, Master of Science, University of Toledo, Engineering (Computer Science).
This thesis focuses on three individual studies about intrusion detection systems using different pre-processing techniques and classifiers on ADFA-LD dataset. ADFA-LD entails thousands of systems call traces, which are collected during seven different situations including normal and six types of attack in the UNIX environment. First study presents development and application of a frequency-based misuse intrusion detection system which is accomplished through an ensemble classification. It entails preprocessing the raw ADFA-LD system call traces with N-gram feature extraction methodology, and generating fixed size patterns whose attributes are N-grams for N value in the range 1 to 10 for training and testing. In order to generate the signature of each class and to reduce the dimensionality, we filtered the features in two steps; selecting the most frequent unique attributes, and picking the most frequent features regardless of uniqueness. The five-random-neighbor SMOTE algorithm is used to balance the classes in terms of pattern counts. The classifier design is based on majority voting ensemble with base classifiers of naive Bayes, support vector machine, PART, decision tree and random forest as they are implemented in the Weka machine-learning framework. The proposed misuse detection system demonstrated very high performance in detecting attacks. In the second study, the misuse detection system employs ADFA-LD system call traces to extract features using principal components analysis (PCA). In this study, fixed size patterns for both training and testing, namely Eigentraces, are generated by preprocessing the ADFA-LD system call traces with the PCA methodology. Eigentraces serve as templates for known normal and attack class traces. Classification of system call trace data that is in the form of feature vectors is accomplished using the k-nearest-neighbor algorithm. A simulation study was conducted to evaluate the performance of the proposed system. The proposed misuse intrusion detection system demonstrated very high performance in detecting attacks and predicting the type of the attacks given that there were six classes of attacks, and as such, appears very promising. In the third study, we modeled a host-based anomaly detection system within the framework of one-class classification using the ADFA-LD dataset. Pre-processing and feature extraction procedures employed windowing on the system-call trace data followed by the application of PCA-based Eigentraces technique. The target or normal class probability function is modeled by two separate machine learners: Radial Basis Function neural network and Random Forest. The normal class density function is estimated using Bayes’ theorem. A simulation study showed that the proposed intrusion detection system offers high performance in detecting anomalies and normal activities accurately.
Gursel Serpen (Committee Chair)
Henry Ledgard (Committee Member)
Ahmad Y. Javaid (Committee Member)
116 p.

Recommended Citations

Citations

  • Aghaei, E. (2017). Machine Learning for Host-based Misuse and Anomaly Detection in UNIX Environment [Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1493255965690437

    APA Style (7th edition)

  • Aghaei, Ehsan. Machine Learning for Host-based Misuse and Anomaly Detection in UNIX Environment. 2017. University of Toledo, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=toledo1493255965690437.

    MLA Style (8th edition)

  • Aghaei, Ehsan. "Machine Learning for Host-based Misuse and Anomaly Detection in UNIX Environment." Master's thesis, University of Toledo, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1493255965690437

    Chicago Manual of Style (17th edition)