Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models

Ocloo, Isaac Xoese

Abstract Details

2021, Doctor of Philosophy (Ph.D.), Bowling Green State University, Statistics.
In this research, we investigate the sequential lasso method for feature selection in sparse high dimensional linear models. It was recently proposed by Luo and Chen (2014). In this project, we propose a new method by introducing the energy distance correlation by Szekely et al. (2007) to replace the ordinary correlation in Luo and Chen's algorithm. We continue to adopt the extended Bayesian Information Criteria as the stopping criteria in the computing algorithm. The advantage of energy distance correlation is that it is able to detect linear and non-linear association between two variables, while the ordinary correlation can detect only linear part of association between two variables. As a result, it appears that the new method is shown to be more powerful than Luo and Chen's method for feature selections. This is demonstrated by simulation studies and illustrated by two real-life examples. It is shown that the proposed new algorithm is also selection consistent. For the first part of our research we examine through simulations the model size selection by Adaptive Lasso and SCAD after a sure screening method proposed by Li et al. (2012) using distance correlation is applied to the data first. We observe that the average model size selected was quite high. In the second part we describe the new sequential variable selection method which we call energy distance correlation with extended Bayesian Information Criteria (Edc+EBIC). At each stage of the sequential procedure we maximize the energy distance correlation between the response and each of the predictor variables. This maximization is done such that if a variable is selected in the previous stage, it's contribution to the response is removed so that it won't have a chance of being selected again. The active set of selected variables is updated once a variable is selected and the EBIC of the set is calculated. The process stops if the EBIC for the current active set is greater than the EBIC of the previous active set. We compare the performance of Edc+EBIC with sequential Lasso, Adaptive Lasso, SCAD and SIS+SCAD. We observed that our proposed method on average has a positive discovery rate close to 100%, a low false discovery rate and an average model size as expected in our simulation set-up.
Hanfeng Chen , Dr (Advisor)
Yuning Fu, Dr (Committee Member)
Wei Ning, Dr (Committee Member)
Maria Rizzo, Dr (Committee Member)
71 p.

Recommended Citations

Citations

  • Ocloo, I. X. (2021). Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models [Doctoral dissertation, Bowling Green State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1625238661031258

    APA Style (7th edition)

  • Ocloo, Isaac. Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models. 2021. Bowling Green State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1625238661031258.

    MLA Style (8th edition)

  • Ocloo, Isaac. "Energy Distance Correlation with Extended Bayesian Information Criteria for feature selection in high dimensional models." Doctoral dissertation, Bowling Green State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1625238661031258

    Chicago Manual of Style (17th edition)