Variable selection plays an important role in high dimensional data analysis where a large number of variables are given as potential predictors of a response of interest. Typically, it arises at two stages of statistical modeling, namely screening and formal model building, with different goals. Screening aims at filtering out irrelevant variables prior to model building where a formal description of a functional relation between the variables screened for relevance and the response is sought. Accordingly, proper comparison of variable selection methods calls for evaluation criteria that reflect the differential goals: accuracy in ranking order of variables for screening and prediction accuracy for formal modeling.
Without delineating the difference in the two aspects, confounding comparisons of various screening and selection methods have often been made in the literature, which may lead to misleading conclusions. In this dissertation, we present comprehensive numerical studies for comparison of three commonly used screening and selection procedures: correlation screening (a.k.a. sure independence screening), forward selection and LASSO in regression setting. By clearly differentiating these two aspects of variable selection, we highlight the situations where the performance of the three approaches differs, offering a guideline for proper choice of a method in practice. Furthermore, we discuss connections to relevant comparisons performed in the recent literature to clarify the different findings and conclusions.
We also conduct similar types of studies for comparison of two corresponding screening and selection procedures of LASSO and correlation screening in classification setting, i.e., $L_{1}$ penalized logistic regression and two-sample t-test. Initial results of exploratory analysis are presented to provide some insights on the preferred scenarios of the two methods respectively. Discussions are made on possible extensions, future works and difference between regression and classification setting.