Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Liu, Yang 2.pdf (786.71 KB)
ETD Abstract Container
Abstract Header
Improving the Accuracy of Variable Selection Using the Whole Solution Path
Author Info
Liu, Yang
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1435858170
Abstract Details
Year and Degree
2015, Doctor of Philosophy (Ph.D.), Bowling Green State University, Statistics.
Abstract
The performances of penalized least squares approaches profoundly depend on the selection of the tuning parameter; however, statisticians did not reach consensus on the criterion for choosing the tuning parameter. Moreover, the penalized least squares estimation that based on a single value of the tuning parameter suffers from several drawbacks. The tuning parameter selected by the traditional selection criteria such as AIC, BIC, CV tends to pick excessive variables, which results in an over-fitting model. On the contrary, many other criteria, such as the extended BIC that favors an over-sparse model, may run the risk of dropping some relevant variables in the model. In the dissertation, a novel approach for the feature selection based on the whole solution paths is proposed, which significantly improves the selection accuracy. The key idea is to partition the variables into the relevant set and the irrelevant set at each tuning parameter, and then select the variables which have been classified as relevant for at least one tuning parameter. The approach is named as Selection by Partitioning the Solution Paths (SPSP). Compared with other existing feature selection approaches, the proposed SPSP algorithm allows feature selection by using a wide class of penalty functions, including Lasso, ridge and other strictly convex penalties. Based on the proposed SPSP procedure, a new type of scores are presented to rank the importance of the variables in the model. The scores, noted as Area-out-of-zero-region Importance Scores (AIS), are defined by the areas between the solution paths and the boundary of the partitions over the whole solution paths. By applying the proposed scores in the stepwise selection, the false positive error of the selection is remarkably reduced. The asymptotic properties for the proposed SPSP estimator have been well established. It is showed that the SPSP estimator is selection consistent when the original estimator is either estimation consistent or selection consistent. Specially, the SPSP approach on the Lasso has been proved to be consistent over the whole solution paths under the irrepresentable condition. Additionally, a number of simulation studies have been conducted to illustrate the performance of the proposed approachs. The comparison between the SPSP algorithm and the existing selection criteria on the Lasso, the adaptive Lasso, the SCAD and the MCP were provided. The results showed the proposed method outperformed the existing variable selection methods in general. Finally, two real data examples of identifying the informative variables in the Boston housing data and the glioblastoma gene expression data are given. Compared with the models selected by other existing approaches, the models selected by the SPSP procedure are much simpler with relatively smaller model errors.
Committee
Hanfeng Chen (Committee Chair)
Peng Wang (Advisor)
James Albert (Committee Member)
Jonathan Bostic (Other)
Pages
110 p.
Subject Headings
Statistics
Keywords
variable selection
;
high dimensional data
;
SPSP
;
AIS
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Liu, Y. (2015).
Improving the Accuracy of Variable Selection Using the Whole Solution Path
[Doctoral dissertation, Bowling Green State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1435858170
APA Style (7th edition)
Liu, Yang.
Improving the Accuracy of Variable Selection Using the Whole Solution Path.
2015. Bowling Green State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1435858170.
MLA Style (8th edition)
Liu, Yang. "Improving the Accuracy of Variable Selection Using the Whole Solution Path." Doctoral dissertation, Bowling Green State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1435858170
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
bgsu1435858170
Download Count:
1,078
Copyright Info
© 2015, all rights reserved.
This open access ETD is published by Bowling Green State University and OhioLINK.