Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Thesis.pdf (1.21 MB)
ETD Abstract Container
Abstract Header
Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization
Author Info
Zhou, Yi
ORCID® Identifier
http://orcid.org/0000-0002-3982-9145
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658
Abstract Details
Year and Degree
2018, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
Abstract
The past decade has witnessed the success of machine learning (ML) in solving many challenging tasks such as image classification, social network prediction, etc. These ML techniques can achieve a better performance that is even beyond the limit of human beings. However, due to the ubiquitous nonconvexity nature of ML models, there is still a lack of understanding of how ML methods work from a theoretical aspect, which has become one of the main focus of ML community. In this dissertation, we provide theoretical understandings of nonconvex optimization in machine learning from three aspects, namely, 1) convergence guarantees of various optimization algorithms for solving generic nonconvex problems, 2) landscape properties of neural network loss functions, and 3) generalization error of stochastic algorithms in nonconvex machine learning. First, we establish convergence of optimization algorithms in nonconvex machine learning under the Kurdyka-Lojasiewicz (KL) property of the objective function. Two algorithms are investigated, i.e., the proximal gradient algorithm with momentum and the cubic-regularized newton's algorithm. In different parameter regimes of the KL property, we show that the former algorithm converges to a first-order stationary point and the latter algorithm converges to a second-order stationary point with corresponding asymptotic convergence rates. Then, we propose an implementation of the proximal gradient algorithm on a stale synchronous parallel system for solving large-scale optimization problems. Under both model parallelism and data parallelism setting, we establish the convergence of such parallel asynchronous algorithms to a stationary point of nonconvex problems. Second, we study the landscape properties of neural network loss functions. In specific, we provide a full characterization of the critical points as well as the global minimizers for linear neural networks and show that the corresponding loss functions have no spurious local minimum. We also show that nonlinear neural networks with ReLU activation function do have spurious local minimum. Lastly, we explore the generalization property of the stochastic gradient descent (SGD) algorithm in nonconvex optimization. Under both un-regularized and regularized setting, we establish the corresponding generalization error bounds for SGD in terms of the on-average variance of the stochastic gradients. Such results lead to improved generalization bounds for SGD and can explain the effect of the random labels on the generalization performance in experiments.
Committee
Yingbin Liang (Advisor)
Abhishek Gupta (Committee Member)
Philip Schniter (Committee Member)
Pages
178 p.
Subject Headings
Computer Science
;
Electrical Engineering
Keywords
machine learning
;
nonconvex optimization
;
generalization
;
neural networks
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zhou, Y. (2018).
Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658
APA Style (7th edition)
Zhou, Yi.
Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization.
2018. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658.
MLA Style (8th edition)
Zhou, Yi. "Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1533554879269658
Download Count:
1,117
Copyright Info
© 2018, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.