Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization

Zhou, Yi

Keyword Search

School Logo

Thesis.pdf (1.21 MB)

Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization

Author Info

Zhou, Yi

ORCID® Identifier

http://orcid.org/0000-0002-3982-9145

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658

Year and Degree

2018, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.

Abstract

The past decade has witnessed the success of machine learning (ML) in solving many challenging tasks such as image classification, social network prediction, etc. These ML techniques can achieve a better performance that is even beyond the limit of human beings. However, due to the ubiquitous nonconvexity nature of ML models, there is still a lack of understanding of how ML methods work from a theoretical aspect, which has become one of the main focus of ML community. In this dissertation, we provide theoretical understandings of nonconvex optimization in machine learning from three aspects, namely, 1) convergence guarantees of various optimization algorithms for solving generic nonconvex problems, 2) landscape properties of neural network loss functions, and 3) generalization error of stochastic algorithms in nonconvex machine learning. First, we establish convergence of optimization algorithms in nonconvex machine learning under the Kurdyka-Lojasiewicz (KL) property of the objective function. Two algorithms are investigated, i.e., the proximal gradient algorithm with momentum and the cubic-regularized newton's algorithm. In different parameter regimes of the KL property, we show that the former algorithm converges to a first-order stationary point and the latter algorithm converges to a second-order stationary point with corresponding asymptotic convergence rates. Then, we propose an implementation of the proximal gradient algorithm on a stale synchronous parallel system for solving large-scale optimization problems. Under both model parallelism and data parallelism setting, we establish the convergence of such parallel asynchronous algorithms to a stationary point of nonconvex problems. Second, we study the landscape properties of neural network loss functions. In specific, we provide a full characterization of the critical points as well as the global minimizers for linear neural networks and show that the corresponding loss functions have no spurious local minimum. We also show that nonlinear neural networks with ReLU activation function do have spurious local minimum. Lastly, we explore the generalization property of the stochastic gradient descent (SGD) algorithm in nonconvex optimization. Under both un-regularized and regularized setting, we establish the corresponding generalization error bounds for SGD in terms of the on-average variance of the stochastic gradients. Such results lead to improved generalization bounds for SGD and can explain the effect of the random labels on the generalization performance in experiments.

Committee

Yingbin Liang (Advisor)
Abhishek Gupta (Committee Member)
Philip Schniter (Committee Member)

Pages

178 p.

Subject Headings

Computer Science; Electrical Engineering

Keywords

machine learning; nonconvex optimization; generalization; neural networks

Zhou, Y. (2018). Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658
APA Style (7th edition)
Zhou, Yi. Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658.
MLA Style (8th edition)
Zhou, Yi. "Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658
Chicago Manual of Style (17th edition)

Document number:

osu1533554879269658

Download Count:

1,117

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations