Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization

Abstract Details

2018, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
The past decade has witnessed the success of machine learning (ML) in solving many challenging tasks such as image classification, social network prediction, etc. These ML techniques can achieve a better performance that is even beyond the limit of human beings. However, due to the ubiquitous nonconvexity nature of ML models, there is still a lack of understanding of how ML methods work from a theoretical aspect, which has become one of the main focus of ML community. In this dissertation, we provide theoretical understandings of nonconvex optimization in machine learning from three aspects, namely, 1) convergence guarantees of various optimization algorithms for solving generic nonconvex problems, 2) landscape properties of neural network loss functions, and 3) generalization error of stochastic algorithms in nonconvex machine learning. First, we establish convergence of optimization algorithms in nonconvex machine learning under the Kurdyka-Lojasiewicz (KL) property of the objective function. Two algorithms are investigated, i.e., the proximal gradient algorithm with momentum and the cubic-regularized newton's algorithm. In different parameter regimes of the KL property, we show that the former algorithm converges to a first-order stationary point and the latter algorithm converges to a second-order stationary point with corresponding asymptotic convergence rates. Then, we propose an implementation of the proximal gradient algorithm on a stale synchronous parallel system for solving large-scale optimization problems. Under both model parallelism and data parallelism setting, we establish the convergence of such parallel asynchronous algorithms to a stationary point of nonconvex problems. Second, we study the landscape properties of neural network loss functions. In specific, we provide a full characterization of the critical points as well as the global minimizers for linear neural networks and show that the corresponding loss functions have no spurious local minimum. We also show that nonlinear neural networks with ReLU activation function do have spurious local minimum. Lastly, we explore the generalization property of the stochastic gradient descent (SGD) algorithm in nonconvex optimization. Under both un-regularized and regularized setting, we establish the corresponding generalization error bounds for SGD in terms of the on-average variance of the stochastic gradients. Such results lead to improved generalization bounds for SGD and can explain the effect of the random labels on the generalization performance in experiments.
Yingbin Liang (Advisor)
Abhishek Gupta (Committee Member)
Philip Schniter (Committee Member)
178 p.

Recommended Citations

Citations

  • Zhou, Y. (2018). Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658

    APA Style (7th edition)

  • Zhou, Yi. Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658.

    MLA Style (8th edition)

  • Zhou, Yi. "Nonconvex Optimization in Machine Learning: Convergence, Landscape, and Generalization." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1533554879269658

    Chicago Manual of Style (17th edition)