Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Geometric Methods for Robust Data Analysis in High Dimension

Anderson, Joseph T

Abstract Details

2017, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Data-driven applications are growing. Machine learning and data analysis now finds both scientific and industrial application in biology, chemistry, geology, medicine, and physics. These applications rely on large quantities of data gathered from automated sensors and user input. Furthermore, the dimensionality of many datasets is extreme: more details are being gathered about single user interactions or sensor readings. All of these applications encounter problems with a common theme: use observed data to make inferences about the world. Our work obtains the first provably efficient algorithms for Independent Component Analysis (ICA) in the presence of heavy-tailed data. The main tool in this result is the centroid body (a well-known topic in convex geometry), along with optimization and random walks for sampling from a convex body. This is the first algorithmic use of the centroid body and it is of independent theoretical interest, since it effectively replaces the estimation of covariance from samples, and is more generally accessible. We demonstrate that ICA is itself a powerful geometric primitive. That is, having access to an efficient algorithm for ICA enables us to efficiently solve other important problems in machine learning. The first such reduction is a solution to the open problem of efficiently learning the intersection of n + 1 halfspaces in Rn, posed in [43]. This reduction relies on a non-linear transformation of samples from such an intersection of halfspaces (i.e. a simplex) to samples which are approximately from a linearly transformed product distribution. Through this transformation of samples, which can be done efficiently, one can then use an ICA algorithm to recover the vertices of the intersection of halfspaces. Finally, we again use ICA as an algorithmic primitive to construct an efficient solution to the widely-studied problem of learning the parameters of a Gaussian mixture model. Our algorithm again transforms samples from a Gaussian mixture model into samples which fit into the ICA model and, when processed by an ICA algorithm, result in recovery of the mixture parameters. Our algorithm is effective even when the number of Gaussians in the mixture grows with the ambient dimension, even polynomially in the dimension. In addition to the efficient parameter estimation, we also obtain a complexity lower bound for a low-dimension Gaussian mixture model.
Luis Rademacher, PhD (Advisor)
Anastasios Sidiropoulos, PhD (Committee Chair)
Mikhail Belkin, PhD (Committee Member)
Facundo Mémoli, PhD (Committee Member)
178 p.

Recommended Citations

Citations

  • Anderson, J. T. (2017). Geometric Methods for Robust Data Analysis in High Dimension [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1488372786126891

    APA Style (7th edition)

  • Anderson, Joseph. Geometric Methods for Robust Data Analysis in High Dimension. 2017. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1488372786126891.

    MLA Style (8th edition)

  • Anderson, Joseph. "Geometric Methods for Robust Data Analysis in High Dimension." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1488372786126891

    Chicago Manual of Style (17th edition)