Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Thesis_final_submitted.pdf (768.67 KB)
ETD Abstract Container
Abstract Header
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters
Author Info
Landgraf, Andrew J
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558
Abstract Details
Year and Degree
2015, Doctor of Philosophy, Ohio State University, Statistics.
Abstract
Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. Exponential family PCA is a popular alternative to dimensionality reduction of discrete data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the exponential family log-likelihood. We propose a new formulation of generalized PCA which extends Pearson's mean squared error optimality motivation for PCA to members of the exponential family. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural parameters by projecting the saturated model parameters. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. When the data are binary, we derive explicit solutions of the new generalized PCA (or logistic PCA) for data matrices of special structure and provide a computationally efficient algorithm for the principal component loadings in general. We also formulate a convex relaxation of the original optimization problem, whose solution might be more effective for prediction, and derive an accelerated gradient descent algorithm. The method and algorithms for binary data are extended to other distributions, including Poisson and multinomial, and the scope of the new formulation for generalized PCA is further extended to incorporate weights, missing data, and variable normalization. These extensions enhance the utility of the proposed method for a variety of tasks such as collaborative filtering and visualization. Through simulation experiments, we compare our formulation of generalized PCA to ordinary PCA and the previous formulation to demonstrate its benefits on both binary and count datasets. In addition, two datasets are analyzed. In the binary medical diagnoses data, we show that the new logistic PCA is better able to explain and predict the probabilities than standard PCA, and is able to do so with many fewer parameters than the previous formulation. On a dataset consisting of users' song listening counts, we show that generalized PCA gives better visualization of the loadings than standard PCA and improves the prediction accuracy in a recommendation task.
Committee
Yoonkyung Lee (Advisor)
Vincent Vu (Committee Member)
Yunzhang Zhu (Committee Chair)
Pages
116 p.
Subject Headings
Statistics
Keywords
Binary data
;
Count data
;
Dimensionality reduction
;
Exponential family
;
Logistic PCA
;
Principal component analysis
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Landgraf, A. J. (2015).
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558
APA Style (7th edition)
Landgraf, Andrew.
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters.
2015. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558.
MLA Style (8th edition)
Landgraf, Andrew. "Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters." Doctoral dissertation, Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437610558
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1437610558
Download Count:
2,304
Copyright Info
© 2015, some rights reserved.
Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters by Andrew J Landgraf is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.