Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data

Sengupta, Aritra

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Statistics.

This dissertation is comprised of an introductory chapter and three stand-alone chapters. The three main chapters are tied together by a common theme: empirical hierarchical spatial-statistical modeling of non-Gaussian datasets. Such non-Gaussian datasets arise in a variety of disciplines, for example, in health studies, econometrics, ecological studies, and remote sensing of the Earth by satellites, and they are often very-large-to-massive. When analyzing ``big data,'' traditional spatial statistical methods are computationally intensive and sometimes not feasible, even in supercomputing environments. In addition, these datasets are often observed over extensive spatial domains, which make the assumption of spatial stationarity unrealistic.

In this dissertation, we address these issues by using dimension-reduction techniques based on the Spatial Random Effects (SRE) model. We consider a hierarchical spatial statistical model consisting of a conditional exponential-family model for the observed data (which we call the data model), and an underlying (hidden) geostatistical process for some transformation of the (conditional) mean of the data model. Within the hierarchical model, dimension reduction is achieved by modeling the geostatistical process as a linear combination of a fixed number of basis functions, which results in substantial computational speed-ups. These models do not rely on specifying a spatial weights matrix, and no assumptions of homogeneity, stationarity, or isotropy are made.

Another focus of the research presented in this dissertation is to properly account for spatial heterogeneity that often exists in these datasets. For example, with county-level health data, the population at risk is different for different counties and is typically a source of heterogeneity. This type of heterogeneity, whenever it exists, needs to be incorporated into the hierarchical model. We address this through the use of an offset term and by properly weighting the SRE model (e.g., Chapter 2), and through data-model specifications (e.g., Chapter 3).

Following the introductory chapter, in Chapter 2 we consider spatial data in the form of counts. We consider a Poisson data model for the counts, and develop maximum likelihood (ML) estimates for the unknown parameters using an expectation-maximization (EM) algorithm. We illustrate the hierarchical nature of our approach to the spatial modeling of counts, through the analysis of a spatial dataset of Sudden Infant Death Syndrome (SIDS) counts for the counties of North Carolina.

Then, in Chapter 3, we extend the empirical hierarchical modeling framework of Chapter 2, which was developed for counts, to the exponential family of distributions. The data model is a conditionally independent exponential-family model. A process model is specified for some transformation of the (conditional) mean of the data model. We present the EM algorithm to obtain ML estimates of the unknown parameters in the empirical-hierarchical-modeling framework introduced in this chapter. The methodological results are illustrated and compared to some other approaches using a simulation study. We then apply our methodology to analyze a remote sensing dataset on aerosol optical depth (AOD).

Finally, in Chapter 4, we use the methodology developed in Chapter 3, to analyze a remote sensing dataset on clouds. We analyze the cloud data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument, which is on board the Terra satellite that was launched in December 1999. Clouds play an important role in climate studies, and hence an accurate quantification of the spatial distribution of clouds is necessary. In this chapter, we build a spatial statistical model for the underlying clear-sky-probability (or conversely, the cloud-probability) process, and we quantify the uncertainty in our predictions. We consider a hierarchical statistical model for analyzing the cloud data, where we postulate a hidden process for the probability of clear sky using a transformed SRE model. Its advantages are considerable: It can represent many types of spatial behavior, it permits fast computations when datasets are very large, and it has attractive change-of-support properties.

Noel Cressie, PhD (Advisor)
Radu Herbei, PhD (Committee Member)
Desheng Liu, PhD (Committee Member)
176 p.

Recommended Citations

Citations

  • Sengupta, A. (2012). Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056

    APA Style (7th edition)

  • Sengupta, Aritra. Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056.

    MLA Style (8th edition)

  • Sengupta, Aritra. "Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056

    Chicago Manual of Style (17th edition)