Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Geometric and Statistical Summaries for Big Data Visualization

Chaudhuri, Abon

Abstract Details

2013, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
In recent times, the visualization and data analysis paradigm is adapting fast to keep up with the rapid growth in computing power and data size. Modern scientific simulations run at massive scale to produce huge datasets, which must be analyzed and visualized by the domain experts to continue innovation. In the presence of large-scale data, it is important to identify and extract the informative regions at an early stage so that the following analysis algorithms, which are usually memory and compute-intensive, can focus only on those regions. Transforming the raw data to a compact yet meaningful representation also helps to maintain the interactivity of the query and visualization of analysis results. In this dissertation, we propose a novel and general-purpose framework suitable for exploring large-scale data. We propose to use importance-based data summaries, which can substitute for the raw data to answer queries and drive visual exploration. Since the definition of importance is dependent on the nature of the data and the task at hand, we propose to use suitable statistical and geometric measures or combination of various measures to quantify importance and perform data reduction on scalar and vector field data. Our research demonstrates two instances of the proposed framework. The first instance applies to large number of streamlines computed from vector fields. We make the visual exploration of such data much easier compared to navigating through a cluttered 3D visualization of the raw data. In this case, we introduce a fractal dimension based metric called box counting ratio, which quantifies the geometric complexity of streamlines (or parts of streamlines) by their space-filling capacity. We utilize this metric to extract, organize and visualize streamlines of varying density and complexity hidden in large number of streamlines. The extracted complex regions from the streamlines represent the data summaries in this case. We organize and present them on an interactive 2D information space, which allows user selection and visualization of streamlines in the original spatial domain. We also extend this framework to support exploration using an ensemble of measures including the box counting ratio. We strengthen our claims with elaborate case studies using combustion and climate simulation datasets. We also use our framework to speed up query-driven exploration of volume data. Our approach speeds up range query response by using distribution-based data summaries as opposed to repeatedly scanning sub-domains of the raw data. Our work is mainly concerned with the range distribution query, which returns the distribution of an axis-aligned query region. Since the response time of such queries scales up with the data and the query size, maintaining interactivity is a challenging task. Our research offers the ability to answer distribution query for any arbitrary region in constant time, regardless of data and query size. We adapt an integral image based data structure to reduce the computation, I/O and communication cost of answering queries, and propose a similarity-based indexing technique to reduce the storage cost of the data structure. Our scheme exploits the similarity present among the nearby regions in the data, and hence, their respective distributions. We demonstrate the benefits that our technique offers to many visualization applications which directly or indirectly require distributions.
Han-Wei Shen (Advisor)
Roger Crawfis (Committee Member)
Rephael Wenger (Committee Member)
Tom Peterka (Committee Member)
152 p.

Recommended Citations

Citations

  • Chaudhuri, A. (2013). Geometric and Statistical Summaries for Big Data Visualization [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382235351

    APA Style (7th edition)

  • Chaudhuri, Abon. Geometric and Statistical Summaries for Big Data Visualization. 2013. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1382235351.

    MLA Style (8th edition)

  • Chaudhuri, Abon. "Geometric and Statistical Summaries for Big Data Visualization." Doctoral dissertation, Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1382235351

    Chicago Manual of Style (17th edition)