Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis

Abstract Details

2019, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
The advent of high-performance supercomputers enables scientists to perform extreme-scale simulations that generate millions of cells and thousands of time steps. Through exploring and analyzing the simulation outputs, scientists can gain a deeper understanding of the modeled phenomena. When the size of simulation output is small, the common practice is to simply move the data to the machines that perform post analysis. However, as the size of data grows, the limited bandwidth and capacity of networking and storage devices that connect the supercomputers to the analysis machine become a major bottleneck. Therefore, visualizing and analyzing large-scale simulation datasets are posing significant challenges. This dissertation addresses the big data challenge and suggests distribution-based in-situ techniques. The technique uses the same supercomputer resources to analyze the raw data and generate compact data proxies which use distribution to statistically summarize the raw data. Only the compact data proxies are moved to the post-analysis machine to overcome the bottleneck. Because the distribution-based data representation keeps the statistical data properties, it has the potential to facilitate flexible post-hoc data analysis and enable uncertainty quantification. We firstly focus on the problem of large data volume rendering on resource-limited post analysis machines. To tackle the limited I/O bandwidth and storage space challenge, distributions are used to summarize the data. When visualizing the data, importance sampling is proposed to draw a small number of samples and minimize the demand of computational power. The error of the proxies is quantified and visually presented to scientists by uncertainty animation. We also tackle the problem of error reduction when approximating the spatial information in distribution-based representations. The error could cause low visualization quality and hinder the data exploration. The basic distribution-based approach is augmented by our proposed spatial distribution which is represented by a three-dimensional Gaussian Mixture Model (GMM). The new representation not only improves the visualization quality but can also be used in various visualization techniques, such as volume rendering, uncertain isosurface, and salient feature exploration. Then, a technique is developed to tackle the problem of large-scale time-varying datasets. This representation stores the time-varying datasets with a lower temporal resolution and utilizes the temporal coherence to reconstruct the data at non-sampled time steps. Each pixel ray at a view at non-sampled time step is decoupled into a value distribution and samples' location information. Our representation utilizes the data coherence to recover the samples' location information and store less data. In addition, similar value distributions from multiple rays are represented by one distribution to save more storage. Finally, a statistical-based super resolution technique is proposed to solve the big data problem caused by a huge parameter space. Simulation runs with a few parameter samples output full resolution data which is used to create the prior knowledge. Data from rest of simulation runs in the parameter space is statistically down-sampled to compact representation in situ to reduce the data size. These compact data representation can be reconstructed to high resolution by combining with the prior knowledge for data analysis.
Han-Wei Shen (Advisor)
170 p.

Recommended Citations

Citations

  • Wang, K.-C. (2019). Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555452764885977

    APA Style (7th edition)

  • Wang, Ko-Chih. Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis. 2019. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1555452764885977.

    MLA Style (8th edition)

  • Wang, Ko-Chih. "Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis." Doctoral dissertation, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555452764885977

    Chicago Manual of Style (17th edition)