High dimensional data is everywhere in our life and in all sectors of our society: text, image, audio, video, and other. Analyzing such rich data and understanding its behavioral and structural aspects is a fruitful process that will provide us with valuable information and insights, and eventually promote the decision making. Visual analytics significantly enhances the analysis of high dimensional data. It elegantly integrates computational tools with interactive techniques and visual representations to enable human-information discourse. Nevertheless, the high dimensionality and large scale have posed critical challenges for the data analysis and exploration. In this dissertation, we propose a set of visual analytical approaches to promote the understanding of the data. Essentially, the approaches aim to advance the visual analytical capabilities in clutter reduction, dimension management, categorical, and stream data visualization.
We propose density distribution map and tile-based parallel coordinates to allow users to investigate the relationship between dimensions. The tools are crafted to reduce visual clutter and highlight data patterns, trends, and anomalies. In addition, they are equipped with interactive features to manipulate the visualization results. An extensive case study in the performance of mutual fund is provided to show the effectiveness of the proposed methods.
Categorical data, which contains variables whose values comprise a set of discrete categories, is widely common. The discrete nature often confounds the direct application of existing multidimensional visualization techniques. We propose to use entropy-related measures to enhance the visualization of categorical data. The entropy information is employed to guide the ordering and filtering in parallel sets and scatter plot matrix visualizations. Furthermore, A novel TabularCluster visualization is proposed to depict cluster characteristics and leverage effective examination and comparison.
An interactive visualization system, named STREAMIT, that enables users to explore text streams without a prior knowledge is proposed. STREAMIT supports interactive exploration with increased scalability: First, keyword importance is adjustable on-the-fly for desirable clustering effects from varying interests. Second, topic modeling is used to represent the documents with higher level semantic meanings. Third, document clusters are created on the 2D layout to promote better understanding. Case studies and real-world applications are presented to demonstrate the effectiveness of STREAMIT.