Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Visualization and Unsupervised Pattern Recognition in Multidimensional Data Using a New Heuristic for Linear Data Ordering

Abstract Details

2016, Doctor of Philosophy (Ph.D.), Bowling Green State University, Statistics.
In data-driven applications, understanding the structural relationship in the given data can greatly facilitate data analysis and decision making in their broadest sense. Many different tools, like multidimensional scaling and hierarchical clustering have been developed and used for this purpose. Seriation is another method. Given a sample of n objects and the corresponding dissimilarity matrix, seriation aims to produce a linear ordering of the objects. One uses the ordering to produce a heat map visualization of the reordered dissimilarity matrix and thus understand the structure of the data. Good orderings should reflect the underlying data structure and result in heat maps that are easy to read and allow for clear interpretation of the data structure. Since the pioneering work of F. Petrie in 1899, a substantial number of seriation methods have been developed. Which methods consistently produce good orderings? In the literature, some authors have made comparisons of different seriation methods. However, the number of seriation methods compared and the number of datasets used is relatively small. This dissertation conducts an evaluation study of the potential of 35 existing and one novel seriation methods to reveal the structure of data. Initial assessment of the potential is conducted for all 36 methods across six datasets with relatively simple data structure. Further assessment is conducted for the most successful seriation methods using another collection of six datasets with a more sophisticated data structure. The assessment results show that some seriation methods consistently produce orderings that are more helpful for understanding and visualization of the structure of data, and that some methods should only be used when their particular features are called for. The results also show that even the better methods should be used with proper caution. This dissertation introduces a new seriation method, called tree-penalized TSP (tpTSP), which compares favorably with other considered methods. Hybrid in nature, the method benefits from the strengths of two popular types of seriation methods, TSP and OLO, but avoids their key pitfalls. The datasets used for the performance evaluation and the R code for the new method are posted on Github.
Craig Zirbel (Advisor)
Haowen Xi (Committee Member)
Hanfeng Chen (Committee Member)
Maria Rizzo (Committee Member)
Junfeng Shang (Committee Member)
250 p.

Recommended Citations

Citations

  • Aliyev, D. A. (2016). Visualization and Unsupervised Pattern Recognition in Multidimensional Data Using a New Heuristic for Linear Data Ordering [Doctoral dissertation, Bowling Green State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1479420043962505

    APA Style (7th edition)

  • Aliyev, Denis. Visualization and Unsupervised Pattern Recognition in Multidimensional Data Using a New Heuristic for Linear Data Ordering . 2016. Bowling Green State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1479420043962505.

    MLA Style (8th edition)

  • Aliyev, Denis. "Visualization and Unsupervised Pattern Recognition in Multidimensional Data Using a New Heuristic for Linear Data Ordering ." Doctoral dissertation, Bowling Green State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1479420043962505

    Chicago Manual of Style (17th edition)