Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

A Tree-based Framework for Difference Summarization

Abstract Details

2012, MS, Kent State University, College of Arts and Sciences / Department of Computer Science.
Understanding the differences between two datasets is a fundamental data mining question and is also ubiquitously important across many real world scientific applications. In this paper, we propose a tree-based framework to provide a parsimonious explanation of the difference between two distributions based on rigorous two-sample statistical test. We develop two efficient approaches. The first one is a dynamic programming approach that finds a minimal number of data subsets that describe the difference between two data sets. The second one is a greedy approach that approximates the dynamic programming approach. We employ the well-known Friedman's MST (minimal spanning tree) statistics for two-sample statistical tests in our summarization tree construction, and develop novel techniques to speedup its computational procedure. We performed a detailed experimental evaluation on both real and synthetic datasets and demonstrated the effectiveness of our tree-summarization approach.
Ruoming Jin (Advisor)
Yuri Breitbart (Advisor)
Feodor Dragan (Committee Member)
Peyravi Hassan (Committee Member)

Recommended Citations

Citations

  • Li, R. (2012). A Tree-based Framework for Difference Summarization [Master's thesis, Kent State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940

    APA Style (7th edition)

  • Li, Rong. A Tree-based Framework for Difference Summarization. 2012. Kent State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940.

    MLA Style (8th edition)

  • Li, Rong. "A Tree-based Framework for Difference Summarization." Master's thesis, Kent State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940

    Chicago Manual of Style (17th edition)