Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
kent1334277940.pdf (396.42 KB)
ETD Abstract Container
Abstract Header
A Tree-based Framework for Difference Summarization
Author Info
Li, Rong
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940
Abstract Details
Year and Degree
2012, MS, Kent State University, College of Arts and Sciences / Department of Computer Science.
Abstract
Understanding the differences between two datasets is a fundamental data mining question and is also ubiquitously important across many real world scientific applications. In this paper, we propose a tree-based framework to provide a parsimonious explanation of the difference between two distributions based on rigorous two-sample statistical test. We develop two efficient approaches. The first one is a dynamic programming approach that finds a minimal number of data subsets that describe the difference between two data sets. The second one is a greedy approach that approximates the dynamic programming approach. We employ the well-known Friedman's MST (minimal spanning tree) statistics for two-sample statistical tests in our summarization tree construction, and develop novel techniques to speedup its computational procedure. We performed a detailed experimental evaluation on both real and synthetic datasets and demonstrated the effectiveness of our tree-summarization approach.
Committee
Ruoming Jin (Advisor)
Yuri Breitbart (Advisor)
Feodor Dragan (Committee Member)
Peyravi Hassan (Committee Member)
Keywords
Chi-square test
;
Friedman-Rafsky test
;
Kolmogorov-Smirnov test
;
difference summarization
;
minimal spanning tree
;
two-sample test
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Li, R. (2012).
A Tree-based Framework for Difference Summarization
[Master's thesis, Kent State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940
APA Style (7th edition)
Li, Rong.
A Tree-based Framework for Difference Summarization.
2012. Kent State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940.
MLA Style (8th edition)
Li, Rong. "A Tree-based Framework for Difference Summarization." Master's thesis, Kent State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1334277940
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
kent1334277940
Download Count:
526
Copyright Info
© 2012, all rights reserved.
This open access ETD is published by Kent State University and OhioLINK.