Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Information and Representation Tradeoffs in Document Classification

Abstract Details

2022, Master of Sciences, Case Western Reserve University, EECS - Computer and Information Sciences.
Significant prior work has proposed using topics as well as words in document classification, and many complex models have been developed to use a mix of different representations of words and topics. But how much do these different representations actually contribute to accuracy in document classification? We categorize existing document classification approaches into two axes: a syntactic/semantic/both axis that considers what kind of information the model uses and a word/topic/both axis that considers how that information is used. We conduct evaluation experiments using a uniform methodology to determine which classes of models are the most effective for the task of document classification. Surprisingly, our results show that there is little difference in overall classification performance between different classes of models on average across many datasets, and few methods outperform or produce sparser models than a basic word-based document classifier.
Soumya Ray (Committee Chair)
Mehmet Koyuturk (Committee Member)
Michael Lewicki (Committee Member)

Recommended Citations

Citations

  • Jin, T. (2022). Information and Representation Tradeoffs in Document Classification [Master's thesis, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1649340330508341

    APA Style (7th edition)

  • Jin, Timothy. Information and Representation Tradeoffs in Document Classification. 2022. Case Western Reserve University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1649340330508341.

    MLA Style (8th edition)

  • Jin, Timothy. "Information and Representation Tradeoffs in Document Classification." Master's thesis, Case Western Reserve University, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=case1649340330508341

    Chicago Manual of Style (17th edition)