Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features

Abstract Details

2019, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Text classification is a widely studied research problem, motivated by the need to process the exponentially growing number of digital documents. Over time, specific types of features and classifiers have shown persistently good performance on different textual data domains, and have become widely used. This dissertation focuses on the classification of texts based on state-of-mind using data from two domains: suicidal ideation and political affiliation. Various approaches are explored, including the standard one using word statistics as features in combination with supervised machine learning methods as well as one grounded in theories of human cognition -- specifically, conceptual association and spreading activation. An approach is proposed to capture a shared state-of-mind in the form of a lexical associative network using word associations in a given corpus. To test this, a novel semi-supervised classifier called excess weight density (EWD) is proposed that computes how well the thoughts in a given text fits the trained lexical networks of a particular state-of-mind. The experiments conducted on nineteen corpora show that this method outperforms the k-Nearest neighbors algorithm. The lexical networks are also used to generate features that are used alongside statistical features in supervised classifiers. Supervised classification performance is tested over several feature combinations using nine different methods including random forests, support vector machines, various feed-forward neural networks, and a convolutional neural network (CNN) with different embedding layer initialization. The results reveal many clues on text classification such as the importance of working with heterogeneous feature spaces. Further, the features that are most important for supervised classification are analyzed, and the results show interesting trends such as the success of lexical network features on capturing contextual and interpretable information. Next, ensemble approaches are evaluated and are found to improve the results. Finally, a longitudinal study is conducted on assessing the changes in the political state-of-mind from 1981 to 2016 in the U.S. Congress, showing results that are of interest from technical and historical viewpoints. Overall, the work in this dissertation represents a systematic evaluation of methods and choices available for state-of-mind classification in diverse domains, and leads to useful recommendations for such tasks. The methods studied -- including lexical networks and spreading activation -- can be also used for tasks beyond text classification, including text summarization, novelty detection, and text generation.
John Pestian, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Daniel Santel, Ph.D. (Committee Member)
377 p.

Recommended Citations

Citations

  • Bayram, U. (2019). State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657

    APA Style (7th edition)

  • Bayram, Ulya. State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features. 2019. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657.

    MLA Style (8th edition)

  • Bayram, Ulya. "State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features." Doctoral dissertation, University of Cincinnati, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657

    Chicago Manual of Style (17th edition)