Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
34187.pdf (17.84 MB)
ETD Abstract Container
Abstract Header
State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features
Author Info
Bayram, Ulya
ORCID® Identifier
http://orcid.org/0000-0002-8150-4053
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657
Abstract Details
Year and Degree
2019, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Abstract
Text classification is a widely studied research problem, motivated by the need to process the exponentially growing number of digital documents. Over time, specific types of features and classifiers have shown persistently good performance on different textual data domains, and have become widely used. This dissertation focuses on the classification of texts based on state-of-mind using data from two domains: suicidal ideation and political affiliation. Various approaches are explored, including the standard one using word statistics as features in combination with supervised machine learning methods as well as one grounded in theories of human cognition -- specifically, conceptual association and spreading activation. An approach is proposed to capture a shared state-of-mind in the form of a lexical associative network using word associations in a given corpus. To test this, a novel semi-supervised classifier called excess weight density (EWD) is proposed that computes how well the thoughts in a given text fits the trained lexical networks of a particular state-of-mind. The experiments conducted on nineteen corpora show that this method outperforms the k-Nearest neighbors algorithm. The lexical networks are also used to generate features that are used alongside statistical features in supervised classifiers. Supervised classification performance is tested over several feature combinations using nine different methods including random forests, support vector machines, various feed-forward neural networks, and a convolutional neural network (CNN) with different embedding layer initialization. The results reveal many clues on text classification such as the importance of working with heterogeneous feature spaces. Further, the features that are most important for supervised classification are analyzed, and the results show interesting trends such as the success of lexical network features on capturing contextual and interpretable information. Next, ensemble approaches are evaluated and are found to improve the results. Finally, a longitudinal study is conducted on assessing the changes in the political state-of-mind from 1981 to 2016 in the U.S. Congress, showing results that are of interest from technical and historical viewpoints. Overall, the work in this dissertation represents a systematic evaluation of methods and choices available for state-of-mind classification in diverse domains, and leads to useful recommendations for such tasks. The methods studied -- including lexical networks and spreading activation -- can be also used for tasks beyond text classification, including text summarization, novelty detection, and text generation.
Committee
John Pestian, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Daniel Santel, Ph.D. (Committee Member)
Pages
377 p.
Subject Headings
Computer Science
Keywords
Machine Learning
;
Text Classification
;
Suicide
;
Party affiliation
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Bayram, U. (2019).
State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657
APA Style (7th edition)
Bayram, Ulya.
State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features.
2019. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657.
MLA Style (8th edition)
Bayram, Ulya. "State-of-Mind Classification From Unstructured Texts Using Statistical Features and Lexical Network Features." Doctoral dissertation, University of Cincinnati, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563274174606657
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1563274174606657
Download Count:
165
Copyright Info
© 2019, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.