Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Analysis and Modeling of the Structure of Semantic Dynamics in Texts

Abstract Details

2017, MS, University of Cincinnati, Engineering and Applied Science: Electrical Engineering.
The analysis of texts has recently become a very active and important area of research because of the exponential growth in electronic documents and the need for their automatic analysis. In particular, it is important to extract meaning from documents for purposes such as classification, interpretation, summarization, etc. The standard approach has been to identify keywords and extract topics. The most widely used topic extraction methods such as latent Dirichlet allocation (LDA) use a bag-of-words approach, where each word in the document is assumed to be chosen independently. Alternatively, there are Markov models – often based on n-grams – that look at the transition probabilities between words or groups of words in an attempt to extract temporal semantic patterns. However, the structure of thought is more complicated than either of these two models assumes. The research presented in this thesis is based on a framework that models thinking as a hierarchical itinerant dynamical process. In this view, a document (or speech) is a sequence of variable length semantic blocks, each representing a single coherent thought, with transitions between blocks and intervening gaps of low semantic content. Importantly, the model looks at semantic coherence at the word level, sentence level and block level. Analyzing this semantic structure for individual documents and large corpora of documents is useful for several reasons: 1) It helps identify general patterns and parameters of human thinking; 2) It allows the writing style of documents to be characterized; and 3) It potentially provides an automated way to uncover the deeper ideas underlying the document, including the structure of its argument. This thesis focuses on the first aspect. Using several corpora of research papers from the International Joint Conference on Neural Networks over multiple years, it looks at the characteristics of semantic blocks and their transition statistics, building towards a hierarchical Markovian view of semantic composition. In particular, this is done by looking at the probability of successive sentences belonging to the same block or a different block or gap. The analysis shows a remarkable consistency in these parameters across corpora separated by many years and involving different authors. This suggests the existence of certain standard temporal patterns in thinking. The results indicate that the mean duration of a single coherent thought is about 6 sentences, albeit with significant variation. While blocks of different ideas can follow each other without gaps, such sequences are typically not very long. The probabilities of transitioning from a block to the next depend slightly on a block’s position in this sequence, but are very consistent across corpora. The results are validated by using three different models of idea coherence, all of which lead to similar statistics.
Ali Minai, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
75 p.

Recommended Citations

Citations

  • Ren, Z. (2017). Analysis and Modeling of the Structure of Semantic Dynamics in Texts [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1512045439740177

    APA Style (7th edition)

  • Ren, Zhaowei. Analysis and Modeling of the Structure of Semantic Dynamics in Texts. 2017. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1512045439740177.

    MLA Style (8th edition)

  • Ren, Zhaowei. "Analysis and Modeling of the Structure of Semantic Dynamics in Texts." Master's thesis, University of Cincinnati, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1512045439740177

    Chicago Manual of Style (17th edition)