Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Identifying Patterns of Epistemic Organization through Network-Based Analysis of Text Corpora

Ghanem, Amer G.

Abstract Details

2015, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
The growth of on-line textual content has exploded in recent years, creating truly massive text corpora. As the quantity of text available on- line increases, professionals from different industries such as marketing and politics are realizing the importance of extracting useful information and insights from this treasure trove of data. It is also clear, however, that doing so requires methods that go beyond those developed for classical data processing or even natural language processing. In particular, there is great need for efficient methods that can make sense of the semantic content of this data, and allows new knowledge to be inferred from it. The research in this dissertation describes a new method for identify- ing latent structures (topics) in texts through the application of community extraction techniques on associative networks of words. Since humans rep- resent knowledge in terms of associations, it is asserted that deriving top- ics from associative networks represents a more cognitively meaningful approach than using purely statistical patterns. The topic identification method proposed in this thesis is called Topic Extraction through Partitioning of Lexical Associative Networks (TExPLAN). It begins by constructing an associative network of words where the strength of their association indicates the frequency of their co-occurrence in documents. Once the word network is constructed, the algorithm proceeds in two stages. In the first stage, a partitioning of the word network takes place using a community extraction method to extract disjoint seed topics. The second stage of TExPLAN uses the connectivity of words across the boundaries of seed topics to assign a relevance measure to each word in each topic, thus generating a set of topics where each one covers all the words in the vocabulary, as is the case with LDA. The topics extracted by TExPLAN are used to define an epistemic metric space in which epistemic entities such as words, texts, documents, collections of documents, etc. can be embedded and compared. Once the dimensions are defined, the entities are visualized in two-dimensional space using multidimensional scaling. Because of its generality, the different types of entities can be analyzed jointly in the epistemic space. For this part of the thesis, we demonstrate the capabilities of the approach by applying it to the DBLP dataset, identifying similar conferences based on their locations in the epistemic space and deriving areas of interest associated with each conference. We are also able to analyze the epistemic diversity of conferences and determine which ones tend to attract more diverse authors and publications. Another part of the analysis focuses on authors and their participation in conferences. We define prominent status and answer questions about authors that have this status. We also look at the different ways an author can become prominent, and tie that to their epistemic diversity. Finally, we look at prominent authors who tend to publish documents that are relatively far from the mainstream of the conference in which they were published, and identify authors who may potentially become prominent in the future.
Ali Minai, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Karen Davis, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
James Uber, Ph.D. (Committee Member)
286 p.

Recommended Citations

Citations

  • Ghanem, A. G. (2015). Identifying Patterns of Epistemic Organization through Network-Based Analysis of Text Corpora [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706

    APA Style (7th edition)

  • Ghanem, Amer. Identifying Patterns of Epistemic Organization through Network-Based Analysis of Text Corpora. 2015. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706.

    MLA Style (8th edition)

  • Ghanem, Amer. "Identifying Patterns of Epistemic Organization through Network-Based Analysis of Text Corpora." Doctoral dissertation, University of Cincinnati, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448274706

    Chicago Manual of Style (17th edition)