Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

IONA: Intelligent Online News Analysis

Abstract Details

2018, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
The analysis of news content has been a central focus for media scholars, political scientists, sociologists and historians. Traditionally, it has been performed using relatively small news archives collected over limited periods. However, in the last 20 years, following the creation of the World-Wide Web, a dramatic change has occurred in the reporting and dissemination of news. One result of this change is that an ever growing archive of news is readily available as electronic text. This, in turn, is making it possible to analyze news on a large scale using methods developed in the fields of Web Intelligence, Data Mining and Machine Learning. The issues that news content analysis tries to address include: Identification of salient topics; summarization of stories; extraction of opinions; and characterization of news reports in terms of content, sentiment, bias, etc. These are also the motivating issues for this research. The research in this dissertation describes a framework called IONA: Intelligent Online News Analysis. This is meant to be a tool to accomplish four goals: 1) Extracting and visualizing important stories from real-time news streams; 2) Characterizing and comparing the cognitive/epistemic organization of all news in different media sources over the same time period; 3) Comparing the structure of specific stories from different media sources to characterize similarities, differences, and possible biases; and 4) Doing comparative analysis of how specific stories and the news streams from different media sources evolve over time in order to characterize the dynamics of news from each source. The IONA approach represents an innovative combination of methods from natural language processing, semantic analysis and complex networks. The identification of topics uses a novel algorithm that integrates Latent Dirichlet Allocation (\LDA) with tagging using Ngrams. The resulting topics are used to extract coherent sets of news reports from large corpora of very short news stories, each of which is then used to construct a summary in the form of a weighted semantic network. In addition to elucidating the global semantic structure of the story, this network also enables the identification of its core components and meaningful sub-units (motifs), which can then be used in the characterization of style, bias, sentiment, etc. The complexity of stories is characterized by both a graphical analysis of the extracted semantic networks and by a graph-based statistical analysis of the identified story sets. This analysis builds on deep theories of semantic cognition and creativity. IONA differs from other methods in current use in several ways: 1) It uses very large corpora with a large number of very short news stories, which presents unique challenges and opportunities for analysis; 2) It does not rely on any prior tagging of news stories, though such tagging could be used to improve performance further; 3) It relies on unsupervised learning rather than supervised or semi-supervised learning, which makes it more broadly applicable independent of specific domains; 4) It goes beyond the simple bag-of-words approach to include some structural and semantic information in the analysis; and 5) It represents a single integrated platform that ranges from data collection to highly abstract functions such as structural and temporal representation of extracted information. News plays a critical role in shaping opinions and thus driving events that affect billions of people. Deep analysis of news in terms of content, influence and bias are, therefore, not only important for media analysts but also for policy-makers, political and geopolitical thinkers, intelligence operations, corporate entities and, of course, the general public. IONA is expected to make a significant contribution in this regard.
Ali Minai, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Karen Davis, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Anca Ralescu, Ph.D. (Committee Member)
193 p.

Recommended Citations

Citations

  • Doumit, S. S. (2018). IONA: Intelligent Online News Analysis [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1544001149153864

    APA Style (7th edition)

  • Doumit, Sarjoun. IONA: Intelligent Online News Analysis. 2018. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1544001149153864.

    MLA Style (8th edition)

  • Doumit, Sarjoun. "IONA: Intelligent Online News Analysis." Doctoral dissertation, University of Cincinnati, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1544001149153864

    Chicago Manual of Style (17th edition)