Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation

Abstract Details

2017, Doctor of Philosophy, Ohio State University, Industrial and Systems Engineering.
Many decision problems are set in changing environments. For example, determining the optimal investment in cyber maintenance depends on whether there is evidence of an unusual vulnerability such as “Heartbleed” that is causing an especially high rate of incidents. This gives rise to the need for timely information to update decision models so that the optimal policies can be generated for each decision period. Social media provides a streaming source of relevant information, but that information needs to be efficiently transformed into numbers to enable the needed updates. This dissertation first explores the use of social media as an observation source for timely decision-making. To efficiently generate the observations for Bayesian updates, the dissertation proposes a novel computational method to fit an existing clustering model, called K-means Latent Dirichlet Allocation (KLDA). The method is illustrated using a cyber security problem related to changing maintenance policies during periods of elevated risk. Also, the dissertation studies four text corpora with 100 replications and show that KLDA is associated with significantly reduced computational times and more consistent model accuracy compared with collapsed Gibbs sampling. Because social media is becoming more popular, researchers have begun applying text analytics models and tools to extract information from these social media platforms. Many of the text analytics models are based on Latent Dirichlet Allocation (LDA). But these models are often poor estimators of topic proportions for emerging topics. Therefore, the second part of dissertation proposes a visual summarizing technique based on topic models, a point system, and Twitter feeds to support passive summarizing and sensemaking. The associated “importance score” point system is intended to mitigate the weakness of topic models. The proposed method is called TWitter Importance Score Topic (TWIST) summarizing method. TWIST employs the topic proportion outputs of tweets and assigns importance points to present trending topics. TWIST generates a chart showing the important and trending topics that are discussed over a given time period. The dissertation illustrates the methodology using two cyber-security field case study examples. Finally, the dissertation proposes a general framework to teach the engineers and practitioners how to work with text data. As an extension of Exploratory Data Analysis (EDA) in quality improvement problems, Exploratory Text Data Analysis (ETDA) implements text as the input data and the goal is to extract useful information from the text inputs for exploration of potential problems and causal effects. This part of the dissertation presents a practical framework for ETDA in the quality improvement projects with four major steps of ETDA: pre-processing text data, text data processing and display, salient feature identification, and salient feature interpretation. For this purpose, various case studies are presented alongside the major steps and tried to discuss these steps with various visualization techniques available in ETDA.
Theodore Allen (Advisor)
Steven MacEachern (Committee Member)
Cathy Xia (Committee Member)
Nena Couch (Other)
126 p.

Recommended Citations

Citations

  • SUI, Z. (2017). Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637

    APA Style (7th edition)

  • SUI, ZHENHUAN. Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation. 2017. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637.

    MLA Style (8th edition)

  • SUI, ZHENHUAN. "Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation." Doctoral dissertation, Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637

    Chicago Manual of Style (17th edition)