Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

An Integrated Framework of Text and Visual Analytics to Facilitate Information Retrieval towards Biomedical Literature

Abstract Details

2018, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Digitalized scientific literature, as a special type of text articles, is considered valuable knowledge repository in widespread academic and practical settings. Biomedical literature has specifically played an important role in supporting evidence-based medicine and promoting quality healthcare. Given an information need such as a patient problem, information retrieval towards biomedical literature has been focusing on the identification of high relevant articles to support up-to-date knowledge synthetization and reliable decision making. In particular, high recall, high precision, and human involvement are expected for a rigorous information retrieval in healthcare. Despite the critical information needs requiring high effectiveness and efficiency, the information overload from the large volume and heterogeneous biomedical literature has placed challenges on that. In this dissertation, we propose an integrated and generalizable framework of text and visual analytics to facilitate the significant domain application of biomedical literature retrieval. We focus on the unmet and most challenging aspect of identifying high relevant articles from a text corpus, which is typically an article collection obtained via exhaustive literature search. We convert extensive biomedical articles to effective representations that encode underlying article meanings and indicate article relevancies; and promote advantageous visualizations to exploit and explore article representations so that humans can get involved in not only task accomplishment but also knowledge discovery. We first implement text analytics to generate machine-understandable article features and representations, and promote their effectiveness with multiple knowledge and computational resources. Consider the special format of biomedical literature, we start by investigating the fundamental lexical feature space consisting of diverse article elements and examine their usefulness in predicting article relevancy. We then proceed to semantic analysis of the most informative article titles and abstracts. We develop an ontology-based semantic method exploiting gold-standard domain knowledge in UMLS ontologies, and build a concept modelling process to represent articles with optimized and enriched UMLS concepts. We also embrace the unprecedented computational power of neural networks, and develop a corpus-based semantic method with a neural document embedding model. This model is trained with multiple tasks to not only capture context semantics, but also integrate task specifications with minimal supervision. The effectiveness of our approaches is demonstrated through a downstream application of active article recommendation. Our approaches are also affordable and generalizable in the biomedical and clinical community. We then implement visual analytics to exploit and explore established article representations in a human involving manner. We start with a concept demonstration that 2D visualizations of article representations (similarities) can reveal visual patterns that are beneficial to biomedical literature retrieval. To promote effective visualizations, we implement multiple visualization schemes, including sparsified article networks and article maps, and propose a new network sparsification scheme that preserves important article relationships and results in favorable 2D embeddings (placements) of articles. Furthermore, we expose the visualizations to real-world settings where scalability, interpretability, and interactivity are expected. Under this notion, we propose a visual analytics system which is built upon effective visualizations and equipped with interactive features to meet the visual analytics Mantra. It is also extendible to assist in visual evaluation and interpretation of text analytics results, such as semantic article representations learned with neural embedding models, to overcome the black-box nature and gain insights into the underlying mechanism such as semantic properties. We use experiments and use cases to demonstrate the usefulness of our visualizations and the visual analytics system in expediting biomedical literature retrieval.
Alan Ritter, Ph.D. (Advisor)
Po-Yin Yen, Ph.D. (Advisor)
Raghu Machiraju, Ph.D. (Committee Member)
231 p.

Recommended Citations

Citations

  • Ji, X. (2018). An Integrated Framework of Text and Visual Analytics to Facilitate Information Retrieval towards Biomedical Literature [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524199589980214

    APA Style (7th edition)

  • Ji, Xiaonan. An Integrated Framework of Text and Visual Analytics to Facilitate Information Retrieval towards Biomedical Literature. 2018. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1524199589980214.

    MLA Style (8th edition)

  • Ji, Xiaonan. "An Integrated Framework of Text and Visual Analytics to Facilitate Information Retrieval towards Biomedical Literature." Doctoral dissertation, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524199589980214

    Chicago Manual of Style (17th edition)