Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Ontology-guided Health Information Extraction, Organization, and Exploration

Abstract Details

2014, Doctor of Philosophy, Case Western Reserve University, EECS - Computer and Information Sciences.
Electronic information in unstructured or semi-structured form in health and healthcare has been steadily generated for decades. An explosive growth has occurred since the recent adoption of electronic health records (EHRs). Textual information includes clinical notes recorded in hospitals and health-related information on the web. Such health-related textual data contains an extraordinary amount of underutilized biomedical knowledge. However, the proliferation of such data presents myriad of challenges for information retrieval and access. Manual review of protected clinical documents to find patient cohorts of interest is a time-consuming and cumbersome task. Consumers have also been overwhelmed by the ever-growing public health information on the Internet. Traditional keyword-based search engines such as Google can return hundreds of thousands of links, though only a few may be relevant. Hence effective querying and exploring of both protected and public health data requires new approaches for information extraction, organization, and exploration. This dissertation proposes an ontology-guided approach to health information extraction, organization, and exploration. This approach allows the extraction of key information from textual data, organization in structured formats, and provision of interfaces for their effective search and exploration. This approach is applied to two independent but related domains: (1) Extracting complex epilepsy phenotypes from narrative clinical discharge summaries for effectively querying patient cohort; (2) Information organization based on extracted biomedical concepts from consumer health questions in NetWellness, an online non-profit community service providing high quality health information, for supporting effective consumer health information retrieval and exploration. For (1), a prototyping Epilepsy Data Extraction and Annotation (EpiDEA) system is developed for effective processing of discharge summaries, where patients' sex, age, epileptogenic zone, etiology, EEG pattern, current antiepileptic medication, and past antiepileptic medication are automatically extracted. Further, a system called Phenotype Exaction in Epilepsy (PEEP) is developed to extract complex epilepsy phenotypes and correlated anatomical locations from narrative discharge summaries and store them as structured information. Both EpiDEA and PEEP use an Epilepsy and Seizure Ontology (EpSO) as the primary knowledge source to perform regular expression-based epilepsy named entity recognition. A parametric and dynamic faceted search interface (PaDyF) is developed for querying the extracted epilepsy data. PaDyF combines the benefits of faceted search, database query, and ontological attributes and structures for exploring clinical patient data. Evaluations against manually created reference standards show that EpiDEA achieves an overall precision of 0.936 and recall of 0.840 with an F1-measure of 0.885; PEEP achieves a precision of 0.924, recall of 0.931, and F1-measure of 0.927 for extracting epilepsy phenotypes; PEEP's performance on the extraction of correlated phenotypes and anatomical locations shows a precision of 0.852, recall of 0.859, and an F1-measure of 0.856. The evaluations demonstrate that EpiDEA is effective in extracting basic phenotypic characteristics, and PEEP is effective in extracting complex epilepsy phenotypes and correlated anatomical locations. For (2), key biomedical concepts are extracted from health questions in NetWellness and used for categorizing questions into multiple topics. A new multi-topic assignment method is introduced, combining Formal Concept Analysis (FCA) and semantic annotation using Unified Medical Language System (UMLS). A novel Conjunctive Exploratory Navigation Interface (CENI) is developed for exploring NetWellness health questions with health topics as dynamic and searchable menus, complementing keyword-based search. The effectiveness of CENI is evaluated through a comparative search-interface evaluation with crowdsourcing through Amazon Mechanical Turk (AMT), a new and valuable method to collect user evaluation data. Evaluation against manually created reference standard showed that the multi-topic assignment method attains an example-based precision of 0.849, recall of 0.774, and F1-measure of 0.782. CENI interface is comparatively evaluated against main-stream search modalities, and is favored by a nearly two to one margin over Google and other search methods.
Guo-Qiang Zhang (Advisor)
169 p.

Recommended Citations

Citations

  • Cui, L. (2014). Ontology-guided Health Information Extraction, Organization, and Exploration [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795

    APA Style (7th edition)

  • Cui, Licong. Ontology-guided Health Information Extraction, Organization, and Exploration. 2014. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795.

    MLA Style (8th edition)

  • Cui, Licong. "Ontology-guided Health Information Extraction, Organization, and Exploration." Doctoral dissertation, Case Western Reserve University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1401709795

    Chicago Manual of Style (17th edition)