Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Non-Lattice Based Ontology Quality Assurance

Abstract Details

2019, Doctor of Philosophy, Case Western Reserve University, EECS - Computer and Information Sciences.
Biomedical ontologies and standardized terminologies play an important role in healthcare information management, extraction, and data integration. The quality of ontologies impacts its usability. One of the quality issues is not conforming lattice property, a generally applicable ontology design principle. Non-lattice structures are often indicative of anomalies in ontological systems and, as such, represent possible areas of focus for subsequent quality assurance work. Quality assurance of ontologies is an indispensable part of the terminology development cycle. This dissertation presents a non-lattice based ontology quality assurance workflow, along with involved approaches, algorithms, and applications. The general steps of non-lattice based ontology quality assurance include: (1) extracting non-lattice fragments; (2) detecting potential defects and proposing remediation suggestions; (3) reviewing and validating these suggested remediations. For (1), a general MapReduce pipeline, called MaPLE (MapReduce Pipeline for Lattice-based Evaluation), is developed for extracting non-lattice fragments in large partially ordered sets. Using MaPLE in a 30-node Hadoop local cloud, we systematically extracted non-lattice fragments in 8 SNOMED CT versions from 2009 to 2014, with an average total computing time of less than 3 hours per version. Compared with previous work, which took about 3 months, MaPLE makes it feasible not only to perform exhaustive structural analysis of large ontological hierarchies but also to systematically track structural changes between versions. Our change analysis showed that the average change rates on the non-lattice pairs are up to 38.6 times higher than the change rates of the background structure (concept nodes). For (2), two methods, NEO and Spark-MCA, are proposed. NEO is a systematic structural approach for embedding of FMA fragments into the Body Structure hierarchy to understand the structural disparity of the subsumption relationship between FMA and SNOMED CT's Body Structure hierarchy, while the is-a relation in FMA has a tree structure and the corresponding relation in Body Structure is not even a lattice. By using UMLS mappings, equivalent concepts in FMA and SNOMED CT are identified. These equivalent concepts are used as seeds to generate FMA fragments and embed them into corresponding SNOMED CT non-lattice fragments. After identifying 8,428 equivalent concepts between the collection of over 30,000 concepts in Body Structure and the collection of over 83,000 concepts in FMA using UMLS concept mapping, 2,117 ($27\%$) shared is-a relations were found. Among Body Structure's 90,465 non-lattice fragments, 65,968 ($73\%$) contained one or more is-a relations that are in SNOMED CT but not in FMA, even though they have equivalent source and target concepts. This shows that SNOMED CT may be more liberal in classifying a relation as is-a, a potential explanation for the fragments not conforming to the lattice property. Spark-MCA is a scalable approach for evaluating the semantic completeness of large ontologies, such as SNOMED CT. SNOMED CT contents are formulated into an FCA-based formal context, in which SNOMED CT concepts are used for extents, while their attributes are used as intents. After applying Spark-MCA on the 201403 US edition of SNOMED CT to exhaustively compute all the formal concepts and subconcept relationships in about 2 hours with 96 processors using Amazon Web Service Cluster, a total of 799,868 formal concepts are found, with 500,583 not contained in the 201403 release. By comparing these concepts with the cumulative addition of 22,687 concepts from 5 ``delta'' files from 201403 release to 201609 release, a total of 3,231 matched concepts are found between those suggested by FCA and those from cumulative concept addition by the SNOMED CT Editorial Panel. This result provides evidence that Spark-MCA approach could be helpful for enhancing the semantic completeness of SNOMED CT. For (3), a feature-rich web-based interactive graph-visualization engine called WINS is presented, for supporting non-lattice based quality assurance work of SNOMED CT. A facets-based interface is designed for easy querying desired non-lattice subgraphs. MongoDB is used for large sets of concepts, relationships, and subgraphs and complex query requirements. An interactive visualization interface is created by leveraging D3.js. A total of 14 versions of SNOMED CT US edition, from the March 2012 version to the Sept 2018 version, with about 170,000 subgraphs in each version, are extracted and imported into WINS. Two non-lattice based OQA works are also mentioned to demonstrate the important role of WINS in analyzing and reviewing non-lattice subgraphs.
Guo-Qiang Zhang (Advisor)
Kenneth Loparo (Committee Chair)
Xu Rong (Committee Member)
Li Pan (Committee Member)
145 p.

Recommended Citations

Citations

  • Zhu, W. (2019). Non-Lattice Based Ontology Quality Assurance [Doctoral dissertation, Case Western Reserve University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=case1558509364811856

    APA Style (7th edition)

  • Zhu, Wei. Non-Lattice Based Ontology Quality Assurance. 2019. Case Western Reserve University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=case1558509364811856.

    MLA Style (8th edition)

  • Zhu, Wei. "Non-Lattice Based Ontology Quality Assurance." Doctoral dissertation, Case Western Reserve University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1558509364811856

    Chicago Manual of Style (17th edition)