Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery

Sahoo, Satya Sanket

Abstract Details

2010, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.

Provenance metadata, describing the history or lineage of an entity, is essential for ensuring data quality, correctness of process execution, and computing trust values. Traditionally, provenance management issues have been dealt with in the context of workflow or relational database systems. However, existing provenance systems are inadequate to address the requirements of an emerging set of applications in the new eScience or Cyberinfrastructure paradigm and the Semantic Web. Provenance in these applications incorporates complex domain semantics on a large scale with a variety of uses, including accurate interpretation by software agents, trustworthy data integration, reproducibility, attribution for commercial or legal applications, and trust computation. In this dissertation, we introduce the notion of “semantic provenance” to address these requirements for eScience and Semantic Web applications.

In addition, we describe a framework for management of semantic provenance by addressing the three issues of, (a) provenance representation, (b) query and analysis, and (c) scalable implementation. First, we introduce a foundational model of provenance called Provenir to serve as an upper-level reference ontology to facilitate provenance interoperability. Second, we define a classification scheme for provenance queries based on the query characteristics and use this scheme to define a set of specialized provenance query operators. Third, we describe the implementation of a highly scalable query engine to support the provenance query operators, which uses a new class of materialized views based on the Provenir ontology, called Materialized Provenance Views (MPV), for query optimization.

We also define a novel provenance tracking approach called Provenance Context Entity (PaCE) for the Resource Description Framework (RDF) model used in Semantic Web applications. PaCE, defined in terms of the Provenir ontology, is an effective and scalable approach for RDF provenance tracking in comparison to the currently used RDF reification vocabulary. Finally, we describe the application of the semantic provenance framework in biomedical and oceanography research projects.

Amit Sheth, PhD (Advisor)
Krishnaprasad Thirunarayan, PhD (Committee Member)
Michael Raymer, PhD (Committee Member)
Nicholas Reo, PhD (Committee Member)
Olivier Bodenreider, PhD (Committee Member)
William York, PhD (Committee Member)
130 p.

Recommended Citations

Citations

  • Sahoo, S. S. (2010). Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery [Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1282847715

    APA Style (7th edition)

  • Sahoo, Satya. Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery. 2010. Wright State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1282847715.

    MLA Style (8th edition)

  • Sahoo, Satya. "Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery." Doctoral dissertation, Wright State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1282847715

    Chicago Manual of Style (17th edition)