Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment

Abstract Details

2017, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Graph-based solutions are receiving significant attention recently for two reasons: (1) their ability to capture relationships as first class elements in interconnected domains, and (2) their inherent resemblance to real-world scenarios such as social networks. In our work, we investigate the use of graph databases towards two schema management challenges: (a) impact assessment of schema evolution in a data warehouse environment, and (b) schema mapping and integration. We leverage the explicit capture of relationships to address these challenges. Data warehouses are a schema-rich, multi-layered environment consisting of many inter-related artifacts.The artifacts include operational, reconciled and warehouse schemas connected by ETL mappings and a set of queries expressed against the schemas. If a user seeks to make a schema change at any level in the architecture, he or she may not be aware of the other artifacts potentially impacted by the change. Previous approaches to data warehouse schema evolution have focused on proposing algorithms for propagating the impact of the change to the related artifacts in an automated manner. While these contributions ease the task for the user by providing a programmatic way of adapting related artifacts, they do not provide support for detailing the potential impact. We focus on defining and implementing a graph-based model for impact assessment and explanation.Impact assessment involves identification of the artifacts that depend on the evolved artifact either directly or transitively. The consequences of the change are revealed before actually propagating the change.Our work also allows changes to all schema artifacts in a multi-layered data warehouse architecture, thus addressing multiple evolution problems under one framework. The current contributions are restricted in that they do not address changes to all schema components in the warehouse architecture. We leverage provenance to facilitate user's understanding for the identified impact. Along with presenting a list of artifacts that will be potentially impacted by the change, we provide a complete trace of how the evolved artifact and the impacted artifacts are related to each other. The schemas in our work follow the relational paradigm, ETL workflows are described using a leading commercial business tool, Pentaho, and the queries are expressed using SQL. We present our framework, illustrate the supporting conceptual model, detail the modeling challenges, and demonstrate the viability of our approach using a case study.In the context of the second domain of interest (schema mapping and integration) of our work, we describe a system that supports schema integration based on graph databases. Our work first looks at leveraging a graph-based solution for schema mapping. Specifically, we illustrate how schemas expressed in relational and RDF models can be transformed to a property graph to provide an information-preserving, NoSQL-compliant, standardization model for schemas expressed in heterogeneous models. We further extend the work by contributing a schema merging algorithm for property graphs. We consider some concrete examples from the literature to highlight how our framework supports integration over property graphs. We illustrate a modular framework that can be further extended and optimized to incorporate different schema mapping and merging algorithms.
Karen Davis, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Hsiang-Li Chiang, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
388 p.

Recommended Citations

Citations

  • Aggarwal, D. (2017). A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495

    APA Style (7th edition)

  • Aggarwal, Dippy. A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment. 2017. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495.

    MLA Style (8th edition)

  • Aggarwal, Dippy. "A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment." Doctoral dissertation, University of Cincinnati, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495

    Chicago Manual of Style (17th edition)