Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
27973.pdf (7.42 MB)
ETD Abstract Container
Abstract Header
A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment
Author Info
Aggarwal, Dippy
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495
Abstract Details
Year and Degree
2017, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
Abstract
Graph-based solutions are receiving significant attention recently for two reasons: (1) their ability to capture relationships as first class elements in interconnected domains, and (2) their inherent resemblance to real-world scenarios such as social networks. In our work, we investigate the use of graph databases towards two schema management challenges: (a) impact assessment of schema evolution in a data warehouse environment, and (b) schema mapping and integration. We leverage the explicit capture of relationships to address these challenges. Data warehouses are a schema-rich, multi-layered environment consisting of many inter-related artifacts.The artifacts include operational, reconciled and warehouse schemas connected by ETL mappings and a set of queries expressed against the schemas. If a user seeks to make a schema change at any level in the architecture, he or she may not be aware of the other artifacts potentially impacted by the change. Previous approaches to data warehouse schema evolution have focused on proposing algorithms for propagating the impact of the change to the related artifacts in an automated manner. While these contributions ease the task for the user by providing a programmatic way of adapting related artifacts, they do not provide support for detailing the potential impact. We focus on defining and implementing a graph-based model for impact assessment and explanation.Impact assessment involves identification of the artifacts that depend on the evolved artifact either directly or transitively. The consequences of the change are revealed before actually propagating the change.Our work also allows changes to all schema artifacts in a multi-layered data warehouse architecture, thus addressing multiple evolution problems under one framework. The current contributions are restricted in that they do not address changes to all schema components in the warehouse architecture. We leverage provenance to facilitate user's understanding for the identified impact. Along with presenting a list of artifacts that will be potentially impacted by the change, we provide a complete trace of how the evolved artifact and the impacted artifacts are related to each other. The schemas in our work follow the relational paradigm, ETL workflows are described using a leading commercial business tool, Pentaho, and the queries are expressed using SQL. We present our framework, illustrate the supporting conceptual model, detail the modeling challenges, and demonstrate the viability of our approach using a case study.In the context of the second domain of interest (schema mapping and integration) of our work, we describe a system that supports schema integration based on graph databases. Our work first looks at leveraging a graph-based solution for schema mapping. Specifically, we illustrate how schemas expressed in relational and RDF models can be transformed to a property graph to provide an information-preserving, NoSQL-compliant, standardization model for schemas expressed in heterogeneous models. We further extend the work by contributing a schema merging algorithm for property graphs. We consider some concrete examples from the literature to highlight how our framework supports integration over property graphs. We illustrate a modular framework that can be further extended and optimized to incorporate different schema mapping and merging algorithms.
Committee
Karen Davis, Ph.D. (Committee Chair)
Raj Bhatnagar, Ph.D. (Committee Member)
Hsiang-Li Chiang, Ph.D. (Committee Member)
Ali Minai, Ph.D. (Committee Member)
Carla Purdy, Ph.D. (Committee Member)
Pages
388 p.
Subject Headings
Computer Science
Keywords
data warehouses
;
Impact assessment
;
provenance
;
schema evolution
;
pentaho
;
etl
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Aggarwal, D. (2017).
A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment
[Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495
APA Style (7th edition)
Aggarwal, Dippy.
A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment.
2017. University of Cincinnati, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495.
MLA Style (8th edition)
Aggarwal, Dippy. "A Provenance-based Approach Towards Impact Assessment of Schema Changes in a Data Warehouse Environment." Doctoral dissertation, University of Cincinnati, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1510061814494495
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1510061814494495
Download Count:
7,433
Copyright Info
© 2017, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.