Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Ontology based Querying and Integration of Heterogeneous Flat Files

Dinakar, Rohit

Abstract Details

2010, Master of Science, Ohio State University, Computer Science and Engineering.

In scientific domains, most of the measurements collected during observation periods are stored in flat files. In many cases, especially when different scientists come together from different fields to draw comprehensive conclusions, the formats of the files vary from one group to another. Integrating, querying, and retrieving data from such heterogeneous data files present a challenge. Semantic interoperability is essential in order to harmonize these datasets.

In this thesis, we describe an ontology-based system that parses, summarizes, represents and integrates heterogeneous data files stored as flat files. The test bed dataset is from the Episodic Events Great Lakes Experiment (EEGLE) project which collected over 500 MB of data in more than 1,500 objects. Existing works on querying hydrological data involve the use of relational databases and do not provide ways to query within the flat files. Hence efficient ways are required to eliminate the overhead associated with relational databases and still provide the flexibility and ease of querying that relational databases offer.

We develop an intuitive approach using ontologies to integrate and query the semi-structured data present in the flat files. The crawled data from the flat files is represented in XML using resource scripts to provide a structure and schema to it. We then create ontologies with rules using Protégé-OWL editor to semantically represent the data being observed. The ontologies are mapped with the XML data to generate records similar to relational database records. Finally, these mapped records can be queried from a custom-built interface to get the desired results.

Currently, the system that we have developed supports simple column queries, range queries and similarity queries. There is also support for keyword-based semantic queries through the Protégé-OWL editor. Our system makes use of the right tools to integrate and represent the data semantically since we intend to provide as much semantic support as possible through the use of ontologies. Since we deal with domain-specific data, the robustness of the system can only be determined by how well we support normal and semantic querying. The ontologies can be enriched semantically to extend support for complex queries.

Gagan Agrawal (Advisor)
Hakan Ferhatosmanoglu (Committee Member)
62 p.

Recommended Citations

Citations

  • Dinakar, R. (2010). Ontology based Querying and Integration of Heterogeneous Flat Files [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1285063965

    APA Style (7th edition)

  • Dinakar, Rohit. Ontology based Querying and Integration of Heterogeneous Flat Files. 2010. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1285063965.

    MLA Style (8th edition)

  • Dinakar, Rohit. "Ontology based Querying and Integration of Heterogeneous Flat Files." Master's thesis, Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1285063965

    Chicago Manual of Style (17th edition)