Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
osu1171388702.pdf (828.44 KB)
ETD Abstract Container
Abstract Header
Supporting on-the-fly data integration for bioinformatics
Author Info
Zhang, Xuan
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702
Abstract Details
Year and Degree
2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
Abstract
The use of computational tools and on-line data knowledgebases has changed the way the biologists conduct their research. The fusion of biology and information science is expected to continue. Data integration is one of the challenges faced by bioinformatics. In order to build an integration system for modern biological research, three problems have to be solved. A large number of existing data sources have to be incorporated and when new data sources are discovered, they should be utilized right away. The variety of the biological data formats and access methods have to be addressed. Finally, the system has to be able to understand the rich and often fuzzy semantic of biological data. Motivated by the above challenges, a system and a set of tools have been implemented to support on-the-fly integration of biological data. Metadata about the underlying data sources are the backbone of the system. Data mining tools have been developed to help users to write the descriptors semi-automatically. With automatic code generation approach, we have developed several tools for bioinformatics integration needs. An automatic data wrapper generation tool is able to transform data between heterogeneous data sources. Another code generation system can create programs to answer projection, selection, cross product and join queries from flat file data. Real bioinformatics requests have been used to test our system and tools. These case studies show that our approach can reduce the human efforts involved in an information integration system. Specifically, it makes the following contributions. 1) Data mining tools allow new data sources to be understood with ease and integrated to the system on-the-fly. 2) Changes in data format are localized by using the metadata descriptors. System maintenance cost is low. 3) Users interact with our system through high-level declarative interfaces. Programming efforts are reduced. 4) Our tools process data directly from flat files and requires no database support. Data parsing and processing are done implicitly. 5) Request analysis and request execution are separated and our tools can be used in a data grid environment.
Committee
Gagan Agrawal (Advisor)
Subject Headings
Computer Science
Keywords
information integration
;
bioinformatics
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zhang, X. (2007).
Supporting on-the-fly data integration for bioinformatics
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702
APA Style (7th edition)
Zhang, Xuan.
Supporting on-the-fly data integration for bioinformatics.
2007. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702.
MLA Style (8th edition)
Zhang, Xuan. "Supporting on-the-fly data integration for bioinformatics." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1171388702
Download Count:
928
Copyright Info
© 2007, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.