Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Supporting on-the-fly data integration for bioinformatics

Abstract Details

2007, Doctor of Philosophy, Ohio State University, Computer and Information Science.
The use of computational tools and on-line data knowledgebases has changed the way the biologists conduct their research. The fusion of biology and information science is expected to continue. Data integration is one of the challenges faced by bioinformatics. In order to build an integration system for modern biological research, three problems have to be solved. A large number of existing data sources have to be incorporated and when new data sources are discovered, they should be utilized right away. The variety of the biological data formats and access methods have to be addressed. Finally, the system has to be able to understand the rich and often fuzzy semantic of biological data. Motivated by the above challenges, a system and a set of tools have been implemented to support on-the-fly integration of biological data. Metadata about the underlying data sources are the backbone of the system. Data mining tools have been developed to help users to write the descriptors semi-automatically. With automatic code generation approach, we have developed several tools for bioinformatics integration needs. An automatic data wrapper generation tool is able to transform data between heterogeneous data sources. Another code generation system can create programs to answer projection, selection, cross product and join queries from flat file data. Real bioinformatics requests have been used to test our system and tools. These case studies show that our approach can reduce the human efforts involved in an information integration system. Specifically, it makes the following contributions. 1) Data mining tools allow new data sources to be understood with ease and integrated to the system on-the-fly. 2) Changes in data format are localized by using the metadata descriptors. System maintenance cost is low. 3) Users interact with our system through high-level declarative interfaces. Programming efforts are reduced. 4) Our tools process data directly from flat files and requires no database support. Data parsing and processing are done implicitly. 5) Request analysis and request execution are separated and our tools can be used in a data grid environment.
Gagan Agrawal (Advisor)

Recommended Citations

Citations

  • Zhang, X. (2007). Supporting on-the-fly data integration for bioinformatics [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702

    APA Style (7th edition)

  • Zhang, Xuan. Supporting on-the-fly data integration for bioinformatics. 2007. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702.

    MLA Style (8th edition)

  • Zhang, Xuan. "Supporting on-the-fly data integration for bioinformatics." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1171388702

    Chicago Manual of Style (17th edition)