Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Efficient Algorithms for Data Mining with Federated Databases

Young, Barrington R. St. A.

Abstract Details

2007, PhD, University of Cincinnati, Engineering : Computer Science.
The internet era and high speed networks have ushered in the capabilities to have ready access to large amounts of geographically distributed data. Individuals, businesses, and governments recognize the value of this available resource to those who can transform the data into information. These databases, though valuable as individual entities, become significantly more valuable when they function as parts of a federated database and their data can be aggregated for collective mining or computations. This requires data mining algorithms to shift their focus from working with single databases to efficiently working with federated databases. Any such methodology must address issues relating to the security and privacy of data, cost of communication among the database nodes and the overlap of attributes among the databases. Most of the algorithms designed for mining single databases are not easily adaptable for federated databases. We present a new perspective for mining vertically partitioned databases that balances the information loss of perturbation methods and those of secure multi-party or encryption methods. We achieve scalable and secure computation on a general vertical partitioning of an implicit global database among the explicit federated components. We use an implicit aggregation to efficiently compute first and second order statistical quantities, and inter-tuple distance based clustering for data in the federated databases. We demonstrate these by computing the covariance matrix and the k-nearest neighbor and other clustering for databases. Our computation of the covariance matrix has worst case message complexity of n(2σ)k−1 divided by k, where n is the number of tuples on the learner and each of the participating databases finds on average tuples per local query. The k-nearest neighbors can be computed with message complexity in the worst case of σk−1. The information exchanged between the nodes is minimized and consists of summaries derived from multiple tuples of the databases. Privacy of individual data tuples is preserved. Security measures can be applied to the reduced shared summaries. Our algorithms are more capable than the other research in this area in that they can handle vertical partitions of an implicit global database with an arbitrary overlap in their attribute sets.
Dr. Raj Bhatnagar (Advisor)
135 p.

Recommended Citations

Citations

  • Young, B. R. S. A. (2007). Efficient Algorithms for Data Mining with Federated Databases [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179332091

    APA Style (7th edition)

  • Young, Barrington. Efficient Algorithms for Data Mining with Federated Databases. 2007. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179332091.

    MLA Style (8th edition)

  • Young, Barrington. "Efficient Algorithms for Data Mining with Federated Databases." Doctoral dissertation, University of Cincinnati, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1179332091

    Chicago Manual of Style (17th edition)