Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT

CALENDER, CHRISTOPHER R

Abstract Details

2004, MS, University of Cincinnati, Engineering : Computer Science.
There are often times where an application needs to find the N nearest neighbors. Imagine the scene of an accident and someone needs to know all of the nearest fire and police departments. It may not be enough to only know one station; they may need to know the five nearest stations, or even more. That is where finding the N nearest neighbors is the perfect algorithm. Finding the N nearest neighbors is not a new concept, but doing this on distributed databases while reducing the traditional communication costs is new. There are existing ways to accomplish this by using a centralized method in which each individual database site transmits all of its data to one central machine, a learner machine, that then performs the necessary calculations. However, communication costs are very expensive, much more expensive than computational costs, and that is why there is a need for an algorithm that can perform finding the N nearest neighbors using distributed databases while minimizing communication costs. In this approach, more power is given to the individual database sites, and less to the central learning machine. What has been attempted is to allow the individual database sites to perform calculations on their data in hopes of gaining some kind of summary information that will allow a minimal amount of information to be transmitted to the learner machine rather than every point. And in fact, that is exactly what this newly proposed algorithm does. It has the individual database sites perform many calculations and then transmit only the relevant information to the learner machine. The learner machine also does computations and gathers information from all of the individual database sites. Once it has gathered some global information, it then sends out the important, relevant global information to each of the database sites. Then, in turn, each individual database site can return only the points that it feels will be relevant. This drastically reduces the amount of communication and, as one will see, works very well using real-world examples.
Dr. Raj Bhatnagar (Advisor)
118 p.

Recommended Citations

Citations

  • CALENDER, C. R. (2004). APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952

    APA Style (7th edition)

  • CALENDER, CHRISTOPHER. APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT. 2004. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952.

    MLA Style (8th edition)

  • CALENDER, CHRISTOPHER. "APPROXIMATE N-NEAREST NEIGHBOR CLUSTERING ON DISTRIBUTED DATABASES USING ITERATIVE REFINEMENT." Master's thesis, University of Cincinnati, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092929952

    Chicago Manual of Style (17th edition)