Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Scalable Clustering of Modern Networks

Satuluri, Venu M.

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Graphs (or networks) provide a simple yet powerful model for capturing the interactions or relationships between the entities in many different domains. This dissertation focusses on a fundamental analytical or data management primitive related to graphs - the discovery of natural groups or clusters from a graph, or graph clustering in short. Due to the representational versatility of graphs, this problem has varied applications - the discovery of protein complexes from protein interaction networks, data partitioning for distributed computing, community discovery in online social networks, general data clustering (via similarity-weighted graphs), image segmentation, optimizing transistor layout for VLSI design and so on.

The increasing scale, complexity and diversity of modern (graph) data has rendered classical solutions for this task, such as spectral clustering, either very slow, incomplete or inaccurate. I address this situation in this thesis by approaching the graph clustering problem from two directions - in one direction, I develop algorithms for the core clustering task; in the second direction, I build intelligent pre-processing strategies that transform the graph to enable fast and accurate clustering subsequently. Such a two-pronged strategy allows us to concentrate on the novel aspects of new variants of the clustering problem, while exploiting existing solutions where we can. In the direction of algorithms for the core clustering task, my contribution to the state-of-the-art is a novel algorithm, MLR-MCL, based on the multi-level simulation of stochastic flows. MLR-MCL is a significantly enhanced version of a popular existing algorithm called MCL, and is fast, accurate, noise-tolerant and allows easy adjustment of the balance in the output cluster sizes. In the direction of intelligent pre-processing strategies, I have made three important contributions: (i) Graph sparsification algorithms that clarify the cluster structure and thereby speed-up subsequent clustering, by upto 50x; (ii) Algorithms that symmetrize directed graphs into similarity-weighted, undirected graphs suitable for subsequent clustering; (iii) Novel hashing-based algorithms that efficiently convert general non-graph data into similarity-weighted graphs suitable for clustering.

Srinivasan Parthasarathy (Advisor)
Gagan Agrawal (Committee Member)
Eric Fosler-Lussier (Committee Member)
181 p.

Recommended Citations

Citations

  • Satuluri, V. M. (2012). Scalable Clustering of Modern Networks [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332477695

    APA Style (7th edition)

  • Satuluri, Venu. Scalable Clustering of Modern Networks. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1332477695.

    MLA Style (8th edition)

  • Satuluri, Venu. "Scalable Clustering of Modern Networks." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332477695

    Chicago Manual of Style (17th edition)