Unravel the Geometry and Topology behind Noisy Networks

Tian, Minghao

Keyword Search

School Logo

minghao_PhD_Dissertation.pdf (2.66 MB)

Unravel the Geometry and Topology behind Noisy Networks

Author Info

Tian, Minghao

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1608757024561299

Year and Degree

2020, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.

Abstract

Graphs and network data are ubiquitous across a wide spectrum of scientific and application domains. Often in practice, an input graph can be considered as an observed snapshot of a (potentially continuous) hidden domain or process. Subsequent analysis, processing, and inferences are then performed on this observed graph. In this dissertation, we advocate the perspective that an observed graph is often a noisy version of some discretized 1-skeleton of a hidden domain, and specifically we propose the following network model $\widehat{G}_{\mathcal{X}}$ (called ER-perturbed random geometric graphs or noisy random geometric graphs): we assume that there is a true graph $G^*_{\mathcal{X}}$ which is a certain proximity graph for points sampled from a hidden domain $\mathcal{X}$; while the observed graph $\widehat{G}_{\mathcal{X}}$ is an Erd\H{o}s--R\'enyi type perturbed version of $G^*_{\mathcal{X}}$. We aim to analyze two aspects of this type of model --- geometry and topology. The geometric problem we aim to solve is to recover the metric structure of the hidden domain $\mathcal{X}$ from the observed graph $\widehat{G}_{\mathcal{X}}$, which is orthogonal to the usual studies of network models (which often focuses on characterizing / predicting behaviors and properties of real-world networks). Two methods are proposed in this dissertation to recover the metric structure of $\mathcal{X}$ from $\widehat{G}_{\mathcal{X}}$: Jaccard-filtering process based on the Jaccard (similarity) index, and edge-clique-filtering process based on the edge clique number (a local version of clique number). We rigorously prove that by using these two simple filtering processes, with high probability, one can recover the shortest-path metric of $G^*_{\mathcal{X}}$ (which reflects the metric of $\mathcal{X}$) within a constant multiplicative factor. The topological problem we study is to estimate the clique number of noisy random geometric graphs $\widehat{G}_{\mathcal{X}}$. Clique number is one of the most important graph / topological quantities in both network analysis and graph theory. With the help of our newly proposed approach called ``well-separated clique-partitions family'', which is utilized to decouple the mingled randomness of $\widehat{G}_{\mathcal{X}}$ generated from both geometry (geometric graph $G^*_{\mathcal{X}}$) and Erd\H{o}s--R\'enyi perturbation, we give asymptotic tight bounds of the clique number of $\widehat{G}_{\mathcal{X}}$ under a variety of parameter settings (different regimes). Finally, we notice that many popular graph featurization / vectorization methods widely used in network comparison and classification are based on local substructures. However, as far as we know, the theoretical understanding of their topology is rather limited. We initiate the study of local topology of random networks by analyzing the statistical properties of a topological summary called \emph{\textbf{d}istance-to-root-induced \textbf{ex}tended \textbf{per}sistence diagram (\text{dexPer} diagram)} of local rooted subgraphs in Erd\H{o}s--R\'enyi random graphs. Specifically, we show a limit theorem and a convergence result for \text{dexPer} diagrams associated to the 1-ring rooted subgraphs in Erd\H{o}s--R\'enyi random graphs. Inspired by the theoretical results, we propose our local-persistence-based distribution-wise featurization for network datasets. We showcase the strength of our approach in two tasks: network comparison / clustering and network classification. For clustering task, our empirical study on synthetic dataset shows that our approach can always group graphs generated from the same generative model correctly, and form more meaningful high-level clusters than the state-of-the-art methods do. For network classification task, we apply our simple approach on a collection of modern real-world networks, where its performance is comparable to other state-of-the-art methods.

Committee

Yusu Wang (Advisor)
Srinivasan Parthasarathy (Advisor)
Pooya Hatami (Committee Member)
David Sivakoff (Committee Member)

Pages

268 p.

Subject Headings

Computer Science

Keywords

Random Graph Theory; Topological Data Analysis

Tian, M. (2020). Unravel the Geometry and Topology behind Noisy Networks [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1608757024561299
APA Style (7th edition)
Tian, Minghao. Unravel the Geometry and Topology behind Noisy Networks. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1608757024561299.
MLA Style (8th edition)
Tian, Minghao. "Unravel the Geometry and Topology behind Noisy Networks." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1608757024561299
Chicago Manual of Style (17th edition)

Document number:

osu1608757024561299

Download Count:

146

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Unravel the Geometry and Topology behind Noisy Networks

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Unravel the Geometry and Topology behind Noisy Networks

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations