Systems-level characterization of complex human diseases remains as one of the biggest challenges in the post-genomic era. Information useful for mechanistic understanding of diseases comes from different types of “-omic” data, including genomic sequences, gene expression, and molecular interactions. Genome Wide Association Studies (GWAS) compare genomic sequences from healthy and affected populations to identify genetic variants that are potentially associated with diseases. Monitoring of gene expression, on the other hand, enables identification of genes that are dysregulated in the development and progression of diseases. However, since complex diseases arise from the interplay among multiple interacting factors, analyses of individual variants in isolation provide limited insights. To this end, data on molecular networks, including protein-protein interactions (PPI), provide a useful resource for uncovering the disease association of multiple molecules in the context of their biological function and interactions. In this thesis, we develop algorithms that integrate different -omic data types to provide systems-level insights into complex diseases.
We first address the problem of identifying genetic interactions among multiple functionally related variants. For this purpose, we develop algorithms to identify groups of single nucleotide polymorphisms (SNPs) that are (i) associated with the same gene and (ii) exhibit more significant association with the disease when considered together. In order to achieve this, we represent the “genotype” of a gene as a combination of a subset of SNPs within its region of interest and develop algorithms to identify the subset of SNPs that best describes the genotypic variation in the patient population. Subsequently, we focus on the problem of disease gene prioritization. We propose a novel algorithm, VAVIEN, that utilizes the topological similarity of proteins in the human PPI network to prioritize candidate disease genes that reside in linkage intervals potentially associated with the disease. Finally, we incorporate mRNA expression data into our studies and propose a set-cover based algorithm, referred as COBALT, that identifies class-specific, coordinately dysregulated subnetworks of genes, associated with the phenotype of interest. We show with comprehensive experimental studies that the proposed algorithms are very effective in generating novel insights into the systems biology of complex diseases.