Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Scalable Map-Reduce Algorithms for Mining Formal Concepts and Graph Substructures

Abstract Details

2018, PhD, University of Cincinnati, Engineering and Applied Science: Computer Science and Engineering.
With evolution in distributed processing environment, many new algorithms have been pro- posed in order to overcome efficiency and scalability issues. Most of the proposed algorithms try to solve these issues by distributing data across multiple nodes in cluster and process the data using existing sequential approaches. Even though the distribution and processing of data using multiple node cluster has eliminated system resource constraints of sequential approaches but failed to address other major issues such as load balancing, duplicate data processing as well as effective partitioning of large data into independent smaller chunks for maximum scalability and efficiency. In the proposed algorithms for Formal Concept Analysis, Graph Processing and Real-Valued Bicluster generation, we have leveraged the power of multiple node cluster to eliminate re- source constraints. We have also addressed the issues of load balancing, duplicate data processing and data partitioning very efficiently. Because of efficient load balancing and data partitioning along with elimination of duplicate data processing, we are able to process same volume of data as of existing algorithms in either same or lesser time using around 1/10 th of the resources. In order to solve Formal Concept Analysis problem in distributed environment, unlike existing iterative approaches we have first generated Sufficient Set using single iteration of map-reduce. We have also demonstrated that the generated Sufficient Set is 2-hop projection of data and contains all the information needed to enumerate entire lattice. This 2-hop projection of data enabled our approach in effective load balancing and data partitioning. Since the generated sufficient set is much smaller in size compared to the original input data, it is processed on stand alone machine to selectively enumerate parts of lattice as well as entire lattice as per the requirement. In second and third problem of processing large graphs (with millions of nodes and edges) for Triangles and Maximal Cliques enumeration, we encountered the problems of hub formation (nodes with very high degree of connectivity), very large and skewed 2-hop projection (data partitioning) and duplicate data processing. In order to eliminate duplicate data processing, we triangulated the 2-hop projected data thus completely eliminating enumeration of already generated triangle. For maximal clique enumeration, we have taken triangles as seed to build maximal clique. In process of maximal clique enumeration, we further limit the redundancy in processing by dropping all the seed triangles which are included in at least one enumerated maximal clique. By performing this step, we have ensured that all the maximal cliques are enumerated and same maximal clique is not enumerated twice from 2 different seed triangle. Real Valued Biclustering is extensively used in Bio-informatics and gene expression analysis. Our algorithm for solving this problem is one of the first approach employing distributed environment. Data projection technique used in this approach is similar in nature to that of maximal clique problem. Finally we have used CHARM’s algorithm in mapper phase of our approach to process these smaller chunks of data independently to generate real valued biclusters.
Raj Bhatnagar, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Yizong Cheng, Ph.D. (Committee Member)
Anil Jegga, B.V.Sc (Committee Member)
Ali| Minai, Ph.D. (Committee Member)
172 p.

Recommended Citations

Citations

  • Kumar, L. (2018). Scalable Map-Reduce Algorithms for Mining Formal Concepts and Graph Substructures [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1543996580926452

    APA Style (7th edition)

  • Kumar, Lalit. Scalable Map-Reduce Algorithms for Mining Formal Concepts and Graph Substructures. 2018. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1543996580926452.

    MLA Style (8th edition)

  • Kumar, Lalit. "Scalable Map-Reduce Algorithms for Mining Formal Concepts and Graph Substructures." Doctoral dissertation, University of Cincinnati, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1543996580926452

    Chicago Manual of Style (17th edition)