With an explosion in database sizes, there is an increasing need for mining relevant information from them. Subspace clustering has been applied in various fields for discovering patterns, and many such
algorithms have been investigated for finding interesting biclusters from binary-valued datasets. Mining biclusters from real-valued datasets has gained significant importance in many of the recently emerging
applications. The algorithms devised for mining such biclusters generally minimize an objective function, and hence the biclusters generated by each algorithm vary depending on the objective function used.
Due to the inherent size and density of the data sets, the algorithms generate a very large number of biclusters, making it dicult to select the useful ones from among them. To overcome this problem, it
is important to design strategies to summarize these biclusters into few representatives of the main ideas embedded in the dataset. The objective of this thesis is to apply some statistical properties of the generated
biclusters to identify some distinguished clusters that seek to summarize the large number of biclusters into few representative ones.
In order to achieve the above stated objective, similarity measures based on mutual information and standard deviation d between biclusters are used to identify similar biclusters. These measures
quantify the information shared (or the similarity) between two biclusters, and this helps in identifying potential biclusters that could be merged. The algorithm has been applied to a synthetic and two real world datasets and the results are presented. The information content and the variance in a bicluster are analyzed as the biclusters are progressively merged. The methodologies proposed in this thesis are compared to a baseline method to verify the quality of the biclusters and validate that our approach performs significantly well and has good merit.