The dissertation investigates how to characterize software at various levels of abstraction (e.g., method, class, or system). Stereotypes are a powerful semantic mechanism in UML and represent generalizations that reflect an intrinsic or atomic behavior of a method or a class. An empirical study of a number of open source systems forms the basis for a set of emergent stereotypes of the software abstractions at the various levels. A mechanism to automatically reverse engineer these stereotypes from existing systems is presented along with a means to re-document classes and methods with their corresponding stereotypes. The basis for the automatic identification of class stereotypes is the distribution of method stereotypes. Entire systems can also be characterized by the method stereotypes distribution. This work is further extended to the characterization of changes in software during evolution. Automatically classifying commits and uncovering evolution patterns of method stereotypes is done to assist developers to gain a high-level perspective of the design over a system’s evolution.
The research contribution of this work includes a taxonomic description of object-oriented method stereotypes and class stereotypes. Further contributions include leveraging the approach for method stereotypes extraction for the implementation of tools for source code re-documentation, identification of descriptors for software systems and their classifications, development of a tool for reverse engineering class stereotypes, and implementation of a tool for the semantic categorization of commits. The final contribution is the evaluation of the approach by performing empirical studies on historical data for a wide range of open source object-oriented C++ software systems that can serve as a benchmark for further investigations and studies.