Heterogeneous data are multidimensional data whose attributes belong to different domains. Processing heterogeneous data has become an important problem in data mining. However, due to the heterogeneous nature of the data the task of measuring the similarity between two heterogeneous data objects has proven to be rather difficult.
There are plenty of similarity measures that apply to homogeneous data. Each of them is applicable for one data type and they were constructed based on particular properties of that corresponding data type. In principle, they should not be applied to other kinds of data.
This thesis is concerned with the issues encountered in proximity evaluation between heterogeneous data. It focuses on a particular, probability-based, method and discusses its suitability.