Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Extended Multidimensional Conceptual Spaces in Document Classification

Abstract Details

2008, MS, University of Cincinnati, Engineering : Computer Science.
Content based retrieval, entailing knowledge representation, can be essentially described as assessing the similarity between objects which constitute primitives as their building blocks. This thesis presents a novel approach in document classification using an Extended Multidimensional Conceptual Spaces (EMCS) framework to address problem areas that require similarity assessment. As a typical problem domain, articles on Breast, Brain, and Colon Cancers were obtained from PubMed (an online service of the U.S. National Library of Medicine) as search results of binary based queries. Since not all terms carry equal discriminatory information, only those that do were identified and treated as primitives by carrying out document pre-processing. Salient weights associated to each term thereafter were assessed statistically–computing their normalized term frequencies and inverse document frequencies. The product of these frequencies were determined as ideal features for their concept category. Example documents were also preprocessed and expressed in terms of the normalized frequency values for each of their stemmed terms. Then, the ideal normalized frequencies of the selected features are compared with every term of each example documents using the degree of difference measure. This computation essentially transformed the frequencies of terms into degrees of membership and cardinality fuzzy sets. Each of the fuzzy sets from the example documents were aggregated using cardinality of fuzzy sets and transformed into co-occurrence matrix in respect to the three concepts. Similarly, fuzzy sets representing test documents were also transformed into co-occurrence matrices in respect to the concepts. Similarity was finally assessed based on a Frobenius distance measure between a core concept matrix and a test matrix. The experiment demonstrated feasibility of the framework in document classification.
Anca Ralescu, Dr. (Committee Chair)
Dan Ralescu, Dr. (Committee Member)
Chia-Yung HAN, Dr. (Committee Member)
89 p.

Recommended Citations

Citations

  • Hadish, M. (2008). Extended Multidimensional Conceptual Spaces in Document Classification [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181

    APA Style (7th edition)

  • Hadish, Mulugeta. Extended Multidimensional Conceptual Spaces in Document Classification. 2008. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181.

    MLA Style (8th edition)

  • Hadish, Mulugeta. "Extended Multidimensional Conceptual Spaces in Document Classification." Master's thesis, University of Cincinnati, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181

    Chicago Manual of Style (17th edition)