Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ucin1227158181.pdf (1006.39 KB)
ETD Abstract Container
Abstract Header
Extended Multidimensional Conceptual Spaces in Document Classification
Author Info
Hadish, Mulugeta
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181
Abstract Details
Year and Degree
2008, MS, University of Cincinnati, Engineering : Computer Science.
Abstract
Content based retrieval, entailing knowledge representation, can be essentially described as assessing the similarity between objects which constitute primitives as their building blocks. This thesis presents a novel approach in document classification using an Extended Multidimensional Conceptual Spaces (EMCS) framework to address problem areas that require similarity assessment. As a typical problem domain, articles on Breast, Brain, and Colon Cancers were obtained from PubMed (an online service of the U.S. National Library of Medicine) as search results of binary based queries. Since not all terms carry equal discriminatory information, only those that do were identified and treated as primitives by carrying out document pre-processing. Salient weights associated to each term thereafter were assessed statistically–computing their normalized term frequencies and inverse document frequencies. The product of these frequencies were determined as ideal features for their concept category. Example documents were also preprocessed and expressed in terms of the normalized frequency values for each of their stemmed terms. Then, the ideal normalized frequencies of the selected features are compared with every term of each example documents using the degree of difference measure. This computation essentially transformed the frequencies of terms into degrees of membership and cardinality fuzzy sets. Each of the fuzzy sets from the example documents were aggregated using cardinality of fuzzy sets and transformed into co-occurrence matrix in respect to the three concepts. Similarly, fuzzy sets representing test documents were also transformed into co-occurrence matrices in respect to the concepts. Similarity was finally assessed based on a Frobenius distance measure between a core concept matrix and a test matrix. The experiment demonstrated feasibility of the framework in document classification.
Committee
Anca Ralescu, Dr. (Committee Chair)
Dan Ralescu, Dr. (Committee Member)
Chia-Yung HAN, Dr. (Committee Member)
Pages
89 p.
Subject Headings
Computer Science
Keywords
Conceptual spaces
;
Document classification
;
ranking
;
knowledge representiaton
;
Fuzzy set
;
Fuzzy sets cardinality
;
Fuzzy sets cardinality aggregation document
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Hadish, M. (2008).
Extended Multidimensional Conceptual Spaces in Document Classification
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181
APA Style (7th edition)
Hadish, Mulugeta.
Extended Multidimensional Conceptual Spaces in Document Classification.
2008. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181.
MLA Style (8th edition)
Hadish, Mulugeta. "Extended Multidimensional Conceptual Spaces in Document Classification." Master's thesis, University of Cincinnati, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1227158181
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1227158181
Download Count:
740
Copyright Info
© 2008, all rights reserved.
This open access ETD is published by University of Cincinnati and OhioLINK.