Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Taxonomy Extraction from Wikipedia

Abstract Details

2011, Master of Science (MS), Ohio University, Computer Science (Engineering and Technology).

Much work has been done on extracting taxonomies from Wikipedia. After reviewing previous work, we find that the previous evaluations do not give a clear sense of the systems' recall. In order to enable a more thorough evaluation, we design an algorithm to generate category subgraphs that are rooted at 10 selected categories from near the top of Wikipedia. The category subgraphs preserve the distribution of the original descendant categories and articles as much as possible by generating random paths in the Wikipedia category graphs that start at the root category and end with a random descendant article. With the exception of the root node in the 10 subgraphs, each node is manually annotated for is-a and instance-of relations with respect to its parent as well as the root node. The newly created datasets enable a more consistent evaluation of taxonomy mining systems.

We also propose a set of relation extraction systems which are designed for two major types of relations: flat relations (node-to-root relations) and hierarchical relations (node-to-parent relations). The taxonomic relation extraction systems are trained and evaluated on the new datasets, exploiting the structure of Wikipedia through a rich set of features. The evaluation on the new datasets gives a clear sense of both the systems' recall and precision. Thus, in a 10 fold cross validation experiment, we obtain a precision of 89.5% and recall of 88.2% on the task of flat relation extraction. A similar evaluation for hierarchical relation extraction results in a precision of 91.9%, at a recall level of 95.0%.

Razvan Bunescu, Dr. (Advisor)
Cynthia Marling, Dr. (Committee Member)
Jundong Liu, Dr. (Committee Member)
Wei Lin, Dr. (Committee Member)

Recommended Citations

Citations

  • Chen, M. (2011). Taxonomy Extraction from Wikipedia [Master's thesis, Ohio University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1320342712

    APA Style (7th edition)

  • Chen, Mike. Taxonomy Extraction from Wikipedia. 2011. Ohio University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1320342712.

    MLA Style (8th edition)

  • Chen, Mike. "Taxonomy Extraction from Wikipedia." Master's thesis, Ohio University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1320342712

    Chicago Manual of Style (17th edition)