Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Hybrid Methods for Acquisition of Lexical Information: the Case for Verbs

Abstract Details

2008, Doctor of Philosophy, Ohio State University, Linguistics.

Improved automatic text understanding requires detailed linguistic information about the words that comprise the text. Particularly crucial is the knowledge about predicates, typically verbs, which communicate both the event being expressed and how participants are related to the event. Although the field of natural language processing (NLP) has yet to develop a clear consensus on guidelines for building a verb lexicon suitable for applications in NLP, class-based construction of verb lexicons (e.g. Levin verb classification) has proved beneficial to a wide range of NLP tasks in combating the pervasive problem of data sparsity. Such broad coverage dictionaries and ontologies are difficult and costly to create and maintain by hand, it is therefore desirable to learn them from distributional data, such as can be obtained from unlabeled text corpora. To this end, this thesis will primarily address the following three questions:

First, deriving Levin-style verb classifications from text corpora helps avoid the expensive hand-coding of such information, but appropriate features must be identified and demonstrated to be effective. One of our primary goals is to assess the linguistic conditions which are crucial for lexical classification of verbs. In particular, we experiment with different ways of mixing syntactic and lexical information for improved verb classification. The results show that both syntactic and lexical information are useful in automatic verb classification.

Second, Levin verb classification provides a systematic account of verb polysemy. We propose a class-based method for disambiguating Levin verbs using only untagged data. The basic working hypothesis is that verbs in the same Levin class tend to share their subcategorization patterns as well as neighboring words. In practice, information about unambiguous verbs is used to disambiguate ambiguous ones. The results suggest that this class-based method can be used in the absence of hand-tagged data.

Last, automatically created verb classifications are likely to deviate from manually created ones, therefore it is great importance to understand whether automatically acquired verb classifications can benefit the wider NLP community. We propose to integrate verb class information, automatically learned from text corpora, into a particular parsing task, PP-attachment disambiguation. The results indicate that automatically acquired verb class information helps improve the performance of PP-attachment disambiguation models by alleviating the severity of the problem of data sparsity.

Chris Brew (Advisor)
Eric Fosler-Lussier (Committee Member)
Mike White (Committee Member)
189 p.

Recommended Citations

Citations

  • Jianguo, L. (2008). Hybrid Methods for Acquisition of Lexical Information: the Case for Verbs [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1228259857

    APA Style (7th edition)

  • Jianguo, Li. Hybrid Methods for Acquisition of Lexical Information: the Case for Verbs. 2008. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1228259857.

    MLA Style (8th edition)

  • Jianguo, Li. "Hybrid Methods for Acquisition of Lexical Information: the Case for Verbs." Doctoral dissertation, Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=osu1228259857

    Chicago Manual of Style (17th edition)