Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Linguistically Motivated Features for CCG Realization Ranking

Rajkumar, Rajakrishnan P.

Abstract Details

2012, Doctor of Philosophy, Ohio State University, Linguistics.

Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface realization. This thesis is about designing novel, linguistically motivated features for surface realization (the final NLG module mentioned above), the process by which text is created from an abstract representation of language according to the rules of syntax and morphology. It primarily involves three interrelated problems: constituent ordering, inflection and agreement and function word insertion. For addressing these problems, most state-of-the-art realization ranking models (Velldal and Oepen, 2005; White and Rajkumar,2009) employ features which are based on very basic insights from linguistic theory (POS tags, rules derived from parse trees, for example). More sophisticated insights of linguistic theory have not been widely perceived as necessary for increased system performance, with very basic insights providing the most gains (similar to the situation Johnson (2009) describes in the context of natural language parsing).

In contrast, our goal is to design features motivated by insights from theoretical linguistics and also based on cognitively plausible accounts of language comprehension discussed in the linguistics literature, so that the realization ranking model can better approximate human judgements of fluency and acceptability. We show that the minimal dependency length theory (Gibson, 1998; Temperley, 2007) helps with the constituent ordering prob- lem in surface realization. For the problem of generating correct inflected word forms, we demonstrate that a machine learning-based approach is well-suited to encode insights from the theoretical linguistics literature on English agreement (Kathol, 1999; Pollard and Sag,1994). This approach leads to improvements over a competitive baseline model containing n-gram and parsing features (of the kind described in Johnson, 2009). Finally, we demonstrate empirically that the uniform information density principle discussed in (Jaeger, 2010) contributes towards the that-complementizer choice in the context of surface realization.

Michael White (Advisor)
Peter Culicover (Committee Member)
William Schuler (Committee Member)
167 p.

Recommended Citations

Citations

  • Rajkumar, R. P. (2012). Linguistically Motivated Features for CCG Realization Ranking [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339673754

    APA Style (7th edition)

  • Rajkumar, Rajakrishnan. Linguistically Motivated Features for CCG Realization Ranking. 2012. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1339673754.

    MLA Style (8th edition)

  • Rajkumar, Rajakrishnan. "Linguistically Motivated Features for CCG Realization Ranking." Doctoral dissertation, Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339673754

    Chicago Manual of Style (17th edition)