Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Erron: A Phrase-Based Machine Translation Approach to Customized Spelling Correction

Hovermale, DJ

Abstract Details

2011, Doctor of Philosophy, Ohio State University, Linguistics.
Spellcheckers such as ASPELL and Microsoft Word 2007 were designed to correct the spelling errors of native writers of English (NWEs). While they are widely used and fairly effective at this task, they do not perform well when used to correct the spelling errors in English text written by Japanese writers of English (JWEFLs). The first contribution of this thesis is a comprehensive analysis of the English spelling errors of Japanese writers of English as a foreign language (JWEFLs) with the goal of finding differences from NWE spelling errors that would cause such poor spellchecker performance. In addition to describing the patterns of characteristic JWEFL errors I also provide hypotheses for why each error is observed. I find that JWEFL errors are 2-3 times more likely to have a mistake in the first letter and 4-5 times more likely to have more than one mistake in them than NWE spellings errors. These facts make it very difficult for widely-used English spellcheckers to correct JWEFL errors, because they are designed to exploit patterns in NWE spelling errors which simply are not present in Japanese learner text. I assert, however, that while JWEFL spelling errors do not have the same patterns as NWE errors, they do display patterns that can be exploited to make a custom spellchecker for JWEFL users. This thesis describes the creation of a spellchecker customized for Japanese writers of English as a foreign language. Traditionally, spellcheckers have used edit distance to determine which words to include in a list of suggestions for the user. At its core, this method transforms the word on a letter-by-letter basis, using statistics and heuristics collected by examining large corpora of English spelling errors to determine a cost for transforming the misspelling into each word in the dictionary (word list). This does not work particularly well for JWEFL text, because they often substitute strings of varying length as a single unit. When calculating a cost for these substitutions on a letter-by-letter level this results in a very high cost, and therefore the correctly spelled word does not have a low enough final cost to be included in the final suggestion list. Recent spellchecking methods for English, however, have seen better results using generic string to string edits, allowing multiple letters to be transformed at the same time at a reduced cost (Brill and Moore, 2000), (Toutanova and Moore, 2002). Boyd (2008) implements these methods for JWEFL text and reports success. The second major contribution of this doctoral thesis is the novel use of tools from phrase-based machine translation to create a spellchecker customized for Japanese writers of English as a foreign language. Early efforts in statistical machine translation (Brown, 1993) translated sentences on a word-by-word basis. Competitive translation systems now employ a phrase-based method, which attempts to translate on a phrase-by-phrase basis. I use Moses (Koehn, 2007), a state-of-the-art phrase-based machine translation system, to create a spellchecker, Erron, which is customized to correct the spelling errors of JWEFL text. The system has two components, a letter-based model, Erron_LTR, and a pronunciation-based model, Erron_PRON. Erron_LTR outperforms ASPELL and Microsoft Word 2007 by roughly 30% in the 1-best case and Erron_PRON outperforms both ASPELL and MS Word 2007 by roughly 20%. In addition, Erron_PRON has nearly 20% better performance than a state-of-the-art pronunciation-based spellchecker. These two models are combined to create a combined model, Erron_CMB, which performs 9% better than the combined from Boyd (2008).
Chris Brew, Ph.D. (Advisor)
Mike White, Ph.D. (Committee Member)
William Schuler, Ph.D. (Advisor)
Eric Fosler-Lussier, Ph.D. (Committee Member)
135 p.

Recommended Citations

Citations

  • Hovermale, D. (2011). Erron: A Phrase-Based Machine Translation Approach to Customized Spelling Correction [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322667721

    APA Style (7th edition)

  • Hovermale, DJ. Erron: A Phrase-Based Machine Translation Approach to Customized Spelling Correction. 2011. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1322667721.

    MLA Style (8th edition)

  • Hovermale, DJ. "Erron: A Phrase-Based Machine Translation Approach to Customized Spelling Correction." Doctoral dissertation, Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1322667721

    Chicago Manual of Style (17th edition)