Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen)

Rytting, Christopher Anton

Abstract Details

2007, Doctor of Philosophy, Ohio State University, Linguistics.

Many computational models have been developed to show how infants break apart utterances into words prior to building a vocabulary – the word segmentation task. However, these models have been tested in relatively few languages, with little attention paid to how different phonological structures may affect the relative effectiveness of particular statistical heuristics. Moreover, even for English, since these models generally rely on transcriptions rather than on speech for input, they have shown little regard for the subsegmental variation naturally found in the speech signal. A model using transcriptional input makes unrealistic assumptions which may overestimate the model's effectiveness, relative to how it would perform on more variable input such as that found in speech.

This dissertation addresses the first of these two issues by comparing the performance of two classes of distribution-based statistical cues on a corpus of Modern Greek, a language with a phonotactic structure significantly different from that of English, and shows how these differences change the relative effectiveness of two classes of statistical heuristics, compared to their performance in English.

To address the second issue, this dissertation proposes an improved representation of the input that preserves the subsegmental variation inherently present in natural speech while maintaining sufficient similarity with previous models to allow for straightforward, meaningful comparisons of performance. The proposed input representation uses an automatic phone classifier to replace the transcription-based phone labels in a corpus of English child-directed speech with real-valued phone probability vectors. These vectors are then used to provide input for a previously-proposed connectionist model of word segmentation, in place of the invariant, transcription-based binary input vectors used in the original model.

The performance of the connectionist model as reimplemented here suggests that real-valued inputs present a harder learning task than idealized inputs. In other words, the subsegmental variation hinders the model more than it helps. This may help explain why English-learning infants soon gravitate toward other, potentially more salient cues, such as lexical stress. However, the model still performs above chance even with very noisy input, consistent with studies showing that children can learn from distributional segmental cues alone.

Christopher Brew (Advisor)
217 p.

Recommended Citations

Citations

  • Rytting, C. A. (2007). Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen) [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1167698589

    APA Style (7th edition)

  • Rytting, Christopher. Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen). 2007. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1167698589.

    MLA Style (8th edition)

  • Rytting, Christopher. "Preserving subsegmental variation in modeling word segmentation (or, the raising of baby Mondegreen)." Doctoral dissertation, Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=osu1167698589

    Chicago Manual of Style (17th edition)