Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system

Yoon, Kyuchul

Abstract Details

2005, Doctor of Philosophy, Ohio State University, Linguistics.
This dissertation describes the design and evaluation of a prosodically sensitive concatenative text-to-speech (TTS) synthesis system for Korean within the Festival TTS framework (Taylor et al., 1998). The primary task that this dissertation undertakes is to build a synthesis system that can test the idea that a speech segment is affected by its prosodic context and is subject to continuous allophonic and categorical allomorphic variation. There are three subtasks to the primary task. The first subtask is to model the allomorphic variation of Korean and to investigate the validity of using hand-written linguistically motivated morphophonological rules in the form of grapheme-to-phoneme (GTP) conversion rules. The evaluation of the implemented GTP module showed that taking advantage of linguistic knowledge could greatly reduce the amount of training material required by any machine-learning approach and that the error analysis is more informative and straightforward. The second subtask is to model positionally-conditioned allophonic variation and to motivate segmental correlates of prosodic categories with a view to designing a prosodically sensitive diphone database. From a corpus of prosodically labeled read speech, we created a prosodically sensitive diphone database, selecting four different prosodic versions of the same diphone. The last subtask is to build a model of Korean prosody, i.e., a model of phrasing, fundamental frequency contour, and duration, using a corpus that has been morpho-syntactically parsed and prosodically labeled following the K-ToBI labeling conventions (Jun, 2000, 1998 & 1993). Only the model of phrasing was implemented, trained from a set of morphosyntactic and textual distance features, and it can predict the location of accentual and intonational phrase breaks. The results of these subtasks were incorporated into the TTS system and the naturalness of the output from the system was evaluated. A listening experiment performed on eighty native speakers of Korean with stimuli synthesized from the TTS system showed that listeners preferred stimuli that were composed of prosodically appropriate diphones. We interpret this as evidence for the idea that the prosodically conditioned allophonic variation is a perceptible marker to the segmental encoding of prosodic domains.
Mary Beckman (Advisor)
291 p.

Recommended Citations

Citations

  • Yoon, K. (2005). Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1119010941

    APA Style (7th edition)

  • Yoon, Kyuchul. Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system. 2005. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1119010941.

    MLA Style (8th edition)

  • Yoon, Kyuchul. "Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system." Doctoral dissertation, Ohio State University, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=osu1119010941

    Chicago Manual of Style (17th edition)