Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Text-Based Speech Video Synthesis from a Single Face Image

Abstract Details

2019, Master of Science, Ohio State University, Electrical and Computer Engineering.
Speech video synthesis is a task to generate talking characters which look realistic to human evaluators. Previously, most of the studies used animation models and required speech audio as input. Recent advances in Generative Adversarial Networks (GAN) have made the modification of real person images possible. Instead of using audio signals which already have temporal information, in this work we use text directly. Our system has three modules: First, through an encoder-decoder Recurrent Neural Network (RNN) and a new training scheme, we transfer the text into Action Unit (AU) activation intensities and 3D head movements. Second, using a conditional GAN, we synthesize new images with facial configuration corresponding to AU activations. Third, 3D rotated images with the corresponding head movements are generated to help improve the visualization.
Aleix Martinez (Advisor)
Yingbin Liang (Committee Member)
35 p.

Recommended Citations

Citations

  • Zheng, Y. (2019). Text-Based Speech Video Synthesis from a Single Face Image [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788

    APA Style (7th edition)

  • Zheng, Yilin. Text-Based Speech Video Synthesis from a Single Face Image. 2019. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788.

    MLA Style (8th edition)

  • Zheng, Yilin. "Text-Based Speech Video Synthesis from a Single Face Image." Master's thesis, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788

    Chicago Manual of Style (17th edition)