Text-Based Speech Video Synthesis from a Single Face Image

Zheng, Yilin

Keyword Search

School Logo

MS_Thesis.pdf (27.33 MB)

Text-Based Speech Video Synthesis from a Single Face Image

Author Info

Zheng, Yilin

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788

Year and Degree

2019, Master of Science, Ohio State University, Electrical and Computer Engineering.

Abstract

Speech video synthesis is a task to generate talking characters which look realistic to human evaluators. Previously, most of the studies used animation models and required speech audio as input. Recent advances in Generative Adversarial Networks (GAN) have made the modification of real person images possible. Instead of using audio signals which already have temporal information, in this work we use text directly. Our system has three modules: First, through an encoder-decoder Recurrent Neural Network (RNN) and a new training scheme, we transfer the text into Action Unit (AU) activation intensities and 3D head movements. Second, using a conditional GAN, we synthesize new images with facial configuration corresponding to AU activations. Third, 3D rotated images with the corresponding head movements are generated to help improve the visualization.

Committee

Aleix Martinez (Advisor)
Yingbin Liang (Committee Member)

Pages

35 p.

Subject Headings

Computer Engineering; Computer Science

Keywords

Face Image Synthesis, Generative Adversarial Network, Recurrent Neural Network

Zheng, Y. (2019). Text-Based Speech Video Synthesis from a Single Face Image [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788
APA Style (7th edition)
Zheng, Yilin. Text-Based Speech Video Synthesis from a Single Face Image. 2019. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788.
MLA Style (8th edition)
Zheng, Yilin. "Text-Based Speech Video Synthesis from a Single Face Image." Master's thesis, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788
Chicago Manual of Style (17th edition)

Document number:

osu1572168353691788

Download Count:

350

Copyright Info

Text-Based Speech Video Synthesis from a Single Face Image by Yilin Zheng is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Text-Based Speech Video Synthesis from a Single Face Image

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Text-Based Speech Video Synthesis from a Single Face Image

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Recommended Citations