Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
MS_Thesis.pdf (27.33 MB)
ETD Abstract Container
Abstract Header
Text-Based Speech Video Synthesis from a Single Face Image
Author Info
Zheng, Yilin
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788
Abstract Details
Year and Degree
2019, Master of Science, Ohio State University, Electrical and Computer Engineering.
Abstract
Speech video synthesis is a task to generate talking characters which look realistic to human evaluators. Previously, most of the studies used animation models and required speech audio as input. Recent advances in Generative Adversarial Networks (GAN) have made the modification of real person images possible. Instead of using audio signals which already have temporal information, in this work we use text directly. Our system has three modules: First, through an encoder-decoder Recurrent Neural Network (RNN) and a new training scheme, we transfer the text into Action Unit (AU) activation intensities and 3D head movements. Second, using a conditional GAN, we synthesize new images with facial configuration corresponding to AU activations. Third, 3D rotated images with the corresponding head movements are generated to help improve the visualization.
Committee
Aleix Martinez (Advisor)
Yingbin Liang (Committee Member)
Pages
35 p.
Subject Headings
Computer Engineering
;
Computer Science
Keywords
Face Image Synthesis, Generative Adversarial Network, Recurrent Neural Network
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Zheng, Y. (2019).
Text-Based Speech Video Synthesis from a Single Face Image
[Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788
APA Style (7th edition)
Zheng, Yilin.
Text-Based Speech Video Synthesis from a Single Face Image.
2019. Ohio State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788.
MLA Style (8th edition)
Zheng, Yilin. "Text-Based Speech Video Synthesis from a Single Face Image." Master's thesis, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572168353691788
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1572168353691788
Download Count:
292
Copyright Info
© 2019, some rights reserved.
Text-Based Speech Video Synthesis from a Single Face Image by Yilin Zheng is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by The Ohio State University and OhioLINK.