Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
dissertation-final.pdf (1.58 MB)
ETD Abstract Container
Abstract Header
Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent
Author Info
Stiff, Adam
ORCID® Identifier
http://orcid.org/0000-0002-5158-8508
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306
Abstract Details
Year and Degree
2020, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
We introduce a virtual patient question-answering dialogue system, used for training medical students to interview real patients, which presents many unique opportunities for research in linguistics, speech, and dialogue. Among the most challenging research topics at this point in the system’s development are issues relating to scarcity of training data. We address three main problems. The first challenge is that many questions are very rarely asked of the virtual patient, which leaves little data to learn adequate models of these questions. We validate one approach to this problem, which is to combine a statistical question classification model with a rule-based system, by deploying it in an experiment with live users. Additional work further improves rare question performance by utilizing a recurrent neural network model with a multi-headed self-attention mechanism. We contribute an analysis of the reasons for this improved performance, highlighting specialization and overlapping concerns in independent components of the model. Another data scarcity problem for the virtual patient project is the challenge of adequately characterizing questions that are deemed out-of-scope. By definition, these types of questions are infinite, so this problem is particularly challenging. We contribute a characterization of the problem as it manifests in our domain, as well as a baseline approach to handling the issue, and an analysis of the corresponding improvement in performance. Finally, we contribute a method for improving performance of domain-specific tasks such as ours, which use off-the-shelf speech recognition as inputs, when no in-domain speech data is available. This method augments text training data for the downstream task with inferred phonetic representations, to make the downstream task tolerant of speech recognition errors. We also see performance improvements from sampling simulated errors to replace the text inputs during training. Future enhancements to the spoken dialogue capabilities of the virtual patient are also considered.
Committee
Eric Fosler-Lussier, PhD (Advisor)
Michael White, PhD (Committee Member)
Yu Su, PhD (Committee Member)
Pages
167 p.
Subject Headings
Artificial Intelligence
;
Computer Science
;
Educational Software
;
Linguistics
Keywords
question-answering
;
dialogue agent
;
spoken dialogue
;
virtual patient
;
standardized patient
;
semantic classification
;
data scarcity
;
data sparsity
;
neural network
;
machine learning
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Stiff, A. (2020).
Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306
APA Style (7th edition)
Stiff, Adam.
Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent.
2020. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306.
MLA Style (8th edition)
Stiff, Adam. "Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1591007163243306
Download Count:
208
Copyright Info
© 2020, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.