Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent

Abstract Details

2020, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
We introduce a virtual patient question-answering dialogue system, used for training medical students to interview real patients, which presents many unique opportunities for research in linguistics, speech, and dialogue. Among the most challenging research topics at this point in the system’s development are issues relating to scarcity of training data. We address three main problems. The first challenge is that many questions are very rarely asked of the virtual patient, which leaves little data to learn adequate models of these questions. We validate one approach to this problem, which is to combine a statistical question classification model with a rule-based system, by deploying it in an experiment with live users. Additional work further improves rare question performance by utilizing a recurrent neural network model with a multi-headed self-attention mechanism. We contribute an analysis of the reasons for this improved performance, highlighting specialization and overlapping concerns in independent components of the model. Another data scarcity problem for the virtual patient project is the challenge of adequately characterizing questions that are deemed out-of-scope. By definition, these types of questions are infinite, so this problem is particularly challenging. We contribute a characterization of the problem as it manifests in our domain, as well as a baseline approach to handling the issue, and an analysis of the corresponding improvement in performance. Finally, we contribute a method for improving performance of domain-specific tasks such as ours, which use off-the-shelf speech recognition as inputs, when no in-domain speech data is available. This method augments text training data for the downstream task with inferred phonetic representations, to make the downstream task tolerant of speech recognition errors. We also see performance improvements from sampling simulated errors to replace the text inputs during training. Future enhancements to the spoken dialogue capabilities of the virtual patient are also considered.
Eric Fosler-Lussier, PhD (Advisor)
Michael White, PhD (Committee Member)
Yu Su, PhD (Committee Member)
167 p.

Recommended Citations

Citations

  • Stiff, A. (2020). Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306

    APA Style (7th edition)

  • Stiff, Adam. Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306.

    MLA Style (8th edition)

  • Stiff, Adam. "Mitigation of Data Scarcity Issues for Semantic Classification in a Virtual Patient Dialogue Agent." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1591007163243306

    Chicago Manual of Style (17th edition)