Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms

Abstract Details

2018, Master of Science, University of Toledo, Electrical Engineering.
Real-time voice recognition and environmental sound detection play an important role in the fields of security, home control systems, robotics, and speech forensics. The advantages and its potential need in these industries have been a great motivation behind this work. The task of voice recognition and environmental sound detection is challenging due to high variability in sound signals. Furthermore, the presence of environmental noise makes the task of recognition even more difficult. Various methods and architectures have been introduced for both voice and sound recognition till date. However, due to some limitations in these architectures, we came up with two di fferent architectures for both voice recognition and background sound detection. Through these architectures, we try to overcome the limitations seen in the previous architectures proposed by various researchers. In this work for environmental sound detection, we present a real-time method in which features are extracted using standard signal processing techniques and classification is done using the standard ML based classi fier. The extracted features are time domain features like ZCR and STE and frequency domain features like SC, SR, and SF. The Pitch was determined using Average Magnitude Di fference Function (AMDF). For the classifi cation, we used some robust and accurate ML techniques like SVM, RF, and DNN. Similarly, for voice recognition, we present a novel pipelined real-time end-to-end voice recognition architecture that enhances the performance of voice recognition by exploiting the advantages of GF and CNN. This architecture has been developed to provide a voice-user interface and aid in voice-based authentication and integration with an existing NLP system. Gaining secure access to existing NLP systems also served as one of the primary goals. Initially, in this work, we identify challenges related to real-time voice recognition and highlight the up-to-date research in the field. Further, we analyze the functional requirements of a voice recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, our work discusses the effect of diff erent mechanisms such as CNN, GF, and statistical parameters in feature extraction. For the classi fication, standard classi fiers such as SVM, RF, and DNN are investigated. To verify the validity and eff ectiveness of the proposed architecture, we compared di fferent parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture.
Vijay Devabhaktuni (Committee Chair)
Ahmad Javaid (Committee Co-Chair)
Richard Molyet (Committee Member)
88 p.

Recommended Citations

Citations

  • Dhakal, P. (2018). Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms [Master's thesis, University of Toledo]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278

    APA Style (7th edition)

  • Dhakal, Parashar. Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms. 2018. University of Toledo, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278.

    MLA Style (8th edition)

  • Dhakal, Parashar. "Novel Architectures for Human Voice and Environmental Sound Recognition using Machine Learning Algorithms." Master's thesis, University of Toledo, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278

    Chicago Manual of Style (17th edition)