Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Labhesh thesis 1.9.20.pdf (2 MB)
ETD Abstract Container
Abstract Header
AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM
Author Info
Popli, Labhesh
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068
Abstract Details
Year and Degree
2019, Master of Science in Software Engineering, Cleveland State University, Washkewicz College of Engineering.
Abstract
With advances of internet computing and a great success of social media websites, internet is exploded with a huge number of digital images. Nowadays searching appropriate images directly through search engines and the web is trending. However, automatically finding images relevant to a textual query content remains a very challenging task. Visual Question Answering (VQA) system has emerged as a significant multidisciplinary research problem. The research combines methodologies from the different areas like natural language processing, image recognition and knowledge representation. The main challenges for developing such a VQA system is to deal with the scalability of the solution and handling features of the objects in vision and questions in a natural language simultaneously. Prior works have been done to develop models for VQA by extracting and combining image features using Convolution Neural Network (CNN) and textual features using Recurrent Neural Network (RNN). This thesis explores methodologies to build a Visual Question Answering (VQA) system that can automatically identify and answer a question about the image presented to it. The VQA system uses methods of deep Residual Network (ResNet), an advanced Convolution Neural Network (CNN) model for image identification, and Long Short-Term Memory (LSTM) networks, which is advanced form of Recurring Neural Network (RNN) for Natural Language Processing (NLP) to analyze a user-provided question. Finally, the features from both an image and a user question are combined to indicate an attention area to focus on to identify objects in the area of the image in deep residual network, to produce an answer in text. When evaluated on the well-known challenging COCO data set and VQA 1.0 dataset, this system has produced an accuracy of 59%, with a 12% increase when compared with a baseline model without the attention-based technique and the results also show comparable performance to other existing state-of-the-art attention-based approaches in the literature. The quality and the accuracy of the method used in this research are compared and analyzed.
Committee
Sunnie Chung (Advisor)
Wenbing Zhao (Committee Member)
Yongjian Fu (Committee Member)
Subject Headings
Artificial Intelligence
;
Computer Engineering
;
Computer Science
;
Scientific Imaging
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Popli, L. (2019).
AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM
[Master's thesis, Cleveland State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068
APA Style (7th edition)
Popli, Labhesh.
AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM.
2019. Cleveland State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068.
MLA Style (8th edition)
Popli, Labhesh. "AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM." Master's thesis, Cleveland State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
csu1579015180507068
Download Count:
633
Copyright Info
© 2019, all rights reserved.
This open access ETD is published by Cleveland State University and OhioLINK.