Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM

Abstract Details

2019, Master of Science in Software Engineering, Cleveland State University, Washkewicz College of Engineering.
With advances of internet computing and a great success of social media websites, internet is exploded with a huge number of digital images. Nowadays searching appropriate images directly through search engines and the web is trending. However, automatically finding images relevant to a textual query content remains a very challenging task. Visual Question Answering (VQA) system has emerged as a significant multidisciplinary research problem. The research combines methodologies from the different areas like natural language processing, image recognition and knowledge representation. The main challenges for developing such a VQA system is to deal with the scalability of the solution and handling features of the objects in vision and questions in a natural language simultaneously. Prior works have been done to develop models for VQA by extracting and combining image features using Convolution Neural Network (CNN) and textual features using Recurrent Neural Network (RNN). This thesis explores methodologies to build a Visual Question Answering (VQA) system that can automatically identify and answer a question about the image presented to it. The VQA system uses methods of deep Residual Network (ResNet), an advanced Convolution Neural Network (CNN) model for image identification, and Long Short-Term Memory (LSTM) networks, which is advanced form of Recurring Neural Network (RNN) for Natural Language Processing (NLP) to analyze a user-provided question. Finally, the features from both an image and a user question are combined to indicate an attention area to focus on to identify objects in the area of the image in deep residual network, to produce an answer in text. When evaluated on the well-known challenging COCO data set and VQA 1.0 dataset, this system has produced an accuracy of 59%, with a 12% increase when compared with a baseline model without the attention-based technique and the results also show comparable performance to other existing state-of-the-art attention-based approaches in the literature. The quality and the accuracy of the method used in this research are compared and analyzed.
Sunnie Chung (Advisor)
Wenbing Zhao (Committee Member)
Yongjian Fu (Committee Member)

Recommended Citations

Citations

  • Popli, L. (2019). AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM [Master's thesis, Cleveland State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068

    APA Style (7th edition)

  • Popli, Labhesh. AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM. 2019. Cleveland State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068.

    MLA Style (8th edition)

  • Popli, Labhesh. "AN ATTENTION BASED DEEP NEURAL NETWORK FOR VISUAL QUESTION ANSWERING SYSTEM." Master's thesis, Cleveland State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=csu1579015180507068

    Chicago Manual of Style (17th edition)