Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks

Abstract Details

2018, Master of Science (M.S.), University of Dayton, Electrical Engineering.
Long Short-Term Memory (LSTM) is a powerful neural network algorithm that has been shown to provide state-of-the-art performance in various sequence learning tasks, including natural language processing, video classification, and speech recognition. Once an LSTM model has been trained on a dataset, the utility it provides comes from its ability to then infer information from completely new data. Due to the large complexity of LSTM models, the so-called inference stage of LSTM can require significant computing power and memory resources in order to keep up with a real-time workload. Many approaches have been taken to accelerate inference, from offloading computations to GPU or other specialized hardware, to reducing the number of computations and memory footprint required by compressing model parameters. This work takes a two-pronged approach to accelerating LSTM inference. First, a model compression scheme called binarization is identified to both reduce the storage size of model parameters and to simplify computations. This technique is applied to training LSTM for two separate sequence learning tasks, and it is shown to provide prediction performance comparable to the uncompressed model counterparts. Then, a digital processor architecture, called Binary Recurrent Unit (BRU), is proposed to accelerate inference for binarized LSTM models. Specifically targeted for FPGA implementation, this accelerator takes advantage of binary model weights and on-chip memory resources in order to parallelize LSTM inference computations. The BRU architecture is implemented and tested on a Xilinx Z7020 device clocked at 200 MHz. Inference computation time for BRU is evaluated against the performance of CPU and GPU inference implementations. BRU is shown to outperform CPU by as much as 39X and GPU by as much as 3.8X.
Tarek Taha, PhD (Advisor)
Eric Balster, PhD (Committee Member)
Vijayan Asari, PhD (Committee Member)
107 p.

Recommended Citations

Citations

  • Mealey, T. C. (2018). Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks [Master's thesis, University of Dayton]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1524402925375566

    APA Style (7th edition)

  • Mealey, Thomas. Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks. 2018. University of Dayton, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=dayton1524402925375566.

    MLA Style (8th edition)

  • Mealey, Thomas. "Binary Recurrent Unit: Using FPGA Hardware to Accelerate Inference in Long Short-Term Memory Neural Networks." Master's thesis, University of Dayton, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1524402925375566

    Chicago Manual of Style (17th edition)