Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Learning Sparse Recurrent Neural Networks in Language Modeling

Abstract Details

2014, Master of Science, Ohio State University, Computer Science and Engineering.
In the context of statistical language modeling, we explored the task of learning an Elman network with sparse weight matrices, as a pilot study towards learning a sparsely con-nected fully recurrent neural network, which would be potentially useful in many cases. We also explored how efficient and scalable it can be in practice. In particular, we explored these tasks: (1) We adapted the Iterative Hard Thresholding (IHT) algorithm into the BackPropagation Through Time (BPTT) learning. (2) To accel-erate convergence of the IHT algorithm, we designed a scheme for expanding the net-work by replicating the existing hidden neurons. Thus we can start training from a small and dense network which is already learned. (3) We implemented this algorithm in GPU. Under small minibatch sizes and large network sizes (e.g., 2000 hidden neurons) it achieves 160 times speedup compared to the RNNLM toolkit in CPU. With larger mini-batch sizes there could be another 10 times speedup, though the convergence rate be-comes an issue in such cases and further effort is needed to address this problem. (4) Without theoretical convergence guarantee of the IHT algorithm in our problem setting, we did an empirical study showing that learning a sparse network does give competitive perplexity in language modeling. In particular, we showed that a sparse network learned in this way can outperform a dense network when the number of effective parameters is kept the same. (5) We gathered performance metric comparing the computational effi-ciency of the matrix operations of interest in both sparse and dense settings. The results suggest that for network sizes which we can train in reasonable time at this moment, it’s hard for sparse matrices to run faster, unless we are allowed to have very sparse networks. Thus for research purposes we may want to focus on using dense matrices, while for en-gineering purposes a more flexible matrix design leveraging the power of dense and sparse matrices might be necessary.
Eric Fosler-Lussier, Dr. (Advisor)
Mikhail Belkin, Dr. (Committee Member)
85 p.

Recommended Citations

Citations

  • Shao, Y. (2014). Learning Sparse Recurrent Neural Networks in Language Modeling [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398942373

    APA Style (7th edition)

  • Shao, Yuanlong. Learning Sparse Recurrent Neural Networks in Language Modeling. 2014. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1398942373.

    MLA Style (8th edition)

  • Shao, Yuanlong. "Learning Sparse Recurrent Neural Networks in Language Modeling." Master's thesis, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398942373

    Chicago Manual of Style (17th edition)