Tensorflow implementations of recently proposed LSTM (RNN) cells. As testing cells, language modeling over Penn Tree Bank, which was downloaded via PTB dataset from Tomas Mikolov's webpage, is conducted.
Available cells:
- Highway State Gating
Ron Shoham and Haim Permuter. "Highway State Gating for Recurrent Highway Networks: improving information flow through time" arxiv 2018- Usage
from lstm_cell import CustomRNNCell cell = CustomRNNCell(highway_state_gate=True, recurrent_highway=True)
- Hyper Networks
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." Proceedings of International Conference on Learning Representations (ICLR) 2017.
- Usage
from lstm_cell import HyperLSTMCell cell = HyperLSTMCell()
- Recurrent Highway Network
Zilly, Julian Georg, et al. "Recurrent Highway Networks." International Conference on Machine Learning (ICML) 2017.
- Usage
from lstm_cell import CustomRNNCell cell = CustomRNNCell(recurrent_highway=True, recurrent_depth=4)
- Key-Value-Predict Attention
Daniluk, Michał, et al. "Frustratingly short attention spans in neural language modeling." Proceedings of International Conference on Learning Representations (ICLR) 2017.
- Usage
from lstm_cell import KVPAttentionWrapper, CustomLSTMCell cell = LSTMCell.CustomLSTMCell() attention_layer = KVPAttentionWrapper(cells)
- Vanilla LSTM
- Usage
from lstm_cell import LSTMCell cell = LSTMCell.CustomLSTMCell()
Each cell utilizes following regularization:
- Variational Dropout (per-sample) Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016.
- Recurrent Dropout Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. "Recurrent dropout without memory loss." arXiv preprint arXiv:1603.05118 (2016).
- Layer Normalization Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
The usage is the same as usual LSTM cell of tensorflow.
- Neural architecture search
- minimal RNN
- char language model
- Layer normalization dose not improve performance. Fix it.
git clone https://github.com/asahi417/LSTMCell
cd LSTMCell
pip install -r requirements.txt
wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
tar xvzf simple-examples.tgz
One can train language model by an arbitrary cell:
python train.py -m [lstm type]
[lstm type]: lstm, rhn, hypernets, kvp, hsg
The large sized model of Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. "Recurrent neural network regularization." arXiv preprint arXiv:1409.2329 (2014) is employed as baseline model.
- This code is supported by python 3 and tensorflow 1.3.0.