Light

keitakurita / better_lstm_pytorch Goto Github PK

An LSTM in PyTorch with best practices (weight dropout, forget bias, etc.) built-in. Fully compatible with PyTorch LSTM.

License: MIT License

Python 100.00%

pytorch deep-learning

better_lstm_pytorch's Introduction

Better LSTM PyTorch

An LSTM that incorporates best practices, designed to be fully compatible with the PyTorch LSTM API. Implements the following best practices: - Weight dropout - Variational dropout in input and output layers - Forget bias initialization to 1

These best practices are based on the following papers: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Regularizing and Optimizing LSTM Language Models An Empirical Exploration of Recurrent Network Architectures <http://proceedings.mlr.press/v37/jozefowicz15.pdf>

This code is heavily based on the code from this repository: most of the credit for this work goes to the authors. (All I have done is update the code for PyTorch version 1.0 and repackage it).

Installation

Install via pip.

$ pip install .

Requires PyTorch version 1.0 or higher.

Usage

>>> from better_lstm import LSTM
>>> lstm = LSTM(100, 20, dropoutw=0.2)

better_lstm_pytorch's People

Contributors

Stargazers

Watchers

Forkers

yuan776 wildcat47 stjordanis pignuante daukantas dragomirradev takotab davda54 sarthusarth 1798064760 mozammalchy kalsir unknow-man stevens963 wallace-163 rilwan-adewoyin sumethy sandmanpx xbc-gressor

better_lstm_pytorch's Issues

Very helpful.

Thank you!

I am putting this to use in my work which takes a novel-ish approach to text classification based on LSTM. I'm seeing a pronounced improvement as it appears to 'tame' the model, making it less prone to overfitting and less sensitive to hyperparameters like batch size. A sure and steady march upward.

Quick question; how close does better_lstm get us to AWD-LSTM ? It appears at first blush to cover a good deal of the same ground ...

a little problem when batch_first=False.

To fix the error to support batch_first=False.
if is_packed: x, batch_sizes = x max_batch_size = int(batch_sizes[0]) else: batch_sizes = None max_batch_size = x.size(0)

=======>
if is_packed: x, batch_sizes = x if self.batch_first: max_batch_size = int(batch_sizes[0]) else: max_batch_size = int(batch_sizes[1]) else: batch_sizes = None if self.batch_first: max_batch_size = x.size(0) else: max_batch_size = x.size(1)

python2 compatibility

python2 compatibility would be awesome.

Variational Dropout Implementation

hi, thanks for your sharing. I found that in your model.py, you constructed self.indrop and self.outdrop once, would it cause the dropout masks unchanged across batchs not only across time steps?

RNN module weights are not part of single contiguous chunk of memory(GPU)

Hi, first of all thanks this is very useful, I was able to tested it on cpu and it stablized the network a lot. My problem is I get the following error when I try to run it on gpu. It continues to run but gives this error many times. I've tried to add 'self.lstm.flatten_parameters()' just before it, it didn't help. I'm testing this on a really simple example with 1 lstm and 1 linear.

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

A question about _drop_weights()

I notice that the model drops weight_hh in forward() and never recover them. The dropped neural unit will keep to 0 and never be recovered. After a series of turns, Every element of the weight matrix will be 0

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.