Coder Social home page Coder Social logo

keitakurita / better_lstm_pytorch Goto Github PK

View Code? Open in Web Editor NEW
132.0 4.0 20.0 6 KB

An LSTM in PyTorch with best practices (weight dropout, forget bias, etc.) built-in. Fully compatible with PyTorch LSTM.

License: MIT License

Python 100.00%
pytorch deep-learning

better_lstm_pytorch's Introduction

Better LSTM PyTorch

An LSTM that incorporates best practices, designed to be fully compatible with the PyTorch LSTM API. Implements the following best practices: - Weight dropout - Variational dropout in input and output layers - Forget bias initialization to 1

These best practices are based on the following papers: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Regularizing and Optimizing LSTM Language Models An Empirical Exploration of Recurrent Network Architectures <http://proceedings.mlr.press/v37/jozefowicz15.pdf>

This code is heavily based on the code from this repository: most of the credit for this work goes to the authors. (All I have done is update the code for PyTorch version 1.0 and repackage it).

Installation

Install via pip.

$ pip install .

Requires PyTorch version 1.0 or higher.

Usage

>>> from better_lstm import LSTM
>>> lstm = LSTM(100, 20, dropoutw=0.2)

better_lstm_pytorch's People

Contributors

keitakurita avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

better_lstm_pytorch's Issues

Very helpful.

Thank you!

I am putting this to use in my work which takes a novel-ish approach to text classification based on LSTM. I'm seeing a pronounced improvement as it appears to 'tame' the model, making it less prone to overfitting and less sensitive to hyperparameters like batch size. A sure and steady march upward.

Quick question; how close does better_lstm get us to AWD-LSTM ? It appears at first blush to cover a good deal of the same ground ...

a little problem when batch_first=False.

To fix the error to support batch_first=False.
if is_packed: x, batch_sizes = x max_batch_size = int(batch_sizes[0]) else: batch_sizes = None max_batch_size = x.size(0)

=======>
if is_packed: x, batch_sizes = x if self.batch_first: max_batch_size = int(batch_sizes[0]) else: max_batch_size = int(batch_sizes[1]) else: batch_sizes = None if self.batch_first: max_batch_size = x.size(0) else: max_batch_size = x.size(1)

Variational Dropout Implementation

hi, thanks for your sharing. I found that in your model.py, you constructed self.indrop and self.outdrop once, would it cause the dropout masks unchanged across batchs not only across time steps?

RNN module weights are not part of single contiguous chunk of memory(GPU)

Hi, first of all thanks this is very useful, I was able to tested it on cpu and it stablized the network a lot. My problem is I get the following error when I try to run it on gpu. It continues to run but gives this error many times. I've tried to add 'self.lstm.flatten_parameters()' just before it, it didn't help. I'm testing this on a really simple example with 1 lstm and 1 linear.

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

A question about _drop_weights()

I notice that the model drops weight_hh in forward() and never recover them. The dropped neural unit will keep to 0 and never be recovered. After a series of turns, Every element of the weight matrix will be 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.