ndrplz / convlstm_pytorch Goto Github PK

View Code? Open in Web Editor NEW

1.9K 20.0 429.0 10 KB

Implementation of Convolutional LSTM in PyTorch.

License: MIT License

Python 100.00%

convlstm_pytorch's Introduction

ConvLSTM_pytorch

This file contains the implementation of Convolutional LSTM in PyTorch made by me and DavideA.

We started from this implementation and heavily refactored it add added features to match our needs.

Please note that in this repository we implement the following dynamics:

which is a bit different from the one in the original paper.

How to Use

The ConvLSTM module derives from nn.Module so it can be used as any other PyTorch module.

The ConvLSTM class supports an arbitrary number of layers. In this case, it can be specified the hidden dimension (that is, the number of channels) and the kernel size of each layer. In the case more layers are present but a single value is provided, this is replicated for all the layers. For example, in the following snippet each of the three layers has a different hidden dimension but the same kernel size.

Example usage:

model = ConvLSTM(input_dim=channels,
                 hidden_dim=[64, 64, 128],
                 kernel_size=(3, 3),
                 num_layers=3,
                 batch_first=True
                 bias=True,
                 return_all_layers=False)

TODO (in progress...)

Comment code
Add docs
Add example usage on a toy problem
Implement stateful mechanism
...

Disclaimer

This is still a work in progress and is far from being perfect: if you find any bug please don't hesitate to open an issue.

convlstm_pytorch's People

Contributors

Stargazers

Watchers

Forkers

the-fonz aserdega zcrwind shuidongliu alexma011 lraxue alterzero danilecug xy0806 robonich lilimeng hibiscuses crqs thbuerg smearle worksking shuiliwanwu stefanopini mohirio pratikgujjar matinhosseiny feiwang2018 wanghuogen kaikangsdu yayeocddy dmenig chengzhicu sibangde guoswang sreenivasvrao annusgit dlwbm123 wangke0809 pplntech ibopeng ryany1994 jvitku ek9852 mfzhang markusbonse zlb-ks yiweilu3 frizy-up baymax1314 denglixi hisangke water2bear magicyu-2015 njuhaozhang mikumeow xdr940 nanton96 tobiaskungenlu tenghuo zhique930716 bhyang ewanlee yanxiao1930 lianrenbao nestdream lhaof hongguangzhang xutete kschmeckpeper ammieqi princeleeee al093 liuzhuang1024 ballmdr 863752027z haoyz wanzysky alonggs dgl547437235 weigq ezelikman yanzhenxiong codedai xuyunlu1030 pandinosaurus mengxiangyudlut chriszhenghaochen wzg16 bitbeyhub chiehchiu guker vipchengrui lansv-noer mosquitobite taehakim-kor kejiejiang carcy zkaiwu samux87 arroqc walkanos keep-learning-cmd ucrscholar jeffery000 xueluli

convlstm_pytorch's Issues

Understanding the algo.

Hello there, really interested in your work but i am still understanding the implementation. Can you explain you have used stacked version of x_(t) and h_(t-1) for convolution according to this:

but in implementation you used h_cur and c_cur rather then h_(t-1)?

Will really appreciate your quick answer.

Does this implenation support GPU acceleration, like cuda?

summary[m_key]["input_shape"] = list(input[0].size()) with empty input

when i build my model with convlstm in the function def forward(self, x):
summary[m_key]["input_shape"] = list(input[0].size()) in torchsummary.py
the input is empty tuple, so the program error with '{IndexError}tuple index out of range'

Any example using the implement like Moving MNIST?

First thanks for your implementing the conv-lstm which is useful in the video predict, I like to use pytorch, and find your implement, have you succeed training on the toy Moving MNIST dataset?

Question: Shouldn't each layer have multiple cells?

I noticed in the code that the input sequence updates the same cell iteratively before the hidden state is passed to the next layer. My understanding of LSTMs is that there can be multiple cells per layer? These cells then act as a sliding window over time, where cell 0 takes input from t0, cell 1 takes input from t1 and hidden state from cell 0, and so on. Can I clarify about this?

computation graph broken

I am trying to train a simple model:

model = seq2seq(input_dim=1,
             hidden_dim=[64],
             num_layers=1,
             kernel_size=(3, 3),
             batch_first=True,
             bias=True,
             return_all_layers=False).to(device)

but the computational graph is broken somewhere and I can't seem to find the place.

ReadMe example missing a comma

model = ConvLSTM(input_dim=channels,
hidden_dim=[64, 64, 128],
kernel_size=(3, 3),
num_layers=3,
batch_first=True[missing comma here]
bias=True,
return_all_layers=False)

Will variable length sequence work with this code? Like if I use pack padded sequences as input to this convlstm will it work correctly

multi-gpu use

Hi,

I just wanted to know if this implementation would work well in a DataParrallel model and if there was some adaptation to do ?

Thanks

Does the stateful implementataion work similar to the LSTMs statefulness.

hidden state raises NotImplementedError()

Hey,
I am trying to integrate your convLSTM cell into an existing model I have.
I did this in the following way:

    def forward(self,x): 
        out = my_model(x)
        out = out.unsqueeze(1) #make sure dimensions fit
        out, self.hidden = self.convlstm(out, self.hidden)
        return out[-1].squeeze(1)

self.hidden is none in the first run, but not none in the second, leading to:

    # Implement stateful ConvLSTM
    if hidden_state is not None:
        raise NotImplementedError()

which is in your ConvLSTM module (line 141, 142)

would this be implemented just by:

    # Implement stateful ConvLSTM
    if hidden_state is not None:
        hidden_state=hidden_state`

or am I misunderstanding something?
What is supposed to happen here?

Thanks for any help :-D

kernel_size = (2, 2) causes dimension issues

This code snippet taken and changed slightly from the docstring fails for me:

import torch
from CONVLSTM_Implementation import ConvLSTM
x = torch.rand((32, 10, 64, 128, 128))
convlstm = ConvLSTM(64, 16, (2, 2), 1, True, True, False)
_, last_states = convlstm(x)

with the following Error:

c_next = f * c_cur + i * g
RuntimeError: The size of tensor a (129) must match the size of tensor b (128) at non-singleton dimension 3

But I don't really know why. I guess I will use a different kernel_size for now.

Missing Hadamard Products in Forward pass

First off thank you for the implementation. Are Hadamard products in the paper missing? Please see the image below and paper that I am referring to for clarification.

Image for equations
Paper

Lstm, generate .pkl. RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

https://discuss.pytorch.org/t/lstm-generate-pkl-runtimeerror-cudnn-error-cudnn-status-mapping-error/181139?u=xukw123123

PLEASE

why out_channels=4 * self.hidden_dim?

RuntimeError

When added the code below, I got an error:" RuntimeError: expected stride to be a single integer value or a list of 3 values to match the convolution dimensions, but got stride=[1, 1] ", does anyone tell me why this happen?

def main():

model = ConvLSTM(input_dim=3,
                 hidden_dim=[64, 64, 128, 128],
                 kernel_size=(3, 3, 3),
                 num_layers=4,
                 batch_first=True,
                 bias=True,
                 return_all_layers=False
                 )

print(model)

x = torch.randn((32, 10, 64, 128, 128))

now_states, last_states = model(x)

if name == "main":
main()

torch split into the four gates

Hey I was not sure how the logic in line 49 works

cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1)

Since each of the 4 gates has operations of weights with inputs, how is the order of split determined? Why not something like cc_g, cc_f, cc_i, cc_go = torch.split(combined_conv, self.hidden_dim, dim=1)? I am a bit confused how the LSTM equations in https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell are implemented here.

Thanks in advance.

output shape probelm

sorry, i didn't really understand conLSTM
when i use keras layers ConvLSTM2D(filters = 128, kernel_size=(3, 3), padding='same', return_sequences = False, go_backwards = True,kernel_initializer = 'he_normal' ),such as input shape is (batch,2,h,w,channel),2 i guess is time,and the output is (batch,h,w,128)
but in your code ,i didn't get same shape, can you help me to get the shape like keras convLSTM,thx

split_size_or_sections

Sorry, my English is not very good. Why split_size_or_sections is self.hidden_dim? There are only 4 variables to receive the result, I think self.hidden_dim should be changed to 4

44 cc_i, cc_f, cc_o, cc_g = torch.split(combined_conv, self.hidden_dim, dim=1)

What is the difference between hidden_state and hidden_dim?

I saw that in the code, hidden_state is not implemented:

    def forward(self, input_tensor, hidden_state=None):
        """

        Parameters
        ----------
        input_tensor: todo
            5-D Tensor either of shape (t, b, c, h, w) or (b, t, c, h, w)
        hidden_state: todo
            None. todo implement stateful

meanwhile, hidden_dim is given.
What is the difference between those two variables?

Doesn't it have bi-directional?

As stated in the title, doesn't it support bi-directional multi-layer LSTM?

"RuntimeError: Jacobian mismatch for output 0 with respect to input 0"

If you run the example toy data in the script you will get the above error in pytorch 0.40. I am not sure if the eailer version will cause this issue. How can you make sure this is working?

Run training on 3-channels video frames

Hello!
Is there any examples how to run training on 3 channels frames sequence?

Various time step of input samples

I am not sure how to handle the situation of various length of input sample. I think padding is not okay for my problem. Is there any better way to solve it?

Valid padding and custom strides

how can we implement options for valid padding and using custom strides, where are changes required?

How to add the dropout into the convlstm

Sorry but I've added a dropout layer behind the convolution, and the memory of the GPUs explode. How could I solve this?

A question

I have a question regarding your implementation:
As I understood the original convolutional lstm formulation is as follows:

But in your implementation, u used only one convolution layer. I don't understand how these 2 correspond with each other. because in the formulation, c is only used in the Hadamard product and not in convolutions, but here c and h are both used in convolutions.
in fact, all weights are shared for all 4 formulas, although there are 11 weights in the original formula.

Reproducing Moving Mnist with ConvLSTM

Hello I am trying to reproduce the results of the paper in moving Mnist.
I developed my own implementation but it didnt converge. Now I will try with your proposed implementation.

I was wondering have you managed to use it in any dataset link Moving Mnist?

Does it converge?

By the way, I am not 100% sure how you implement the input to state transitions and final state to output transition. Do you do it via 2D(in the timestep) or 3D convs (in all the sequence)

Any ideas?
N.A.