lucidrains / lumiere-pytorch Goto Github PK

View Code? Open in Web Editor NEW

211.0 211.0 7.0 1.17 MB

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

License: MIT License

Python 100.00%

artificial-intelligence deep-learning denoising-diffusion text-to-video

lumiere-pytorch's People

Contributors

Stargazers

Watchers

Forkers

umcai f-amerehi lcsouzamenezes goswamig tengfei86 unanan gerardragbir

lumiere-pytorch's Issues

Is this complete?

This looks amazing. I am yet to give it a try, but is this complete or still a work in progress? Would love to share feedback once I set this up and try it out.

Thanks

Incorrect time_dim for intermediate temporal layers

I have been working through your code trying to get it working, and I believe I found an issue when you set the time_dim for the temporal layers here:

def set_time_dim_(
    klasses: Tuple[Type[Module]],
    model: Module,
    time_dim: int
):
    for model in model.modules():
        if isinstance(model, klasses):
            model.time_dim = time_dim

You are setting the same time_dim for all of layers, but the size of the temporal dimension is cut in half after each step in the UNet. Because of this, the model crashes when trying to reshape/rearrange the tensors for intermediate layers (for instance here (maybe others as well?):

if is_video:
    batch_size = x.shape[0]
    x = rearrange(x, 'b c t h w -> b h w t c')
else:
    assert exists(batch_size) or exists(self.time_dim)

    rearrange_kwargs = dict(b = batch_size, t = self.time_dim)
    x = rearrange(x, '(b t) c h w -> b h w t c', **compact_values(rearrange_kwargs))

I am working on my on workaround in the same set_time_dim function but thought I would report it in case it is helpful.

What is the difference between the two `time`s?

1. `time` at the third dimension of the input tensor `video`:

https://github.com/lucidrains/lumiere-pytorch/blob/main/lumiere_pytorch/lumiere.py#L569

2. `time` as the input parameter passed into `KarrasUnet`'s method `forward`:

https://github.com/lucidrains/lumiere-pytorch/blob/main/README.md#usage
https://github.com/lucidrains/denoising-diffusion-pytorch/blob/main/denoising_diffusion_pytorch/karras_unet.py#L560

lucidrains / lumiere-pytorch Goto Github PK

lumiere-pytorch's People

Contributors

Stargazers

Watchers

Forkers

lumiere-pytorch's Issues

Is this complete?

Incorrect time_dim for intermediate temporal layers

What is the difference between the two `time`s?

1. `time` at the third dimension of the input tensor `video`:

2. `time` as the input parameter passed into `KarrasUnet`'s method `forward`:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

lucidrains / lumiere-pytorch Goto Github PK

lumiere-pytorch's People

Contributors

Stargazers

Watchers

Forkers

lumiere-pytorch's Issues

1. time at the third dimension of the input tensor video:

2. time as the input parameter passed into KarrasUnet's method forward:

Recommend Projects

Recommend Topics

Recommend Org

1. `time` at the third dimension of the input tensor `video`:

2. `time` as the input parameter passed into `KarrasUnet`'s method `forward`: