lucidrains / lumiere-pytorch Goto Github PK
View Code? Open in Web Editor NEWImplementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch
License: MIT License
Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch
License: MIT License
This looks amazing. I am yet to give it a try, but is this complete or still a work in progress? Would love to share feedback once I set this up and try it out.
Thanks
I have been working through your code trying to get it working, and I believe I found an issue when you set the time_dim for the temporal layers here:
def set_time_dim_(
klasses: Tuple[Type[Module]],
model: Module,
time_dim: int
):
for model in model.modules():
if isinstance(model, klasses):
model.time_dim = time_dim
You are setting the same time_dim for all of layers, but the size of the temporal dimension is cut in half after each step in the UNet. Because of this, the model crashes when trying to reshape/rearrange the tensors for intermediate layers (for instance here (maybe others as well?):
if is_video:
batch_size = x.shape[0]
x = rearrange(x, 'b c t h w -> b h w t c')
else:
assert exists(batch_size) or exists(self.time_dim)
rearrange_kwargs = dict(b = batch_size, t = self.time_dim)
x = rearrange(x, '(b t) c h w -> b h w t c', **compact_values(rearrange_kwargs))
I am working on my on workaround in the same set_time_dim function but thought I would report it in case it is helpful.
time
at the third dimension of the input tensor video
:https://github.com/lucidrains/lumiere-pytorch/blob/main/lumiere_pytorch/lumiere.py#L569
time
as the input parameter passed into KarrasUnet
's method forward
:https://github.com/lucidrains/lumiere-pytorch/blob/main/README.md#usage
https://github.com/lucidrains/denoising-diffusion-pytorch/blob/main/denoising_diffusion_pytorch/karras_unet.py#L560
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.