Comments (4)
Thank you for the detailed response!
I think you could try the application of our unified mask modeling, than VDT is capable of performing zero-shot extrapolation across any spatial-temporal dimension.
To clarify, the unified mask approach must be applied during training to enable the various tasks (including zero-shot extrapolation) during inference, correct? Do you plan to release details on how you modulated the spatial-temporal mask (e.g. probability of frame dropout, etc.)?
from vdt.
Thank you for your interest in our VDT. The observed decrease in performance when fewer frames are used as a condition stems from the fact that our released model was only trained with a fixed number of frames (8 frames) for conditioning. And we have discovered that it is feasible to extend the model's capabilities to conditions involving more than 8 frames, as demonstrated in Appendix Figure 8. The term "any length" may have been somewhat ambiguous; it specifically refers to any quantity exceeding 8 frames. We have made revisions to clarify this point.
For second question "Would the model need to be trained with only 1 conditioning frame", I think you could try the application of our unified mask modeling, than VDT is capable of performing zero-shot extrapolation across any spatial-temporal dimension.
from vdt.
Additionally, can you please clarify if the unified mask modeling is applied in the image or latent (VAE) space? It seems your notation suggests it is applied in the image domain (
from vdt.
Hi everyone, my apologies for the late reply. I was quite busy earlier and couldn't get to it. I've now updated the mask modeling, and you can find the necessary code in it. Have fun!
from vdt.
Related Issues (13)
- Would you release the training code? HOT 4
- Some confusion about the code. HOT 1
- Cityscapes pretrained model HOT 1
- More Physion evaluation results HOT 1
- any diff with https://github.com/VDT-2023/VDT? HOT 1
- Training Code and Dataset format? HOT 4
- 不吹不擂,分析一下VDT和Sora之间的差别,顺Genie继续往远眺望...
- test HOT 2
- 文中的Mask机制,在代码中对不上 HOT 2
- GPU computer capability
- How to make text to video diffusion network? HOT 1
- Question about the `VDT`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vdt.