Coder Social home page Coder Social logo

snap-research / mocogan-hd Goto Github PK

View Code? Open in Web Editor NEW
237.0 237.0 25.0 20.24 MB

[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis

License: Other

Python 83.82% C++ 1.33% Cuda 8.76% Shell 6.09%
deep-learning gan image-generation video-generation

mocogan-hd's People

Contributors

alanspike avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mocogan-hd's Issues

Incorrect link for the image generator checkpoint on FaceForensics

Hi! Thank you for the project and the codebase! I noticed that for some datasets, links to the pretrained models do not work: e.g. the image generator link on FaceForensics leads to https://github.com/snap-research/MoCoGAN-HD/blob/main/pretrained_models/faceforensics-fid10.9920-snapshot-008765.pt, which does not exist (same for (Anime, VoxCeleb) and (AFHQ, VoxCeleb) cross-domain image generators). Could you please provide a link for the pretrained image generator on FaceForensics?

Augmentation for training?

Hello,

As I saw issue #5 (specifically, the below comment), I understand DiffAugment is applied for training on UCF-101 dataset.

Is DiffAugment applied for FaceForensics dataset too?
Because similar to UCF-101 which only has a small number of samples per class,
FaceForensics has only 704 training data, and I think this is not enough amount of data to train GANs

Hi @sihyun-yu, have you tried to use the augmentation from this work?

The FID was calculated during training from StyleGAN2.

Originally posted by @alanspike in #5 (comment)

Thanks,

Inference issue using pre-trained models

Hi,
Great Work!
I was using the pre_trained model for inference on skytimelapse and ucf-101 dataset. However in both the cases, gray videos are generated. I have not made any changes to the code. There are no errors or warnings. Did you face any similar issue ?

video-gen_19_5_noise.mp4

Question about the way you finetune the generator

Dear authors,

I want to ask a question about how you finetune the generator. Take faceforensics as an instance, did you use all cropped frames as the finetuning dataset, or did you use several frames of an identity each for finetuning?

Thanks a lot.

Question about Inception score evaluation

Hello
Thank you for your great work! I read the paper carefully.

I wonder how to calculate the inception score of UCF-101 in detail.
I read that you follow the tgan paper for evaluating the inception scores and use the C3D network for getting the predictions.

Which weights did you use for the C3D network? Did you train from the initial?
If not, could you let me know the weight of the C3D network and w how to use the C3D net?

In detail, in this paper, the size of the generated video(ucf101) is (224, 224). But pre-trained network of C3D in this link (https://github.com/rezoo/tgan2/releases/download/v1.0/conv3d_deepnetA_ucf.npz) was not trained with config (224, 224). How did you resize and normalize the frames..?

I would be very grateful if you could reply.
Thanks.

Cannot run pca_stats.py

Hi,

I was able to run the pca_stats.py file using the pretrained image generator models provided by you. I was installing a few more packages to my conda environment when pca_stats stopped running altogether.

I have tried uninstalling and reinstalling the conda environment using the requirements.txt provided in the repository. This is my command -- python get_stats_pca.py --batchSize 4000 --save_pca_path pca_stats/ucf_101 --pca_iterations 250 --latent_dimension 512 --img_g_weights pretrained_checkpoints/ucf-256-fid41.6761-snapshot-006935.pt --style_gan_size 256 --gpu 0

The process just hangs forever, the GPU memory goes from 0 MB to 3 MB, and nothing else happens. I don't know what I could have done wrong. It was working before. As an additional step, I also setup the repository from scratch.

Any idea what might have happened?

Did you cut first seconds of the FaceForensics dataset?

Hi! FaceForensics contains "video starting" artifacts for its first ~0.5 seconds for many its videos (see the gif), which might produce the corresponding training artifacts. Did you remove them?

Here are random samples from FFS, cut to the first 0.5 seconds:

real_ffs_128_unstable_1s

Also, did you account for them in any way when computing FVD?

Question about the FVD evaluation

Hi,

First of all, thank you for your great work!

As I read your paper,
I understand that the FVD is calculated from 2048 videos with 128x128 resolution in UCF101 dataset.

To evaluate your model on UCF101, I randomly sampled the 2048 real videos (random video clips with 16 consecutive frames) and resize them into 128x128 resolution.
Then, I calculated FVD between sampled real and fake videos.

In result, I got 625.87 which is a little lower than the distance you reported.
I think there is some difference when building the real video samples compared to your implementation or there is a lot of oscillation of FVD as the randomness of sampling.

Can you inform me detailed evaluation process for FVD on UCF101 and faceforensics dataset?

Thanks,

Question about the cross-domain video discriminator

Hi, thanks for your great work!

I have a question about the cross-domain video discriminator.

According to your paper, you can learn to synthesize video content from one dataset A (such as Anime-Face) while motion part from another dataset B (such as VoxCeleb). In this mode, I think the video discriminator will first learn how to classify the anime and the real person's contents, rather than distinguish meaningful motions. How do you ensure that the video discriminator is helpful during training in this mode?

README should be updated

Some orders in README and current scripts don't match.
image

should run like this
sh script/ffhq-vox/run_evaluate_1024.sh and sh script/ffhq-vox/run_get_stats_pca_1024.sh

Usage of UCF-101 dataset

Hi, thank you for sharing the code of your elegant work!

I have a question about the experimental setup on experiments with UCF-101 dataset.
Did you use the "train" split from the UCF-101 dataset or the whole dataset without split?

Thank you in advance!

Sincerely,
Sihyun

How to train on a custom dataset?

I have a custom dataset of face videos from the How2Sign dataset. I have the dataset in the format required by this repository. What are the steps for training on a custom dataset?

Hyperparameters to train StyleGANv2 on UCF-101

Hello! Thanks again for providing the implementation.
I am trying to retrain a "unconditional" image generator from scratch on the UCF-101 dataset using StyleGANv2 as you suggested.
Did you use specific hyperparameters to train such a model to reach the reported FID?
If so, can you share those hyperparameters?

Thanks in advance!

Sincerely,
Sihyun

Did you use any truncation or curation for the released samples?

Hi! Could you please tell whether you used any truncation for the content or motion codes or curated the samples for these generations: https://github.com/snap-research/MoCoGAN-HD#faceforensics-1 ? I used your pretrained checkpoint, PCA stats and the pretrained G to generate samples with --n_frames_G=32 and without spatial noise. And the results feel of lower quality compared to the ones which you show in your README.md. Here is the samples I got (sorry for the external link, github for some reason does not want to upload the gif even if it is less than 10mb):

https://i.imgur.com/1QRibnD.mp4

For example, the motion diversity is not that good, i.e. the heads do not "speak". Could you tell, why there is such a difference?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.