Coder Social home page Coder Social logo

jinwoolim8180 / rnntalkinghead Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snap-research/mocogan-hd

0.0 0.0 0.0 20.24 MB

[ICLR 2021 Spotlight] A Good Image Generator Is What You Need for High-Resolution Video Synthesis

License: Other

Shell 6.00% C++ 1.31% Python 84.06% Cuda 8.63%

rnntalkinghead's Introduction

MoCoGAN-HD

(AFHQ, VoxCeleb)

Pytorch implementation of our method for high-resolution (e.g. 1024x1024) and cross-domain video synthesis.
A Good Image Generator Is What You Need for High-Resolution Video Synthesis
Yu Tian1, Jian Ren2, Menglei Chai2, Kyle Olszewski2, Xi Peng3, Dimitris N. Metaxas1, Sergey Tulyakov2
1Rutgers Univeristy, 2Snap Inc., 3University of Delaware
In ICLR 2021, Spotlight.

Pre-trained Image Generator & Video Datasets

In-domain Video Synthesis

UCF-101: image generator, video data, motion generator
FaceForensics: image generator, video data, motion generator
Sky-Timelapse: image generator, video data, motion generator

Cross-domain Video Synthesis

(FFHQ, VoxCeleb): FFHQ image generator, VoxCeleb, motion generator
(AFHQ, VoxCeleb): AFHQ image generator, VoxCeleb, motion generator
(Anime, VoxCeleb): Anime image generator, VoxCeleb, motion generator
(FFHQ-1024, VoxCeleb): FFHQ-1024 image generator, VoxCeleb, motion generator
(LSUN-Church, TLVDB): LSUN-Church image generator, TLVDB

Calculated pca stats are saved here.

Training

Organise the video dataset as follows:

Video dataset
|-- video1
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video2
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- video3
    |-- img_0000.png
    |-- img_0001.png
    |-- img_0002.png
    |-- ...
|-- ...

In-domain Video Synthesis

UCF-101

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ucf_101 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints/ucf_101 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ucf_101_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ucf_101 \
  --num_test_videos 10 \

FaceForensics

Collect the PCA components from a pre-trained image generator.

sh script/faceforensics/run_get_stats_pca.sh

Train the model

sh script/faceforensics/run_train.sh

Inference

sh script/faceforensics/run_evaluate.sh

Sky-Timelapse

Collect the PCA components from a pre-trained image generator.

sh script/sky_timelapse/run_get_stats_pca.sh

Train the model

sh script/sky_timelapse/run_train.sh

Inference

sh script/sky_timelapse/run_evaluate.sh

Cross-domain Video Synthesis

(FFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

python get_stats_pca.py --batchSize 4000 \
  --save_pca_path pca_stats/ffhq_256 \
  --pca_iterations 250 \
  --latent_dimension 512 \
  --img_g_weights /path/to/ffhq_image_generator \
  --style_gan_size 256 \
  --gpu 0

Train the model

python -W ignore train.py --name ffhq_256-voxel \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --dataroot /path/to/voxel_dataset \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ffhq_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 25 \
  --cross_domain \

Inference

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch the_epoch_for_testing (should >= 0) \
  --results results/ffhq_256 \
  --num_test_videos 10 \

(FFHQ-1024, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/ffhq-vox/run_get_stats_pca_1024.sh

Train the model

sh script/ffhq-vox/run_train_1024.sh

Inference

sh script/ffhq-vox/run_evaluate_1024.sh

(AFHQ, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/afhq-vox/run_get_stats_pca.sh

Train the model

sh script/afhq-vox/run_train.sh

Inference

sh script/afhq-vox/run_evaluate.sh

(Anime, VoxCeleb)

Collect the PCA components from a pre-trained image generator.

sh script/anime-vox/run_get_stats_pca.sh

Train the model

sh script/anime-vox/run_train.sh

Inference

sh script/anime-vox/run_evaluate.sh

(LSUN-Church, TLVDB)

Collect the PCA components from a pre-trained image generator.

sh script/lsun_church-tlvdb/run_get_stats_pca.sh

Train the model

sh script/lsun_church-tlvdb/run_train.sh

Inference

sh script/lsun_church-tlvdb/run_evaluate.sh

Fine-tuning

If you wish to resume interupted training or fine-tune a pre-trained model, run (use UCF-101 as an example):

python -W ignore train.py --name ucf_101 \
  --time_step 2 \
  --lr 0.0001 \
  --save_pca_path pca_stats/ucf_101 \
  --latent_dimension 512 \
  --dataroot /path/to/ucf_101 \
  --checkpoints_dir checkpoints \
  --img_g_weights /path/to/ucf_101_image_generator \
  --multiprocessing_distributed --world_size 1 --rank 0 \
  --batchSize 16 \
  --workers 8 \
  --style_gan_size 256 \
  --total_epoch 100 \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0

Training Control With Options

--w_residual controls the step of motion residual, default value is 0.2, we recommand <= 0.5
--n_pca # of PCA basis, used in the motion residual calculation, default value is 384 (out of 512 dim of StyleGAN2 w space), we recommand >= 256
--q_len size of queue to save logits used in constrastive loss, default value is 4,096
--video_frame_size spatial size of video frames for training, all synthesized video clips will be down-sampled to this size before feeding to the video discriminator, default value is 128, larger size may lead to better motion modeling
--cross_domain activate for cross-domain video synthesis, default value is False
--w_match weight for feature matching loss, default value is 1.0, large value improves content matching

Long Sequence Generation

LSTM Unrolling

In inference, you can generate long sequence by LSTM unrolling with --n_frames_G

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --n_frames_G 32

Interpolation

In inference, you can generate long sequence by interpolation with --interpolation

python -W ignore evaluate.py  \
  --save_pca_path pca_stats/ffhq_256 \
  --latent_dimension 512 \
  --style_gan_size 256 \
  --img_g_weights /path/to/ffhq_image_generator \
  --load_pretrain_path /path/to/checkpoints \
  --load_pretrain_epoch 0 \
  --interpolation

Examples of Generated Videos

UCF-101

FaceForensics

Sky Timelapse

(FFHQ, VoxCeleb)

(FFHQ-1024, VoxCeleb)

(Anime, VoxCeleb)

(LSUN-Church, TLVDB)

Citation

If you use the code for your work, please cite our paper.

@inproceedings{
tian2021a,
title={A Good Image Generator Is What You Need for High-Resolution Video Synthesis},
author={Yu Tian and Jian Ren and Menglei Chai and Kyle Olszewski and Xi Peng and Dimitris N. Metaxas and Sergey Tulyakov},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=6puCSjH3hwA}
}

Acknowledgments

This code borrows StyleGAN2 Image Generator, BigGAN Discriminator, PatchGAN Discriminator.

rnntalkinghead's People

Contributors

alanspike avatar jinwoolim8180 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.