Coder Social home page Coder Social logo

lvt's Introduction

Latent Video Transformer

Code for paper Latent Video Transformer.

Preparation

The training routine code is based on detectron2.

Run this command after cloning the repository.

python setup.py build develop

Inference on the pretrained model

Download the pretrained model here: https://yadi.sk/d/8QjrPTcxznrqNg

In order to run inference use the command:

CUDA_VISIBLE_DEVICES=<gpus> python scripts/generate_videos.py --video-dir ./example --config-file configs/vt/DSFVT.yaml MODEL.GENERATOR.WEIGHTS pretrained/DSFVT/netG/model_final.pth OUTPUT_DIR ./example/sample

It takes the following parameters:

  • video-dir โ€” Folder containing priming frames.
  • config-file โ€” Config file for specific type of LVT model
  • any other parameters insided config-file you would like to change

Datasets

Bair

Download the dataset:

wget http://rail.eecs.berkeley.edu/datasets/bair_robot_pushing_dataset_v0.tar -P ./bair
tar -xvf ./bair/bair_robot_pushing_dataset_v0.tar -C ./bair

Preprocess the dataset:

python ./scripts/convert_bair.py --data_dir ./bair

Kinetics-600

Kinetics-600 dataset is presented as a set of links to YouTube videos.

Download links:

mkdir ./kinetics/
wget https://storage.googleapis.com/deepmind-media/Datasets/kinetics600.tar.gz -P ./kinetics/
tar -xvf ./kinetics/kinetics600.tar.gz -C ./kinetics/
rm ./kinetics/kinetics600.tar.gz

Download data from YouTube:

python  ./scripts/download_kinetics.py ./kinetics/kinetics600/train.csv ./kinetics//kinetics600/train_vid --trim --num-jobs 1
python  ./scripts/download_kinetics.py ./kinetics/kinetics600/test.csv ./kinetics/kinetics600/test_vid --trim --num-jobs 1

Note, that YouTube can block you from downloading videos. That is why it is important not to load many videos simultaneously.

Preprocessing of videos includes:

  1. Trimming videos to the scecified 10-sec range
  2. Converting videos to png files
  3. Center-crop each image
python ./scripts/convert_kinetics.py --video_dir ./kinetics/kinetics600/train --output_dir ./kinetics/kinetics600/train_frames --num_jobs 5 --img_size 64
python ./scripts/convert_kinetics.py --video_dir ./kinetics/kinetics600/test --output_dir ./kinetics/kinetics600/test_frames --num_jobs 5 --img_size 64

Preprocessing script will store images in train_frames and test_frames folders.

VQVAE

Training

In order to train VQVAE run the following command. If you want to modify some parameters, consider changing them in the config configs/vqvae/PR-DVQVAE2.yaml.

CUDA_VISIBLE_DEVICES=<gpus> python tools/train_net.py --config-file configs/vqvae/PR-DVQVAE2.yaml --num-gpus <number of gpus> OUTPUT_DIR experiments/PR-DVQVAE2

Codes sampling

After training of VQVAE one should run code extraction on train data:

CUDA_VISIBLE_DEVICES=<gpus> python tools/train_net.py --eval-only --config-file configs/vqvae/PR-DVQVAE2.yaml OUTPUT_DIR experiments/PR-DVQVAE2 TEST.EVALUATORS "CodesExtractor" DATASETS.TEST "kinetics_train_seq"

Train Latent Transformer

Latent transformer is trained on codes extracted with VQVAE. You should run Latent Transformer after VQVAE training finished.

Note, that in the config file, you should specify the dataset for latent codes:

DATASETS:
  TRAIN: ("prdvqvae_train",)
  TEST: ("prdvqvae_test",)

In order to specify path to codes, modify file vidgen/data/datasets/builtin.py:

register_latents("prdvqvae_train", "datasets/prdvqvae2/inference/bair_train_seq")
register_latents("prdvqvae_test", "datasets/prdvqvae2/inference/bair_test_seq")

register_kinetics_latents("kdvqvae_train", "datasets/K-DVQVAE/inference/kinetics_train_seq")
register_kinetics_latents("kdvqvae_test", "datasets/K-DVQVAE/inference/kinetics_test_seq")
CUDA_VISIBLE_DEVICES=<gpus> python tools/train_net.py --config-file configs/vt/DSFVT.yaml --num-gpus 1 OUTPUT_DIR experiments/vt/DSFVT 

lvt's People

Contributors

dvolkhonskiy avatar rakhimovv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.