ATM-VFI: Exploiting Attention-to-Motion via Transformer for Versatile Video Frame Interpolation

In this repository, we present a versatile VFI work, utilizing the Attention-to-Motion (ATM) module to intuitively formulate motion estimation.

Paper: (Under review)
Video demo: Youtube

Architecture Overview

Attention-to-Motion

Dependencies

We provide the dependencies in requirements.txt.

Example Usage

import torch
import cv2
from network.network_base import Network # or use network.network_lite 
from demo_2x import load_model_checkpoint, inference_2frame

# initialize model
model = Network()
load_model_checkpoint(model, 'path_to_checkpoint')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device).eval()

# prepare data and inference
img0 = cv2.imread("path_to_frame0")
img1 = cv2.imread("path_to_frame1")
pred = inference_2frame(img0, img1, model, isBGR=True) # please trace demo_2x.py -> inference_2frame() for details

Demo Script

For 2x interpolation, run the command below:

use --global_off flag to disable the global motion estimation.

input: 2 frames

python3 demo_2x.py --model_type <select base or lite> --ckpt <path to model checkpoint> --frame0 <path to frame 0> --frame1 <path to frame 1> --out <path to output frame>

input: mp4 video

python3 demo_2x.py --model_type <select base or lite> --ckpt <path to model checkpoint> --video <path to .mp4 file>

use --combine_video flag to combine the original input video and processed video.

Example: 2x interpolation comparison (24 fps v.s. 48 fps)

output_interpolation_combine_resize2.mov

Pretrained checkpoints

We will release the checkpoints after the final paper decision.

Version	Link	Param (M)
Base	TBA	51.56
Lite	TBA	11.98
Pct	TBA	51.56

Evalution

We evaluate our method using the benchmark scripts provided by RIFE, EMA-VFI and AMT for consistency.

Vimeo90K

cd benchmark
python3 test_vimeo90k.py --path <path to Vimeo90K dataset folder> --ckpt <path to model checkpoint>

UCF101

cd benchmark
python3 test_ucf101.py --path <path to UCF101 dataset folder> --ckpt <path to model checkpoint>

SNU-FILM

cd benchmark
python3 test_snufilm.py --path <path to SNU-FILM dataset txt> --img_data_path <path to SNU-FILM dataset image folder> --ckpt <path to model checkpoint>

Xiph

cd benchmark
python3 test_xiph.py --root <path to Xiph dataset folder> --ckpt <path to model checkpoint>

Training/Fine-tuning

The first 2 phases of the training procedure (stated in our paper) utilize train.py and trainer.py, while the last 2 phases utilize finetune.py and finetune_trainer.py.

Phase 1: run train.py and set the argument dataset as vimeo90k, the other training hyperparameters can be set as you wish (batch size, learning rate, no. of epoch, etc.). Reminder: please make sure to uncomment model.global_motion = False.
Phase 2: run train.py, set the argument dataset as X4k, and remember to set the variable isLoadCheckpoint to True and change param to the checkpoint of Phase 1. Reminder: please make sure to uncomment model.global_motion = True and model.__freeze_local_motion__().
Phase 3: run finetune.py, change param to the checkpoint of Phase 2. For more tweaking, please trace the source code.
Phase 4: run finetune.py, change param to the checkpoint of Phase 3.

Citation

TBA

Acknowledgement

Thanks to EMA-VFI, AMT, RIFE, XVFI, vgg_perceptual_loss, GMFlow for releasing their source code.

gancheekim / atm-vfi Goto Github PK

atm-vfi's Introduction

ATM-VFI: Exploiting Attention-to-Motion via Transformer for Versatile Video Frame Interpolation

Architecture Overview

Attention-to-Motion

Dependencies

Example Usage

Demo Script

Example: 2x interpolation comparison (24 fps v.s. 48 fps)

Pretrained checkpoints

Evalution

Training/Fine-tuning

Citation

Acknowledgement

atm-vfi's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent