Coder Social home page Coder Social logo

atm-vfi's Introduction

ATM-VFI: Exploiting Attention-to-Motion via Transformer for Versatile Video Frame Interpolation

In this repository, we present a versatile VFI work, utilizing the Attention-to-Motion (ATM) module to intuitively formulate motion estimation.

  • Paper: (Under review)
  • Video demo: Youtube

Architecture Overview

drawing

Attention-to-Motion

drawing  drawing

Dependencies

We provide the dependencies in requirements.txt.

Example Usage

import torch
import cv2
from network.network_base import Network # or use network.network_lite 
from demo_2x import load_model_checkpoint, inference_2frame

# initialize model
model = Network()
load_model_checkpoint(model, 'path_to_checkpoint')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device).eval()

# prepare data and inference
img0 = cv2.imread("path_to_frame0")
img1 = cv2.imread("path_to_frame1")
pred = inference_2frame(img0, img1, model, isBGR=True) # please trace demo_2x.py -> inference_2frame() for details

Demo Script

For 2x interpolation, run the command below:

use --global_off flag to disable the global motion estimation.

  • input: 2 frames
    python3 demo_2x.py --model_type <select base or lite> --ckpt <path to model checkpoint> --frame0 <path to frame 0> --frame1 <path to frame 1> --out <path to output frame>
    
  • input: mp4 video
    python3 demo_2x.py --model_type <select base or lite> --ckpt <path to model checkpoint> --video <path to .mp4 file>
    

    use --combine_video flag to combine the original input video and processed video.

Example: 2x interpolation comparison (24 fps v.s. 48 fps)

output_interpolation_combine_resize2.mov

Pretrained checkpoints

We will release the checkpoints after the final paper decision.

Version Link Param (M)
Base TBA 51.56
Lite TBA 11.98
Pct TBA 51.56

Evalution

We evaluate our method using the benchmark scripts provided by RIFE, EMA-VFI and AMT for consistency.

  • Vimeo90K
    cd benchmark
    python3 test_vimeo90k.py --path <path to Vimeo90K dataset folder> --ckpt <path to model checkpoint>
    
  • UCF101
    cd benchmark
    python3 test_ucf101.py --path <path to UCF101 dataset folder> --ckpt <path to model checkpoint>
    
  • SNU-FILM
    cd benchmark
    python3 test_snufilm.py --path <path to SNU-FILM dataset txt> --img_data_path <path to SNU-FILM dataset image folder> --ckpt <path to model checkpoint>
    
  • Xiph
    cd benchmark
    python3 test_xiph.py --root <path to Xiph dataset folder> --ckpt <path to model checkpoint>
    

Training/Fine-tuning

The first 2 phases of the training procedure (stated in our paper) utilize train.py and trainer.py, while the last 2 phases utilize finetune.py and finetune_trainer.py.

  • Phase 1: run train.py and set the argument dataset as vimeo90k, the other training hyperparameters can be set as you wish (batch size, learning rate, no. of epoch, etc.). Reminder: please make sure to uncomment model.global_motion = False.
  • Phase 2: run train.py, set the argument dataset as X4k, and remember to set the variable isLoadCheckpoint to True and change param to the checkpoint of Phase 1. Reminder: please make sure to uncomment model.global_motion = True and model.__freeze_local_motion__().
  • Phase 3: run finetune.py, change param to the checkpoint of Phase 2. For more tweaking, please trace the source code.
  • Phase 4: run finetune.py, change param to the checkpoint of Phase 3.

Citation

TBA

Acknowledgement

Thanks to EMA-VFI, AMT, RIFE, XVFI, vgg_perceptual_loss, GMFlow for releasing their source code.

atm-vfi's People

Contributors

gancheekim avatar

Stargazers

chinghsien avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.