Coder Social home page Coder Social logo

paulasquin / fgt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hitachinsk/fgt

0.0 0.0 0.0 54.72 MB

[ECCV 2022] Flow-Guided Transformer for Video Inpainting

Home Page: https://hitachinsk.github.io/publication/2022-10-01-Flow-Guided-Transformer-for-Video-Inpainting

License: MIT License

Shell 0.01% Python 36.40% Jupyter Notebook 63.59%

fgt's Introduction

[ECCV 2022] Flow-Guided Transformer for Video Inpainting

OSCS Status LICENSE

[Paper] / [ECVA] / [Demo] / [Project page] / [Talk] / [Poster] / [Intro]

This repository contains the implementation of the following paper:

Flow-Guided Transformer for Video Inpainting
Kaidong Zhang, Jingjing Fu and Dong Liu
European Conference on Computer Vision (ECCV), 2022

It is too tedious to run the codes on the local server? Feel free to play with our provided online demos!

  • Colab demo: Open In Colab

โญ News

  • 2023.04.24: FGT is reported by VALSE, a premier forum for Chinese CV researchers, here is the paper report.
  • 2023.01.25: We release FGT++, which is the journal extension of FGT. In this paper, we reformulate the research motivation and propose more methods to exploit the guidance from completed optical flows to transformer-based video inpainting, including the flow-guided feature propagation module and the newly designed temporal deformable MHSA in temporal transformer block. Besides, we also explore the supervision from frequency domain in video inpainting. FGT++ achieves greatly improved compared with FGT and current existing video inpainting baselines.

Overview

We propose a flow-guided transformer, which innovatively leverage the motion discrepancy exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video inpainting. More specially, we design a novel flow completion network to complete the corrupted flows by exploiting the relevant flow features in a local temporal window. With the completed flows, we propagate the content across video frames, and adopt the flow-guided transformer to synthesize the rest corrupted regions. We decouple transformers along temporal and spatial dimension, so that we can easily integrate the locally relevant completed flows to instruct spatial attention only. Furthermore, we design a flow-reweight module to precisely control the impact of completed flows on each spatial transformer. For the sake of efficiency, we introduce window partition strategy to both spatial and temporal transformers. Especially in spatial transformer, we design a dual perspective spatial MHSA, which integrates the global tokens to the window-based attention. Extensive experiments demonstrate the effectiveness of the proposed method qualitatively and quantitatively.

Todo list

  • Update the real-world results.
  • Add Huggingface demo

Prerequisites

  • Linux (We tested our codes on Ubuntu20.04)
  • Anaconda
  • Python 3.6.8
  • Pytorch 1.10.1

To get started, first please clone the repo

git clone https://github.com/hitachinsk/FGT.git

Then, please run the following commands:

conda create -n FGT python=3.6.8
conda activate FGT
pip install -r requirements.txt
pip install imageio-ffmpeg

Quick start

You can try our online demos: Open In Colab

If you need to run the codes on the local servers, here are the instructions.

  1. Download the [pre-trained models], and the [data].
    • You can also get access to our models at Hugging Face Spaces
  2. Put the downloaded zip files to the root directory of this project
  3. Run bash prepare_data.sh to unzip the files
  4. Run the object removal demo
cd tool
python video_inpainting.py --path ../data/frames/schoolgirls \
--path_mask ../data/masks/schoolgirls \
--outroot ../data/results/schoolgirls

If everythings works, you will find a result.mp4 file in data/results/schoolgirls. And the video should be like:

We also prepare other video sequences for you to test. All you need to do is to change the input and output paths, have fun!

Watermark removal

Click to open

cd tool
python video_inpainting.py --opt configs/watermark_removal.yaml \
--mode watermark_removal \
 --path "Path to the video frames" \
--path_mask "Path to the video masks" \
--outroot "Path to save the results"

Video extrapolation (outpainting)

Click to open

cd tool
python video_inpainting.py --opt configs/video_extrapolation.yaml \
--mode video_extrapolation \
 --path "Path to the video frames" \
--outroot "Path to save the results"

Training

Our codes follow a two-stage training process. In the first stage, we train the flow completion network (LAFC), and then we train the flow-guided transformer model (FGT). We assume you are at the root directory of this repo.

First, please prepare the dataset for training.

  1. Please download Youtube-VOS 2018 in the official link.
  2. Unzip the Youtube-VOS files to /myData/youtubevos_frames
  3. Extract the flows from Youtube-VOS with RAFT (or other flow extraction methods) and put the extracted files to /myData/youtubevos_flows

Second, train the LAFC model

cd LAFC
python train.py

Third, train the FGT model. Since LAFC takes multiple flows as input, the IO cost is huge for the training of FGT model. Therefore, we adopt the 2D version of LAFC (replace all the P3D convolutions to 2D convolutions) to guide the training of FGT. We provide a pretrained model in LAFC/flowCheckpoint, if you want to train this model by yourself, follow the command below.

python train.py --opt config/train_single.yaml

As for the training of FGT model, run the command below.

cd ../FGT
python train.py

License

This work is licensed under MIT license. See the LICENSE for details.

Citation

If our work inspires your research or some part of the codes are useful for your work, please cite our paper:

@inproceedings{zhang2022flow,
  title={Flow-Guided Transformer for Video Inpainting},
  author={Zhang, Kaidong and Fu, Jingjing and Liu, Dong},
  booktitle={European Conference on Computer Vision},
  pages={74--90},
  year={2022},
  organization={Springer}
}
@misc{https://doi.org/10.48550/arxiv.2301.10048,
  doi = {10.48550/ARXIV.2301.10048},
  url = {https://arxiv.org/abs/2301.10048},
  author = {Zhang, Kaidong and Peng, Jialun and Fu, Jingjing and Liu, Dong},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting},
  publisher = {arXiv},
  year = {2023},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Our other video inpainting paper (ISVI) :

@InProceedings{Zhang_2022_CVPR,
    author    = {Zhang, Kaidong and Fu, Jingjing and Liu, Dong},
    title     = {Inertia-Guided Flow Completion and Style Fusion for Video Inpainting},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {5982-5991}
}

Contact

If you have any questions, please contact us via

Acknowledgement

Some parts of this repo are based on FGVC and Fuseformer. And we adopt RAFT for flow estimation.

fgt's People

Contributors

hitachinsk avatar kduserstudy avatar paulasquin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.