Coder Social home page Coder Social logo

zzh-tech / bit Goto Github PK

View Code? Open in Web Editor NEW
196.0 8.0 8.0 116.82 MB

[CVPR2023] Blur Interpolation Transformer for Real-World Motion from Blur

Home Page: https://zzh-tech.github.io/BiT/

License: MIT License

Python 98.83% Shell 1.17%
beam-splitter computer-vision cvpr cvpr2023 dataset deblurring deep-learning image-enhancement image-restoration image-to-video

bit's Introduction

BiT

by Zhihang Zhong, Mingdeng Cao, Xiang Ji, Yinqiang Zheng, and Imari Sato

๐Ÿ‘‰ Project website

Please leave a โญ if you like this project!

TL;DR:

Our proposed method, BiT, is a powerful transformer-based technique for arbitrary factor blur interpolation, which achieves state-of-the-art performance.

In addition, we present the first real-world dataset for benchmarking blur interpolation methods.

Preparation

Download data

Please download the synthesized Adobe240 dataset from their original repo.

Our real-world dataset RBI can be downloaded from here.

Download checkpoints

Please download the corresponding checkpoints from here.

P.S., *_rbi_adobe240_pretrain denotes models pretrained on Adobe240 and fine-tuned on RBI.

Conda environment installation:

conda create -n BiT python=3.8
conda activate BiT
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

Train

Train on Adobe240

Train BiT on Adobe240 (BiT+ is same as BiT but with more training epochs):

python -m torch.distributed.launch --nproc_per_node=8 train_bit.py --config ./configs/bit_adobe240.yaml

Train BiT++ on Adobe240 (P.S., need to load a pretrained BiT checkpoint. Please set the path of the checkpoint in the config file, i.e., "./configs/bit++_adobe240.yaml"):

python -m torch.distributed.launch --nproc_per_node=8 train_bit.py --config ./configs/bit++_adobe240.yaml

Train on RBI

Train BiT on RBI:

python -m torch.distributed.launch --nproc_per_node=8 train_bit.py --config ./configs/bit_rbi.yaml

Train BiT++ on RBI (P.S., need to load a pretrained BiT checkpoint. Please set the path of the checkpoint in the config file, i.e., "./configs/bit++_rbi.yaml"):

python -m torch.distributed.launch --nproc_per_node=8 train_bit.py --config ./configs/bit++_rbi.yaml

Test

Test on Adobe240

Test BiT++ on Adobe240:

CUDA_VISIBLE_DEVICES=0 ./tools/test/test_bit_adobe240.sh ./checkpoints/bit++_adobe240/cfg.yaml ./checkpoints/bit++_adobe240/latest.ckpt ./results/bit++_adobe240/ /home/zhong/Dataset/Adobe_240fps_dataset/Adobe_240fps_blur/

[Optional] Test BiT on Adobe240:

CUDA_VISIBLE_DEVICES=0 ./tools/test/test_bit_adobe240.sh ./checkpoints/bit_adobe240/cfg.yaml ./checkpoints/bit_adobe240/latest.ckpt ./results/bit_adobe240/ /home/zhong/Dataset/Adobe_240fps_dataset/Adobe_240fps_blur/

Test on RBI

Test BiT++ on RBI:

CUDA_VISIBLE_DEVICES=0 ./tools/test/test_bit_rbi.sh ./checkpoints/bit++_rbi/cfg.yaml ./checkpoints/bit++_rbi/latest.ckpt ./results/bit++_rbi/

[Optional] Test BiT on RBI:

CUDA_VISIBLE_DEVICES=0 ./tools/test/test_bit_rbi.sh ./checkpoints/bit_rbi/cfg.yaml ./checkpoints/bit_rbi/latest.ckpt ./results/bit_rbi/

Inference

Inference with BiT++:

sh ./tools/inference/inference.sh ./checkpoints/bit++_adobe240/cfg.yaml ./checkpoints/bit++_adobe240/latest.ckpt ./demo/00777.png ./demo/00785.png ./demo/00793.png ./demo/bit++_results/ 45

[Optional] Inference with BiT:

sh ./tools/inference/inference.sh ./checkpoints/bit_adobe240/cfg.yaml ./checkpoints/bit_adobe240/latest.ckpt ./demo/00777.png ./demo/00785.png ./demo/00793.png ./demo/bit_results/ 45

Citation

If you find this repository useful, please consider citing:

@inproceedings{zhong2023blur,
  title={Blur Interpolation Transformer for Real-World Motion from Blur},
  author={Zhong, Zhihang and Cao, Mingdeng and Ji, Xiang and Zheng, Yinqiang and Sato, Imari},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5713--5723},
  year={2023}
}
@inproceedings{zhong2022animation,
  title={Animation from blur: Multi-modal blur decomposition with motion guidance},
  author={Zhong, Zhihang and Sun, Xiao and Wu, Zhirong and Zheng, Yinqiang and Lin, Stephen and Sato, Imari},
  booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XIX},
  pages={599--615},
  year={2022},
  organization={Springer}
}

bit's People

Contributors

zzh-tech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bit's Issues

A minor issue with the config .yaml files

Hello, I am trying to run the example code you provided and noticed a potential issue with the .yaml config files. It seems that all of them contain absolute paths instead of relative paths:
issue

I believe this might not be intended and wanted to bring it to your attention. Thank you for your work, and I'm excited to try it out!

Channel Similarity

Hi there, I read your paper and was intrigued by Figure 6b. I was wondering, how did you visualise this and is there any code to do this?
Thanks

Issue When I try to train Train BiT++ on Colab

I am trying to train your model on Google colab uisng following command:
!python -m torch.distributed.launch --nproc_per_node=1 train_bit.py --config ./configs/bit++_rbi.yaml
But i get following error (most probably regarding some issue related to the GPU available on Colab):

`ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7212) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_bit.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-06-19_09:52:32
host : aea723e3180b
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 7212)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`

As I dont know about the backend settings needed to run this code regarding GPU, can you please guide me how to solve this error

Questions about the dataset

First of all, congratulations on such a great public work, but I have some questions to ask you๏ผš

I would like to use your dataset to train a single image input, single image output deblurring model, I want to know how to find the sharp image corresponding to the blurred image.

How to deblur a video?

Thanks for your excellent work.
I have found that the demo is capable of deblurring images using a set of three images.
However, I am wondering if you could offer some guidance on how to deblur a video instead.
Would it be appropriate to deblur each frame using information from the previous and next frames?
I appreciate your help with this matter.

Request for a practical Pre-BiT++ model trained on perceptual loss

I'm a big fan of the practical use of video frame interpolation AI models to make watching movies, TV series and other video content as close to real life as possible. I also believe that for an even better representation of real life, the next step towards even better realism is to use joint video deblurring and frame interpolation AI models, due to the fact that almost all footage recorded at 24fps contains around 20.8ms of motion blur in each frame.

I am hugely grateful to you for the RBI dataset with real motion blur, as this will finally make it possible to develop models that will perform well with real video footage. Thank you also for the information about the Pre-BiT++ model, which is trained on Adobe240 and then on RBI in order to get even better results.

Only a practical Pre-BiT++ model trained on perceptual loss is missing to make it perfect. Why is it so important to train on perceptual loss to use models in practice? I described it in detail in the introduction to the rankings here: https://github.com/AIVFI/Video-Frame-Interpolation-Rankings-and-Video-Deblurring-Rankings

In short: training on perceptual loss recovers more fine details, which is more pleasing to the human eye. This is particularly important for models such as BiT, where all video frames will be replaced by new frames, unlike video frame interpolation models where the original frames are preserved. In addition, BiT, by removing motion blur and giving clear and sharp output frames, will further benefit from the ability to recovers fine details through training on perceptual loss.

So here is my big request to you to train a practical Pre-BiT++ model on perceptual loss. Unfortunately I am not a programmer myself and have no knowledge or skills in this area. These rankings of mine above are the pinnacle of my abilities and a way to connect with those who do model development on a daily basis. In this way, I want to help enthusiasts like me to find the best model for practical applications. I believe that a practical Pre-BiT++ trained on perceptual loss may be the best model for practical use, hence my request.

I also think that such a model would also attract even more attention to your repository, which is also important to me, as I want to see more models trained on RBI dataset with real motion blur in the future.

At the moment, of the 3 most popular frame interpolation methods on GitHub https://github.com/search?o=desc&q=Frame+Interpolation&s=stars&type=Repositories :

7.9k stars - DAIN (CVPR 2019)
3.4k stars - RIFE (ECCV2022)
2.1k stars - FILM (ECCV 2022)

developers of as many as two of those: RIFE and FILM have provided additional practical models that, although they do not reach as high PSNR and SSIM as the primary models of these methods, offer much better perceptual quality.

Thus, I believe that a practical Pre-BiT++ model trained on perceptual loss can gain very wide interest not only from researchers but also from a wide range of enthusiasts for restoring realism to movies, TV series and other video footage.

Questions about Pre-BiT++

Many thanks for the RBI dataset with real motion blur. In my opinion, this is a real revolution! It will finally be possible to train Joint Video Deblurring and Frame Interpolation models on a dataset with real motion blur. Also thanks for developing the BiT models and making them available for download.

I am creating on GitHub Video Frame Interpolation Rankings and Video Deblurring Rankings, where each ranking includes only the single best model for one method.

I now intend to add rankings based on the RBI dataset and I make no secret that these will be the most important rankings in my repository. The best results based on your paper were achieved by the Pre-BiT++ model. I have a couple of questions in relation to this:

  1. Did I understand correctly that this Pre-BiT++ model is the model trained on Adobe240 and then on RBI?

  2. Does Pre-BiT++ also achieve better results visually compared to BiT++(RBI) on the RBI dataset? I mean does Pre-BiT++ not introduce artifacts such as shown in Figure 5 in your paper as in the case of BiT++(Adobe240)?

  3. If you were to apply your method to a real movie to be judged by human vision would you choose Pre-BiT++ instead of BiT++(RBI)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.