Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Under Review

Introduction

Prerequisites

Python >= 3.6, PyTorch >= 1.1.0
Requirements: opencv-python, numpy, matplotlib, imageio, scikit-image, tqdm
Platforms: Ubuntu 20.04, cuda-10.2, 4 * Tesla V100 (16GB)

Datasets

GOPRO_Random(Original): To satisfy our assumption that sharp frames exist in a blurry video, we generate non-consecutively blurry frames in a video by randomly averaging adjacent sharp frames, i.e., the average number is randomly chosen from 1 to 15. And we assume that a generated frame Bi is sharp if the number of averaging frames is smaller than 5, i.e., label is 1, otherwise label is 0. It is worth noting that we randomly generate 50% blurry frames in a video, while the other 50% frames are sharp, without constraining that there must be 2 sharp ones in consecutive 7 frame.

REDS_Random(Original): To satisfy our assumption that sharp frames exist in a blurry video, we generate non-consecutively blurry frames in the same way as GOPRO. However, when the frame rate is not high enough, simply averaging frames may generate unnatural spikes or steps in the blur trajectory, especially when the resolution is high and the motion is fast. Hence, we employed FLAVR to interpolate frames, increasing the frame rate to virtual 960 fps by recursively interpolating the frames. Thus, we can synthesize frames with different degrees of blur, i.e., the average number is randomly chosen from 3 to 39. And we assume that a generated frame Bi is sharp if the number of averaging frames is smaller than 17, i.e., label is 1, otherwise label is 0.

Dataset Organization Form

|--dataset
    |--blur  
        |--video 1
            |--frame 1
            |--frame 2
                ：  
        |--video 2
            :
        |--video n
    |--gt
        |--video 1
            |--frame 1
            |--frame 2
                ：  
        |--video 2
         :
        |--video n
    |--Event
        |--video 1
            |--frame 1
            |--frame 2
                ：  
        |--video 2
         :
        |--video n
    |--label
        |--video 1
        |--video 2
         :
        |--video n

BSD Dataset: ESTRNN provided a real-world video blur dataset by using a beam splitter system with two synchronized cameras. By controlling the length of exposure time and strength of exposure intensity during video shooting, the system could obtain a pair of sharp and blurry video samples by shooting video one time. They collected blurry/sharp video sequences for three different blur intensity settings (sharp exposure time – blurry exposure time), 1ms–8ms, 2ms–16ms and 3ms–24ms, respectively. The test set has 20 video sequences with 150 frames in each intensity setting. We use these test sets for evaluating generalization ability.

CED Dataset: Scheerlinck et al. presented the first Color Event Camera Dataset (CED) by color event camera ColorDAVIS346, containing 50 minutes of footage with both color frames and events. We also employed FLAVR to interpolate frames for generating blurry frames as the same with REDS. We randomly split the sequences in CED into training, validation and testing sets, and report the corresponding comparison results against the state-of-the-art models by retraining them with the same setting in extension experiment.

RBE Dataset: Pan et al. presented a real blurry event dataset, where each real sequence is captured with the DAVIS under different conditions, such as indoor, outdoor scenery, low lighting conditions, and different motion patterns (e.g., camera shake, objects motion) that naturally introduce motion blur into the APS intensity images. There is no ground-truth data available on this dataset. Hence, we only use it for quantitative comparison.

Download

Please download the testing datasets and training datasets from BaiduYun[password:f94f]. Our STGTN model trained on non-consecutively blurry GOPRO dataset, REDS dataset, event dataset can be download Here[password:8qpa]. Our results on all datasets can be download Here[password:9jzx].

(i) If you have downloaded the pretrained models，please put STGTN model to './experiment'.

(ii) If you have downloaded the datasets，please put them to './dataset'.

Getting Started

1) Testing

cd code

① For testing w/o event data :

python inference_swin_hsa_nfs.py --default_data XXXX

② For testing w/ event data :

On synthetic event dataset:

inference_swin_hsa_nfs_event.py

On real event dataset:

inference_swin_hsa_nfs_event_real.py

Results

Metrics(PSNR/SSIM) calculating codes are Here.

Results on RBE dataset:

Ablation

Effectiveness of NSFs:

2) Training

Without event data:

python main_swint_hsa_nsf.py --template SWINT_HSA_NSF  # SWINT_HSA_NSF_REDS  for REDS dataset,

With event data:

python main_swint_hsa_nsf_event.py --template SWINT_HSA_NSF_EVENT_GREY # SWINT_HSA_NSF_EVENT for color event

Please check the path for your dataset.

Cite

If you use any part of our code, or STGATN and non consecutively blurry datasets are useful for your research, please consider citing:

@article{ren2023aggregating,
  title={Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring},
  author={Ren, Dongwei and Shang, Wei and Yang, Yi and Zuo, Wangmeng},
  journal={arXiv preprint arXiv:2309.07054},
  year={2023}
}

shangwei5 / stgtn Goto Github PK

stgtn's Introduction

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

Under Review

Introduction

Prerequisites

Datasets

Dataset Organization Form

Download

Getting Started

1) Testing

Results

Ablation

2) Training

Cite

stgtn's People

Contributors

Stargazers

Watchers

Forkers

stgtn's Issues

Recommend Projects

Recommend Topics

Recommend Org