Coder Social home page Coder Social logo

jingyunliang / vrt Goto Github PK

View Code? Open in Web Editor NEW
1.3K 17.0 123.0 13.05 MB

VRT: A Video Restoration Transformer (official repository)

Home Page: https://arxiv.org/abs/2201.12288

License: Other

Python 100.00%
transformer video-restoration low-level-vision vision-transformer video-super-resolution video-deblurring video-denoising video-sr super-resolution sr

vrt's Introduction

Jingyun Liang visitorsGitHub Followers

Email / Homepage / Google Scholar / Github

I am currently a PhD Student at Computer Vision Lab, ETH Zürich, Switzerland. I am co-supervised by Prof. Luc Van Gool and Prof. Radu Timofte. I also work closely with Dr. Kai Zhang. I mainly focus on low-level vision research, especially on image and video restoration, such as

  • image/video super-resolution (SR)
  • image/video deblurring
  • image/video denoising
  • ...

🚀 News

  • 2022-10-04: Our new paper RVRT, NeurlPS2022 achieves SOTA video restoration results with balanced size, memory and runtime.
  • 2022-08-30: See our papers on real-world image denoising (SCUNet) and video denoising (ReViD).
  • 2022-07-30: Three papers, including EFNet (event-based image deblurring, oral), DATSR (reference image SR) and DAVSR (video SR), accepted by ECCV2022.
  • 2022-01-28: Our new paper VRT outperforms previous Video SR/ deblurring/ denoising/ frame interpolation/ space-time video SR methods by up to 😍 2.16dB. 😍
  • 2021-10-20: SwinIR is awarded the best paper prize in ICCV-AIM2021.
  • 2021-08-01: Three papers (HCFlow, MANet and BSRGAN) accepted by ICCV2021.
  • 2021-03-29: One paper (FKP) accepted by CVPR2021.

🌱 Repositories

Topic Title Badge
real-world video denoising Practical Real Video Denoising with Realistic Degradation Model arXivGitHub Stars
event-based image deblurring Event-based Fusion for Motion Deblurring with Cross-modal Attention, ECCV2022 arXivGitHub Stars
reference image SR Reference-based Image Super-Resolution with Deformable Attention Transformer, ECCV2022 arXivGitHub Stars
interpretable video restoration Towards Interpretable Video Super-Resolution via Alternating Optimization, ECCV2022 arXivGitHub Stars
transformer-based video restoration Recurrent Video Restoration Transformer with Guided Deformable Attention arXivGitHub Starsdownload google colab logo
transformer-based video restoration VRT: A Video Restoration Transformer arXivGitHub Starsdownload google colab logo
transformer-based image restoration SwinIR: Image Restoration Using Swin Transformer arXivGitHub Starsdownload google colab logo
real-world image denoising Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis arXivGitHub Stars
real-world image SR Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, ICCV2021 arXivGitHub Stars
blind image SR Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution, ICCV2021 arXivGitHub Starsdownload google colab logo
blind image SR Flow-based Kernel Prior with Application to Blind Super-Resolution, CVPR2021 arXivGitHub Stars
normalizing flow-based image SR and image rescaling Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling, ICCV2021 arXivGitHub Starsdownload google colab logo
image/ video restoration Image/ Video Restoration Toolbox GitHub StarsdownloadGitHub Forks

vrt's People

Contributors

jingyunliang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vrt's Issues

CUDNN_STATUS_INTERNAL_ERROR with my own set

Hello,
I'm able to run VRT with provide sets then I tried with my own and it doesn't work I get this error:

(py38) H:\git\VRT>python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --sigma 10 --folder_lq testsets/mine --tile 8 160 180 --tile_overlap 2 20 20
h:\Anaconda3\envs\py38\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
H:\git\VRT\models\network_vrt.py:716: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats)
loading model from ./model_zoo/vrt/model_zoo/vrt/008_VRT_videodenoising_DAVIS.pth
using dataset from testsets/mine
Traceback (most recent call last):
  File "main_test_vrt.py", line 346, in <module>
    main()
  File "main_test_vrt.py", line 72, in main
    output = test_video(lq, model, args)
  File "main_test_vrt.py", line 257, in test_video
    out_clip = test_clip(lq_clip, model, args)
  File "main_test_vrt.py", line 308, in test_clip
    out_patch = model(in_patch).detach().cpu()
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\git\VRT\models\network_vrt.py", line 1382, in forward
    flows_backward, flows_forward = self.get_flows(x)
  File "H:\git\VRT\models\network_vrt.py", line 1413, in get_flows
    flows_backward, flows_forward = self.get_flow_2frames(x)
  File "H:\git\VRT\models\network_vrt.py", line 1436, in get_flow_2frames
    flows_backward = self.spynet(x_1, x_2)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\git\VRT\models\network_vrt.py", line 438, in forward
    flow_list = self.process(ref, supp, w, h, w_floor, h_floor)
  File "H:\git\VRT\models\network_vrt.py", line 412, in process
    flow = self.basic_module[level](torch.cat([
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "H:\git\VRT\models\network_vrt.py", line 356, in forward
    return self.basic_module(tensor_input)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "h:\Anaconda3\envs\py38\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

I suppose this is the size of the images and/or the tile settings, I don't really understand how to choose that setting as there's 3 values, I think the second and third as the height and width of the tile but what is the first one, it doesn't need the count as it can be calculated so what ?

Change upscale arg for VideoSR

I am running the VRT demo code and would like to change the upscale ratio to 2x instead of the default of 4 on the 001_VRT_videosr_bi_REDS_6frames task. I am using my own test dataset containing 1080x1920p image files. I'm running python main_test_vrt.py --task 001_VRT_videosr_bi_REDS_6frames --folder_lq testsets/uploaded --tile 40 128 128 --tile_overlap 2 20 20...

I noticed that the upscale is being set in main_test_vrt.py. Simply changing upscale=2 doesn't work because of model mismatch errors during model load.
Is there a pretrained model that will do a 2x upscale or are there a set of parameters that I can pass when initializing the model for the 2x upscale to work?

 # define model
    if args.task == '001_VRT_videosr_bi_REDS_6frames':
        model = net(upscale=4, img_size=[6,64,64], window_size=[6,8,8], depths=[8,8,8,8,8,8,8, 4,4,4,4, 4,4],
                    indep_reconsts=[11,12], embed_dims=[120,120,120,120,120,120,120, 180,180,180,180, 180,180],
                    num_heads=[6,6,6,6,6,6,6, 6,6,6,6, 6,6], pa_frames=2, deformable_groups=12)
        datasets = ['REDS4']
        args.scale = 4
        args.window_size = [6,8,8]
        args.nonblind_denoising = False

Thanks!

Testing fails in network_vrt.py @ get_flow_4frames, flows_forward[0].shape[1]

Hi, I've been trying to use this code in combination with the github://cszn/KAIR for training a VRT model using my data and a custom dataloader I wrote for my data. Unfortunately, I'm running into an error in the testing phase of the get_flow_4frames because the shape of the forward_flows[0] is: torch.Size([1, 0, 2, 64, 64])

The X input into forward is: torch.Size([1, 1, 3, 64, 64])
The X input into get_flows is: torch.Size([1, 1, 3, 64, 64])
The X input into get_flow_2frames: torch.Size([1, 1, 3, 64, 64])
The forward_flows[0] is as previously specified: torch.Size([1, 0, 2, 64, 64])

def get_flow_4frames(self, flows_forward, flows_backward):
        '''Get flow between t and t+2 from (t,t+1) and (t+1,t+2).'''

        # backward
        d = flows_forward[0].shape[1]
        flows_backward2 = []
        for flows in flows_backward:
            flow_list = []
            for i in range(d - 1, 0, -1):
                flow_n1 = flows[:, i - 1, :, :, :]  # flow from i+1 to i
                flow_n2 = flows[:, i, :, :, :]  # flow from i+2 to i+1
                flow_list.insert(0, flow_n1 + flow_warp(flow_n2, flow_n1.permute(0, 2, 3, 1)))  # flow from i+2 to i
            if len(flow_list) != 0:
                flows_backward2.append(torch.stack(flow_list, 1))

The training is working without any issues.

Is this the anticipated behavior within the code or is there something regarding the test settings that I'm missing?

Deblurring Task

I have a basic question about memory management.
When I tried deblurring task on my dataset I get this error:

tcmalloc: large alloc 4556390400 bytes == 0x7fdce32c8000 @ 0x7fdf59255b6b 0x7fdf59275379 0x7fde8e8b750e 0x7fde8e8a97c2 0x7fdec8c642d8 0x7fdec8c64de7 0x7fdec9375928 0x7fdec8d08b07 0x7fdec8d09415 0x7fdec93dee82 0x7fdec92b1a2d 0x7fdec8d142e1 0x7fdec94dc622 0x7fdec8fed79a 0x7fdec8d101da 0x7fdec94dd5b2 0x7fdec90b17c2 0x7fdeca0d677a 0x7fdeca0d6d65 0x7fdec90fd35d 0x7fdf43d5f2e0 0x593835 0x548c51 0x5127f1 0x549e0e 0x593fce 0x548ae9 0x51566f 0x4bc98a 0x533274 0x4d3969 tcmalloc: large alloc 4556390400 bytes == 0x5e4e2000 @ 0x7fdf59255b6b 0x7fdf59275379 0x7fde8e8b750e 0x7fde8e8a97c2 0x7fdec881b10f 0x7fdec881ba51 0x7fdec881baa4 0x7fdec8cee9ae 0x7fdec93dbaea 0x7fdec919d38c 0x7fdec93c262f 0x7fdec91d81e0 0x7fdf43fa7371 0x7fdec8cf2cf0 0x7fdec9573994 0x7fdec8fc2467 0x7fdec93c1995 0x7fdec9005a59 0x7fdf43cf35e1 0x593784 0x548c51 0x51566f 0x593dd7 0x5118f8 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206 /usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be Nonewarnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
I reduced to 128x128 but nothing has changed.Could you give any suggestion ?

Task 007 raise EinopsError

I have been getting this error:
image
when I ran task 007 on blur_bicubic dataset. I have also tried on another datasets. The error was still raised. This is my own argsguments:
image
Please, Could you provide solutions to solve it?

problem about test

Traceback (most recent call last):
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/multiprocessing/queues.py", line 107, in get
if not self._poll(timeout):
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
r = wait([self], timeout)
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 112301) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "main_test_vrt.py", line 376, in
main()
File "main_test_vrt.py", line 82, in main
for idx, batch in enumerate(test_loader):
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "/home/wgp/anaconda3/envs/VRT/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 112301) exited unexpectedly

Request training setting recommendation of ×4 VSR

If I only have 2/4 3090s and want to train a model for ×4 VSR, how can I set training parameters effectively? That is no OOM, no large performance drop, mild training time.

For example, there are two parameters of using checkpoint to save Cuda memory, use_checkpoint_attn and use_checkpoint_ffn, which one is the most influence one for training time/memory consumption?

Looking forward to your reply, thank you.

Colab IndexError: list index out of range

Hello.

I keep getting this error in Colab no matter what I try. "IndexError: list index out of range"
I'm only trying to denoise 5 1920x1080 images. Are you able to do this successfully in the Colab notebook?
Any suggestions on what I might be doing wrong? Thanks!

Problem of "use_checkpoint_attn".

I try to reimplement the training part. But I encounter the following problem when I set the param 'use_checkpoint_attn' in self.residual_group2 as True. Could you provide solutions to solve it?
2022121123637_96838

Pre-trained SpyNet

Hi,

Thanks for sharing the impressive paper and the code. I am curious about the SpyNet during training. Did you use pre-trained SpyNet then enable the gradient to fine-tune the weights to fit your tasks? I am not sure if it is possible to train a SpyNet from scratch together with other modules in the network.

Many thanks!

Best regards

How to run inference on larger frames e.g. 360p?

Hola! Thanks for the great work with VRT. I wanted to know if you have any tips and recommendations to how we can run your evaluation code against our own higher resolution frames. It seems from my tests that anything above 180p just runs OOM in a K80 (12G) and a T4 (16G) regardless of the tile size that I use for all models (REDS, Vimeo, etc.). Do you have any advice? Thanks!

Test own data?

Hi, very impressed by the results. Is there a way to test my own data with the models?

Thank you.

Torch.distributed.elastic.multiprocessing.api.SignalException: Process XXXX got signal :1

Hello, thank you for the code.
I meet an error when I train with 005_train_vrt_videodeblurring_dvd.json

Fix keys: ['spynet', 'deform'] for the first 20000 iters. Fix keys: ['spynet', 'deform'] for the first 20000 iters. 22-09-01 02:31:11.512 : <epoch: 0, iter: 400, lr:4.000e-04> G_loss: 7.544e-02 22-09-01 02:48:36.264 : <epoch: 0, iter: 600, lr:4.000e-04> G_loss: 1.637e-02 22-09-01 03:06:01.631 : <epoch: 0, iter: 800, lr:4.000e-04> G_loss: 7.941e-02 WARNING:torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2704351 closing signal SIGHUP WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2704352 closing signal SIGHUP Traceback (most recent call last): File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module> main() File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/run.py", line 755, in run )(*cmd_args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2704341 got signal: 1

and I use the python=3.7.13, pytorch=1.12.1

Exporting to onnx model problem

When I tried to export the 'VRT' model to onnx, the process was stopped by this error:

Traceback (most recent call last):
File "main_test_vrt.py", line 424, in
main()
File "main_test_vrt.py", line 71, in main
torch.onnx.export(model, # model being run
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/init.py", line 350, in export
return utils.export(
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/utils.py", line 163, in export
_export(
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/utils.py", line 1074, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/utils.py", line 731, in _model_to_graph
graph = _optimize_graph(
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/utils.py", line 308, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/init.py", line 416, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/utils.py", line 1406, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 232, in wrapper
return fn(g, *args, **kwargs)
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/symbolic_opset11.py", line 244, in pixel_shuffle
return symbolic_helper._unimplemented("pixel_shuffle", "only support 4d input")
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 440, in _unimplemented
_onnx_unsupported(f"{op}, {msg}")
File "/home/lxp/anaconda3/envs/hh_vrt/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 444, in _onnx_unsupported
raise RuntimeError(
RuntimeError: Unsupported: ONNX export of operator pixel_shuffle, only support 4d input. Please feel free to request support or submit a pull request on PyTorch GitHub.

From the code I know we use pytorch 1.8.1 or higher version to support '5D PixelShuffle', however, ONNX only support 4d input.So, how can I fix this problem?
Any help would be appreciated!

Regarding license term

Hi, I would like to ask questions on license.

vrt training scripts in KAIR is under MIT license. This project is under non-commercial license.

If I use KAIR to training vrt and use the model, which license am I under?

Inference Taking Forever

I am trying to deblur a 150 frames video using a machine having two NVIDIA RTX A5000 GPUs using the the GoPro delur model and I reduced the tile value. But this operation is taking forever. How to solve this ? Is NVIDIA RTX A5000 enough to make ineferece ?

Few questions about paper 😸

According to the paper "The runtime is 2.2s per frame on 1280×720 blurred videos". What gpu are you guys used to measure runtime?
Also i have question about model size, Did you guys try smaller model sizes (popular in modern transformers something like VRT-S VRT-L with different parameter size etc.) or architecture is limited and don't coverage with custom sizes?

And ofc. congrats on cool paper 📦

Test in clips leads to block artifact

Thanks for sharing your impressive work!

I got a problems regards to the video deblur task. I noticed there are block artifacts in the GOPRO test result you released(e.g. 00000097_005_VRT_GoPro.png, 00000098_005_VRT_GoPro.png in GOPR0410_11_00 folder, 006_VideoDeblur_VRT_6frames_GoPro). This artifact is much worse when I run the pretrained model with my own data.

I think this might caused by testing frames in clips, so there is no information flowing between different clips. Do you have any idea on this?

Thanks for help.
00000097_005_VRT_GoPro
00000098_005_VRT_GoPro

Log Files from Training

Hello,

Thank you for your awesome code!

I am hoping you might open-source the log files you have from training. Maybe the training and validation loss as a function of epoch (and/or batch) with an estimate of the runtime?

deform_conv2d() takes from 3 to 7 positional arguments but 8 were given

command:python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --sigma 1 --folder_lq testsets/Set8 --folder_gt testsets/Set8 --tile 12 128 128 --tile_overlap 2 20 20

has error deform_conv2d() takes from 3 to 7 positional arguments but 8 were given

image

What may be the reason? Thanks!

Colab inference problem

Hi,
The colab demo crashes (^C exits) when running the inference tab using self-provided images/video.
I have tried .png sequence, .mp4 video neither worked.
The included (gt) samples work fine.

Checkpoint of Real-World SR in VRT

Thank you for sharing this repository! Nice work!
Do you plan to publish a checkpoint for Real-World Super-Resolution, similar to the SwinIR?

VRT 2x upscale

I was wondering if the authors have any suggestions for finetuning the VRT model to do a 2x upscale instead of a 4x upscale. I removed some layers from the Upsample module to support 2x upscale, however the forward/backward pass is consuming too much VRAM. Which layers do you suggest to remove from the model to reduce the model complexity and also achieve good results for a 2x upscale?

Currently, I have tried 2x upscale training with 1 GPU, batch size =1, low quality frames crop size = 64x64, and high quality frames crop size = 128x128. The maximum VRAM usage in the forward pass/backward pass is 23GB.

I tested video SR with SwinIR and VRT, but SwinIR perfroms better. Is it normal?

VRT testing command

CUDA_VISIBLE_DEVICES=9 \
python main_test_vrt.py --task 002_VRT_videosr_bi_REDS_16frames \
                        --folder_lq /home/liao/cjj/dataset/test/LR \
                        --folder_gt /home/liao/cjj/dataset/test/GT \
                        --tile 10 128 128 \
                        --tile_overlap 2 20 20

SwinIR model: 001_classicalSR_DIV2K_s48w8_SwinIR-M_x4

Video for testing: https://cowtransfer.com/s/1739646a86874e

Result

SwinIR          
  1 2 3 4 平均
PSNR 26.9603 31.9831 33.0922 33.2781 31.32843
SSIM 0.7353 0.9022 0.8842 0.9233 0.86125
VRT          
  1 2 3 4 平均
PSNR 26.7961 31.7153 30.7655 34.3461 30.90575
SSIM 0.7272 0.8931 0.8724 0.9385 0.8578

Train on own dataset

Hi, while I'm trying to test the model on my own dataset, but I'm still running into the same issue:

/usr/local/lib/python3.7/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/content/VRT/models/network_vrt.py:716: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats)
loading model from ./model_zoo/vrt/model_zoo/vrt/001_VRT_videosr_bi_REDS_6frames.pth
using dataset from testsets/uploaded
tcmalloc: large alloc 4187750400 bytes == 0x4d22e000 @ 0x7fcae3624b6b 0x7fcae3644379 0x7fca5f668d57 0x7fca5f656bc3 0x7fca8954c39f 0x7fca8954cd10 0x7fca8954cd64 0x7fca89a5dfff 0x7fca8a2ce89b 0x7fca8a01a223 0x7fca8a2a99bf 0x7fca8a057bf7 0x7fcab15aac30 0x7fca89a653b4 0x7fca8a51c515 0x7fca89dc49ce 0x7fca8a2a81a5 0x7fca89e0a372 0x7fcab128bad7 0x593784 0x548c51 0x51566f 0x593dd7 0x5118f8 0x593dd7 0x5118f8 0x549576 0x604173 0x5f5506 0x5f8c6c 0x5f9206
/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")

Any idea where the issue is coming from?

Colab path error

Hello !

I wanted to try VRT with the proposed Colab notebook with one of my own test video but while I've changed nothing to the notebook, and use the embedded upload function, the frames are generated without problem but then I'm getting this error when running the "default" inference command provided :
!python main_test_vrt.py --task 001_VRT_videosr_bi_REDS_6frames --folder_lq testsets/uploaded --tile 6 128 128 --tile_overlap 2 20 20

[...]
loading model from ./model_zoo/vrt/model_zoo/vrt/001_VRT_videosr_bi_REDS_6frames.pth
using dataset from testsets/uploaded/
Traceback (most recent call last):
  File "main_test_vrt.py", line 346, in <module>
    main()
  File "main_test_vrt.py", line 51, in main
    'sigma':args.sigma, 'num_frame':-1, 'cache_data': False})
  File "/content/VRT/data/dataset_video_test.py", line 349, in __init__
    img_paths_lq = sorted(list(utils_video.scandir(subfolder_lq, full_path=True)))
  File "/content/VRT/utils/utils_video.py", line 37, in _scandir
    for entry in os.scandir(dir_path):
NotADirectoryError: [Errno 20] Not a directory: 'testsets/uploaded/BI_deinterlaced_1000frames.mp4'

and I don't understand why, so hard to fix it by myself, any help would be appreciated !

why not patch?

Why don't you treat patch as a token to embedding but use the channel as the embedding dim

Same error, solution didn't work: RuntimeError expected input... to have 28 channels, but got 27 channels instead

I ran into the same error as #14 , and verified that self.nonblind_denoising was set to True here, but still receive the error:

line 585, in _conv_forward
    return F.conv3d(
RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 40, 128, 128] to have 28 channels, but got 27 channels instead

This is using the dataset VRT/testsets/REDS4/sharp_bicubic via the call python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --folder_lq testsets/REDS4/sharp_bicubic --tile 40 128 128 --tile_overlap 2 20 20. I ultimately want to run this on my own folder of PNGs from a video.

Memory consumption while training

Hi, congrats on this cool work!
I'm trying to train your model but I only have 2 A100 GPUs so the memory is limited, I wonder how much space do you need to train models like "003_VRT_videosr_bi_Vimeo_7frames.pth" and "006_VRT_videodeblurring_GoPro.pth"?

What's the resolution of videos when eval on GoPro dataset in your paper? And any advice about the resolution when doing video debluring?

In Table 2 of your paper, you report the result of your model on GoPro dataset. At what resolution did you get this result?

And can you give some advice about the resolution when doing video debluring?

From this issue #12 , it seems that this mode is hard to run at a high resolution.
And 180p is blurry in application, how can we apply this model on a higher resolution (eg.720p)
Is it possible for us to train this model on a lower resolution and inference on a higher resulution?

gopro_train_problems

when I use distributed train, find error : AttributeError: 'DistributedDataParallel' object has no attribute '_set_static_graph'.How to solve it problems?

Huggingface Spaces

Hi, would you be interested in sharing a web demo on Huggingface Spaces for VRT?

It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:

github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

Spaces is completely free, and I am to help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces

About the deblerring task

Hello, I am running the codes on google colab. But I have some troubles running them on the deblurring dataset.
I mean these 3 tasks.

005, video deblurring trained and tested on DVD

006, video deblurring trained and tested on GoPro

007, video deblurring trained on REDS, tested on REDS4

The error is as follows.

/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
^C

Would you please tell me how to solve it. Thanks a lot.

Your Results in New Super-Resolution Benchmarks

Hello,

MSU Graphics & Media Lab Video Group has recently launched two new Super-Resolution Benchmarks.

Your method achieved 16th place in Video Upscalers Benchmark: Quality Enhancement in 'LPIPS Animation 2x' category and 5th place in Super-Resolution for Video Compression Benchmark in 'VVC compression' category. We congratulate you on your result and look forward to your future work!

We would be grateful for your feedback on our work.

RuntimeError expected input... to have 28 channels, but got 27 channels instead

I am getting this error on my own test data (with task 008_VRT_videodenoising_DAVIS)

RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 32, 128, 128] to have 28 channels, but got 27 channels instead

Full stack:
File "C:\Dev\VRT\models\network_vrt.py", line 1395, in forward x = self.conv_first(x.transpose(1, 2)) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\conv.py", line 590, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\conv.py", line 585, in _conv_forward return F.conv3d( RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 32, 128, 128] to have 28 channels, but got 27 channels instead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.