zj-binxia / diffir Goto Github PK

View Code? Open in Web Editor NEW

369.0 5.0 16.0 14.54 MB

This project is the official implementation of 'Diffir: Efficient diffusion model for image restoration', ICCV2023

Python 26.71% Shell 0.51% Jupyter Notebook 72.73% Dockerfile 0.02% MATLAB 0.02%

deblurring diffusion-model inpainting super-resolution

diffir's People

Contributors

Stargazers

Watchers

Forkers

ip-restoration hitmingzhiwang mornydew cxy0802 sheffieldcao mr-nobody-dey hanzc989 zhiqianghu2021 dearborn-open-ai wendashi yaelportnoy zhehangqiu

diffir's Issues

想问一下，为什么不能直接将Z和D用来调制bakcbone，而是要分别添加扩散和去噪过程？

我觉得如果不用diffusion和danoise，而是使用您的KDSR中的KD约束或者更简单的linear+l1约束，是否也能起到差不都的效果。DiffIR中的diffusion究竟起到了多大的作用？谢谢！

I think if you don't use diffusion and denoise, but use the KD constraint in your KDSR or the simpler linear+l1 constraint, will it have similar effects? What role does diffusion in DiffIR play? Thanks!

RealSR has no folder basicsr

the inference inference_diffif.py needs a basic folder named basicsr

Only Test in SRGAN

Because of the limitation of computing resources, I want to just use the pre-trained model and test on my privative datasheet, but I put the model in the ./experiment and modify the test_DiffIRS2_GAN_x4.yml, but when I run test.sh, it said
"Traceback (most recent call last):
File "D:\DL_ws\DiffIR-SRGAN\test.py", line 15, in
test_pipeline(root_path)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\test.py", line 40, in test_pipeline
model.validation(test_loader, current_iter=opt['name'], tb_logger=None, save_img=opt['val']['save_img'])
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\models\base_model.py", line 48, in validation
self.nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "D:\DL_ws\DiffIR-SRGAN\DiffIR\models\DiffIR_GAN_S2_model.py", line 74, in nondist_validation
super(DiffIRGANS2Model, self).nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\models\sr_model.py", line 156, in nondist_validation
self.feed_data(val_data)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\DL_ws\DiffIR-SRGAN\DiffIR\models\DiffIR_GAN_S2_model.py", line 65, in feed_data
self.gt = data['gt'].to(self.device)
~~~~^^^^^^
KeyError: 'gt'
"
But isn't it just use SingleImageDataset and just use LQ as the only input? How can I input ST properly?
Looking forward to reply.

deblur第一阶段训练结果

您好，想问一下在gopro数据集上进行deblur训练时，第一阶段S1训练完成后，在gopro验证集上的PSNR大概是多少呢？

如何提升清晰度

我想把该算法用在电影的画质增强上，目前模型输出图片的稳定性已经可以了，就是清晰度总觉得不够，有种朦胧的画面感，不知道有没有参数可以改善当前现状，我也在尝试用GAN方式进行训练，感觉对清晰度提升也不明显

Question about space of IPR

Thank you for your open-source code which contributes to the community!

I have a question about the IPRs. The DIRformer you proposed in your paper can be considered as an auto-encoder which takes IPRs and LR images as letent embeddings. During inference, the IPRs are actually generated from your DM conditioned on LR images.

However, it is a consensus that the latent spaces of auto-encoders are discrete. But IPRs (as one part of latent embeddings) are generated from DMs, meaning that the IPRs will be continuous. In order to address the problem of discrete latent spaces, a common method is to adopt a KL divergence of latent embeddings so that the latent embeddings can be continuous, as VAEs have done. However, it seems that you did not constrain the space of IPR. So, how can the generated IPRs from DMs, which are continuous, be leveraged as latent embeddings of the DIRformer whose latent space is discrete? How does it work?

Looking forward to your reply! Thank you again for your excellent work.

x1 model

hi, can u release the x1 real-sr model? thanks

单卡训练报错

Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/DiffIR/DiffIR-RealSR/DiffIR/train_pipeline.py", line 185, in train_pipeline
model.optimize_parameters(current_iter)
File "/DiffIR/DiffIR-RealSR/DiffIR/models/DiffIR_S2_model.py", line 408, in optimize_parameters
_, pred_IPR_list = self.net_g.module.diffusion(self.lq,S1_IPR[0])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DiffIRS2' object has no attribute 'module'

多卡训练时会有module这个属性，是必须使用多卡训练吗？

ValueError: `Dataloader` returned 0 length. Please make sure that it returns at least 1 batch

Hello, I'm sorry to disturb your life. In DiffIR-inpainting with your own dataset, this error occurs. After trying, the problem was not found, and the following problems have been ruled out:

There is no problem with the file path or format；
There is no problem with the content of the dataset,；
The problem of the size of the dataset is not very certain.
Please do not hesitate to advise！thanks
[saicinpainting.training.trainers.baseS1][INFO] - BaseInpaintingTrainingModule init done
[root][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[INFO] - Make val dataloader default from /home/liu/ZZB/DH_2700//Val/Val_GT
[main][CRITICAL] - Training failed due to Dataloader returned 0 length. Please make sure that it returns at least 1 batch:
Traceback (most recent call last):
File "bin/train.py", line 63, in main
trainer.fit(training_model)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
self.accelerator.start_training(self)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
self._results = trainer.run_train()
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 607, in run_train
self.run_sanity_check(self.lightning_module)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 854, in run_sanity_check
self.reset_val_dataloader(ref_model)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 364, in reset_val_dataloader
self.num_val_batches, self.val_dataloaders = self._reset_eval_dataloader(model, 'val')
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 325, in _reset_eval_dataloader
num_batches = len(dataloader) if has_len(dataloader) else float('inf')
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py", line 33, in has_len
raise ValueError('Dataloader returned 0 length. Please make sure that it returns at least 1 batch')
ValueError: Dataloader returned 0 length. Please make sure that it returns at least 1 batch

'DiffIRGANS2Model' object has no attribute 'model_Es1'

Thank you for your excellent work!

When trying the upsampling factor is 2 and loading the pre-trained model, I met the following problem.
Traceback (most recent call last): File "DiffIR/train.py", line 15, in <module> train_pipeline(root_path) File "/DiffIR/DiffIR-SRGAN/DiffIR/train_pipeline.py", line 185, in train_pipeline model.optimize_parameters(current_iter) File "/DiffIR/DiffIR-SRGAN/DiffIR/models/DiffIR_GAN_S2_model.py", line 116, in optimize_parameters _, S1_IPR = self.model_Es1(self.lq,self.gt)
I would appreciate it if you could kindly give me some advice about this.

Thanks in advance.

I found the cpen is "degrade aware"

我发现cpen的degrade aware很强

basicsr 文件缺失？

感谢作者的分享，看了看代码，是不是缺失basicsr文件呢？另外几个任务的模型文件会上传吗？

not found ./experiments folder

I want to use directly your pretrain model, you said in readme Download the pre-trained [model](https://drive.google.com/drive/folders/1JWYaP9VVPX_Mh2w1Vezn74hck-oWSyMh?usp=drive_link) and place it in ./experiments/ but there is no ./experiments folder.

Control/Keep PSNR/SSIM values

My model results in very poor PSNR and SSIM values whenever I use the diffusion model. Even after recovering more detail and better visualization.

Question about leveraging "DDPM w/o variance" instead of DDIM.

Thank you for your contribution to the community.

I have a question about the sampler you employed in your work. You mentioned that you used the original DDPM sampler but set the variance to 0 to achieve better performance. However, such a setting is not theoretically equal to the original sampler. DDIM [1] is a commonly used diffusion sampler w/o variance and it can be proven that it has the same optimization target as DDPM. So, why do you choose to employ DDPM w/o variance instead of DDIM? Does this choice have any theoretical reasons? Or just an empirical choice?

Thank you again for your excellent work. Looking forward to your reply.

[1] Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising Diffusion Implicit Models", ICLR 2021.

关于测试数据的设置问题

如果我想用其他数据集来测试比如一些LMDB类型的数据来测试或者训练我该如何修改数据集的参数设置呢？

退化先验的通道响应和退化类型是否有关系？

比如，某种退化比较强的退化先验某通道数值较大。又或者，手动指定推理时的退化先验能否起到作用？

Data Class Loading Issue

Has anyone encountered this problem? It seems that the dataset class cannot be registered when loading.

Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 121, in train_pipeline
result = create_train_val_dataloader(opt, logger)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 36, in create_train_val_dataloader
train_set = build_dataset(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/data/init.py", line 34, in build_dataset
dataset = DATASET_REGISTRY.get(dataset_opt['type'])(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/utils/registry.py", line 71, in get
raise KeyError(f"No object named '{name}' found in '{self._name}' registry!")
KeyError: "No object named 'DeblurPairedDataset' found in 'dataset' registry!"
Name DeblurPairedDataset is not found, use name: DeblurPairedDataset_basicsr!
Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 121, in train_pipeline
result = create_train_val_dataloader(opt, logger)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 36, in create_train_val_dataloader
train_set = build_dataset(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/data/init.py", line 34, in build_dataset
dataset = DATASET_REGISTRY.get(dataset_opt['type'])(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/utils/registry.py", line 71, in get
raise KeyError(f"No object named '{name}' found in '{self._name}' registry!")
KeyError: "No object named 'DeblurPairedDataset' found in 'dataset' registry!"

really great work!

Hello, I have read your work and was deeply inspired by it, really great work! I am anticipating the open-sourcing

During the deblur reproduction, the trainS1 does not converge.

Hello, due to limited hardware resources, I trained the deblur network in the S1 stage using two 3090 GPUs. To avoid running out of memory, I reduced the batch_size_pergpu to 2. However, during the training process, the PSNR only reached around 26 for the first 20,000 iterations, but then the loss sharply increased and the PSNR dropped to around 5. Could you please help me understand the possible reasons for this issue? I would greatly appreciate your response!

Why not normalize the output of the cpen?

Is it more stable to restrict the degenerate prior to the sphere?

Training costs of DiffIR

Hi, great work!

I‘m wondering what's the training costs (days, GPUs, GPU type that used) of DiffIR in evaluated tasks (inpainting, super-resolution, debluring)?

No requirements.txt

Hi,

In DiffIR-SRGAN, there is no requirements.txt. Please check it.

Thanks!

ModuleNotFoundError: No module named 'DiffIR'

$ sh test.sh
Traceback (most recent call last):
  File "C:\DiffIR-master\DiffIR-RealSR\DiffIR\test.py", line 5, in <module>
    import DiffIR.archs
ModuleNotFoundError: No module named 'DiffIR'

The proposed DGTA and DGFN are different from Restormer's MDTA and GDFN?

Where can dynamics be reflected?

Questions about memory consumption.

Thank you for your work.

I have a few questions about your paper (also related to DiffI2I) related to the memory consumption of the Stage 1 of your solution(s).

What kind/How many GPUS did you use in your experiments?
How many GB of GPU memory do you use at each step (per GPU if you distribute batches among several ones).
In the paper, you mention that the input of DIRformer for super resolution is 64x64 (and thus, as many tokens thanks to OverlapPatchEmbed, which seems reasonable) whereas the input is 256x256 for inpainting. Isn't the memory consumption 'exploding' with that many transformer tokens?

Thanks in advance.

The questions about deblur pretrained checkpoints.

On which dataset were deblur checkpoints trained?

FileNotFoundError: [Errno 2] No such file or directory: '/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/models/lpips_models/vgg.pth'

Hello, thank you very much for your contribution! I encountered this issue during debugging and am unsure how to resolve it. I kindly request your guidance. Wishing you a joyful life and successful endeavors!
[main][CRITICAL] - Training failed due to [Errno 2] No such file or directory: '/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/models/lpips_models/vgg.pth':
Traceback (most recent call last):
File "bin/train.py", line 50, in main
training_model = make_training_model(config)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 25, in make_training_model
return cls(config, **kwargs)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/defaultS1.py", line 32, in init
super().init(*args, **kwargs)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/baseS1.py", line 79, in init
self.val_evaluator = make_evaluator(**self.config.evaluator)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/init.py", line 16, in make_evaluator
metrics['lpips'] = LPIPSScore()
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/base_loss.py", line 109, in init
self.score = PerceptualLoss(model=model, net=net, model_path=model_path,
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/lpips.py", line 26, in init
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace,
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/lpips.py", line 294, in initialize
self.net.load_state_dict(torch.load(model_path, **kw), strict=False)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/models/lpips_models/vgg.pth'

question about the test code

Thank you for your nice job in IR, i came across a problem when testing on CelebA dataset

[2023-10-31 16:09:00,318][main][CRITICAL] - Prediction failed due to [Errno 2] No such file or directory: '/mnt/bn/xiabinpaint/ICCV-Inpainting/code-final/DiffIR-inpainting-final/celeba-ta.pth':
Traceback (most recent call last):
File "/mnt/DiffIR/DiffIR-inpainting/predict.py", line 57, in main
model = load_checkpoint(train_config, checkpoint_path, strict=False, map_location='cpu')
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 29, in load_checkpoint
model: torch.nn.Module = make_training_model(train_config)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 25, in make_training_model
return cls(config, **kwargs)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/defaultS2.py", line 34, in init
super().init(*args, **kwargs)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/baseS2.py", line 81, in init
load_network(self.generator,generatorS2_path,strict=False)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/baseS2.py", line 31, in load_network
load_net = torch.load(load_path)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/bn/xiabinpaint/ICCV-Inpainting/code-final/DiffIR-inpainting-final/celeba-ta.pth' @Zj-BinXia

The provided model is not compatible with the code

I am trying to test DiffIR-SRGAN. I set the model as instructed in the README file, but encountered with the following error. Could you pls check the model compatibility? Thanks!

......
2023-09-06 08:03:51,010 INFO: Loading DiffIRS2 model from experiments/SISR-DiffIRS2-GAN.pth, with param key: [params_ema].
2023-09-06 08:03:51,364 INFO: Network [DiffIRS1] is created.
2023-09-06 08:03:53,088 INFO: Loading DiffIRS1 model from experiments/SISR-DiffIRS1.pth, with param key: [params_ema].
Traceback (most recent call last):
  File "DiffIR/test.py", line 11, in <module>
    test_pipeline(root_path)
  File "/cluster/....../my_envs/diffir/lib64/python3.8/site-packages/basicsr/test.py", line 35, in test_pipeline
    model = build_model(opt)
  File "/cluster/....../my_envs/diffir/lib64/python3.8/site-packages/basicsr/models/__init__.py", line 26, in build_model
    model = MODEL_REGISTRY.get(opt['model_type'])(opt)
  File "/cluster/....../DiffIR/DiffIR-SRGAN/DiffIR/models/DiffIR_GAN_S2_model.py", line 43, in __init__
    self.model_Eta = self.net_g_S1.module.E
  File "/cluster/apps/nss/gcc-6.3.0/python_gpu/3.8.5/torch/nn/modules/module.py", line 778, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'DiffIRS1' object has no attribute 'module'

Also, in requirements.txt, the requirement for torch is torch>=1.7, but it seems that nn.PixelUnshuffle is only availble for torch>=1.8, see this.

Best regards.

Finetuning the RealSR pretrained model

I want to try finetuning the pretrained model to have a better performance on my privative data, but my single gpu is not suitable for the current finetuning program because of the distributed training mode. It's hard for me to deal with the problems on my own. So, how can I adjust my code to run the finetuning program? I would very much like to use DiffIR to solve problems in the field of my research area.

Question about x0-prediction and noise-prediction

Thank you so much for your contirbution to the community! Your excellent work really benifits me in my research.

I have a question about the diffusion model you employed in your paper. You mentioned that your model was built to "predict noise" in Sec. 4.2, page 4 and trained by the L1 loss of Z (Eqn. 13 in your paper). However, the code implementation is actually "x0 predition" (https://github.com/Zj-BinXia/DiffIR/blob/3d8d677a77588666618c139b1693634fe5eb638f/DiffIR-RealSR/ldm/ddpm.py#L43C16-L43C16). Did I misunderstand anything?

Really looking forward to your reply. Thank you again.

What time to release the DiffI2I

Does this project work only for Linux devices only?

I'm trying to utilize the motion deblur work on Windows 11, however, it is not working. I tried to run the codes using WSL2 and Ubuntu 22.04 (using Oracle virtual machine) but I'm still facing several issues. The current main issue is related to importing DiffIR in other files, please see the following:

from DiffIR.train_pipeline import train_pipeline

ModuleNotFoundError: No module named 'DiffIR'
[2024-01-10 14:41:03,177] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 992) of binary: /home/yaman/anaconda3/bin/python3
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 196, in
main()
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 192, in main
launch(args)
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/launch.py", line 177, in launch
run(args)
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yaman/anaconda3/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

DiffIR/train.py FAILED

Please note that I ran the ((bash pip.sh)) command before running one of the trainS1.sh or trainS2.sh scripts.

Is it related somehow to using WSL or VM based Ubuntu?