zj-binxia / diffir Goto Github PK
View Code? Open in Web Editor NEWThis project is the official implementation of 'Diffir: Efficient diffusion model for image restoration', ICCV2023
This project is the official implementation of 'Diffir: Efficient diffusion model for image restoration', ICCV2023
Hi, I'm wondering these patch sizes are for transformer patch or data augmentation patch? what size of patch shall we choose if we want to training 256*256 data.
Where can dynamics be reflected?
$ sh test.sh
Traceback (most recent call last):
File "C:\DiffIR-master\DiffIR-RealSR\DiffIR\test.py", line 5, in <module>
import DiffIR.archs
ModuleNotFoundError: No module named 'DiffIR'
Has anyone encountered this problem? It seems that the dataset class cannot be registered when loading.
Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 121, in train_pipeline
result = create_train_val_dataloader(opt, logger)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 36, in create_train_val_dataloader
train_set = build_dataset(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/data/init.py", line 34, in build_dataset
dataset = DATASET_REGISTRY.get(dataset_opt['type'])(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/utils/registry.py", line 71, in get
raise KeyError(f"No object named '{name}' found in '{self._name}' registry!")
KeyError: "No object named 'DeblurPairedDataset' found in 'dataset' registry!"
Name DeblurPairedDataset is not found, use name: DeblurPairedDataset_basicsr!
Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 121, in train_pipeline
result = create_train_val_dataloader(opt, logger)
File "/mnt/sdc/bdc/DiffIR-master/DiffIR-demotionblur/DiffIR/train_pipeline.py", line 36, in create_train_val_dataloader
train_set = build_dataset(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/data/init.py", line 34, in build_dataset
dataset = DATASET_REGISTRY.get(dataset_opt['type'])(dataset_opt)
File "/home/bdc/anaconda3/envs/pytorch/lib/python3.7/site-packages/basicsr/utils/registry.py", line 71, in get
raise KeyError(f"No object named '{name}' found in '{self._name}' registry!")
KeyError: "No object named 'DeblurPairedDataset' found in 'dataset' registry!"
比如,某种退化比较强的退化先验某通道数值较大。又或者,手动指定推理时的退化先验能否起到作用?
I want to try finetuning the pretrained model to have a better performance on my privative data, but my single gpu is not suitable for the current finetuning program because of the distributed training mode. It's hard for me to deal with the problems on my own. So, how can I adjust my code to run the finetuning program? I would very much like to use DiffIR to solve problems in the field of my research area.
I'm trying to utilize the motion deblur work on Windows 11, however, it is not working. I tried to run the codes using WSL2 and Ubuntu 22.04 (using Oracle virtual machine) but I'm still facing several issues. The current main issue is related to importing DiffIR in other files, please see the following:
from DiffIR.train_pipeline import train_pipeline
DiffIR/train.py FAILED
Please note that I ran the ((bash pip.sh)) command before running one of the trainS1.sh or trainS2.sh scripts.
Is it related somehow to using WSL or VM based Ubuntu?
Because of the limitation of computing resources, I want to just use the pre-trained model and test on my privative datasheet, but I put the model in the ./experiment and modify the test_DiffIRS2_GAN_x4.yml, but when I run test.sh, it said
"Traceback (most recent call last):
File "D:\DL_ws\DiffIR-SRGAN\test.py", line 15, in
test_pipeline(root_path)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\test.py", line 40, in test_pipeline
model.validation(test_loader, current_iter=opt['name'], tb_logger=None, save_img=opt['val']['save_img'])
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\models\base_model.py", line 48, in validation
self.nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "D:\DL_ws\DiffIR-SRGAN\DiffIR\models\DiffIR_GAN_S2_model.py", line 74, in nondist_validation
super(DiffIRGANS2Model, self).nondist_validation(dataloader, current_iter, tb_logger, save_img)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\basicsr\models\sr_model.py", line 156, in nondist_validation
self.feed_data(val_data)
File "E:\Develop\Anaconda\envs\dl_envs\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\DL_ws\DiffIR-SRGAN\DiffIR\models\DiffIR_GAN_S2_model.py", line 65, in feed_data
self.gt = data['gt'].to(self.device)
~~~~^^^^^^
KeyError: 'gt'
"
But isn't it just use SingleImageDataset and just use LQ as the only input? How can I input ST properly?
Looking forward to reply.
您好,想问一下在gopro数据集上进行deblur训练时,第一阶段S1训练完成后,在gopro验证集上的PSNR大概是多少呢?
Thank you for your work.
I have a few questions about your paper (also related to DiffI2I) related to the memory consumption of the Stage 1 of your solution(s).
What kind/How many GPUS did you use in your experiments?
How many GB of GPU memory do you use at each step (per GPU if you distribute batches among several ones).
In the paper, you mention that the input of DIRformer for super resolution is 64x64 (and thus, as many tokens thanks to OverlapPatchEmbed, which seems reasonable) whereas the input is 256x256 for inpainting. Isn't the memory consumption 'exploding' with that many transformer tokens?
Thanks in advance.
在输入训练命令后,终端显示ModuleNotFoundError: No module named 'ldm
I want to use directly your pretrain model, you said in readme Download the pre-trained [model](https://drive.google.com/drive/folders/1JWYaP9VVPX_Mh2w1Vezn74hck-oWSyMh?usp=drive_link) and place it in ./experiments/
but there is no ./experiments folder.
Hello, thank you very much for your contribution! I encountered this issue during debugging and am unsure how to resolve it. I kindly request your guidance. Wishing you a joyful life and successful endeavors!
[main][CRITICAL] - Training failed due to [Errno 2] No such file or directory: '/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/models/lpips_models/vgg.pth':
Traceback (most recent call last):
File "bin/train.py", line 50, in main
training_model = make_training_model(config)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 25, in make_training_model
return cls(config, **kwargs)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/defaultS1.py", line 32, in init
super().init(*args, **kwargs)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/training/trainers/baseS1.py", line 79, in init
self.val_evaluator = make_evaluator(**self.config.evaluator)
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/init.py", line 16, in make_evaluator
metrics['lpips'] = LPIPSScore()
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/base_loss.py", line 109, in init
self.score = PerceptualLoss(model=model, net=net, model_path=model_path,
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/lpips.py", line 26, in init
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace,
File "/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/saicinpainting/evaluation/losses/lpips.py", line 294, in initialize
self.net.load_state_dict(torch.load(model_path, **kw), strict=False)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/liu/anaconda3/envs/DIFFIR/lib/python3.8/site-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/liu/ZZB/DiffIR-master/DiffIR-inpainting/models/lpips_models/vgg.pth'
If I want to change the sampling output, I should change the validation function right? But, I cannot find it, would you plz help me out? Thanks
Thank you for your excellent work.
为什么将GT和LQ图像拼接在一起后需要进行四倍下采样呢?
我想把该算法用在电影的画质增强上,目前模型输出图片的稳定性已经可以了,就是清晰度总觉得不够,有种朦胧的画面感,不知道有没有参数可以改善当前现状,我也在尝试用GAN方式进行训练,感觉对清晰度提升也不明显
Is it more stable to restrict the degenerate prior to the sphere?
Hello, I'm sorry to disturb your life. In DiffIR-inpainting with your own dataset, this error occurs. After trying, the problem was not found, and the following problems have been ruled out:
Dataloader
returned 0 length. Please make sure that it returns at least 1 batch:Dataloader
returned 0 length. Please make sure that it returns at least 1 batch')Dataloader
returned 0 length. Please make sure that it returns at least 1 batchHi, great work!
I‘m wondering what's the training costs (days, GPUs, GPU type that used) of DiffIR in evaluated tasks (inpainting, super-resolution, debluring)?
Thank you for your nice job in IR, i came across a problem when testing on CelebA dataset
[2023-10-31 16:09:00,318][main][CRITICAL] - Prediction failed due to [Errno 2] No such file or directory: '/mnt/bn/xiabinpaint/ICCV-Inpainting/code-final/DiffIR-inpainting-final/celeba-ta.pth':
Traceback (most recent call last):
File "/mnt/DiffIR/DiffIR-inpainting/predict.py", line 57, in main
model = load_checkpoint(train_config, checkpoint_path, strict=False, map_location='cpu')
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 29, in load_checkpoint
model: torch.nn.Module = make_training_model(train_config)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/init.py", line 25, in make_training_model
return cls(config, **kwargs)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/defaultS2.py", line 34, in init
super().init(*args, **kwargs)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/baseS2.py", line 81, in init
load_network(self.generator,generatorS2_path,strict=False)
File "/mnt/DiffIR/DiffIR-inpainting/saicinpainting/training/trainers/baseS2.py", line 31, in load_network
load_net = torch.load(load_path)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/bn/xiabinpaint/ICCV-Inpainting/code-final/DiffIR-inpainting-final/celeba-ta.pth' @Zj-BinXia
My model results in very poor PSNR and SSIM values whenever I use the diffusion model. Even after recovering more detail and better visualization.
I am trying to test DiffIR-SRGAN. I set the model as instructed in the README file, but encountered with the following error. Could you pls check the model compatibility? Thanks!
......
2023-09-06 08:03:51,010 INFO: Loading DiffIRS2 model from experiments/SISR-DiffIRS2-GAN.pth, with param key: [params_ema].
2023-09-06 08:03:51,364 INFO: Network [DiffIRS1] is created.
2023-09-06 08:03:53,088 INFO: Loading DiffIRS1 model from experiments/SISR-DiffIRS1.pth, with param key: [params_ema].
Traceback (most recent call last):
File "DiffIR/test.py", line 11, in <module>
test_pipeline(root_path)
File "/cluster/....../my_envs/diffir/lib64/python3.8/site-packages/basicsr/test.py", line 35, in test_pipeline
model = build_model(opt)
File "/cluster/....../my_envs/diffir/lib64/python3.8/site-packages/basicsr/models/__init__.py", line 26, in build_model
model = MODEL_REGISTRY.get(opt['model_type'])(opt)
File "/cluster/....../DiffIR/DiffIR-SRGAN/DiffIR/models/DiffIR_GAN_S2_model.py", line 43, in __init__
self.model_Eta = self.net_g_S1.module.E
File "/cluster/apps/nss/gcc-6.3.0/python_gpu/3.8.5/torch/nn/modules/module.py", line 778, in __getattr__
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'DiffIRS1' object has no attribute 'module'
Also, in requirements.txt, the requirement for torch is torch>=1.7
, but it seems that nn.PixelUnshuffle
is only availble for torch>=1.8
, see this.
Best regards.
Thanks for your nice work!
In the process of studying your paper, I have some doubts and I hope you can help me clarify them.
I am confused about "cancel using DM in DiffIRs2-V3 to obtain DiffIRs2-V1" in the subsection 6.(2), it means using another type of NN instead of reverse process or using the conditional vector D which is generated from CPENs2 as the IPR estimation directly?
For inpainting model, there's a 4 channels input, mask will be one of them and the output is 3 channels data. I'm wondering does this mask contribute to the network, or I can delete it and make input for 4 channels as well. Best regards, Thanks.
Hi, I just want to ask how much memory needed to train the model, thanks a lot!
Thank you for your excellent work!
When trying the upsampling factor is 2 and loading the pre-trained model, I met the following problem.
Traceback (most recent call last): File "DiffIR/train.py", line 15, in <module> train_pipeline(root_path) File "/DiffIR/DiffIR-SRGAN/DiffIR/train_pipeline.py", line 185, in train_pipeline model.optimize_parameters(current_iter) File "/DiffIR/DiffIR-SRGAN/DiffIR/models/DiffIR_GAN_S2_model.py", line 116, in optimize_parameters _, S1_IPR = self.model_Es1(self.lq,self.gt)
I would appreciate it if you could kindly give me some advice about this.
Thanks in advance.
Thank you for your open-source code which contributes to the community!
I have a question about the IPRs. The DIRformer you proposed in your paper can be considered as an auto-encoder which takes IPRs and LR images as letent embeddings. During inference, the IPRs are actually generated from your DM conditioned on LR images.
However, it is a consensus that the latent spaces of auto-encoders are discrete. But IPRs (as one part of latent embeddings) are generated from DMs, meaning that the IPRs will be continuous. In order to address the problem of discrete latent spaces, a common method is to adopt a KL divergence of latent embeddings so that the latent embeddings can be continuous, as VAEs have done. However, it seems that you did not constrain the space of IPR. So, how can the generated IPRs from DMs, which are continuous, be leveraged as latent embeddings of the DIRformer whose latent space is discrete? How does it work?
Looking forward to your reply! Thank you again for your excellent work.
Hi, I try to reproduce your motion deblur training setting by only a single V100 GPU(32GB), but gpu is out of memory. I check previous issues saw that you are training on 8 V100 GPUs, I want to know whether the setting is wrong or not? Looking forward to your reply!
here are my environment:
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
absl-py 1.4.0 pypi_0 pypi
addict 2.4.0 pypi_0 pypi
asttokens 2.2.1 pypi_0 pypi
backcall 0.2.0 pypi_0 pypi
basicsr 1.4.2 pypi_0 pypi
boto3 1.17.112 pypi_0 pypi
botocore 1.20.112 pypi_0 pypi
ca-certificates 2023.01.10 h06a4308_0 defaults
cachetools 5.3.1 pypi_0 pypi
certifi 2023.7.22 pypi_0 pypi
charset-normalizer 3.2.0 pypi_0 pypi
click 8.1.6 pypi_0 pypi
cmake 3.27.0 pypi_0 pypi
contourpy 1.1.0 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
cython 0.29.30 pypi_0 pypi
debugpy 1.6.7 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
einops 0.7.0 pypi_0 pypi
entrypoints 0.4 pypi_0 pypi
executing 1.2.0 pypi_0 pypi
facexlib 0.3.0 pypi_0 pypi
filelock 3.12.2 pypi_0 pypi
filterpy 1.4.5 pypi_0 pypi
flask 2.0.1 pypi_0 pypi
fonttools 4.41.1 pypi_0 pypi
future 1.0.0 pypi_0 pypi
fvcore 0.1.5.post20221221 pypi_0 pypi
gfpgan 1.3.8 pypi_0 pypi
google-auth 2.22.0 pypi_0 pypi
google-auth-oauthlib 1.0.0 pypi_0 pypi
grpcio 1.56.2 pypi_0 pypi
gunicorn 20.1.0 pypi_0 pypi
idna 3.4 pypi_0 pypi
imageio 2.34.0 pypi_0 pypi
importlib-metadata 6.8.0 pypi_0 pypi
iopath 0.1.10 pypi_0 pypi
ipykernel 6.7.0 pypi_0 pypi
ipython 8.14.0 pypi_0 pypi
itsdangerous 2.1.2 pypi_0 pypi
jedi 0.18.2 pypi_0 pypi
jinja2 3.0.1 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
joblib 1.3.1 pypi_0 pypi
jupyter-client 7.4.9 pypi_0 pypi
jupyter-core 5.3.0 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
lazy-import 0.2.2 pypi_0 pypi
lazy-loader 0.3 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 defaults
libffi 3.3 he6710b0_2 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
lit 16.0.6 pypi_0 pypi
llvmlite 0.42.0 pypi_0 pypi
lmdb 1.4.1 pypi_0 pypi
lxml 4.9.1 pypi_0 pypi
ma-tensorboard 1.0.0 pypi_0 pypi
markdown 3.4.4 pypi_0 pypi
markupsafe 2.1.3 pypi_0 pypi
matplotlib 3.6.1 pypi_0 pypi
matplotlib-inline 0.1.6 pypi_0 pypi
mmcv 1.7.1 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
natsort 8.4.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nest-asyncio 1.5.6 pypi_0 pypi
networkx 2.6.3 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numba 0.59.1 pypi_0 pypi
numpy 1.21.5 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
opencv-python 4.7.0.72 pypi_0 pypi
openssl 1.1.1t h7f8727e_0 defaults
packaging 23.1 pypi_0 pypi
pandas 1.5.1 pypi_0 pypi
parso 0.8.3 pypi_0 pypi
pathlib2 2.3.7.post1 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 22.0.4 pypi_0 pypi
platformdirs 3.5.1 pypi_0 pypi
portalocker 2.8.2 pypi_0 pypi
prompt-toolkit 3.0.38 pypi_0 pypi
protobuf 3.20.1 pypi_0 pypi
psutil 5.9.4 pypi_0 pypi
ptflops 0.7.2.2 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
pure-eval 0.2.2 pypi_0 pypi
pyasn1 0.5.0 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pygments 2.15.1 pypi_0 pypi
pyparsing 3.1.0 pypi_0 pypi
pytest-runner 5.3.0 pypi_0 pypi
python 3.9.11 h12debd9_2 defaults
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
pyzmq 25.1.0 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
realesrgan 0.2.5.0 dev_0
requests 2.28.2 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
s3transfer 0.4.2 pypi_0 pypi
scikit-image 0.22.0 pypi_0 pypi
scikit-learn 1.1.3 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
setuptools 67.8.0 py39h06a4308_0 defaults
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 defaults
stack-data 0.6.2 pypi_0 pypi
sympy 1.12 pypi_0 pypi
tb-nightly 2.17.0a20240401 pypi_0 pypi
tensorboard 2.12.2 pypi_0 pypi
tensorboard-data-server 0.7.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
threadpoolctl 3.2.0 pypi_0 pypi
tifffile 2024.2.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
tomli 2.0.1 pypi_0 pypi
torch 1.13.1 pypi_0 pypi
torchaudio 0.13.1 pypi_0 pypi
torchvision 0.14.1 pypi_0 pypi
tornado 6.3.2 pypi_0 pypi
tqdm 4.65.0 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.6.3 pypi_0 pypi
tzdata 2023c h04d1e81_0 defaults
urllib3 1.26.16 pypi_0 pypi
wcwidth 0.2.6 pypi_0 pypi
werkzeug 2.3.6 pypi_0 pypi
wheel 0.38.4 py39h06a4308_0 defaults
xz 5.4.2 h5eee18b_0 defaults
yacs 0.1.8 pypi_0 pypi
yapf 0.40.1 pypi_0 pypi
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 defaults
我觉得如果不用diffusion和danoise,而是使用您的KDSR中的KD约束或者更简单的linear+l1约束,是否也能起到差不都的效果。DiffIR中的diffusion究竟起到了多大的作用?谢谢!
I think if you don't use diffusion and denoise, but use the KD constraint in your KDSR or the simpler linear+l1 constraint, will it have similar effects? What role does diffusion in DiffIR play? Thanks!
Hi,
In DiffIR-SRGAN, there is no requirements.txt. Please check it.
Thanks!
The teacher's input include the gt.
and the student's input only the lr
no stage one , online distilling
hi, can u release the x1 real-sr model? thanks
感谢作者的分享,看了看代码,是不是缺失basicsr文件呢?另外几个任务的模型文件会上传吗?
Hello, due to limited hardware resources, I trained the deblur network in the S1 stage using two 3090 GPUs. To avoid running out of memory, I reduced the batch_size_pergpu to 2. However, during the training process, the PSNR only reached around 26 for the first 20,000 iterations, but then the loss sharply increased and the PSNR dropped to around 5. Could you please help me understand the possible reasons for this issue? I would greatly appreciate your response!
RT
Thank you for your contribution to the community.
I have a question about the sampler you employed in your work. You mentioned that you used the original DDPM sampler but set the variance to 0 to achieve better performance. However, such a setting is not theoretically equal to the original sampler. DDIM [1] is a commonly used diffusion sampler w/o variance and it can be proven that it has the same optimization target as DDPM. So, why do you choose to employ DDPM w/o variance instead of DDIM? Does this choice have any theoretical reasons? Or just an empirical choice?
Thank you again for your excellent work. Looking forward to your reply.
[1] Song, Jiaming, Chenlin Meng, and Stefano Ermon. "Denoising Diffusion Implicit Models", ICLR 2021.
Hello, I have read your work and was deeply inspired by it, really great work! I am anticipating the open-sourcing
I want to try finetuning the pretrained inpainting model to have a better performance on my privative data, but I have no idea how to load the pretrained model. I would be very grateful if you could provide some guidance.
On which dataset were deblur checkpoints trained?
训练RealSR到一半,到5000iter时开始执行val,然后报错,
[E ProcessGroupNCCL.cpp:566] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1803936 millisecond
s before timing out.
Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/data/sjq/SR/DiffIR/DiffIR-RealSR/DiffIR/train_pipeline.py", line 185, in train_pipeline
model.optimize_parameters(current_iter)
File "/data/sjq/SR/DiffIR/DiffIR-RealSR/DiffIR/models/DiffIR_S1_model.py", line 329, in optimize_parameters
self.log_dict = self.reduce_loss_dict(loss_dict)
File "/home/ikun/miniconda3/envs/diffir/lib/python3.8/site-packages/basicsr/models/base_model.py", line 371, in reduce_loss_dict
torch.distributed.reduce(losses, dst=0)
File "/home/ikun/miniconda3/envs/diffir/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1383, in reduce
work = default_pg.reduce([tensor], opts)
RuntimeError: NCCL communicator was aborted on rank 1.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 994987) of binary: /home/ikun/miniconda3/envs/diffir/bin/python3\
我是用两张4090训练的,是不是显存不够导致超时了?
Thank you so much for your contirbution to the community! Your excellent work really benifits me in my research.
I have a question about the diffusion model you employed in your paper. You mentioned that your model was built to "predict noise" in Sec. 4.2, page 4 and trained by the L1 loss of Z (Eqn. 13 in your paper). However, the code implementation is actually "x0 predition" (https://github.com/Zj-BinXia/DiffIR/blob/3d8d677a77588666618c139b1693634fe5eb638f/DiffIR-RealSR/ldm/ddpm.py#L43C16-L43C16). Did I misunderstand anything?
Really looking forward to your reply. Thank you again.
File "DiffIR/train.py", line 12, in
train_pipeline(root_path)
File "D:\project\Super_resolution\DiffIR-SRGAN\DiffIR\train_pipeline.py", line 126, in train_pipeline
model = build_model(opt)
File "D:\Users\15822\Anaconda3\envs\pytorch\lib\site-packages\basicsr\models_init_.py", line 26, in build_model
model = MODEL_REGISTRY.get(opt['model_type'])(opt)
File "D:\Users\15822\Anaconda3\envs\pytorch\lib\site-packages\basicsr\utils\registry.py", line 71, in get
raise KeyError(f"No object named '{name}' found in '{self._name}' registry!")
KeyError: "No object named 'DiffIRS1Model' found in 'model' registry!"
Traceback (most recent call last):
File "DiffIR/train.py", line 15, in
train_pipeline(root_path)
File "/DiffIR/DiffIR-RealSR/DiffIR/train_pipeline.py", line 185, in train_pipeline
model.optimize_parameters(current_iter)
File "/DiffIR/DiffIR-RealSR/DiffIR/models/DiffIR_S2_model.py", line 408, in optimize_parameters
_, pred_IPR_list = self.net_g.module.diffusion(self.lq,S1_IPR[0])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DiffIRS2' object has no attribute 'module'
多卡训练时会有module这个属性,是必须使用多卡训练吗?
the inference inference_diffif.py needs a basic folder named basicsr
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.