Coder Social home page Coder Social logo

algolzw / daclip-uir Goto Github PK

View Code? Open in Web Editor NEW
555.0 9.0 28.0 17.44 MB

[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.

Home Page: https://algolzw.github.io/daclip-uir

License: MIT License

Shell 0.28% Python 99.67% Makefile 0.05%
diffusion-models image-restoration prompt vision-language face-inpainting image-deblurring image-dehazing image-denoising image-deraining image-desnowing

daclip-uir's Issues

Cannot install on WSL2

On Windows WSL2, I follow these instructions: https://github.com/Algolzw/daclip-uir#install
First 3 steps are working as expected

python3 -m venv .env
source .env/bin/activate
pip install -U pip

When I run the 4th step, pip install -r requirements.txt, I get this error -
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

How to use latest robust model

Hi, thank you very much for your research it is very interesting.
i was wondering how to use the latest model as the old code doesn't work with it

checkpoint = 'pretrained/daclip_ViT-B-32_mix.pt'
model, preprocess = open_clip.create_model_from_pretrained('daclip_ViT-B-32', pretrained=checkpoint)

it cannot load the model because it gives an error
thank you!

how to use cpu/mps?

Hey there! I'm on Apple M2 Mac.
How can I use mps/cpu device only?

When I'm running example, getting this error:

raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Dependencies / Requirements / Windows

Hi... Firstly, it's a fantastic work and impressive results! The google colab gradio is very easy to use, thank you!

I'd be very grateful to know, as I'm on windows if you've any plans to slim down the dependencies a little bit? It's easy enough to install cuda toolkit separately, drop triton and remove a few version requirements, but, if you could slim it down a bit I for one would be very grateful!

Not able to train daclip in my dateset

This is a very outstanding job!!
whenI use 256 * 256*3 images for training daclip,the following issues will occur

File "main.py", line 495, in
main(sys.argv[1:])
File "main.py", line 423, in main
train_one_epoch(model, data, loss, epoch, optimizer, scaler, scheduler, dist_model, args, tb_writer=writer)
File "/data_160TB/2022/panxudong/code/daclip-uir-main/da-clip/src/training/train.py", line 106, in train_one_epoch
losses = loss(**model_out, output_dict=True)
File "/data_160TB/2022/panxudong/.conda/envs/py8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data_160TB/2022/panxudong/code/daclip-uir-main/da-clip/src/open_clip/loss.py", line 190, in forward
clip_loss = super().forward(image_features, text_features, logit_scale)
File "/data_160TB/2022/panxudong/code/daclip-uir-main/da-clip/src/open_clip/loss.py", line 122, in forward
logits_per_image, logits_per_text = self.get_logits(image_features, text_features, logit_scale)
File "/data_160TB/2022/panxudong/code/daclip-uir-main/da-clip/src/open_clip/loss.py", line 115, in get_logits
logits_per_image = logit_scale * image_features @ text_features.T
RuntimeError: The size of tensor a (4) must match the size of tensor b (512) at non-singleton dimension 1

size mismatch between the provided pretrained model and the current model

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for ConditionalUNet:
size mismatch for downs.3.3.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for ups.0.0.mlp.1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([512, 256]).
size mismatch for ups.0.0.mlp.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for ups.0.0.block1.proj.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for ups.0.0.block2.proj.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for ups.0.0.res_conv.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for ups.0.1.mlp.1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([512, 256]).
size mismatch for ups.0.1.mlp.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for ups.0.1.block1.proj.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for ups.0.1.block2.proj.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for ups.0.1.res_conv.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for ups.0.2.fn.fn.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for ups.0.2.fn.fn.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for ups.0.2.fn.fn.proj_in.weight: copying a param with shape torch.Size([512, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]
size mismatch for ups.0.2.fn.fn.proj_in.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for ups.0.2.fn.fn.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Siz6, 256]).
size mismatch for ups.0.2.fn.fn.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Siz6, 256]).
size mismatch for ups.0.2.fn.fn.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Siz6, 256]).
size mismatch for ups.0.2.fn.fn.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch([256, 256]).

NAFNet和DA-CLIP的结合和论文内容的一些问题

你好,我读了论文 就如何将NAFNet和DA-CLIP结合和论文内容有些问题:
1,看了前面的issues的回答 找到了codes/config/deraining/models/modules/DenoisingNAFNet_arch.py。但是在ConditionalNAFNet类里面没看到使用image_context或是degra_context的地方,还是不太懂是怎么结合的。
2, 论文里的对特定的修复任务的实验有使用prompt embedding模块么?还是只使用了cross-attention?

非常感谢!

csv

(daclip) root@autodl-container-fe1f429743-4d498b12:~# python /root/autodl-fs/daclip-uir-main/scripts/generate_captions.py
Loading caption model blip-large...
Loading CLIP model ViT-L-14/openai...
Traceback (most recent call last):
File "/root/autodl-fs/daclip-uir-main/scripts/generate_captions.py", line 72, in
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai",
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py", line 72, in init
self.load_clip_model()
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py", line 106, in load_clip_model
self.clip_model, _, self.clip_preprocess = open_clip.create_model_and_transforms(
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/open_clip/factory.py", line 384, in create_model_and_transforms
model = create_model(
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/open_clip/factory.py", line 290, in create_model
load_checkpoint(model, checkpoint_path)
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/open_clip/factory.py", line 161, in load_checkpoint
incompatible_keys = model.load_state_dict(state_dict, strict=strict)
File "/root/miniconda3/envs/daclip/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIP:生成csv文件时出错

Cannot Get Decent Results in Unseen Images

Hi,

I am testing unseen images on the pretrained model daclip_ViT-B-32.pt and it seems that model is not working well. There is no or very slight changes in the images. As an example, here is the original image and the output image, respectively.
scratch_removal_automatic_1
image

Is there a problem related to the pretrained model or is this a normal behavior on unseen images due to the limitations of the training dataset of the pretrained model?

Thanks.

clip模型得到的图像嵌入维度

作者您好,据我目前所了解,通过clip编码器得到的维度是[1, 512]的,您是如何把他们变为c , h, w的形状,并融入扩散模型?感谢您的回答。

How to use trained parameters for testing?

I am using Ubuntu 22.04

I want to train this network on my own images.
After training(e.g. running train.py), I get two files lastest_EMA.pth and latest_G.pth, I would like to ask how to use them for testing? I can not figure out how to read those two files from test.py. Could you please give some tutorials on loading the parameters? Thank you!

some question about paper

Thank you for your excellent work. I have a question about the content of your paper. What does gradient flow(in Fig.2a) mean and what does it do? I can't find its meaning elsewhere in the paper.

About uncompleted data

uncompleted 数据集有30000张,请问哪些数据是用于训练,哪些用于测试?有详细的ID嘛?

About generalizing to unseen degradation type?

Thanks for your impressive work! In the appendix, the types of degradation observed in Table 5 and Figure 17 closely resemble those of the trained degradation. Have you considered applying the DA-CLIP model to a broader range of tasks to test the generalization?

显卡配置要求

感谢分享这个优秀工作的代码。

请问要训练和测试这个模型,对于显卡最低要求要达到什么层次,在3090这种显卡上能运行吗?谢谢!

Not able to test on an image

I get this error when I run the code given in the readme file for testing on an image

Traceback (most recent call last):
File "run.py", line 6, in
model, preprocess = open_clip.create_model_from_pretrained('daclip_ViT-B-32', pretrained=checkpoint)
File "/home/vinayak/.local/lib/python3.8/site-packages/open_clip/factory.py", line 437, in create_model_from_pretrained
model = create_model(
File "/home/vinayak/.local/lib/python3.8/site-packages/open_clip/factory.py", line 215, in create_model
raise RuntimeError(f'Model config for {model_name} not found.')
RuntimeError: Model config for daclip_ViT-B-32 not found.

Please let us know what has to be done?

bugs in daclip-uir/universal-image-restoration/config/wild-ir/test.py ?

change

# clip_model, _preprocess = clip.load("ViT-B/32", device=device)
if opt['path']['daclip'] is not None:
    clip_model, preprocess = open_clip.create_model_from_pretrained('daclip_ViT-B-32', pretrained=opt['path']['daclip'])
else:
    clip_model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_s34b_b79k')
tokenizer = open_clip.get_tokenizer('ViT-B-32')
clip_model = clip_model.to(device)

to

# clip_model, _preprocess = clip.load("ViT-B/32", device=device)
if opt['path']['daclip'] is not None:
    clip_model, preprocess = open_clip.create_model_from_pretrained('daclip_ViT-L-14', pretrained=opt['path']['daclip'])
else:
    clip_model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_s34b_b79k')
tokenizer = open_clip.get_tokenizer('ViT-L-14')
clip_model = clip_model.to(device)

Degra_loss during training for customized datasets

Hi! Firstly, great thanks for your outstanding and inspiring works!

When I tried to train daclip for my datasets, the Contrastive_loss decreased normally and converged to ~0.15. However, the Degra_loss decreased only in first 2 epochs and then maintained ~5.7 which is relative huge compared to Contrastive_loss.

Does this situation also occur by your training steps? I would be appreciated if you could also show the loss curves so that I could compare it with mine.

task_name

Hello! What does it mean to achieve task_name in print(f"Task: {task_name}: {index]} - {text_probs[0][index]}")?

About training

感谢作者的这篇优秀工作。我在复现的时候遇到一个问题。在训练过程中的验证阶段会爆显存,可以通过修改参数来避免爆显存吗
屏幕截图 2023-11-25 233658

Some questions about your paper

Hi, sorry for the distrubance again, I got some ambiguous points after reading your paper:

  1. Do you train CLIP Controller and Restoration Model separtely or train they at the same time?
  2. I saw you introduce learnable prompt at this line, which is smart. However, I notice you incorporate prompt_embedding by t = t + prompt_embedding, my question is why you integrate degradation type into time step, instead of by cross attention like this x = attn(x, context=image_context).
  3. For the NAFNet there's no time step, how did you integrate prompt_embedding into NAFNet?

Something I didn't find answer in your paper (or I missed), sorry for interrupting you. Thank you for your great work.

snowy train set selection.

Hello, in order to reproduce the training process, I would like to ask you how you selected the 1872 training sets and 601 test sets from 100k snowy pictures. It would be great if you could give a sample list. Thank you again for sharing your excellent research results.

best wishes

test a image

Hi, How should I open the URL address
The output is
(DA-clip) root@autodl-container-4d6411b93c-2cb1d3b0:~/autodl-tmp/daclip-uir-main/daclip-uir-main/universal-image-restoration/config/daclip-sde# python app.py
export CUDA_VISIBLE_DEVICES=0
OrderedDict([('name', 'Test'), ('mode', 'LQGT'), ('dataroot_GT', 'datasets/universal/deg_type/GT'), ('dataroot_LQ', 'datasets/universal/deg_type/LQ')])
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().

Rain100H dataset

Hi, would you please check the link of this dataset Rain100H. I checked but the link is wrong.

Hosting models on Hugging Face Hub

Hi there! Thanks for sharing the code of DA-CLIP!

Would you be interested in hosting the models on the Hugging Face Hub? (hf.co/models). The current model weights are in Google Drive, which is hard for users to discover externally. By hosting the models on the Hub, you can document them with model cards, get download stats, and can use programmatic access to download the models directly without the users having to download them manually. That would also simplify the process of training and evaluation. Here is a guide in case you're interested

Cannot Generate Clean Captions With BLIP

AssertionError: /content/drive/MyDrive/datasets/universal/train/uncompleted/GT is not a valid directory
There doesn't seem to be a GT directory in the dataset you provided, what should I do?

How do you integrate DA-CLIP to NAFNet?

Hi, very appreciate your open-source work and it gives us great insights.

I noticed that in the page 7 of your paper it mentions: "Moreover, we further integrate1 DA-CLIP into an MSE-based network NAFNet (Chen et al., 2022) as a variant of our method.", may I know how do you implement it since there is no cross-attention modules in NAFNet?

Thank you so much and wish you have a bright future on your research path.

Seems to be working well only on test images

I'm using Google Colab with gradio to test the project, and it seems it doesn't do almost anything on my custom images (I tried different degradations in my examples), it only adds some noise and that's it. It works very well only for the test images provided in the project. Is there possibly a mistake in app.py script or anywhere else in the pipeline for testing with custom images? Or maybe a pretrained model is so limited?

如何用自己的灰度图数据集进行训练?🥲

我的数据集是(1,256,256)的灰度图,在训练DACLIP的时候会报如下的错误:🥲🥲🥲

我把如下路径中的conv1中的in_channel从3改为了1
image

image

结果会报错:
RuntimeError: Error(s) in loading state_dict for CLIP: size mismatch for visual.conv1.weight: copying a param with shape torch.Size([768, 3, 32, 32]) from checkpoint, the shape in current model is torch.Size([768, 1, 32, 32]).

请问如何解决呀?

about Snowy dataset

downlaod from the link of the Snowy train dataset, there are 50,000 images under gt folder, where can I get the "subset" that contained 1872 images for trainning ?

Question about Degradation Type Classification

Thanks for the awesome work! It's really cool.
I was trying to figure out how to do the classification as described, but I couldn't find any specific details in the paper. Could you help me out with some info or point me to where I can learn more about it?
Thank you very much for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.