daitao / san Goto Github PK
View Code? Open in Web Editor NEWSecond-order Attention Network for Single Image Super-resolution (CVPR-2019)
Second-order Attention Network for Single Image Super-resolution (CVPR-2019)
Hi Everyone,
I am getting an error AssertionError: Invalid device id, when i set --n_GPUs to 2 (args.n_GPUs to 2). can anyone help me.
I am getting error at this line(Line number 29) in model/init.py
self.model = nn.DataParallel(self.model, range(args.n_GPUs))
I tried nvidia-smi , it is showing 2 gpus also i tried,
torch.cuda.device_count() it shows 2
Hi, I've read your paper and code and found it very interesting.
One question, if SAN uses channel-wise feature extraction, is it not effective on grayscale images?
I tried to test it myself, but simply changing the code in option.py didn't work. If this model is also effective on grayscale images, can you tell me how to change the code/settings for training? If not, its fine :)
您好 我在论文复现的时候出现了一些问题,首先,是学习率的设置 您论文中说的是每200次衰减一半,然后代码设置里是说的每50次衰减为0.6,按照代码直接跑的结果是37.7多,改完学习率以后跑的结果是37.9多。
随后模型中LSRAG中soca后面是没有conv层的 然后您加了conv层 我去掉跑的话第一代结果为8.多,然后就直接关掉了 请问您加这一层的作用是什么?
最后 我又添加了SSRG中最后一层的conv,目前跑到1200代左右,效果为37.9,还是离论文结果有一定的差距。请问是我哪里的参数设置错了还是模型要改动哪里 非常希望能得到您的回复
Thanks for your great work. From the paper I found that SAN+ improved performance by a large margin but I don't know how to evaluate that. Hopefully you can share the tips/. Thanks a lot.
can you share your code?i am student in school ,just for study
Hi,
Congratulations on getting some great results with SAN. I was wondering if it's possible for you to put the pretrained models somewhere else than Baidu network. I wanted to try out the model on some of my personal benchmarks but seems like Baidu bloatware won't let me do the same in Germany. Can you perhaps put the model on Google Drive or Model Zoo or Git Large File Storage?
It would be much appreciated given that the model is pretty big and expensive to train from scratch.
Hello,
I've been searching the codebase for a validation routine, and it seems the current framework is not using a separate validation set. Are you using the test set as the validation set?
Thanks,
Kwang
Hi, thanks for your wonderful work and opening source.
Could you please tell me how long did you train the model , the kind of GPU and number of GPUs to be used ?
Best regards
I modified the MPNCOV.py based on this solution #29, but I still get out of memory error.
RuntimeError: CUDA out of memory. Tried to allocate 9.80 GiB (GPU 0; 14.76 GiB total capacity; 9.94 GiB already allocated; 3.89 GiB free; 160.79 MiB cached).
Is there any way to solve this issue?
Hello,
I follow the readme file's step and set the DIV2K dataset and set '--dir_data' as the HR and LR image path.
But when I try to train the model , there seems to be a missing dataset dir_data/benchmark/Set5/
please tell me how should the benchmark folder and the Set5 looks like,
Thanks
您好,我在训练模型时出现错误,FileNotFoundError: [WinError 3] 系统找不到指定的路径。: 'dataset\benchmark\Set5\HR'
在您给出的训练方法中只有这两步骤:
Download the DIV2K dataset (900 HR images) from the link DIV2K.
Set '--dir_data' as the HR and LR image path.
我已经按照要求下载了DIV2K数据集并且设置了dir_data,但是这里仍然缺少一个benchmark,请问benchmark这些数据在哪里下载呢?
Could you share the test results of Manga109 data set?
Good job, I think.
Looking forward for your pre-trained models.
Hi,
Do you plan to set an open-source license to your project, such as MIT or Apache-2.0 ?
It would make it usable in other open-source projects (which is generally the case when we put some code on github). In this case, I would like to test it and reference to it in my own project focusing on open source image restoration: https://github.com/titsitits/open-image-restoration
Best regards,
Mickaël Tits
I cannot download the pre-trained weights because it is placed in baidu.
Baidu require Chinese telephone numbers, so foreigners cannot try to SAN.
Will you plan to place pre-trained model in other place?
Hi, thanks for your work.
I can not find any information about 'LSRAG' in the code of ‘san.py’.
In the code, I find the 'SOCA' is not at the tail of NLRG and there is still one Conv layer following 'SOCA'. Additionally, san consists of several NLRGs(n_resgroups).
In the paper, SAN just has one NLRG,which consists of several LSRAG. So I think the 'NLRG' in code is actually the 'LSRAG' in paper. Is it right?
And if it is right, why is the SOCA followed by a Conv layer? In paper,the SOCA is at the tail of LSRAG.
I want to know the reason about the difference.Looking forward to your reply.
Impossible to run this model on single 1080ti GPU under the default setting in the code
according to the "option.py". it's ok to use "--n_GPUs". Howerver, it didn't work
"RuntimeError: CUDA out of memory. Tried to allocate 324.00 MiB (GPU 0; 10.73 GiB total capacity; 9.08 GiB already allocated; 290.31 MiB free; 612.32 MiB cached)"
Training on a 2080ti gpu under the same setting as training demo (--n_resgroups 20 --n_resblocks 10)
Please kindly let me know how to deal with it if this is possible.
My environment is pytorch 1.6, cuda 10.2 in ubuntu
I replace _update_worker_pids with _set_worker_pids in code/dataloader.py, then i got this error.
"ValueError: _set_worker_pids should be called only once for each _BaseDataLoaderIter. "
Help me ..
您好!请问BDx3的模型的训练权重能提供一下吗?我需要用于定性比较
Traceback (most recent call last):
File "main.py", line 19, in
t.train()
File "/opt/data/private/SAN-master/TrainCode/trainer.py", line 51, in train
sr = self.model(lr, idx_scale)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/opt/data/private/SAN-master/TrainCode/model/init.py", line 58, in forward
return self.model(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/opt/data/private/SAN-master/TrainCode/model/san.py", line 515, in forward
x = self.sub_mean(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDNN_STATUS_MAPPING_ERROR
How can I solve it?
Why CUDA out of memory .Tried to allocate 8.38GiB (GPU 0; 10.92 GiB total capacity; 8.69 GiB already allocated; 1.22GiB free;33.00 MiB cached)
Hi,
I'm unable to access Baidu from my country(Brazil). For some reason, when trying to download the pretrained model, it provides me with a Linux client for Baidu's network, but not a direct link to the actual file.
Could you guys upload this file elsewhere, such as Google Drive?
Or is there any other way I can get it?
Thanks in advance.
I train SAN on my own data. The loss is droping . But eval psnr is droping, too.
What is the problem?
[gaoqing_592 x2] PSNR: 25.203 (Best: 29.220 @epoch 1)
Total time: 765.24saxis shape: (10,)
axis: [ 1 2 3 4 5 6 7 8 9 10]
self.log[:, i].numpy() shape: (10,)
self.log[:, i].numpy(): [11.964835 10.344067 9.891278 9.340872 9.287792 8.970273 8.765504
8.653118 8.545278 8.612093]
SAN\TrainCode\model\MPNCOV\python\MPNCOV.py 84 line der_sacleTrace
I can not find the location of this variable definition. I also don't understand what this variable mean. Can the author explain this? Thank you.
Hi, guys.
The inaccessibility of pre-trained models for SAN is quite a problem. Thankfully, I have a friend of mine in China who was kind enough to download the zip file for me and send it. I have uploaded both the zip file and the extracted .pt
files to Google Drive.
Feel free to download the models.
Regards
Hi, daitao, thanks for your wonder work about SAN. It gives us many ideas.
I have a problem about reduction value in SAN code.
In option.py, it is set to be 16. However , (in san.py) when we use Nonlocal_CA(about 500 lines), it is set to be 8.
Is it set to be all 16 in code as paper said ?
Best regards.
TrainCode fold does not have code fold.
I am observing that 24th line operation in MPNCOV.py file under the Testcode is consuming a huge amount of memory to create the tensors and to run the addition between them. For 250*100 image size, it is consuming almost 6.98 GB memory. Is there any way to reduce the memory consumption?
RuntimeError: CUDA out of memory. Tried to allocate 5.05 GiB (GPU 0; 15.75 GiB total capacity; 10.42 GiB already allocated; 3.68 GiB free; 50.61 MiB cached)
Training on a V100 gpu under the same setting as training demo (--n_resgroups 20 --n_resblocks 10)
Please kindly let me know how to deal with it if this is possible.
Can someone who has successfully implemented, can please share the pytorch version
I think super resolved 48x48 with scale x8 is 384x384, but it's written as 392x392 in the README.md.
Is it just a simple typo, or is there any reason?
Thank you.
When i run the code to test, got the this error.
./SAN/TestCode/code/trainer.py, line 113 in test
self.ckp.log[-1, idx_scale] = eval_acc / len(self.loader_test)
ZeroDivisionError: division by zero.
I have no idea why len(self.loader_test) is zero.
How can i fix this error?
SAN\TrainCode\model\MPNCOV\python\MPNCOV.py 84 line der_sacleTrace
Really good job!
Just a quick question, did you compare your results with WDSR and ESRGAN for both PSNR and perception loss?
Thanks!
Hello,
Thank you so much for the excellent contribution! I am attempting to upscale a standard 720p image using SAN. I am using the 3x model, and have tried the following GPUs
And the model is OOM (out of memory) in all of them. I'm actually at a loss, how should I proceed with regards to the same in order to run inference?
Thank you.
1、看上去SOCA是一个很厉害的加强版CA,能否做一个将RCAN中所有CA换成SOCA的实验,这样可以证明SOCA比CA更强?
2、大组(SSRG)和小组(LSRAG)都是堆叠,但是结构略有区别,能否解释下为什么这样设计(为什么不同)吗?
(1)大组结尾有一个3x3卷积,而小组没有;
(2)小组结尾是SOCA,但是大组开头和结尾是RL-NL;
(3)大组用了对开头做残差,但是小组没有。
3、SSRG中的残差gamma,为什么是共享的同一个,而不是每一个小组用一个不同的gamma?
4、为什么SSRG前后的non-local是共享同一个,而不是用两个不同的?
Hi @daitao ,
Excellent job!
Have you ever tried to test with 1920x1080 images? Such as 1920x1080 x2 scale to 3840x2160
Thanks!
I found there is no 8x pre-trained model in the pretrained_model.zip
file, I wonder when it will be released.
Hi, @daitao
Sorry to disturb, I think you have made an error in your SAN paper.
RCAN have less parameters than SAN that RCAN contains 15.44M
and SAN contains 15.7M
.
(This value is reported by RCAN's author)
Because of your wrong conclusion, now all the work has made the same mistakes as you, which is completely deviated from the development track.
Hi,
In the paper, it is stated that 8 LR colour patches of size 48x48 are used for training. However, in the default settings, the mini-batch size is 16. What settings need to be used to match the results in the paper?
When I reduced the batch size to 10 due to memory limitations in the GPU, the PSNR of Set5x2 was about 37.8dB and stopped improving after 590 epochs. This deviates from the results in the paper.
I am aware that a few questions have been asked about the batch size, but I couldn't find an answer to it. Does anyone have any information about this? Thanks.
Hello, thank you for your wonderful work .
As a beginner, I find that the TrainCode and TestCode are mostly similar, and at first glance they look the same. Could you please tell me the main difference?
Specs:
OS: Ubuntu 18.04
PyTorch: 1.3.1
Python: 3.6.9
Command:
python3 main.py --model san --data_test MyImage --save save_name --scale 4 --n_resgroups 20 --n_resblocks 10 --n_feats 64 --reset --chop --save_results --test_only --testpath 'your path' --testset Set5 --pre_train ../model/SAN_BIX4.pt
Output:
File "main.py", line 4, in <module>
import data
File "<some_directory_containing_SAN>/SAN/TestCode/code/data/__init__.py", line 3, in <module>
from dataloader import MSDataLoader
File "<some_directory_containing_SAN>/SAN/TestCode/code/dataloader.py", line 10, in <module>
from torch._C import _set_worker_signal_handlers, _update_worker_pids, \
ImportError: cannot import name '_update_worker_pids'
Problem
Upon running the specified command, the provided output is returned. I'm confused as to how everyone else has been able to run the network.
Any help would be sincerely appreciated.
Hi,Dai Tao.
I ran the code in TrainSAN_script.sh and readme(both for scale=4) to train the model,but it just achieved a PSNR about 31.67 on SET5(it dosen't rise after about 700epoch). What's wrong with that, or what did I missed?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.