yulunzhang / rcan Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 312.0 33.77 MB

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

MATLAB 8.08% Shell 2.59% Python 88.81% Dockerfile 0.52%

rcan's People

Contributors

Stargazers

Watchers

Forkers

mahlermozart cuiwenxue ngchc sfd198977 yunfeihaha zijundeng csjunxu kelvinson flt19940317 dtennant chaoyueziji scapeqin obeliskchoi lanthlove mati1994 codeforl cxxgtxy jingang-cv zhlawliet shubhampachori12110095 xuanzhangyang luxuriance19 luluyuyuyang amwons leungzzz stephenjia yongboliang klqulei jaredyedh xuhuaze707313 xianglei96 helloyingying ligua hpyhee jdc08161063 guanlongtianzi dlwbm123 hufangjian trackingbird harukiyqm wanglichenxj junshk amirunpri2018 dengzeshuai hehuiguo wpfhtl huizhang0110 xiaopingzeng hongpanlab circlehy iiaiysh stephenysh dearleiii initializezero jiahaomeng zhangwenhuikk wmehling jjdbear sxd0071 cosmoshua amr905 huanwang1995 zjyruobing hyfine zxgapollo qingfengmingyue qr520 laoyangui liuchongwei liujianzhao6328057 samrtisong dalezz1 lifunudt sugar-hit feiyu-zhang sunchang2017 xuecaihu zehaoy sysuzyc songchengwen wallace-he zieglerservice milletzcz k-hosokawa hqleeustc reborm zengxi77 cyforsr bruinxiong juingzhou fan4fun caoshuyi paparazz1 mistariano chisyliu wangh-allen zhuangzhong talkuhulk sxy370921 dongran-byte

rcan's Issues

Custom input preprocessed

Good afternoon!

Thanks for sharing the source code and experimental results.
What are the changes in the source code to have the same size as the input image as the output image?
In this case, could I can evaluate (train/test) your network for inpainting?

Could you kindly provide the learning curves?

Thank you for your great work!
As I don't have enough GPUs to train the model for 1000 epochs, much fewer epochs are used for my training. But I really want to make sure if my training procedure has come to the plateau state, so it would be beneficial if I could compare your learning curves.
Would you kindly provide the learning curves(loss vs epochs, and test psnr (on DIV2K validation) vs epochs, which are generated automatically by the released code) for training your RCAN model? If not all, x2 BI is enough for me. Thank you very much! :)

How can i get X8 LR Bic images?

Hello, how can i get X8 LR Bic images since DIV2K dataset only contains x2, x3 and x4 LR images?

About the test

Hi, sorry to bother you. Which file did you use to get the final result, the model_best.pt or model_latest ? I find the quantitative result of model_best is better than the other one, when I test the images. Look forward to your reply, thanks.

训练1000个epoch大概需要几天？

您好，感谢您提供源代码。
我现在正在尝试重新训练x4的模型，但我发现在一块1080ti上面跑一个epoch都需要很长时间，所以我想问一下重新训练1000个epoch大概需要多久呢？谢谢

one question about the activation function

Thank you for your impressive work and sharing code so soon, I have read your paper RDN and RCAN ,I'm little confused about your choice of activation function, why did you always choose RELU rather than RRELU or PRELU as activation function? RELU may have many dead neurons, right?

Resuming training

Hi,
I had a a technical problem during training and it got stuck after 800 epochs.
Is there a way to restart with the exact same parameters and have it continue plotting the log?

I tried '--resume -1' which loads the model but not the parameters and doesn't continue the log.
Is there an automatic way to do this?

Thank you

PNSR not improving by using pre-trained models

The output i am getting after running the test scripts is this
(deeplearning) administrator@administrator-System-Product-Name:~/Desktop/Projects/RCAN/RCAN_TestCode/code$ python main.py --data_test MyImage --scale 3 --model RCAN --n_resgroups 10 --n_resblocks 20 --n_feats 64 --pre_train ../model/RCAN_BIX3.pt --test_only --save_results --chop --save 'RCAN' --testpath /home/administrator/Desktop/Projects/RCAN/RCAN_TestCode/LR/LRBI --degradation BD --testset Set5
Making model...
Use DIV2K mean (0.4488, 0.4371, 0.4040)
Loading model from ../model/RCAN_BIX3.pt

Evaluation:
100%|█████████████████████████████████████████████| 5/5 [00:02<00:00, 1.72it/s]
[MyImage x3] PSNR: 0.000 (Best: 0.000 @epoch 1)
Total time: 2.90s, ave time: 0.58s

I have downloaded the pre-trained models. I have used Set5 data which is present in LR folder.

Training with gray images

Hi, Thank you for sharing the code. Great work!

How can i train a model with gray training images? Thanks.

Out of memory error

I have tried many combinations of learning rate and decay still i am getting the same error.

)
Preparing loss function:
1.000 * L1
[Epoch 1] Learning rate: 1.00e-6
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 19, in
t.train()
File "/home/administrator/Desktop/Projects/RCAN/RCAN_TrainCode/code/trainer.py", line 51, in train
sr = self.model(lr, idx_scale)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/Desktop/Projects/RCAN/RCAN_TrainCode/code/model/init.py", line 54, in forward
return self.model(x)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/Desktop/Projects/RCAN/RCAN_TrainCode/code/model/rcan.py", line 110, in forward
res = self.body(x)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/Desktop/Projects/RCAN/RCAN_TrainCode/code/model/rcan.py", line 62, in forward
res = self.body(x)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/Desktop/Projects/RCAN/RCAN_TrainCode/code/model/rcan.py", line 44, in forward
res = self.body(x)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/administrator/anaconda2/envs/deeplearning/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCStorage.cu:58

RuntimeError: cuda runtime error (2) : out of memory

[Epoch 3]       Learning rate: 1.00e-4
[1200/12000]    [L1: 7.5750]    141.7+3.7s
[2400/12000]    [L1: 7.6471]    142.1+0.0s
[3600/12000]    [L1: 7.6028]    142.2+0.0s
[4800/12000]    [L1: 7.6049]    145.2+0.0s
[6000/12000]    [L1: 616.0927]  143.3+0.0s
Skip this batch 510! (Loss: 11261144.0)
THCudaCheck FAIL file=c:\programdata\miniconda3\conda-bld\pytorch_1524543037166\work\aten\src\thc\generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "main.py", line 20, in <module>
    t.train()
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\trainer.py", line 51, in train
    sr = self.model(lr, idx_scale)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\model\__init__.py", line 54, in forward
    return self.model(x)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\model\rcan.py", line 110, in forward
    res = self.body(x)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
    input = module(input)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\model\rcan.py", line 62, in forward
    res = self.body(x)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
    input = module(input)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\model\rcan.py", line 44, in forward
    res = self.body(x)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\container.py", line 91, in forward
    input = module(input)
  File "C:\Users\motor\Anaconda3\envs\TENSORFLOW\lib\site-packages\torch\nn\modules\module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\motor\RCAN-master\RCAN_TrainCode\code\model\rcan.py", line 25, in forward
    return x * y
RuntimeError: cuda runtime error (2) : out of memory at c:\programdata\miniconda3\conda-bld\pytorch_1524543037166\work\aten\src\thc\generic/THCStorage.cu:58

command: python main.py --model RCAN --save RCAN_BDX2_G10R20P48 --scale 2 --n_resgroups 10 --n_resblocks 20 --n_feats 64 --reset --chop --save_results --print_model --patch_size 96 --dir_data (my dir)--pre_train ../experiment/RCAN_BDX2_G10R20P48/model/model_latest.pt --ext sep --batch_size 12

Environment: WIN 10 GTX1060 6gb ANACONDA3 Python 3.6 Pytorch 4.0 and LR *.npy from jpg for jpeg artifact reduction training

Issues: Because of the problem with VRAM, I reduced the --batch_size to 12(even 8) . However, after a certain amount of training(3 to 10 epoch), I get an error saying that I don't have enough memory. Don't you have this problem in a Linux environment?

Thank you!

Training time

Hi Yulun,
Thank you for sharing your excellent work.

While answering this issue, You mentioned that your model takes 70s on titan xp for training 100 iterations. I am using batch size 80 and 8 gpus to perform training using exactly your code and other parameters, but it is taking much more time. Could you please tell me what could be the problem?

Run model on a single test image

Hi,
Thank you for your research!

I was wondering which script can I use to apply pretrained model for particular image instead of applying it for full test5 set?

about nn.AdaptiveAvgPool2d

you use nn.AdaptiveAvgPool2d when you define CALayer in rcan.py,but you use AdaptiveAvgPool2d(1),as i know,when the pool size =1,this pooling is no difference with AvgPool2d(1),and it change nothing with input,so i want to know what dose it mean in your network,thank you!

How do you process the input?

Good afternoon.
Thank you for sharing your excellent work and codes.
But when I run the code, I didn't find the normalization function of the input. And the value feed into the network is not in [-0.5,0.5]. I am confused. I am wondering if i have wrong configs.
Looking forward to your reply.

Training your dataset

Hi,

I have a folder (ex. MyImages) of images. I want to train the network on my set of images.
The input images has the same size as output images, because the input images was custom scaled.
Can you give , as example, the necessary steps to train your network on my set of images?

Kind regards,
Ion

您好，请问您写论文时候的网络结构图是怎么画的呢

我用visio画出来的图像质量很低

Dropbox link give 404

Thanks for sharing your great work :)
I wanted to try to run it with the pre-trained model, but the dropbox link on RCAN/RCAN_TrainCode/experiment/model/Readme.md is returning a 404 status code

MyImage.getitem hr always return -1

CUDA_VISIBLE_DEVICES=0 python main.py --data_test MyImage --scale 4 --model RCAN --n_resgroups 10 --n_resblocks 20 --n_feats 64 --pre_train ../model/RCAN_BIX4.pt --test_only --save_results --chop --save 'RCAN' --testpath ~/RCAN/RCAN_TestCode/datasets --testset Urban100
I Run this code to test Urban100 dataset, but psnr got 0, because hr always is -1, and lr is the original image in Urban100. Does anyone can explain ?

myimage.py
def __getitem__(self, idx):
         filename = os.path.split(self.filelist[idx])[-1]
         filename, _ = os.path.splitext(filename)
         lr = misc.imread(self.filelist[idx])
         lr = common.set_channel([lr], self.args.n_colors)[0]
         return common.np2Tensor([lr], self.args.rgb_range)[0], -1, filename

Results for small image SR

When I ran the script

python main.py --data_test MyImage --scale 4 --model RCAN --n_resgroups 10 --n_resblocks 20 --n_feats 64 --pre_train ../model/RCAN_BIX4.pt --test_only --save_results --chop --save 'RCAN' --testpath ../LR/LRBI --testset Set5

the results are very good as expected.

However, when I try my own input frame (80*45):

The 4x results become blurry:

I wonder if this is a normal output image for a small input (80*45)? Is this because the model is trained with input kind of > 128 * 128 or some other reason?

FYI, I put the input frames in the ../LR/LRBI folder, run the same script with only changing --testset to the new folder

about training problem

Hello, are you using DIV2K dataset for training? is there pretrained model for inference.
I just want to check result

what is the difference between the SE Module and your CA Module

Hi,
What is the difference between your proposed CA Module and SE Module?
Is there a modification on SE Module for low-level visual tasks?

about resume trainning

Hi，sorry to bother you. I have some problems about the Resuming training. Firstly, My training stopped unexpectedly with a setence of 'EOFError: Ran out of input'. I don't know why. Secondly, I want to resume training. I try the setting' --load RCAN_BIX2_G10R20P48 --resume -1 --n_GPUs 2' , i got an error as the followings show:

Preparing loss function:
1.000 * L1
Traceback (most recent call last):
File "/home/img/Desktop/sxd/RCAN/RCAN_TrainCode/code/main.py", line 16, in
loss = loss.Loss(args, checkpoint) if not args.test_only else None
File "/home/img/Desktop/sxd/RCAN/RCAN_TrainCode/code/loss/init.py", line 67, in init
if args.load != '.': self.load(ckp.dir, cpu=args.cpu)
File "/home/img/Desktop/sxd/RCAN/RCAN_TrainCode/code/loss/init.py", line 140, in load
for l in self.loss_module:
TypeError: 'DataParallel' object is not iterable

Process finished with exit code 1

Training a BD model

Hi, i want to train a BD model from the beginning, how can i get the LR data?
I found the operation of the BD in the Prepare_TestData_HR_LR.m, is this the same code you use to obtain the LR data? And if i get the LR data from BD, it seems that i should rename it "DIV2K_LR_bicubic" to make it train.

problems in training

Hello,thanks for sharing the source code and experimental results.
I want to train a model with my own dataset, but there exist some problems in the process of training, hoping you can give some suggestion, I will be very appreciate!

the problem described as below:
Preparing loss function:
1.000 * L1
[Epoch 1] Learning rate: 1.00e-4
Traceback (most recent call last):
File "main.py", line 19, in
t.train()
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/trainer.py", line 45, in train
for batch, (lr, hr, _, idx_scale) in enumerate(self.loader_train):
File "/home/weihq/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/weihq/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/dataloader.py", line 47, in _ms_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/dataloader.py", line 47, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/data/srdata.py", line 90, in getitem
lr, hr = self._get_patch(lr, hr)
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/data/srdata.py", line 126, in _get_patch
lr, hr, patch_size, scale, multi_scale=multi_scale
File "/home/weihq/superresolution1/RCAN-master/RCAN-master/RCAN_TrainCode/code/data/common.py", line 22, in get_patch
img_in = img_in[iy:iy + ip, ix:ix + ip, :]
IndexError: too many indices for array

unbelievable work

hi @yulunzhang
just want to thank you, for your stunning work, amazing Super-Resolution result, from my heart thank you man.

Training how to choose # of training set and # of validation set

Wonderful work! I got a question, if I want to train my own dataset, like there are 3000 images in the training set in total, the question is how to set the # of training images and # of validation images? For example, in your provided code, for the div2k dataset, --n_train = 800, and --n_val = 5. Is there any underlying reason to choose those two numbers? Thanks

saving model error

Hi,

while running your code, there is a problem with the saving model. It gives an error.

The error arises in the test section of the trainer when saving the model.

Please, can you look into the problem?

Regards,
Saeed

No module named ‘data.database'

when I try to run main.py.
an error occur that : No module named 'data.Database'
what should I do to fix it.
thank you .

blurry SR image

The SR image generetaed by the released model is very blurry. What the reason?

model ? or the input?

AttributeError

Hello, I am very interested in your research, but I am running the main.py script and there is no "module" error. How can I solve this problem?

Traceback (most recent call last):

File "", line 1, in
runfile('/home/renxue/RCAN/RCAN_TrainCode/code/main.py', wdir='/home/renxue/RCAN/RCAN_TrainCode/code')

File "/home/renxue/anaconda3/envs/3d-AAE/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/home/renxue/anaconda3/envs/3d-AAE/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/renxue/RCAN/RCAN_TrainCode/code/main.py", line 17, in
model = model.Model(args, checkpoint)

File "/home/renxue/RCAN/RCAN_TrainCode/code/model/init.py", line 35, in init
cpu=args.cpu

File "/home/renxue/RCAN/RCAN_TrainCode/code/model/init.py", line 102, in load
self.get_model().load_state_dict(

File "/home/renxue/RCAN/RCAN_TrainCode/code/model/init.py", line 61, in get_model
return self.model.module

File "/home/renxue/anaconda3/envs/3d-AAE/lib/python3.6/site-packages/torch/nn/modules/module.py", line 535, in getattr
type(self).name, name))

AttributeError: 'RCAN' object has no attribute 'module'

look forward to your reply！

issue about the decrease of learning rate

hello, there is a sentence "The initial leaning rate is set to 10−4 and then decreases
to half every 2 × 105 iterations of back-propagation." in paper,but I do not see this change,in traing ,will I decrease the learning rate?thanks

您好，请问多卡运行的代码如何添加？

您好，这段代码的集成度较高，因此不太清楚想要多卡训练的话，在哪里添加DataParallel的声明？
并且，添加声明后option中的参数是否有需要添加或修改的地方？
我是学生，深度学习的新手，麻烦大神赐教！！

Source code availability

Will the source code be available soon?

Is there two errors about comparative images in this paper?

In Fig. 5, the last two images ocurs error. Others images are flipped up and down significantly.

is there a way to continue training ?

Hi!I recently read your paper and codes, and want to ask your a question.
Is there an argument to continue training after 300 epoch if I feel it's not enough ? else, how to do that ? thanks ! the --load option seems to do the job, but are the checkpoints retained ? with the correct numbering ?

test with own images

Thank you for your wonderful work !
Recently I have trained a model with my own datasets, but when I test it ,there seems to be something wrong I can't understand. The error is described as below:

(base) luomeilu@Ubuntu:~/CNN/RCAN-master/RCAN-master/RCAN_TestCode/code$ python main.py --data_test MyImage --scale 4 --model RCAN --n_resgroups 10 --n_resblocks 20 --n_feats 64 --pre_train ../model/RCAN_BIX4.pt --test_only --save_results --chop --save 'RCAN' --testpath ../LR/LRBI --testset Set5
Traceback (most recent call last):
File "main.py", line 7, in
from option import args
File "/home/luomeilu/CNN/RCAN-master/RCAN-master/RCAN_TestCode/code/option.py", line 19, in
help='random seed')
File "/home/luomeilu/anaconda3/lib/python3.6/argparse.py", line 1338, in add_argument
action = action_class(**kwargs)
TypeError: init() got an unexpected keyword argument 'defaut'

Hope you can give me some suggestions, thanks a lot！

about crop the dataset, mean is different

hello, I cut the images into 48*48 patches, but I found out that the mean is 10^-2 order. The dataset I made is not the same with you. I can't find the details of the input processing, can you point out a way? Thanks very much.

有关网络结构的问题

您在每个小组和大组的堆叠的RCAB后与残差操作前，均加了一个卷积层，请问这个卷积层的作用是什么，能否去除？

2 GPU error when training

I run the command to use 2 GPU for trainning:

CUDA_VISIBLE_DEVICES=0,1 python3 main.py --n_GPUs 2 --dir_data /root/dataset/super-resolution --model RCAN --save RCAN_BIX2_G10R20P48 --scale 2 --n_resgroups 10 --n_resblocks 20 --n_feats 64  --reset --chop --save_results --print_model --patch_size 96 2>&1 | tee $LOG

but get follow error:

Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/data2/docker/overlay2/l/76VGRCNKB4276UVDYJIQ4K44VI:/data2/docker/overlay2/l/MM3UKJSDI6OMZYJEQHBG5K5EBU:/data2/docker/overlay2/l/3TUQTOAGEKBLNX7DPFOKXKXUD5:/data2/docker/overlay2/l/5ZHVFRGKBYJ5MGWORLVPCB67H4:/data2/docker/overlay2/l/MGTNS2XZPIFDXQLJDPBWMZHSFF:/data2/docker/overlay2/l/NBUTJL2W2ZFDXG2JAE3Y6V4M3Z:/data2/docker/overlay2/l/WZ4AKFUGVNF4YJNSHH5XQEZVAV:/data2/docker/overlay2/l/W5VI2B4IEWSZLIUN7VC2PP3LD4:/data2/docker/overlay2/l/JBVVURDZXDPD7SAEKMXLQGX2YS:/dat'
Unexpected end of /proc/mounts line `a2/docker/overlay2/l/2ISST5GDKCNKQHI3D6LITRSPPC:/data2/docker/overlay2/l/QA7MQGMCVTSS4DQ4SS7QOEGADY:/data2/docker/overlay2/l/24BA5LASJSQBJYYNQONNE7DFOA:/data2/docker/overlay2/l/RHLGBBVVMXFSFDL666UIIDLCU6:/data2/docker/overlay2/l/ZJYKOHO5XHWZVLIG3OOX4SMJMW:/data2/docker/overlay2/l/X3VORDWXFDU2Q4IZGWZE24GOF7,upperdir=/data2/docker/overlay2/581f3545fee5eef1ebdd17aea4f9e4d4b922a18a608972a6115f2bbeec32b019/diff,workdir=/data2/docker/overlay2/581f3545fee5eef1ebdd17aea4f9e4d4b922a18a608972a6115f2bbeec32b019/work '
Unexpected end of /proc/mounts line `0 0

Could you please help me to find the reason?

Training epochs

Hi, I recently read your paper, and want to ask you a question.
You don't write total training epochs in your paper, so I find it in your code. The option.py writes '--epochs 1000', which means all the models trained for 1000 epochs?

Questions about using multiple gpu

Hi, @yulunzhang
First of all, thank you for your open source code, and the results of the reconstruction are impressive. I read the EDSR project and your project source code. I use the commad CUDA_VISIBLE_DEVICES=0,1,2 python main.py --model RCAN --save RCAN_BIX2_G10R20P48 --scale 2 --n_resgroups 10 --n_resblocks 20 --n_feats 64 --reset --chop --save_results --print_model --patch_size 96 --ext sep_reset --n_GPUs 3 to using multiple gpu. The code is ok. But I use the commad watch -n 0.1 nvidia-smi to surveillance the gpu and memory usage. We all know in pytorch the model will copy the model to other gpu if we use multiple gpu, and the data will will be distributed equally to each gpu according to the batch size. In this way, our memory usage should be the same, but in practice, the memory usage is decremented in turn, I would like to ask the author how this is going on. Is it that I ignore the details, but also ask the author to help answer. Thank you.

Y channel training error with "--n_colors=1"

Hi, I am trying to train DIV2K with Y channel using the following scripts, but get some error. First I convert DIV2K_train_HR and DIV2K_train_LR_bicubic from rgb to y channel, and rename DIV2K to DIV2K_y, correspondingly delete + '/DIV2K' in div2k.py to keep the training path is right.

CUDA_VISIBLE_DEVICES=1 python3 main.py --n_GPUs 1
 --dir_data /root/dataset/super-resolution/DIV2K_y 
--model RCAN --save RCAN_BIX2_G10R20P48 
--scale 2 --n_resgroups 10 --n_resblocks 20 --n_feats 64  
--reset --chop --save_results --print_model --patch_size 96 
--n_colors=1 --batch_size=48 --n_threads=8 >&1 | tee $LOG

Then, I set --n_colors=1 , but the error comming:

Traceback (most recent call last):
  File "main.py", line 19, in <module>
    t.train()
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/trainer.py", line 45, in train
    for batch, (lr, hr, _, idx_scale) in enumerate(self.loader_train):
  File "/usr/local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/dataloader.py", line 47, in _ms_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/dataloader.py", line 47, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/data/srdata.py", line 90, in __getitem__
    lr, hr = self._get_patch(lr, hr)
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/data/srdata.py", line 126, in _get_patch
    lr, hr, patch_size, scale, multi_scale=multi_scale
  File "/root/kindlehe/project/pytorch/RCAN-master/RCAN_TrainCode/code/data/common.py", line 23, in get_patch
    img_tar = img_tar[ty:ty + tp, tx:tx + tp, :]
IndexError: too many indices for array

Could you please give some advice about y channel training?

about the training process

请教多通道问题

您好，我想请教一下对图像不止rgb三个通道地输入您是怎么实现的，我在使用vgg时曾尝试六个通道因为‘’ValueError: 'arr' does not have a suitable array shape for any mode.‘’未成功，请问您是怎么解决scipy.misc.imsave对图像通道限制的局限的？

Fix for Incompatible version with pytorch 1.0

For anyone who encounters a problem when using pytorch 1.0. saying that _worker_manager_loop is not found. There is also a same issue occuring in proSR repository. You can find the issue here: https://github.com/fperazzi/proSR/issues/31 . This issue tells us to change _worker_memory_loop to _pin_memory_loop to fix the problem.

There are some additional steps that need to be change after changing from _worker_manager_loop to _pin_memory_loop.

go to code/dataloader.py and change:
self.worker_result_queue = multiprocessing.SimpleQueue()
to
self.worker_result_queue = multiprocessing.Queue()

then change:
self.worker_manager_thread = threading.Thread( target=_worker_manager_loop, args=(self.worker_result_queue, self.data_queue, self.done_event, self.pin_memory, maybe_device_id)) self.worker_manager_thread.daemon = True self.worker_manager_thread.start()
to
self.pin_memory_thread = threading.Thread( target=_pin_memory_loop, args=(self.worker_result_queue, self.data_queue, maybe_device_id, self.done_event )) self.pin_memory_thread.daemon = True self.pin_memory_thread.start()

train myimage

Thank you for your work,
I have a problem, during training my data, there always is a error "RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 96 and 68 in dimension 2 at /pytorch/aten/src/TH/generic/THTensorMath.c:3586". patch_size=96. For simplify debugging, n_resblocks and n_resgroups is decreased to 5 and 2 . Other network structure is not changed.

The HR data is from Middlebury dataset, LR data is bicubic-sampled. (total 60 images)
This problem puzzled me for a long time. I would appreciate it if you could give me a reply.

Thanks.

<Making model...
RCAN(
(sub_mean): MeanShift(3, 3, kernel_size=(1, 1), stride=(1, 1))
(add_mean): MeanShift(3, 3, kernel_size=(1, 1), stride=(1, 1))
(head): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(body): Sequential(
(0): ResidualGroup(
(body): Sequential(
(0): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(1): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(2): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(3): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(4): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(5): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(1): ResidualGroup(
(body): Sequential(
(0): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(1): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(2): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(3): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(4): RCAB(
(body): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): CALayer(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(conv_du): Sequential(
(0): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
(1): ReLU(inplace)
(2): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
(3): Sigmoid()
)
)
)
)
(5): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(tail): Sequential(
(0): Upsampler(
(0): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): PixelShuffle(upscale_factor=2)
)
(1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
Preparing loss function:
1.000 * L1
[Epoch 1] Learning rate: 1.00e-4
[16/1980] [L1: 56.1840] 0.1+1.1s
[32/1980] [L1: 58.0140] 0.1+0.1s
[48/1980] [L1: 58.6069] 0.1+0.1s
[64/1980] [L1: 57.1261] 0.1+0.5s
[80/1980] [L1: 55.2015] 0.1+0.1s
Traceback (most recent call last):
File "main.py", line 20, in
t.train()
File "/media/ybl/0A9AD66165F33762/CODE/RCAN-master/RCAN_TrainCode/code/trainer.py", line 47, in train
for batch, (lr, hr, _, idx_scale) in enumerate(self.loader_train):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/media/ybl/0A9AD66165F33762/CODE/RCAN-master/RCAN_TrainCode/code/dataloader.py", line 47, in _ms_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 138, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 138, in
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 115, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 96 and 68 in dimension 2 at /pytorch/aten/src/TH/generic/THTensorMath.
c:3586>file:///home/ybl/%E5%9B%BE%E7%89%87/2019-01-09%2023-10-59%E5%B1%8F%E5%B9%95%E6%88%AA%E5%9B%BE.png