tengshaofeng / residualattentionnetwork-pytorch Goto Github PK

View Code? Open in Web Editor NEW

674.0 674.0 165.0 142.61 MB

a pytorch code about Residual Attention Network. This code is based on two projects from

Python 100.00%

residualattentionnetwork-pytorch's People

Contributors

Stargazers

Watchers

Forkers

berryhn bentengma fengwengg leviawang udonda swordlicoder zt1112 nikitatselousov cch2016 chiehchiu aymenx17 dashengge back2yes abi98213 cltdevelop shuharold guancheng817 xiaoyigwr csjunxu jizongfox shubhampachori12110095 ondrejbiza icaresth alexliyang jy00002 xiaodongdreams witgotflg gqrong lxmwust honglongcai fengjiqiang wkflyerman zhzixuan carl-lei vistart huangwenwenlili manmancover lwpyh hiiamein bzhong2 baucheng xxlxsyhl leolucklee wyk0517 deepcolin kjzju xychen9459 tonyfd ayulove zhanqan grp2019 lvxiuwang joegue husam1986 wzx479 pengchuan1994 xupp1989 13331112522 thomaslin1990 anhcda-study mengkunzhao mikey240 zcl912 xtmeng wqz960 lpsunny voidstrike akolada m1ckyro5a gaohuiluo zrdail gaimjkp yuan776 zhushaoquan xujiafree sunsunwudll bkl255 shiyanrubing jayant1234 3ptelephant cattyhubby sparkparis mayshy lanson07 fairuzsafwan zhaowujie greitzmann azureli yifeng1992 carol007 hvning jianku122 hell-to-heaven rena-jzhang star0071 kurnianggoro jkdomoguen qianrenjian maelstrom9 sunshiding

residualattentionnetwork-pytorch's Issues

what's the version of torch, torchvision and python?

what's the version of torch, torchvision and python? can anyone explain it?

Expression of mix attention

Thanks for your job! I have a question about the expression of mix attention. And is conv->relu->conv->sigmoid able to represent it?

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

I ran your code and meet the error as below:

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). Traceback (most recent call last): File "train_pre.py", line 52, in <module> for i, (images, labels) in enumerate(train_loader): File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 275, in __next__ idx, batch = self._get_batch() File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 254, in _get_batch return self.data_queue.get() File "/usr/lib/python3.5/multiprocessing/queues.py", line 343, in get res = self._reader.recv_bytes() File "/usr/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 175, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 50) is killed by signal: Bus error.

The run environment is python 3.5, tensorflow 1.0.1 and pythorch 0.3.1. I have search for the solutions. And I think this maybe cause by version confilicts.

Can you tell us the run environment, and any other suggestion? thx

Errors when I try to run train.py

I follow the insturction and run: CUDA_VISIBLE_DEVICES=0 python train.py
but I get
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

(tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
(tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

what's wrong with this code?

话题关闭

extending it to 3d data

Hi,
Can we implement the same network for 3D data by using 3d layers of same 2d layers? What do you advice?

transfer learning

I want to use this code for another dataset, which parameter makes sure that my new data will be used for the model trained on CIFAR?

And do you have any advice, if input data dimensions are higher than CIFAR, e.g 100*100?

stage 0

where the code has stage 0 which doesn't exist in the paper

i think the num of params for cifar10 residual network is incorrect

i think the num of params for cifar10 residual network is incorrect, i find that it is much bigger than the num in paper

model_92_sgd.pkl is pre_trained for cifar10?

Hi,
Is the model_92_sgd.pkl is pre_trained for cifar10? Does the imagenet has the pretrained model? Thanks

Shouldn't you record grad when testing?

ResidualAttentionNetwork-pytorch/Residual-Attention-Network/train_mixup.py

Line 56 in 88ed90f

images = Variable(images.cuda())

When testing, model do not need grad.
And this line caused me out of memory.

TypeError

When I run this code in python3.6
I met an error
'File "/home/user/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 33, in init
out_channels, in_channels // groups, *kernel_size))
TypeError: torch.FloatTensor constructor received an invalid combination of arguments - got (float, int, int, int), but expected one of:

no arguments
(int ...)
didn't match because some of the arguments have invalid types: (float, int, int, int)
(torch.FloatTensor viewed_tensor)
(torch.Size size)
(torch.FloatStorage data)
(Sequence data)
'
Do you know how to fix it
Thank you

Error : Data must be sequence , got float

I am trying to implement a new dataset on this code. I changed the class name and also included data class which gives an image as an item of size 448*448 through each iteration. And there is a list of labels matching the class name list. And I am using from model.residual_attention_network import ResidualAttentionModel_448input as.....

And I am getting this error :
Traceback (most recent call last):
File "train.py", line 83, in
model = ResidualAttentionModel().cuda()
File "/home/jayant/Documents/Marsh_Ann/ResidualAttentionNetwork-pytorch-master/model/residual_attention_network.py", line 24, in init
self.residual_block0 = ResidualBlock(64, 128)
File "/home/jayant/Documents/Marsh_Ann/ResidualAttentionNetwork-pytorch-master/model/basic_layers.py", line 20, in init
self.bn3 = nn.BatchNorm2d(output_channels/4)
File "/home/jayant/anaconda3/envs/saltmarsh/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 21, in init
self.weight = Parameter(torch.Tensor(num_features))
TypeError: new(): data must be a sequence (got float)
@tengshaofeng Do you have an intuition about what am I doing wrong? I can also share my dataset rendering class. It has a getiitem method which returns 448*448 image.

About multi-label

Hi @tengshaofeng,
Do you know if this model can process multi-label datasets, like NUSWIDE? Any idea how to do it? Thank you.

Errors when I run train.py

File "/home//ResidualAttentionNetwork-pytorch-master/Residual-Attention-Network/model/attention_module.py", line 249, in forward
out_interp3 = self.interpolation3(out_softmax3) + out_softmax2
RuntimeError: The size of tensor a (14) must match the size of tensor b (2) at non-singleton dimension 3

Traceback (most recent call last): File "train.py", line 20, in <module> from model.residual_attention_network import ResidualAttentionModel_92_32input_update as ResidualAttentionModel ImportError: No module named model.residual_attention_network

下面是我运行某epoch的结果，我想问一下：为什么分类测试精度这么低？

Epoch [32/300], Iter [100/254] Loss: 0.2530
Epoch [32/300], Iter [200/254] Loss: 0.1421
the epoch takes time: 40.39500594139099
evaluate test set:
Accuracy of the model on the test images: 87 %
Accuracy of the model on the test images: 0.8785185185185185
Accuracy of plane : 0 %
Accuracy of car : 0 %
Accuracy of bird : 1 %
Accuracy of cat : 0 %
Accuracy of deer : 0 %
Accuracy of dog : 3 %
Accuracy of frog : 0 %
Accuracy of horse : 0 %
Accuracy of ship : 0 %
Accuracy of truck : 1 %

分类测试精度这么低，还有多个类别都有对应精度？是不是我运行软件的版本有问题，我用python3.5 pytorch1.1版本。还有就是最高精度没输出是咋回事。谢谢！

What is the meaning of `softmax` in attention_module.py?

Hi, I am confused about the term softmax_blocks. The term in the paper should be soft mask blocks? I check the ResidualBlock class which does not exist normalization layers.

Mixed attention、Channel attention and Spatial attention

Hello, I studied your code carefully, and then I found that there are different formulas for Mixed Attention, Channel Attention and Spatial Attention in the paper. But I don't see a formal representation of F (xi, c) in your code. I just started to learn about Deep Networks. How do I modify the network if I want to express different attentions? Thank you!

can you provide the pretrained model

Thank you for sharing your code!

can you provide the best pretrained model?

Focus of the attention mask

I have question about the the soft attention mask. I have implemented residual attention blocks for specific domain (faces). How does the attention mask focus on specific regions of the face? such as forehead and so on???

Hi，is there any impletation of visualizing the mask? i'm insterest in the mask they showed in the paper,it seems very good

model_92_sgd.pkl is pre_trained for cifar10?

Hi,
Is the model_92_sgd.pkl is pre_trained for cifar10? Does the imagenet has the pretrained model? Thanks

Errors when I run train.py

Traceback (most recent call last):
File "train.py", line 93, in
model = ResidualAttentionModel().cuda()
File "/home/ResidualAttentionNetwork-pytorch-master/Residual-Attention-Network/model/residual_attention_network.py", line 136, in init
self.residual_block1 = ResidualBlock(64, 256)
File "/home/ResidualAttentionNetwork-pytorch-master/Residual-Attention-Network/model/basic_layers.py", line 16, in init
self.conv1 = nn.Conv2d(input_channels, output_channels/4, 1, 1, bias = False)
File "/home//.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 412, in init
False, _pair(0), groups, bias, padding_mode)
File "/home//.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 78, in init
out_channels, in_channels // groups, *kernel_size))
TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

(*, torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, *, torch.device device)
(object data, *, torch.device device)

Questions about the performance on ImageNet

Is there anyone train the resattentionnet on ImageNet?

The paper didn't provide the batchsize for ImageNet training. So I set the batchsize=256/lr=0.1 which is a common setting, but the training result (top1-acc: 77.64) is much lower than paper reported (top1-acc: 78.24) ! More details about hyperparameters are listed as below. The epoch setting is converted from the iteration which is mentioned in paper. If we set the batchsize as 256, then there is 5k iteration in 1 epoch. According to the paper, we should decay the learning rate at 200k/5k=40, 400k/5k=80, 500k/5k=100 epoch, and terminate training at 530/5k=106 epoch.

The learning rate is divided by 10 at 200k, 400k, 500k iterations. We terminate training at 530k iterations.

Hyperparameter settings

args.epochs = 106
args.batch_size = 256
### data transform: RandomResizeCrop(224)/HorizontalFlip(0.5)/ChangeLight(AlexNet color augmenation)/Normalize() are used in training
args.autoaugment = False
args.colorjitter = False
args.change_light = True # standard color augmentation from AlexNet
### optimizer
args.optimizer = 'SGD'
args.lr = 0.1
args.momentum = 0.9
args.weigh_decay_apply_on_all = True  # TODO: weight decay apply on which params
args.weight_decay = 1e-4
args.nesterov = True
### criterion
args.labelsmooth = 0
### lr scheduler
args.scheduler = 'uneven_multistep'
args.lr_decay_rate = 0.1
args.lr_milestone = [40, 80, 100]

It seems that the this code reproduced results can not achieve the results in the original paper ?

A Inputsize Question

Hi @tengshaofeng ,thanks ,But I have a question,in attention_module.py ,the class AttentionModule_stage0 inputsize is 112112,but in the class AttentionModule_stage1 the inputsize is 5656,is any maxpool layer used in the middle?I think it's not mentioned in the paper.

How to generate the masks given in the paper?

I have a trained residual attention model, and I want to visualize the masks given in Figure 1. Any idea how do the authors do that? @tengshaofeng If u have already done it, can u share the code to actually visualize the attention masks?

The error about if name == 'main': freeze_support()

excuse me，your code bring me a big help about my research，but,when i run the train.py,it appears the following errors,do you konw how to fix it? thank you!
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

new() received an invalid combination of arguments

E TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
E * (torch.device device)
E * (torch.Storage storage)
E * (Tensor other)
E * (tuple of ints size, torch.device device)
E * (object data, torch.device device)

how to fix it? Thanks

about the code "out_interp = self.interpolation1(out_middle_2r_blocks) + out_down_residual_blocks1"

hello ,thank you for your code!
But I have a question about your code.The episode in your code seems to be no such operation in the paper and in the soft mask branch only skip connection have addition operation.Could you help me solve this question?
out_interp = self.interpolation1(out_middle_2r_blocks) + out_down_residual_blocks1

model = ResidualAttentionModel() error with python3

TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
(object data, torch.device device)

pretrained network

你好，如果我想用自己的数据集，有没有在ImageNet上预训练好的模型呢？

请问residual_attention_network.py里的各个类有什么区别？

Test Accuracy Stagnates

Can you tell me if your training and testing accuracies always followed each other? I am implementing a smaller and modified version of the network you coded, and my test accuracy seems to have stagnated at 81%.
Also, I think you have coded a different architecture because you are adding output of pool layer as well as the output of pool+conv layer to the upsampled input, while the actual architecture only adds the pool+conv output to the upsampled layer. Is that making all the difference?

请问在AttentionModule_stage1_cifar函数中原论文结构这里没在上采样后加 out_trunk这一步骤吧如下

out_interp2 = self.interpolation2(out_up_residual_blocks1) + out_trunk

During the test, cifar10, the output data structure is incorrect.

test code：

print('%s :Accuracy of the model on the test images: %d %%' % (datetime.now(),100 * correct / total))

# print('Accuracy of the model on the test images:', correct.item()/total)
# print(correct.item())
# print(total)
# for i in range(10):
#     print('%s :Accuracy of %5s : %2d %%' % (
#         datetime.now(),classes[i],  class_correct[i].item() / class_total[i]))
#     print(class_correct[i].item())
#     print(class_total[i])
# return correct / total

out：
D:\Microsoft Visual Studio\Shared\Anaconda3_64\envs\xk\lib\site-packages\torch\nn\modules\upsampling.py:129: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
2020-03-31 15:32:25.001979 :Accuracy of the model on the test images: 95 %
Accuracy of the model on the test images: 0.954
9540
10000
2020-03-31 15:32:25.002979 :Accuracy of plane : 0 %
194
1000.0
2020-03-31 15:32:25.002979 :Accuracy of car : 0 %
206
1000.0
2020-03-31 15:32:25.002979 :Accuracy of bird : 0 %
169
1000.0
2020-03-31 15:32:25.002979 :Accuracy of cat : 0 %
136
1000.0
2020-03-31 15:32:25.002979 :Accuracy of deer : 0 %
187
1000.0
2020-03-31 15:32:25.002979 :Accuracy of dog : 0 %
159
1000.0
2020-03-31 15:32:25.003980 :Accuracy of frog : 0 %
204
1000.0
2020-03-31 15:32:25.003980 :Accuracy of horse : 0 %
197
1000.0
2020-03-31 15:32:25.003980 :Accuracy of ship : 0 %
205
1000.0
2020-03-31 15:32:25.003980 :Accuracy of truck : 0 %
203
1000.0

If don't add ‘.item()’
The output will become：
D:\Microsoft Visual Studio\Shared\Anaconda3_64\envs\xk\lib\site-packages\torch\nn\modules\upsampling.py:129: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
2020-03-31 15:38:02.784257 :Accuracy of the model on the test images: 95 %
Accuracy of the model on the test images: tensor(0, device='cuda:0')
9540
10000
2020-03-31 15:38:02.785258 :Accuracy of plane : 0 %
tensor(194, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.786259 :Accuracy of car : 0 %
tensor(206, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.786259 :Accuracy of bird : 0 %
tensor(169, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.787259 :Accuracy of cat : 0 %
tensor(136, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.787259 :Accuracy of deer : 0 %
tensor(187, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.788261 :Accuracy of dog : 0 %
tensor(159, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.789261 :Accuracy of frog : 0 %
tensor(204, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.789261 :Accuracy of horse : 0 %
tensor(197, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.789261 :Accuracy of ship : 0 %
tensor(205, device='cuda:0', dtype=torch.uint8)
1000.0
2020-03-31 15:38:02.790264 :Accuracy of truck : 0 %
tensor(203, device='cuda:0', dtype=torch.uint8)
1000.0

I hope to get your help. thanks

have you ever tested the num of theparams

i find that the params is different from