jeong-tae / racnn-pytorch Goto Github PK
View Code? Open in Web Editor NEWThis is a third party implementation of RA-CNN in pytorch.
This is a third party implementation of RA-CNN in pytorch.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "trainer2.py", line 305, in
train()
File "trainer2.py", line 94, in train
logits, cc, aa= net(images)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/Desktop/ww/RACNN-pytorch-master/models/RACNN.py", line 55, in forward
conv5_4_A = self.b2.features:-1
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/batchnorm.py", line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f10b8d49dd8>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
hi, @jeong-tae :
I follow your model to train. The number of pre-apn trainings is 2000, and it seems strange to see the loss of the cls and the loss of the rank, as shown.
Also accuracy has been very low, no more than 1. What is the reason for this?
I know nothing about pytorch, so I read your programs as tensorflow, and I find a strange calculations about loss, in model/loss.py, I see you cal loss as
F.cross_entropy(preds[i], labels)
Is this right? I guess maybe F.cross_entropy(preds[i], labels**[i]**) ?
when i run the trainer.py,there is no problem in pre_apn_epoch,but when test ,it is out of size ,just like the figure ,what i use is a GT2010 Ti,have no idea of the problem ,is there any tensor space not be released?
Hello, @jeong-tae:
I try use you write code of RACNN-pytorch to train my dataset. I want to save train-model. But I have not idea to do this. Can you give me some suggest or idea for save-model and predict. thanks.
Hello, in Line 228 of ./trainer.py
response_map = F.upsample(response_map, size = [resize, resize])
maybe shoud be
before_upsample = Variable(response_map.unsqueeze(0))
response_map = F.upsample(before_upsample, size = [resize, resize])
response_map = response_map.data.squeeze()
More, I have a question to ask you. It have no problem when I run your code with only one gpu, however it has the "cude error: out of memory" problem when I run the code with multiple gpus, do u have the same problem , or do u know the reason?
[图片][图片]
I fix the batch_size=1,there are some following issues,Can you tell me how to solve them??
[] pre_apn_epoch[13], || pre_apn_iter 19980 || pre_apn_loss: 0.1223 || Timer: 0.1521sec
[] Swtich optimize parameters to Class
Traceback (most recent call last):
File "/home/alex/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/home/alex/Datas/code/RACNN-pytorch/trainer.py', wdir='/home/alex/Datas/code/RACNN-pytorch')
File "/usr/local/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/usr/local/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 311, in
train()
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 136, in train
test(testloader, iteration)
File "/home/alex/Datas/code/RACNN-pytorch/trainer.py", line 292, in test
test_apn_losses = torch.stack(test_apn_losses).mean()
TypeError: expected Tensor as element 0 in argument 0, but got list
Thanks for sharing the code. It helps me understand the APN, I have been confused by how the author crops the attention region.
In the backward code of APN, I found you used a fixed value of in_size. (If my understanding for the code is right) Did you just backpropagate the gradient to a fixed location? if it is fixed, why did you do that? If not, how you backpropagate the gradient to the attention location?
Thanks in advance
def backward(self, grad_output):
images, ret_tensor = self.saved_variables[0], self.saved_variables[1]
in_size = 224
ret = torch.Tensor(grad_output.size(0), 3).zero_()
norm = -(grad_output * grad_output).sum(dim=1)
x = torch.stack([torch.arange(0, in_size)] * in_size).t()
y = x.t()
long_size = (in_size/3*2)
short_size = (in_size/3)
mx = (x >= long_size).float() - (x < short_size).float()
my = (y >= long_size).float() - (y < short_size).float()
ml = (((x<short_size)+(x>=long_size)+(y<short_size)+(y>=long_size)) > 0).float()*2 - 1
mx_batch = torch.stack([mx.float()] * grad_output.size(0))
my_batch = torch.stack([my.float()] * grad_output.size(0))
ml_batch = torch.stack([ml.float()] * grad_output.size(0))
if isinstance(grad_output, torch.cuda.FloatTensor):
mx_batch = mx_batch.cuda()
my_batch = my_batch.cuda()
ml_batch = ml_batch.cuda()
ret = ret.cuda()
ret[:, 0] = (norm * mx_batch).sum(dim=1).sum(dim=1)
ret[:, 1] = (norm * my_batch).sum(dim=1).sum(dim=1)
ret[:, 2] = (norm * ml_batch).sum(dim=1).sum(dim=1)
return None, ret
is there someone meet the same problem ?
in the cub_loader train_test_split.txt classes.txt images.txt image_class_labels.txt been used. will U please tell me what each file look like and what's have in it.
hello, I'm interesting in RACNN too. And I'm writing the baseline with pytorch. however, I get the highest accuracy of 73.18%. so, have you implemented the performance reported in his paper? waiting for your reply. thank you!
find error in line 75, RACNN.py when run train.py
h = lambda x: 1 / (1 + torch.exp(-10 * x))
RuntimeError: _exp_out is not implemented for type torch.cuda.LongTensor
look like is the torch.exp not support a longtensor
do any body know why?
thanks
hi,If I use code in Line 236 of ./trainer.py
response_map = F.interpolate(response_map.unsqueeze(0), size = [resize, resize])
it will report CUDA memory error
when I change it to
before_upsample = Variable(response_map.unsqueeze(0)) response_map = F.upsample(before_upsample, size = [resize, resize]) response_map = response_map.data.squeeze()
and
def train(): net.train() with torch.no_grad():
it will be ok,But I don't know if it's right.and in Line 125 and 185 of ./trainer.py if it should be
logits, _, _, _ = net(images)
Hi, very nice code but seems that the APN doesn't work.
there are some problems in AttentionCropFunction
and I changed it as below:
tx, ty, tl = locs[i][0], locs[i][1], locs[i][2]
# tx = tx if tx > (in_size/3) else in_size/3
# tx = tx if (in_size/3*2) < tx else (in_size/3*2)
# ty = ty if ty > (in_size/3) else in_size/3
# ty = ty if (in_size/3*2) < ty else (in_size/3*2)
# tl = tl if tl > (in_size/3) else in_size/3
## this should generate a more reasonable anchor here
tl = tl if tl > (in_size/3) else in_size/3
tx = tx if tx > tl else tl
tx = tx if tx < in_size-tl else in_size-tl
ty = ty if ty > tl else tl
ty = ty if ty < in_size-tl else in_size-tl
w_off = int(tx-tl) if (tx-tl) > 0 else 0
h_off = int(ty-tl) if (ty-tl) > 0 else 0
w_end = int(tx+tl) if (tx+tl) < in_size else in_size
h_end = int(ty+tl) if (ty+tl) < in_size else in_size
mk = (h(x-w_off) - h(x-w_end)) * (h(y-h_off) - h(y-h_end))
xatt = images[i] * mk
# xatt_cropped = xatt[:, h_off : h_end, w_off : w_end]
## axis h,w here seems to be reversed wrongly?
xatt_cropped = xatt[:, w_off: w_end, h_off: h_end]
Hi, I'm interested in racnn and your code. How about your experiment results now? Is it worked well now?
Traceback (most recent call last):
File "trainer.py", line 306, in
train()
File "trainer.py", line 57, in train
trainset = CUB200_loader(os.getcwd() + 'G:/RACNN-pytorch-master/RACNN-pytorch-master/data/CUB_200_2011/images', split = 'train')
TypeError: 'module' object is not callable
I need help
Hi ,
Can you please tell the use of average pooling over the feature map of conv5_4? I couldn't find its reference in paper? @jeong-tae @jtlee90
Thanks
Hello, in Line 88 of ./models/RACNN.py
tx = tx if (in_size/3*2) < tx else (in_size/3*2)
maybe shoud be
tx = tx if (in_size/3*2) > tx else (in_size/3*2)
and the same as that of the ty?
by the way, Would you mind explain the meaning of the Line 115 of ./models/RACNN.py about the backward() in brief? I got confused on it .Thank you!
norm = -(grad_output * grad_output).sum(dim=1)
After line 183 in the trainer.py , did you forget this line of code "logits, _, _ = net(images)"?
Hi,Thank you for sharing,but when I do as your steps,some question occur and I try my best to fix it but nothing help.my problem is:
ImportError: No module named 'visual',could you please help me! Thank you
[*] RACNN forward test...
Traceback (most recent call last):
File "/home/dl2/Songly/RACNN-pytorch-master/models/RACNN.py", line 158, in
logits, conv5s, attens = net(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/dl2/Songly/RACNN-pytorch-master/models/RACNN.py", line 49, in forward
scaledA_x = self.crop_resize(x, atten1 * 448)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/dl2/Songly/RACNN-pytorch-master/models/RACNN.py", line 152, in forward
return AttentionCropFunction.apply(images, locs)
File "/home/dl2/Songly/RACNN-pytorch-master/models/RACNN.py", line 99, in forward
mk = (h(x-w_off) - h(x-w_end)) * (h(y-h_off) - h(y-h_end))
File "/home/dl2/Songly/RACNN-pytorch-master/models/RACNN.py", line 76, in
h = lambda x: 1 / (1 + torch.exp(-10 * x))
RuntimeError: "exp" not implemented for 'torch.LongTensor'
Hi, I'm interested about RACNN and your code. But I‘m not familiar with pytorch. I want to ask how could I save the tx, ty and tl and use them to get the pics which is the input of the second channel and the third channel? Thank you!
hi, @jeong-tae :
I looked at your code and looked at the model and original paper of the source racnn-caffe. I have a question about the final loss function. The original racnn-caffe model finally concat pool5, pool5_A, and pool5_A_A(pow1, pow2, pow3), and then calculates accuracy1+2+3 through the fc layer. Figure:
I read your code and the original paper about this Lcls. You calculate each Lcls1, Lcls2, Lcls3, and then add them up. Figure:
I don't think they are a meaning. If I want to use scale1+2+3 like the source racnn-caffe, how I do it ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.