Comments (20)
Edit : I believe there are basic Python2.7 vs Python3 compatibility issues which cause the problem, since this code was written for Python3 and not Python2.7
Adding the line from __future__ import division
in box_utils.py, prior_box.py and detection.py gets rid of the above error, and some other errors.
from ssd.pytorch.
Yes I've successfully trained several models. For some reason I cannot reproduce this error on my machine. Did you make sure your repo is up to date with the current master branch?
from ssd.pytorch.
I am using Python 3, and have not tested it using 2.7, so that is the only thing I can think of at the moment if you're local repo is up to date. I will add the lack of 2.7 support to the README if that's the issue.
from ssd.pytorch.
I didn't build from the source but installed pytorch from pip. I have also made some changes to adapt your code to python 2.7 (link star expression)
I checked the latest master branch and found that https://github.com/pytorch/pytorch/blob/master/torch/autograd/variable.py#L317-L320 still only supports scalar division. In the case of your code "x/=norm.expand_as(x)", it is clearly an element-wise division. But I don't understand how the python version can affect this.
from ssd.pytorch.
BTW, could you please give me a rough time estimation for running one epoch ( with machine specs)?
from ssd.pytorch.
Yeah I agree I don't understand how it is working on my computer if that's the case. I'll look into it more after my classes today, sorry I don't have an answer right this second. As for the time estimate, it takes ~1.4 seconds to run a batch of size 32 forward and backward, but I'm not in my lab right now so I can't remember the exact time per epoch. Will get back to you on all of this right after class.
from ssd.pytorch.
And that's on a single Tesla K80 ^
from ssd.pytorch.
I think if you update to the latest version of Pytorch you will see that element-wise division with .div_() is supported. I do remember that it was originally not supported, but they added it not too long ago. When I run something as simple as:
x = torch.Tensor([1,2,3,4,5,6])
y = torch.Tensor([2,2,2,2,2,2])
x/=y
the correct result is returned. With a batch size of 32, on 1 Tesla K80, it takes me ~ 109 sec. per epoch.
from ssd.pytorch.
As I mentioned in the previous post, in the latest github pytorch source code (master branch), it still shows:
def div_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return DivConstant(other, inplace=True)(self)
raise RuntimeError("div_ only supports scalar multiplication")
I still don't understand how it works for your case. But I will try to update my pytorch to the latest version.
Thanks a lot.
from ssd.pytorch.
Yeah, I apologize for lack of a better answer, but since I cannot reproduce I am closing the issue for now. Let me know if updating PyTorch fixes the issue, I will try to see if I can figure out more info myself in the mean time..
from ssd.pytorch.
Ah, figured it out. That line in the source code is referring to Variables, so it is just saying Variables cannot be divided by Tensors, but Variables can be divided by other Variables of the same size (which is the case here) and Tensors can be divided by other Tensors of the same size.
torch/csrc/generic/methods/TensorMath.cwrap line 1038 has what looks like the place that bridges the python and C for the tensor div_
definition, and it's implied in torch/tensor.py 378: return self.div_(other)
even though it doesn't seem like self.div_
is defined.
So again, not sure what the exact source of the problem is in your case, but my best bet is your version of PyTorch. Hopefully that helps.
from ssd.pytorch.
Also, update on training time: it takes approx. 37.5 sec. per epoch with a gtx1060 and batch size of 16, which is what I am currently using (ran out of money to afford the K80 EC2 instance :P).
from ssd.pytorch.
Thanks a lot. I will definitely update my Pytorch.
Regarding the training time, it only takes 37.5 sec for one epoch? (I suppose you were training using VOC2007 with about 10000 images, right?). I have tried training a mxnet SSD implementation which takes about 270 sec for one epoch using both VOC2007 and VOC2012 data on my titan x gpu card. Does this mean this pytorch ssd is even faster than the mxnet implementation, which doesn't seem to be true.
from ssd.pytorch.
Yeah, that's my bad. Disregard that number, its late here. Training on purely the training set (2501 images) from VOC07 it takes on average ~140 sec. per epoch on a single GTX 1060... So yeah the previous number was off by alot. I would be curious to see how it compares on a Titan X though.
from ssd.pytorch.
one more question ;-)
I am wondering how you got the fc-reduced VGG-16 weights?
from ssd.pytorch.
Hahah of course... I converted them to Chainer and then from Chainer to PyTorch. I also was able to convert them to Torch and then from Torch to PyTorch, but the specific weight file I supply was one that took the Chainer route.
from ssd.pytorch.
hi, i just updated the pytorch to the latest version (0.1.11_5) and had the train.py run.
luckily, i didn't get the div_ error.
but this time, I got "RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/T1" at "conf = labels[best_truth_idx] + 1 " from the box_utils.py
any idea about this? it seems like something to do with tensor add
from ssd.pytorch.
Is this on the first feed forward or were you able to get through some iterations? The only time that line has every been an issue was a while back when I had an explicit 'background' label in the voc labelmap and it just became an index out of range issue for softmax. But I'm currently training as I type this and can't think of what could be causing that. Have you pulled the most recent update of master? Or maybe you're on a different branch?
from ssd.pytorch.
I faced this issue as well, with PyTorch version ( 0.1.12_4 ) which is very recent.
I fixed it by changing the forward()
function in L2Norm.py
as follows:
def forward(self, x):
norm = x.pow(2).sum(1).sqrt()+self.eps
norm_stretch = norm.expand_as(x)
x = x / norm_stretch
out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
return out
I then am facing an issue in the box_utils.py
as:
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu line=226 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_cars.py", line 232, in <module>
train()
File "train_cars.py", line 184, in train
loss_l, loss_c = criterion(out, targets)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/mshah/code/ssd.pytorch/layers/modules/multibox_loss.py", line 70, in forward
match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)
File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 107, in match
loc = encode(matches, priors, variances)
File "/home/mshah/code/ssd.pytorch/layers/box_utils.py", line 133, in encode
return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4]
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:226
from ssd.pytorch.
great suggestion @superhans . adding from __future__ import division
to most of the files, gets rid of any nan
, inf
in the loss for python 2.7
from ssd.pytorch.
Related Issues (20)
- nvidia.line( NameError: name 'nvidia' is not defined
- what does target actually look like in criterion(out, target)? HOT 3
- Result produced too many boxes HOT 2
- About training only VOC2007 HOT 1
- eval my voc dataset ' __init__() missing 5 required positional arguments' HOT 4
- RuntimeError: Error(s) in loading state_dict for SSD:
- How can I get precision and recall in eval.py
- setting an array element with a sequence. HOT 3
- np.random.choice doesn't support tuple (or something like that) anymore
- ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. HOT 4
- Do you calculate mAP in the right way ?
- ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part. HOT 5
- sh code error report
- Follow this to get this ancient code works HOT 1
- ValueError: operands could not be broadcast together with shapes HOT 1
- What environment is the best to run test.py?
- Packages versions
- RuntimeError: index 1 is out of bounds for dimension 1 with size 1
- TypeError: forward() missing 1 required positional argument: 'x' HOT 5
- Extract loss for each image in the batch
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd.pytorch.