abeardear / pytorch-yolo-v1 Goto Github PK

View Code? Open in Web Editor NEW

592.0 6.0 244.0 70.2 MB

an experiment for yolo-v1, including training and testing.

License: MIT License

Python 100.00%

yolov1 pytorch object-detection experiment

pytorch-yolo-v1's Introduction

pytorch YOLO-v1

中文博客

This is a experimental repository, which are not exactly the same as the original paper, our performance on voc07test is 0.665 map, 57fps@1080ti

I write this code for the purpose of learning. In yoloLoss.py, i write forward only, with autograd mechanism, backward will be done automatically.

For the convenience of using pytorch pretrained model, our backbone network is resnet50, add an extra block to increase the receptive field, in addition, we drop Fully connected layer.

Effciency has not been optimized. It may be faster... I don't know

Train on voc2012+2007

model	backbone	map@voc2007test	FPS
our ResNet_YOLO	ResNet50	66.5%	57
YOLO	darknet19?	63.4%	45
YOLO VGG-16	VGG-16	66.4%	21

1. Dependency

pytorch 0.2.0_2
opencv
visdom
tqdm

2. Prepare

Download voc2012train dataset
Download voc2007test dataset
put all images in one folder, i have provide txt annotation file ~~3. Convert xml annotations to txt file, for the purpose of using dataset.py, you should put the xml_2_txt.py in the same folder of voc dataset, or change Annotations path in xml_2_txt.py~~

3. Train

Run python train.py

Be careful: 1. change the image file path 2. I recommend you install visdom and run it

4. Evaluation

Run python eval_voc.py

be careful 1. change the image file path

5. result

Our map in voc2007 test set is 0.665~ some result are below, you can see more in testimg folder.

pytorch-yolo-v1's People

Contributors

Stargazers

Watchers

Forkers

uptodiff insigh licf23 kay1794 pengfeidip lihengtianxia cguisu eversee22 0xproflupin shifangyan shape-kim motokimura sunlinhui uecym snsardiwal yu-ta-chen zhewei-forks edwardzcl smallsmallqiu stefancho daywater rayteen lkk12014402 sky730367 eleven11wang sophia303v wangzz313 linwei-chen guhay lavendelion cyberstone02 mingcent stefanwangsjtu note-liu caijiahao test-error jzxiong wuxiangchao 2226171237 czifan tkone2018 chenhang98 sunyaj hbwxcw json870422471 fortitude94deng beebrain leokale ryan-phy felixhuangx feizy rodrigogantier shadowwalker00 buaaplayer 525747310 inghyun hxz2015 cnuxdh heypaprika liaorongfan yokings piaofu110 lacorse husterrc qinjt zhihaolzh xiaohuihuichao guolihe xubaozhao oreo-lp lidading leiqing110 lucheng2 snakeonex hedali-88 eekarot zhangxingchao x-sapphire tangohu17 aaron2bin zhangmeijuan666 ridang trings mahmoud-saadel-din nathannlp liaoxianfu silenceawp billysx hyunsongkwon tjtanaa hqw15 ycl1035621458 ryw90 1996hxr agito555 cyy0523xc hongliw ahwxz123 xcd-suzhanzhishen k0nen

pytorch-yolo-v1's Issues

prediction problem

when i tried to predict this error occurred

load model...
predicting...
predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "predict.py", line 174, in
result = predict_gpu(model,image_name)
File "predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "predict.py", line 89, in decoder
cls_indexs = torch.cat(cls_indexs,0) #(n,)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

please guide
thank you

疑问：关于target的encoder部分

似乎是把box的坐标encode成相对与box中心所在网格的相对位置

但是计算IOU时怎么可以直接把这个target放进去？

在yoloLoss.py找到一处bug。

大佬好，我在yoloLoss.py中找到一处bug，已经提交了Pull Requests，希望大佬能审查下。

some pictures in annotation txt（voc2007.txt+ voc2012.txt） are not in the image folder（2007trainval + 2012trainval）

eg：009963.jpg is in the voc2007.txt as a train image's name but it is in the voc2007test folder as a test image actually.so the dataloder（getitem）can not load the picture,and will cause error --'NoneType' object has no attribute 'shape' during trainning the net.

train datasets not exactly same to the origin YOLO v1

hi, I found that your train dataset have voc2007train/val + 2012train/val, and that usually called VOC+ , but in VOC+, 2012train/val data len is 11k, but yours have 17k, your total train datasets lenth is 22k(2007 5K + 2012 17K), but origin YOLO v1 use the VOC+ total datasets len is 17k (2007 5.xK + 2012 11.xK)?

i dont know what is the problem

thank you for your good code
i faced this problem

File "train.py", line 123, in
for i,(images,target) in enumerate(train_loader):
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\miniconda\lib\site-packages\torch\utils\data\dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 70, in getitem
img, boxes = self.random_flip(img, boxes)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\dataset.py", line 257, in random_flip
im_lr = np.fliplr(im).copy()
File "C:\Users\vcvis\AppData\Roaming\Python\Python36\site-packages\numpy\lib\twodim_base.py", line 95, in fliplr
raise ValueError("Input must be >= 2-d.")
ValueError: Input must be >= 2-d.

what is should do?

please guide me
thank you

key errors: unexpected keys in state_dict

https://github.com/xiongzihua/pytorch-YOLO-v1/blob/0e5776a15e63f6d811c61a1b08f382bc41cff8c0/predict.py#L146

Hi，zihua:

after training , I run predict.py, then I get an error shows that keys are mismatch. Could you help me solve this bug? Thanks ~

The errror information:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "features.0.weight", "features.0.bias"...
Unexpected key(s) in state_dict: "module.features.0.weight", "module.features.0.bias"...

请问为什么要代码中加入了sigmoid？

在net.py和resnet.py中，请问forward函数的最后为什么要加入x = torch.sigmoid(x) ？，这里之后return x应该和标签值求损失函数然后反向传播，sigmoid之后相当于归一化，明显与标签值不匹配吧

predict.py中的nms是对所有类别一起做nms吗？

看代码里面是所有类别的框一起做nms，为什么不每个类单独做

IndexError: invalid index of a 0-dim tensor

Met this question when using predict.py. Error code location is located at

        i = order[0]
        keep.append(i)

after check the code, I found the key is to update squeeze() use

        # before
        ids = (ovr<=threshold).nonzero().squeeze()
        # after
        ids = (ovr<=threshold).nonzero().squeeze(-1)

this change can solve the problem, when ids shape == (1,1), then squeeze(-1) return (1)

ImportError: cannot import name 'queue' from 'torch._six' (/home/liqi/.local/lib/python3.8/site-packages/torch/_six.py)

yoloLoss的组成部分中contain_loss的可疑之处

contain_loss = F.mse_loss(box_pred_response[:,4],box_target_response_iou[:,4],size_average=False)
box_pred_response[:,4]代表的是iou值较大的预测得分，
box_target_response_iou[:,4]代表iou的值，
利用这两个信息求loss是什么意思勒，希望能得到作者的解惑。
我认为这行语句应该改为以下形式更为妥当：
contain_loss =
F.mse_loss(box_pred_response[:,4],box_target_response[:,4],size_average=False)
这只是我个人看法，还是希望能得到作者和广大码农的帮助

loss变化图片

能把loss数据的变化给一下吗，我想对比一下你的数据。我用resnet微调，loss为什么变化那么小哪。

关于dataset模块的encode方法，可能的bug

大佬你好：
关于你的encode代码，我有一个疑问：

        for i in range(cxcy.size()[0]):
            cxcy_sample = cxcy[i]
            ij = (cxcy_sample/cell_size).ceil()-1 #
            target[int(ij[1]),int(ij[0]),4] = 1
            target[int(ij[1]),int(ij[0]),9] = 1
            target[int(ij[1]),int(ij[0]),int(labels[i])+9] = 1
            xy = ij*cell_size #匹配到的网格的左上角相对坐标
            delta_xy = (cxcy_sample -xy)/cell_size
            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

关于类的概率部分赋值倒没什么疑问，但是bbox的赋值我有一些疑问，希望大佬能够解答。

考虑如下的bbox的label:
也是一个7 * 7 * 30的target

x1, y1 , w1 , h1 , c1, x2, y2, w2, h2, c2 是target[ row, col, :10]的值，
target[row, col, 10:] 是class 概率，

x1, y1代表bbox中心点坐标，w1,h1代表bbox的宽和长，c1是论文中的confidence score，x2等就是第二个bbox的label。以此类推。

我看到大佬把第一个bbox的label和第二个bbox的label全都赋值为一样的了，两个confident score也一样了

            target[int(ij[1]),int(ij[0]),2:4] = wh[i]
            target[int(ij[1]),int(ij[0]),:2] = delta_xy
            target[int(ij[1]),int(ij[0]),7:9] = wh[i]
            target[int(ij[1]),int(ij[0]),5:7] = delta_xy

我的问题是：

假如有两个物体，他们两个物体的bbox的中心都落到同一个cell里，这个时候为啥还要把两个bbox的label和confident score赋值为一样的呢？？？不应该是一个bbox的label对应x1,y1,w1,h1,另外一个物体的bbox赋值到x2,y2,w2,h2吗？
另外为什么一个cell里只有一个bbox的中心的时候，要c1和c2都是1，x1=x2, y1=y2, w1=w2, h1=h2呢？

十分感谢！

target grid num is 14

I found that target grid num is 14， and error while training. So I change it to 7.
what's more, the loss never decrease when it reach 4.xx
I use it on people detect, mAP is only 0.08

数据集的目录形式是什么样的？

我想请问下数据集的目录形式，和这段代码是什么意思？
if isinstance(list_file, list):
# Cat multiple list files together.
# This is especially useful for voc07/voc12 combination.
tmp_file = '/tmp/listfile.txt'
os.system('cat %s > %s' % (' '.join(list_file), tmp_file))
list_file = tmp_file

About BatchNormalization

Hi, Thank you for your reproducible code about Yolov1.

I was wondering about the structure of your resnet_yolo.py

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    x = self.layer5(x)
    # x = self.avgpool(x)
    # x = x.view(x.size(0), -1)
    # x = self.fc(x)
    x = self.conv_end(x)
    x = self.bn_end(x)
    x = F.sigmoid(x) #归一化到0-1
    # x = x.view(-1,7,7,30)
    x = x.permute(0,2,3,1) #(-1,7,7,30)

Why there is a 'self.bn_end(x)' at the last of the Network?
Is it for faster convergency and critical for the performance?

训练了 5 个epoch ， pred bbox 的x2 竟然小于 x1

ValueError: Input must be >= 2-d.

I tried to run train.py without cuda, but get this error

pre-train model

Do you have a pre-trained model(by yourself)?

执行eval_voc,Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead

Expected 4-dimensional input for 4-dimensional weight 64 3 7 7, but got 3-dimensional input of size [3, 448, 448] instead
增加了下面一行代码，出现了更多的问题。感谢感谢!!!
def predict_gpu(model,image_name,root_path=''):

result = []
image = cv2.imread(root_path+image_name)
h,w,_ = image.shape
img = cv2.resize(image,(448,448))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
mean = (123,117,104)#RGB
img = img - np.array(mean,dtype=np.float32)

transform = transforms.Compose([transforms.ToTensor(),])
img = transform(img)
img = img.cuda()

**img = torch.unsqueeze(img, dim=0)**


pred = model(img) #1x7x7x30
pred = pred.cpu()
boxes,cls_indexs,probs =  decoder(pred)

for i,box in enumerate(boxes):
    x1 = int(box[0]*w)
    x2 = int(box[2]*w)
    y1 = int(box[1]*h)
    y2 = int(box[3]*h)
    cls_index = cls_indexs[i]
    cls_index = int(cls_index) # convert LongTensor to int
    prob = probs[i]
    prob = float(prob)
    result.append([(x1,y1),(x2,y2),VOC_CLASSES[cls_index],image_name,prob])
return result

损失函数的参数好像有点问题

(self.l_coord * loc_loss + 2 * contain_loss + not_contain_loss + self.l_noobj * nooobj_loss + class_loss) / N

contain_loss前系数为何是2？
not_contain_loss前应该有self.l_noobj作为系数

eval error

C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master>python eval_voc.py
---prepare target---
---start test---
0%| | 0/4951 [00:00<?, ?it/s]C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py:143: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
img = Variable(img[None,:,:,:],volatile=True)
D:\miniconda\lib\site-packages\torch\nn\functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
0%| | 4/4951 [00:01<55:21, 1.49it/s]
Traceback (most recent call last):
File "eval_voc.py", line 186, in
result = predict_gpu(model,image_path,root_path='./VOCdevkit/VOC2012/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 148, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 90, in decoder
keep = nms(boxes,probs)
File "C:\Users\vcvis\Desktop\pytorch-YOLO-v1-master\predict.py", line 107, in nms
i = order[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

please help

这里操作是不是有问题？

box1_xyxy[:, :2] = box1[:, :2] / 14. - 0.5 * box1[:, 2:4]
box1_xyxy[:, 2:4] = box1[:, :2] / 14. + 0.5 * box1[:, 2:4]
box2 = box_target[i].view(-1, 5)
box2_xyxy = Variable(torch.FloatTensor(box2.size()))
box2_xyxy[:, :2] = box2[:, :2] / 14. - 0.5 * box2[:, 2:4]
box2_xyxy[:, 2:4] = box2[:, :2] / 14. + 0.5 * box2[:, 2:4]

这里预测出来的xywh应该都是[0-1]，这里除以14没有意义吧

question : Yolo v1 confidence and probability class map

Hello,

I am trying to understand in details Yolo V1 but I have some questions about it regarding the confidence and class probability. In fact, the confidence is equal to : ground truth label * IOU(pred, truth). To find the ground truth label, we just need to get an array of size 7x7 and put the cell to 1 if the center of an object is inside the cell in the dataset. But, to compute the IOU I have some doubt. Do you compute the IOU only when the center of the prediction box and the center of the object (truth) is INSIDE the same cell ?
I have also a question regarding P class|object, if there is no object in a cell ou multiple object which label do you return during the training step ?

I thank you for the help !

IndexError: list index out of range

    num_faces = int(splited[1])
IndexError: list index out of range

I changed file_root and test_root in train.py
then run train.py, the error occurs.

and I want to ask you what is the role of dataset.py.
thanks for your nice work!

为什么结果的置信度这么低呢？ 0.2，0.3这样子？

choose the best iou box

box1_xyxy[:,:2] = box1[:,:2]/14. -0.5*box1[:,2:4]

What's that code mean? Why divide 14 and multiply 0.5？
Thank you .

predict.py error

error when i run eval_voc.py

Traceback (most recent call last):
File "eval_voc.py", line 164, in
result = predict_gpu(model,image_path,root_path='E:/yolov1/pytorch_yolov1/data/VOCdevkit/VOC2007/JPEGImages/') #result[[left_up,right_bottom,class_name,image_path],]
File "E:\yolov1\yolores\predict.py", line 126, in predict_gpu
boxes,cls_indexs,probs = decoder(pred)
File "E:\yolov1\yolores\predict.py", line 46, in decoder
if mask[i,j,b] == 1:
IndexError: index 7 is out of bounds for dimension 1 with size 7

当我运行eval_voc.py之后报错上面的问题，请问该如何解决呢？

about performance

我用原始代码训练, vgg16作为backbone, 50个epoch(原始参数)后 mAP达到44.6%, 训练70个epoch后达到 49.8% ... 从数据上看, 没啥问题, 但是不知道为什么距离你的readme里面还有很大的距离.

（50 epoches, left: training loss right: val loss）

(70 epoches, left: training loss right: val loss)

I trained the network with the original code, vgg16 as the backbone,. after 50 epoches (original parameters) mAP is 44.6%. After 70 epoches , mAP is 49.8% ... Why does this not achieve the performance inside your readme.

about grid_num=14

I don't understand why grid_num is 14 in code , and why is it divided by 14 in yoloLoss.py line 88?

Can't find the listfile.txt

excuse me, In the yolodataset have a string variable tmp_file is '/tmp/listfile.txt'.
could you teach me how to use it? thanks.

训练的时候损失值降到1.6，predict的时候啥也检测不出来是怎么回事？

其实输出是14×１４大小的，但是注释给的都是7×７的。
eval_voc的时候更加离奇，
---start evaluate---
---class aeroplane ap 0.0---
---class bicycle ap -1---
---map -0.5---

Why you used sigmoid activation in laste layer instead of linear activation?

Line no 61 in net.py

about dataset.py line 94

你好,
以下为源码, 我添加了注释, 问题在注释里

boxes /= torch.Tensor([w,h,w,h]).expand_as(boxes)
img = self.BGR2RGB(img)
img = self.subMean(img,self.mean) 

# 因为网络输入要求, 所以缩放到固定尺寸, 但是在这之后
#  难道不应该再调整一下 bboxes的值吗.. 
#  此时的img和bboxes已经不匹配了(我特意看了self.encoder, 也没有类似的操作)
img = cv2.resize(img,(self.image_size,self.image_size))
target = self.encoder(boxes,labels)

The network predicts absolute value of xy instead of offset to the grid cell as specified in paper. Why is it so?

你的resnet50作为backbone时，输入(3,448,448)的图片，输出维度不是(7,7,30)！

请问关于vgg16_bn版本的performance

背景
预训练的vgg16_bn

我在110个epoch后mAP只有52%，而且很难再提升；但是通过预训练的resnet50,50个epoch就有67%，不知道问题出在哪。。

help

Can you give me the train file: best.pth ?

您能提供一下预训练好的模型文件吗，谢谢！

您能提供一下预训练好的模型文件吗，谢谢！
邮箱：[email protected]

I don't konw where is wrong

i use pytorch1.0
I encountered some warnings and errors.
I don't know if they are important .Maybe when i tried to correct them the logic is wrong
here are the warnings and errors

1 UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
I replaced F.sigmoid() with torch,sigmoid in resnet_yolo.py and net.py
2 UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
I replaced size_average=False with reduction='sum' in yoloLoss.py
3 IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I replaced loss.data[0] with loss.item() in train.py
4 UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. images = Variable(images,volatile=True)
i just change it to images = images.detach()
i don't konw if it's right

My result is bad .
so anyone can tell me why
thanks

不能得到任何的bbox

在我训练完模型之后，尝试预测bbox，可是在预测bbox时候，也就是运行predict.py文件加载训练好的模型best.pth，不能得到任何的bbox，在查看代码后发现mask1 = contain > 0.1 # 大于阈值这一行代码的mask1[:,:,0]为0，如下图所示：

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=torch.uint8)

请问我的问题大概出在哪里呢？

best.pth

如果可以提供下 best.pth 就更好了！