microsoft / singleshotpose Goto Github PK

This research project implements a real-time object detection and pose estimation method as described in the paper, Tekin et al. "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018. (https://arxiv.org/abs/1711.08848).

License: MIT License

Python 84.31% Jupyter Notebook 15.69%

singleshotpose's Introduction

SINGLESHOTPOSE

This is the development version of the code for the following paper:

Bugra Tekin, Sudipta N. Sinha and Pascal Fua, "Real-Time Seamless Single Shot 6D Object Pose Prediction", CVPR 2018.

The original repository for the codebase for the above paper can be found in the following link.

Introduction

We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. The key component of our method is a new CNN architecture inspired by the YOLO network design that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm. Paper, arXiv

Citation

If you use this code, please cite the following

@inproceedings{tekin18,
      TITLE = {{Real-Time Seamless Single Shot 6D Object Pose Prediction}},       AUTHOR = {Tekin, Bugra and Sinha, Sudipta N. and Fua, Pascal},
      BOOKTITLE = {CVPR},
      YEAR = {2018}
}

License

SingleShotPose is released under the MIT License (refer to the LICENSE file for details).

Environment and dependencies

The code is tested on Windows with CUDA v8 and cudNN v5.1. The implementation is based on PyTorch 0.4.1 and tested on Python3.6. The code requires the following dependencies that could be installed with conda or pip: numpy, scipy, PIL, opencv-python. For an earlier version that is compatible with PyTorch 0.3.1 and tested on Python2.7, please see py2 folder.

Downloading and preparing the data

Inside the main code directory, run the following to download and extract (1) the preprocessed LINEMOD dataset, (2) trained models for the LINEMOD dataset, (3) the trained model for the OCCLUSION dataset, (4) background images from the VOC2012 dataset respectively.

wget -O LINEMOD.tar --no-check-certificate "https://onedrive.live.com/download?cid=05750EBEE1537631&resid=5750EBEE1537631%21135&authkey=AJRHFmZbcjXxTmI"
wget -O backup.tar --no-check-certificate "https://onedrive.live.com/download?cid=0C78B7DE6C569D7B&resid=C78B7DE6C569D7B%21191&authkey=AP183o4PlczZR78"
wget -O multi_obj_pose_estimation/backup_multi.tar --no-check-certificate  "https://onedrive.live.com/download?cid=05750EBEE1537631&resid=5750EBEE1537631%21136&authkey=AFQv01OSbvhGnoM"
wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
wget https://pjreddie.com/media/files/darknet19_448.conv.23 -P cfg/
tar xf LINEMOD.tar
tar xf backup.tar
tar xf multi_obj_pose_estimation/backup_multi.tar -C multi_obj_pose_estimation/
tar xf VOCtrainval_11-May-2012.tar

Alternatively, you can directly go to the links above and manually download and extract the files at the corresponding directories. The whole download process might take a long while (~60 minutes). Please also be aware that access to OneDrive in some countries might be limited.

Training the model

To train the model run,

python train.py --datacfg [path_to_data_config_file] --modelcfg [path_to_model_config_file] --initweightfile [path_to_initialization_weights] --pretrain_num_epochs [number_of_epochs to pretrain]

e.g.

python train.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --initweightfile cfg/darknet19_448.conv.23 --pretrain_num_epochs 15

if you would like to start from ImageNet initialized weights, or

python train.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --initweightfile backup/duck/init.weights

if you would like to start with an already pretrained model on LINEMOD, for faster convergence.

[datacfg] contains information about the training/test splits, 3D object models and camera parameters

[modelcfg] contains information about the network structure

[initweightfile] contains initialization weights. <<darknet19_448.conv.23>> contains the network weights pretrained on ImageNet. The weights "backup/[OBJECT_NAME]/init.weights" are pretrained on LINEMOD for faster convergence. We found it effective to pretrain the model without confidence estimation first and fine-tune the network later on with confidence estimation as well. "init.weights" contain the weights of these pretrained networks. However, you can also still train the network from a more crude initialization (with weights trained on ImageNet). This usually results in a slower and sometimes slightly worse convergence. You can find in cfg/ folder the file <<darknet19_448.conv.23>> that includes the network weights pretrained on ImageNet.

At the start of the training you will see an output like this:

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32
    1 max          2 x 2 / 2   416 x 416 x  32   ->   208 x 208 x  32
    2 conv     64  3 x 3 / 1   208 x 208 x  32   ->   208 x 208 x  64
    3 max          2 x 2 / 2   208 x 208 x  64   ->   104 x 104 x  64
    ...
   30 conv     20  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x  20
   31 detection

This defines the network structure. During training, the best network model is saved into the "model.weights" file. To train networks for other objects, just change the object name while calling the train function, e.g., "python train.py --datacfg cfg/duck.data --modelcfg cfg/yolo-pose.cfg --initweightfile backup/duck/init.weights". If you come across GPU memory errors while training, you could try lowering the batch size, to for example 16 or 8, to fit into the memory. The open source version of the code has undergone strong refactoring and furthermore some models had to be retrained. The retrained models that we provide do not change much from the initial results that we provide (sometimes slight worse and sometimes slightly better).

Testing the model

To test the model run

python valid.py --datacfg [path_to_data_config_file] --modelcfg [path_to_model_config_file] --weightfile [path_to_trained_model_weights]

e.g.

python valid.py --datacfg cfg/ape.data --modelcfg cfg/yolo-pose.cfg --weightfile backup/ape/model_backup.weights

You could also use valid.ipynb to test the model and visualize the results.

Multi-object pose estimation on the OCCLUSION dataset

Inside multi_obj_pose_estimation/ folder

Testing:

python valid_multi.py cfgfile weightfile

e.g.

python valid_multi.py cfg/yolo-pose-multi.cfg backup_multi/model_backup.weights

Training:

python train_multi.py datafile cfgfile weightfile

e.g.,

python train_multi.py cfg/occlusion.data cfg/yolo-pose-multi.cfg backup_multi/init.weights

Label files

Our label files consist of 21 ground-truth values. We predict 9 points corresponding to the centroid and corners of the 3D object model. Additionally we predict the class in each cell. That makes 9x2+1 = 19 points. In multi-object training, during training, we assign whichever anchor box has the most similar size to the current object as the responsible one to predict the 2D coordinates for that object. To encode the size of the objects, we have additional 2 numbers for the range in x dimension and y dimension. Therefore, we have 9x2+1+2 = 21 numbers.

Respectively, 21 numbers correspond to the following: 1st number: class label, 2nd number: x0 (x-coordinate of the centroid), 3rd number: y0 (y-coordinate of the centroid), 4th number: x1 (x-coordinate of the first corner), 5th number: y1 (y-coordinate of the first corner), ..., 18th number: x8 (x-coordinate of the eighth corner), 19th number: y8 (y-coordinate of the eighth corner), 20th number: x range, 21st number: y range.

The coordinates are normalized by the image width and height: x / image_width and y / image_height. This is useful to have similar output ranges for the coordinate regression and object classification tasks.

Tips for training on your own dataset

We train and test our models on the LINEMOD dataset using the same train/test splits with the BB8 method to validate our approach. If you would like to train a model on your own dataset, you could create the same folder structure with the provided LINEMOD dataset and adjust the paths in cfg/[OBJECT].data, [DATASET]/[OBJECT]/train.txt and [DATASET]/[OBJECT]/test.txt files. The folder for each object should contain the following:

(1) a folder containing image files,
(2) a folder containing label files (Please refer to this link for a detailed explanation on how to create labels. You could also find third-party ObjectDatasetTools toolbox useful to create ground-truth labels for 6D object pose estimation),
(3) a text file containing the filenames for training images (train.txt),
(4) a text file containing the filenames for test images (test.txt),
(5) a .ply file containing the 3D object model (The unit of the object model is given in meters),
(6) optionally, a folder containing segmentation masks (If you want to change the background of your training images to be more robust to diverse backgrounds, this would be essential for a better generalization ability),

Please also make sure to adjust the following values in the data and model configuration files according to your needs:

You should change the "diam" value in the data configuration file with the diameter of the object model at hand.
Depending on the size and variability of your training data, the learning rate schedule (steps, scales, max_epochs parameters in the yolo-pose.cfg file) and some data augmentation parameters (jitter, hue, saturation, exposure parameters in dataset.py) might also need to be adjusted for a better convergence on your dataset.
For multiple object pose estimation, you should also pre-compute anchor values using the procedure described in Section 3.2 of the paper and specify it in the model configuration file (yolo-pose-multi.cfg). Please also make sure to use correct number of classes and specify it in yolo-pose-multi.cfg.
You should further change the image size and camera parameters (fx, fy, u0, v0, width, height) in the data configuration files with the ones specific to your dataset.

While creating a training dataset, sampling a large number of viewpoints/distances and modeling a large variability of illumination/occlusion/background settings would be important in increasing the generalization ability of the approach on your dataset. If you would like to adjust some model & loss parameters (e.g. weighthing factor for different loss terms) for your own purposes, you could do so in the model configuration file (yolo-pose.cfg).

Acknowledgments

The code is written by Bugra Tekin and is built on the YOLOv2 implementation of the github user @marvis

Contact

For any questions or bug reports, please contact Bugra Tekin

singleshotpose's People

Contributors

Stargazers

Watchers

Forkers

cvxzhou wavelet303 pchan-pipeline jkznst chanpatrick sherwoac hzhang57 btekin mingliangfu hitgmq sergioragostinho alakia csq20081052 greenteahua zmdsjtu zengqiang12 chanchanhan zhanghanduo optimus1072 superz678 lksnmn luciusluo ideaplexus jzheng84 m4rmphong dltensor jufengwu demingwang ewenwan kevin0525 jerrybonjour iamstg csldali borongyuan zju-plp smallflyfly sbpl-cruz klqulei doctorwk007 rzr00 ktlichkid lungtakumi xkevincode zwzzsyxqq opentld robotwithcv y-c lxb1989 george0049 dahburj futurev cuijianzhu benjamesbabala jackros1022 sunbirddy ybyangjing hsiaotsan xuhuaze707313 joeytang3377 chunyu-lin-bjtu michaelodum yellowfighter liuwenhaha yaohanghan f2wang jgcbrouns cezny mohamadjaber1 superjiale shehancaldera simon0323 manujnaman chuong gerdhel jarygrace lysummer123 dongyanchaotj jiatianwu scp10086 pengxbin lilixj yunxijun dicksonyuan rmizero doughtmw xiaojinu jlqzzz hussainflr roboticsbrian peterzhousz gracejary wiibrew anshkumar abhimanyu-jain pamzerbhu nhonth qingchenkanlu simonepelosi lidehuihxjz antonlinderer

singleshotpose's Issues

About Label Files

#37
@btekin , @eyildiz-ugoe . Hi, you said we can project the 3D keypoints to 2D by using the compute_projection function you provided. However, do I have to change proj_2d_gt = compute_projection(corners3D, Rt_gt, internal_calibration) instead of proj_2d_gt = compute_projection(Vertices, Rt_gt, internal_calibration)? Because the label is 2D projection of the 3D
eight corners. By the way, how to get the 2D projection of the object's centroid (0,0,0) ?

Thank you

some questions about the paper

Hi, Mr. Tekin

I am reading your paper(Real-time Seamless Single Shot 6D Object Prediction) and I have some questions. This is my first time dealing with object pose estimation problem. In your paper, you mentioned 9 control points, but YOLO can only regress 2D bounding box of the objects and its class. Previously, I was working on SSD, I am not very familiar with YOLO, but I think YOLO and SSD are similar. Here are the questions:

How to predict the projections of the 3D bounding box corners in the image by using YOLO? Since YOLO can only regress 2D bounding box(object's location in the image) of the objects and its class.
You selected 9 control points(8 points are the corners, 1 point is the center). How do you get these 9 points? and the 3D ground truth points?
What data is known? For example, the dimension of objects? the depth? the camera intrinsic? the point cloud data?

IOError: [Errno 2] No such file or directory: 'LINEMOD/ape/train.txt'

I have downloaded all the materials required. When I try to train the model ( python train.py cfg/ape.data cfg/yolo-pose.cfg /data1/edison/realtime_seamless_6D_data_and_model/dataset/backup/ape/init.weights
)
It has error:

File "train.py", line 289, in
nsamples = file_lines(trainlist)
File "/home/edison/realtime_seamless_yolo_6D/singleshotpose/utils.py", line 1006, in file_lines
thefile = open(thefilepath, 'rb')
IOError: [Errno 2] No such file or directory: 'LINEMOD/ape/train.txt'

could you please tell me how to solve this?? Thank you

Preprocessing LINEMOD Dataset Query

Hello Authors,

Thank you so much for making the code available in public domain. It is very well written and intuitive.

I am sure you might have got the data from http://far.in.tum.de/Main/StefanHinterstoisser
Now, how did you:

Create the "masks" of the original data.
Come up with labels(8 coordinates and centroid) of object in the images since they are not always at the center of the image. This could have been fairly easy if you could have used blender on mesh file to create synthetic data but how did you achieve the same thing on original images?

Please do help understand these concepts.

Thanks.

Visualiziation

Thank you for publishing this awesome work, I was wondering if there is any "easy" way to show the results, like having boxes around the tested images as done in the paper?

I could not find it and it was not mentioned in the readme, maybe this is a feature request if it is not availabel atm.

Issue 24 Reopened: Preprocessing LINEMOD Dataset Query

Hey @btekin ,

Apologies for missing out on your quick response on the Issue, I had some queries on the same but the issue was closed by you so I took the liberty of opening a new issue with the same name, hope you don't mind.

Just for confirmation, the fwd projection in part 2 of my question is clear but the position of those fwd projected points in 2D images is after we have the mask, right? Since, the exact coordinates of the center object in the dataset are not given and the images are not centered on the object.

In short, in 2. The labels are forward projection of 3D bounding box but the way we obtain their local coordinates on 2D image(not w.r.t to each other) since the images are not object centered is still confusing.

Say this is my mesh :

This is my fwd projection of 3D coordinates into two 2D(I have the 8 2D coordinates too):

But how does this help on my training image label(8 coordinates in this following image) which is this, where I am not aware which part of the image is the ape object actually in

Exception NameError: "global name 'FileNotFoundError' is not defined"

Hi, running the following command on Ubuntu 16.04:

python train.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/init.weights

and getting:

/home/user/.local/lib/python2.7/site-packages/torch/cuda/__init__.py:114: UserWarning: 
    Found GPU0 TITAN V which requires CUDA_VERSION >= 9000 for
     optimal performance and fast startup time, but your PyTorch was compiled
     with CUDA_VERSION 8000. Please install the correct PyTorch binary
     using instructions from http://pytorch.org
    
  warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
2018-07-11 13:05:37 epoch 0, processed 0 samples, lr 0.000100
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f5b7db5b8d0>> ignored
Traceback (most recent call last):
  File "train.py", line 399, in <module>
    niter = train(epoch)
  File "train.py", line 89, in train
    output = model(data)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/workspace/singleshotpose/darknet.py", line 91, in forward
    x = self.models[ind](x)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Any thoughts?

AttributeError: 'str' object has no attribute 'read'

edison@amax2:~/data1_edison/singleshotpose$ python train.py cfg/wangwang.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23

2018-10-08 14:44:10 epoch 0, processed 0 samples, lr 0.000100
Traceback (most recent call last):
File "train.py", line 411, in
niter = train(epoch)
File "train.py", line 70, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data1/edison/singleshotpose/dataset.py", line 75, in getitem
img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
File "/data1/edison/singleshotpose/image.py", line 178, in load_data_detection
img = Image.open(imgpath).convert('RGB')
File "/home/edison/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2549, in open
fp = io.BytesIO(fp.read())
AttributeError: 'str' object has no attribute 'read'

Can you please help me? Thank you

Kind of datasets the approach would work

I was wondering about the application domain and dataset type of the approach. Suppose that our scenes are heavily containing metal objects (3d models and 2d images are known to us, of course) which may be a bit shiny or similar to each other (after all, all would have metalic texture).

In such cases, would the approach actually perform within the acceptable limits?

Running out of memory during training

First off, thanks for making your code available.

When running the training example I run into the following issue:

...
12808: nGT 32, recall 32, proposals 76, loss: x 0.398651, y 0.639343, conf 0.127928, total 1.165922
12834: nGT 26, recall 26, proposals 53, loss: x 0.760645, y 1.274382, conf 0.700860, total 2.735886
2018-09-12 17:44:04 epoch 69, processed 12834 samples, lr 0.001000
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525796793591/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 396, in <module>
    niter = train(epoch)
  File "train.py", line 86, in train
    output = model(data)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/foo/projects/pose-detection/detectors/singleshotpose/darknet.py", line 91, in forward
    x = self.models[ind](x)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/modules/container.py", line 67, in forward
    input = module(input)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/modules/batchnorm.py", line 37, in forward
    self.training, self.momentum, self.eps)
  File "/home/foo/miniconda2/envs/deeplearning/lib/python2.7/site-packages/torch/nn/functional.py", line 1013, in batch_norm
    return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525796793591/work/torch/lib/THC/generic/THCStorage.cu:58

How much memory is required for that particular example? I am using a GTX 1070 with 8Gb Memory with CUDA 8.0 and CUDNN 7.1.3, 8Gb RAM and PyTorch 0.3.1. Thanks.

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3

I just did a git pull after a while and now the code does not work. I am not sure what happened in between but I get the following error:

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3

Any idea?

Preparing label file for our own training data

I am going to try out some images that contain a holepuncher which is identical to the one in LINEMOD. However, for this to work, I need to have its label file as the requirements state:

(2) a folder containing label files (labels should be created using the same output representation explained above),

I am bit confused here. How and why to have its label file in the first place? I am only trying to test the detection here. I mean, if I need to create a label file to detect an object and if I need to enter 21 values in that file, pretty much I do the all the work anyway, what is there for the code to do. Moreover, the output representation talks about "prediction of values" so that also kind of confuses me. What is there for me to enter if the system "predicts"?

I guess I am a bit lost here. Could someone enlighten me a bit?

Training on your own dataset

One other impressive thing which could really help people reproduce the results is to enable a way that can guide them to train the network from the scratch using their own dataset. Then my question would be the following:

Would this mean the users who want to give it a try have to have:

The JPEG training images (rgb) and the ground truth (binary) masks of the objects, organized as neatly as the objects now in the dataset such as ape, benchvise, cat, etc. (similar folder structure obviously)
The necessary lists for training and testing (train.txt, test,txt)
The 3D model of the object (.ply)

Would they need something else to get the system running on their own training dataset? I see something called "training range" which I am not sure what for, but I think you can answer the question better than I speculate, hence the reason why I opened this issue.

Could you perhaps write a small guide for us so that we could utilize the work on our own dataset? To see how it would perform in other users' own domains is something one has to consider, naturally. This would also make the paper cited by way more people if they could just try and see how it performs on their own dataset.

IndexError: list index out of range

edison@amax2:~/data1_edison/singleshotpose$ python train.py
Traceback (most recent call last):
File "train.py", line 280, in
datacfg = sys.argv[1]
IndexError: list index out of range

can you please help me?
thank you

wrong reshape

https://github.com/Microsoft/singleshotpose/blob/1e6bfae97ee8e307abd3d06bf062b33f5d713af4/multi_obj_pose_estimation/valid_multi.py#L106
this seems to be wrong reshaping. Your read_truths_args in utils.py returns shape (-1, 19), which you flatten to len 1050, and then reshape back with above line. In case when you have more targets this will not work properly.

question about dataset

what does these values mean? For example,

0 0.413748 0.375966 0.450491 0.435113 0.451697 0.337961 0.381431 0.435781 0.378852 0.337811 0.447729 0.390781 0.448722 0.297217 0.382641 0.391062 0.380281 0.296740 0.072845 0.139041

(in /singleshotpose/LINEMOD/ape/labels/)

can you please help me? thank you

AssertionError

Hi, @btekin I try to train and test my own dataset. I only train and test one object, toy-bear. However, After training, when I want to valid my object. I get this problem. Could you please help me? Thank you.

john@laptop:~/john/singleshotpose$ python valid.py cfg/bear_toy.data cfg/yolo-pose-pre.cfg backup/bear_toy/model_backup.weights

Traceback (most recent call last):
File "valid.py", line 275, in
valid(datacfg, cfgfile, weightfile, outfile)
File "valid.py", line 139, in valid
all_boxes = get_region_boxes(output, conf_thresh, num_classes)
File "/john/singleshotpose/utils.py", line 338, in get_region_boxes
assert(output.size(1) == (19+num_classes)*anchor_dim)
AssertionError

I have print the value of (output, conf_thresh, num_classes)

output:
(0 ,0 ,.,.) =
-2.6282e-02 -1.1638e-01 3.1015e-02 ... 1.1922e-01 1.1074e-01 5.2097e-03
-3.1200e-02 -1.2207e-01 6.0752e-02 ... 4.6943e-02 3.4072e-02 8.3957e-04
1.1783e-01 -1.5114e-02 2.2526e-02 ... -4.3052e-02 -1.4134e-02 8.0588e-03
... ⋱ ...
-3.6536e-02 -9.4302e-02 -1.2103e-01 ... 6.7788e-03 -2.7891e-02 -2.3125e-02
5.8908e-02 -2.4154e-02 1.8790e-02 ... -3.0124e-02 -3.6338e-02 -9.2290e-02
4.6499e-03 3.7421e-02 6.2782e-02 ... -6.0601e-02 -8.3703e-02 -4.8863e-02

(0 ,1 ,.,.) =
3.7348e-02 8.0601e-02 -3.8682e-02 ... -7.8329e-02 -1.2706e-01 -6.6510e-02
-1.0373e-01 6.7127e-02 6.3087e-02 ... 4.4907e-02 -2.6325e-02 -5.6617e-02
-1.2971e-01 -1.4979e-01 -5.8169e-02 ... -4.4630e-02 -1.5308e-01 -5.1643e-02
... ⋱ ...
-1.4815e-01 -1.3714e-01 -1.5018e-01 ... -1.1391e-01 -2.0022e-01 -4.1368e-02
-1.6764e-01 -1.2512e-01 -1.2874e-01 ... -1.3810e-01 -1.0333e-01 -1.0093e-01
-1.5435e-01 -1.9643e-01 -1.6086e-01 ... -1.5192e-01 -1.5799e-01 -1.1227e-01

(0 ,2 ,.,.) =
5.4995e-02 5.9248e-03 -2.5956e-02 ... 1.0725e-02 1.7036e-02 -5.8850e-02
-7.3472e-03 5.0274e-02 -6.4523e-02 ... -2.2691e-02 6.2444e-03 -3.6810e-02
1.9026e-01 5.2537e-02 -1.3431e-01 ... -1.1706e-01 -6.8829e-02 -7.9105e-02
... ⋱ ...
2.5623e-02 2.2308e-02 -2.5726e-02 ... 6.3350e-02 5.5525e-02 2.5616e-02
9.1978e-02 -3.2332e-02 -3.5101e-02 ... 1.7200e-02 -1.2062e-02 4.8496e-03
-3.6505e-02 2.4037e-03 -6.2010e-02 ... -1.2425e-02 -2.2164e-02 -4.8250e-02
...

(0 ,29,.,.) =
-2.7069e-02 -1.6055e-02 1.1301e-01 ... -1.2177e-03 -1.2537e-01 -6.5511e-02
1.3091e-01 -1.8412e-02 -1.1215e-01 ... 6.4691e-03 -5.1697e-02 -7.6036e-02
7.9545e-02 -4.6112e-02 -1.7834e-01 ... -6.0202e-02 -7.8994e-02 -1.2372e-01
... ⋱ ...
-6.9567e-02 -7.1485e-02 -1.7435e-02 ... -7.7636e-02 -4.5202e-02 -7.5089e-02
-4.6567e-02 -6.2728e-02 -1.2260e-01 ... -4.8473e-02 -9.4372e-02 -7.8866e-02
-4.5883e-02 -9.8076e-02 -7.5239e-02 ... -1.1135e-01 -5.4546e-02 -9.3099e-02

(0 ,30,.,.) =
-9.6636e-02 -1.0497e-01 -7.5623e-02 ... -2.4792e-02 -8.5384e-02 -1.0472e-01
-1.0639e-01 -5.6438e-02 2.2714e-03 ... 1.6224e-02 9.1031e-03 -8.7191e-02
-2.1391e-01 2.7976e-02 -1.3651e-02 ... 3.0248e-02 -4.5505e-02 -1.2442e-01
... ⋱ ...
-1.6899e-01 -7.7362e-03 -3.5663e-02 ... -1.1511e-01 -1.3709e-01 -1.4391e-01
-1.0154e-01 -5.0749e-02 -2.1263e-02 ... -1.2721e-01 -9.8940e-02 -1.3418e-01
-1.5037e-01 -6.7391e-02 -5.3581e-02 ... -1.5120e-01 -1.7679e-01 -1.8124e-01

(0 ,31,.,.) =
-1.5346e-01 -1.3244e-01 -2.4529e-01 ... -1.0987e-01 -5.2348e-02 -2.3235e-02
-2.1193e-01 -3.1680e-01 -3.1731e-01 ... -2.5150e-03 -1.0657e-01 -7.3463e-02
-2.1910e-01 -4.3107e-01 -4.4805e-01 ... -1.1789e-02 2.2698e-02 -1.2666e-01
... ⋱ ...
-8.0657e-03 -4.5893e-02 -4.8971e-02 ... -1.9136e-04 -3.9249e-02 -1.1822e-01
-3.3907e-03 -3.4883e-02 -4.4246e-02 ... 4.3262e-02 3.0650e-02 -8.7762e-02
-5.0029e-02 -8.6026e-02 -6.3097e-02 ... -2.8686e-02 -1.4475e-02 -9.8525e-02
[torch.cuda.FloatTensor of size 1x32x17x17 (GPU 0)]

conf_thresh:
0.1

num_classes:
1

IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/000493.jpg'

Hi, when I try to train the multi_obj model (python train_multi.py cfg/occlusion.data cfg/yolo-pose-multi.cfg backup_multi/init.weights) i get the following error. could you please help me? Thanks

2018-07-16 15:10:06 epoch 0, processed 0 samples, lr 0.000100
Traceback (most recent call last):
File "train_multi.py", line 408, in
niter = train(epoch)
File "train_multi.py", line 72, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/dataset_multi.py", line 68, in getitem
img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 446, in load_data_detection
total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 376, in augment_objects
img = Image.open(imgpath).convert('RGB')
File "/home/edison/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2543, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/000493.jpg'

RuntimeError: The size of tensor a (13) must match the size of tensor b (5408) at non-singleton dimension 3

I got this problem, when I want to train the model (python train.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/init.weights), my environment: python 2.7.14 , cuda8.0, cudnn 5.1, pytorch 0.4.0. Could you please give me a help??? Thank you very much.

Traceback (most recent call last):
File "train.py", line 397, in
niter = train(epoch)
File "train.py", line 91, in train
loss = region_loss(output, target)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/data1/edison/singleshotpose/region_loss.py", line 205, in forward
pred_corners[0] = (x0.data + grid_x) / nW
RuntimeError: The size of tensor a (13) must match the size of tensor b (5408) at non-singleton dimension 3

models released with code

are the models released in backup folder not the same as the ones used for the CVPR'18 paper because I am getting different results?

LINEMOD dataset cannot be downloaded

LINEMOD dataset cannot be downloaded and the link cannot be opened. Could you offer another link?

Experimenting with a real scene

I would like to try a real scene in which I can detect and fit my holepuncher (which is exactly the same holepuncher in LINEMOD). So, since the network is already trained to work with LINEMOD, this should work out of the box.

Do I need to create the very same experimental environment, though? Like aruco markers, similar objects such as ape, cat, benchvise and others? Or can I just take a picture of the same holepuncher with my camera (calibrated) and test the image?

what is diam in cfg file?

train = LINEMOD/ape/train.txt
valid = LINEMOD/ape/test.txt
backup = backup/ape
mesh = LINEMOD/ape/ape.ply
tr_range = LINEMOD/ape/training_range.txt
name = ape
diam = 0.103

for example, this diam = 0.103, that is the diam?

Multi-Obj Visualization

Running multi-valid.py with visualization taken from the notebook side results with images that are somehow weird, as seen below:

The code snippets I used are taken from the notebook file (I just save them instead of displaying them):

      # Images
        img = data[0, :, :, :]
        img = img.numpy().squeeze()
        img = np.transpose(img, (1, 2, 0))

# Visualize
                fig = plt.figure()
                plt.xlim((0, 640))
                plt.ylim((0, 480))
                plt.imshow(scipy.misc.imresize(img, (480, 640)))
                # Projections
                for edge in edges_corners:
                    plt.plot(proj_corners_gt[edge, 0], proj_corners_gt[edge, 1], color='g', linewidth=3.0)
                    plt.plot(proj_corners_pr[edge, 0], proj_corners_pr[edge, 1], color='b', linewidth=3.0)
                plt.gca().invert_yaxis()
                #plt.show() 

                plt.savefig(outfile + '/output_' + str(count) + '_.png', bbox_inches='tight')
                fig.canvas.draw()
                count = count + 1

Is there a way to visualize the multi-object setup as it is seen in the paper supplement?

IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/000627.jpg'

when I try to do Multi-object pose estimation on the OCCLUSION dataset, I train the model (python train_multi.py cfg/occlusion.data cfg/yolo-pose-multi.cfg backup_multi/init.weights)

i get this error, could you please help me? thank you. I have put dataset into code file.

Traceback (most recent call last):
File "train_multi.py", line 408, in
niter = train(epoch)
File "train_multi.py", line 72, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/dataset_multi.py", line 68, in getitem
img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 446, in load_data_detection
total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
File "/data1/edison/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 376, in augment_objects
img = Image.open(imgpath).convert('RGB')
File "/home/edison/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2543, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/000627.jpg'

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3 and IOError: [Errno 2] No such file or directory

Thank you very much for your previous help, I have change all the directories for objects in 'cfg' file. by the way , i did not find "phone" file in LINEMOD dataset file you provided. now i try to train the network again ( python train.py cfg/ape.data cfg/yolo-pose.cfg /data1/edison/realtime_seamless_6D_data_and_model/dataset/backup/ape/init.weights) it shows the error below, could you please give me a help?

Traceback (most recent call last):
File "train.py", line 336, in
model.load_weights_until_last(weightfile)
File "/home/edison/realtime_seamless_yolo_6D/singleshotpose/darknet.py", line 315, in load_weights_until_last
start = load_conv_bn(buf, start, model[0], model[1])
File "/home/edison/realtime_seamless_yolo_6D/singleshotpose/cfg.py", line 175, in load_conv_bn
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3

IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/001129.jpg'

I am trying to run the multi-object pose estimation script. I noticed the error I got in multiple solved issues. The solution proposed was to download the latest version of the scripts. However, that did not help my case.

Traceback (most recent call last):
  File "train_multi.py", line 408, in <module>
    niter = train(epoch)
  File "train_multi.py", line 72, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 281, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/sspe/singleshotpose/multi_obj_pose_estimation/dataset_multi.py", line 68, in __getitem__
    img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
  File "/sspe/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 446, in load_data_detection
    total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
  File "/sspe/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 376, in augment_objects
    img = Image.open(imgpath).convert('RGB')
  File "/usr/local/lib/python2.7/dist-packages/PIL/Image.py", line 2580, in open
    fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/001129.jpg'

I have placed the files in the recommended method. It looks like this:

I have no name!@db80008d79f2:/sspe/singleshotpose$ ls
LICENSE      MeshPly.py   VOCtrainval_11-May-2012.tar  cfg.py	   gdown.pl		      train.py	   valid.py
LICENSE.txt  MeshPly.pyc  backup		       cfg.pyc	   image.py		      utils.py
LINEMOD      README.md	  backup.tar		       darknet.py  multi_obj_pose_estimation  utils.pyc
LINEMOD.tar  VOCdevkit	  cfg			       dataset.py  region_loss.py	      valid.ipynb

Any idea what else could be causing this issue? Thanks in advance.

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'

Hi,

I am trying to run the multi-obj-pose-estimation, and I do have the latest code, yet I face the following:

user@user:~/workspace/singleshotpose/multi_obj_pose_estimation$ python valid_multi.py cfg/yolo-pose-multi.cfg backup_multi/model_backup.weights
2018-07-12 15:13:26 Testing ape...
Traceback (most recent call last):
  File "valid_multi.py", line 170, in <module>
    valid(datacfg, cfgfile, weightfile, conf_th)
  File "valid_multi.py", line 126, in valid
    iou           = bbox_iou(bb2d_gt, bb2d_pr)
  File "/home/user/workspace/singleshotpose/utils.py", line 159, in bbox_iou
    mx = min(box1[0]-box1[2]/2.0, box2[0]-box2[2]/2.0)
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'

In case if the any of the below parameters matter, here are some:

OS: Ubuntu 16.04
GPU: NVidia GTX 1060ti 6GB
Batch Size: 8

something for multi-gpu training

hi, it's a amazing work . But I got some trouble when I use a big batch size for multi-gpu training. I changed the gpus = '0,1,2,3' , but the memory can only used in the first gpu. So i got the OUT OF MEMORY ERROR. can anyone give me some advise for multi gpu training? thx

Why the occupied GPU memory keep increasing and finally get GPU memory error?

Question: Training performance over multiple training sessions.

Hi. I'm just looking for some feedback on your experience with variability in trained model quality between training sessions.

I'm progressively trying to retrace some important steps:

validating the paper results with the trained models you provided
training the models with the initial weights you provided and trying to achieve the same performance metrics as in the paper.
training the initial weights, then training new models with the new initial weights, and finally verifying the paper results.

An important disclaimer is that I had to apply the modifications in #30 in order to run things.

Step 1 seems to be working just fine. The difference between the results and the paper are within 5% give or take.

For Step 2, things are looking a little bit worse. My experiment was done training the ape object, which has a low ADD score compared to the other objects. With your models I manage to achieve an ADD score of roughly 28%, with my trained model it dropped significantly to 21%. I'm training the model once more to get a feeling for the performance variability between training sessions but decided to ask if this is normal and expected.

Thanks

How to label my own dataset

Thanks for providing this amazing work!

I am trying to make a dataset which follows the type as you introduced in README, however, I am wondering whether exist any tool can help me with making .ply files and label files.

Thank you again anyway.

IOError: [Errno 2] No such file or directory:

I just wanted to let you know that running the following

python valid_multi.py cfg/yolo-pose-multi.cfg backup_multi/model_backup.weights

results with

Traceback (most recent call last):
  File "train_multi.py", line 408, in <module>
    niter = train(epoch)
  File "train_multi.py", line 72, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/home/user/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 188, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/user/workspace/singleshotpose/multi_obj_pose_estimation/dataset_multi.py", line 68, in __getitem__
    img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
  File "/home/user/workspace/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 446, in load_data_detection
    total_masked_img, label, total_mask = augment_objects(imgpath, objname, add_objs, shape, jitter, hue, saturation, exposure)
  File "/home/user/workspace/singleshotpose/multi_obj_pose_estimation/image_multi.py", line 376, in augment_objects
    img = Image.open(imgpath).convert('RGB')
  File "/home/user/.local/lib/python2.7/site-packages/PIL/Image.py", line 2580, in open
    fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/cvlabdata1/home/btekin/ope/yolo6D/LINEMOD/benchvise/JPEGImages/000078.jpg'
user@frankfurt:~/workspace/singleshotpose/multi_obj_pose_estimation$ gedit train_multi.py

since the path is not relatively set. One may want to change this in the next release. Perhaps a setup script which sets the dataset paths would be nice.

Replacing the Calibration Data

As I am trying to reproduce the results with the same holepuncher (of LINEMOD) but with a different camera, I need to change the calibration matrix in the code, which is in the file named utils.py:

def get_camera_intrinsic():
    K = np.zeros((3, 3), dtype='float64')
    K[0, 0], K[0, 2] = 572.4114, 325.2611
    K[1, 1], K[1, 2] = 573.5704, 242.0489
    K[2, 2] = 1.
    return K

Normally, matrix K (intrinsic) should be composed of the following:

[fx, 0, 0]
[s, fy, 0]
[cx, cy, 1]

where:

[cx, cy] — Optical center (the principal point), in pixels.
(fx,fy) — Focal length in pixels. 
s — Skew coefficient, which is non-zero if the image axes are not perpendicular.

Given that, what is the structure of the K matrix in the code? Since it has slightly different construction as such:

>>> K
array([[572.4114,   0.    , 325.2611],
       [  0.    , 573.5704, 242.0489],
       [  0.    ,   0.    ,   1.    ]])

I need to know which element is what since I will replace the values with my own intrinsic values.

RuntimeError: cuda runtime error (2) : out of memory

So finally I was able to get rid of all pytorch version related problems, and at least I can see the network structure when I execute now. However, for some strange reason I always end up with the specified error.:

THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 381, in <module>
    model = model.cuda() # model = torch.nn.DataParallel(model, device_ids=[0]).cuda() # Multiple GPU parallelism
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 146, in _apply
    module._apply(fn)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 152, in _apply
    param.data = fn(param.data)
  File "/home/user/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 216, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/user/.local/lib/python2.7/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I've tried various GPUs such as Titan V, 1060ti, 1080ti, no luck. It can't be the case that I need more than 11 GB memory to run the training session, right? Validation is OK, it runs. Training however, no matter which card I tried, does not run. One should perhaps decrease the batch size? Which config file should one edit then? yolo-pose-pre.cfg or yolo-pose.cfg? Or both? Before I change any of these I wanted to ask if that's actually the reason.

why set the code: loss_cls = 0

Hello, why is the class loss function set to 0? but the cross entropy loss function is annotated

#####loss_cls = self.class_scale * nn.CrossEntropyLoss(size_average=False)(cls, tcls)
loss_cls = 0

What's the difference between LINEMOD and OCCLUSION dataset?

The paper states that the architecture was tested on the LINEMOD and the OCCLUSION dataset. I was able to download the LINEMOD dataset, however I was unable to find OCCLUSION to see what the differences are.

It looks like they are basically the same with different annotations ? Can somebody please give me some hints for that?

Thanks in adavance

multi scale training

currently the train code is only working with the size 416 X 416 and it never changes the size (maybe this is the reason i am not able to reproduce your results by training it again).
what i see from the code is that it should change the scales based on the number of seen example in listDataset but that never happens. did i miss something ?

RuntimeError: The size of tensor a (13) must match the size of tensor b (5070) at non-singleton dimension 3

I want to train the data, but when I run python train.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/init.weights, it get wrong as follows:
Dose anyone got the same errors?

layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 20 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 20
31 detection
2018-09-11 13:56:55 epoch 0, processed 0 samples, lr 0.000100
/home/yong/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py:1006: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
File "train.py", line 401, in
niter = train(epoch)
File "train.py", line 96, in train
loss = region_loss(output, target)
File "/home/yong/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/yong/paperWorkspace/01.singleshotpose/singleshotpose-master/region_loss.py", line 205, in forward
pred_corners[0] = (x0.data + grid_x) / nW
RuntimeError: The size of tensor a (13) must match the size of tensor b (5070) at non-singleton dimension 3

0% Accuracy for 0.02 vx and 5 cm 5 degree metrics

I have created my own training dataset and set it up in the exact same way as the included LINEMOD objects. I am able to successfully train to a reasonably high degree of accuracy (82%) but only for the 5 pixel 2D projection metric. The other metrics are always 0%

2018-10-10 16:55:40    Mean corner error is 5.215204
2018-10-10 16:55:40    Acc using 5 px 2D Projection = 81.28%
2018-10-10 16:55:40    Acc using 0.02 vx 3D Transformation = 0.00%
2018-10-10 16:55:40    Acc using 5 cm 5 degree metric = 0.00%
2018-10-10 16:55:40    Translation error: 0.813218, angle error: 2.663330

However, if I change my cfg file to point to any one of the LINEMOD object's .ply instead of the .ply for the actual object I'm training with, all 3 metrics shoot up to the 95% range. How is this possible? Is there a problem with my .ply file? When I run the validation script, the bounding box predictions for my object appear to be extremely accurate so I don't understand how the accuracy is so low for the 3D metrics

training problem

edison@amax2:~/data1_edison/singleshotpose$ python train.py cfg/wangwang.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23 cp backup/wangwang/model.weights backup/wangwang/init.weights

/data1/edison/singleshotpose/cfg.py:175: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w])); start = start + num_w
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 32 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 32
31 detection
2018-10-06 01:11:23 epoch 0, processed 0 samples, lr 0.000100
Traceback (most recent call last):
File "train.py", line 396, in
niter = train(epoch)
File "train.py", line 70, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data1/edison/singleshotpose/dataset.py", line 75, in getitem
img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
File "/data1/edison/singleshotpose/image.py", line 175, in load_data_detection
img = Image.open(imgpath).convert('RGB')
File "/home/edison/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2543, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/data1/edison/singleshotpose/chen_lab_data/wangwang/JPEGImages/000026.jpg'

but I checked the file, i have the images. please help me, thank you

some questions about paper

Hello, sir, I am sorry to bother you again. But I have a few questions about paper.

Accurately speaking, the whole process did two things, object detection and object 6D pose estimation. As for the object detection part, I don't quite understand how it is done. What method is used to detect the object in the paper? Is it based on the confidence value? How do the four detection points representing objects in Figure 1(c) be obtained?
Regarding the output part of the network, the input image is divided into S*S mesh after the convolution network, and finally the target value bit D-dimensional vector is output. Does each cell correspond to to a D dimensional vector? Or just the cell containing point information corresponds to a D dimensional vector?
Regarding the confidence function c(X), DT(x) represents the distance between the projection point and the ground truth point. How do you get it? By depth information?
4.Regarding the class probability, there is not much explanation in the paper. Could you please explain it briefly?

What is the "diam" parameter in the datafiles?

I understand this is probably the model diameter for purposes of 6D pose estimation metric, but what are the units? Is it scaled somehow? Is it the diameter along a particular axis?

NameError: global name 'accious' is not defined

After i trained the model i get the following result and the error, can you please help me? Thanks

2018-07-09 17:19:38 Mean corner error is 7.615080
2018-07-09 17:19:38 Acc using 5 px 2D Projection = 68.57%
2018-07-09 17:19:38 Acc using 0.0103 vx 3D Transformation = 6.48%
2018-07-09 17:19:38 Acc using 5 cm 5 degree metric = 9.62%

Traceback (most recent call last):
File "train.py", line 400, in
test(epoch, niter)
File "train.py", line 268, in test
logging(' Acc using iou metric = {:.2f}%'.format(accious))
NameError: global name 'accious' is not defined

Input and output

Hi, sir, I am sorry to bother you, but I have a few simple questions to ask you. From the paper, I learned that the input is a picture or a set of RGB images, and the output is 6D pose (this article uses 21 numbers). I want to ask what should i do to process an RGB image as input, using the network(you have trained model) to get the output(6Dpose of the object)?
Looking forward to your reply!

how to get the labels for own datasets (21 numbers)

@PeterZheFu @btekin
I am still confused with this part, for the label file in LINEMOD, how to get the 18 coordinates?

Currently, I got the Camera intrinsic, mesh models and the dimension of the my object (Length, Width, Height), I intuitively set the object centroid to be (0,0,0), so I can also get the coordinates of the 8 corners. (For example, Coke Can: L= 6.2cm, W= 6.2cm, H= 10.2cm . If the centriod is (0,0,0), the corners should be P1=(3.1,3.1,5.1), P2=(3.1,3.1,-5.1)... so on ) Is this intuitive method correct?

If it is correct, How do I project these 8 points on the 2D image? By the way, How to set the x, y range? please give me a help, thank you very much.

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3

Hi,

Running the following on Ubuntu 16.04.:

python train.py cfg/ape.data cfg/yolo-pose.cfg backup/ape/init.weights

results with:

RuntimeError: The expanded size of the tensor (3) must match the existing size (864) at non-singleton dimension 3

For which the solution is editing the line 175 in cfg.py to:

conv_model.weight.data.copy_(torch.from_numpy(buf[start:start+num_w]).view(conv_model.weight.data.shape)); start = start + num_w

Other dataset

The dataset you provided has mesh ply file containing vertex information. If I want to extract 3D proposals from KITTI or cityscape dataset, can I use the code?

IOError: [Errno 2] No such file or directory:

Hi, Mr. Tekin @btekin
Thanks to your previous help, I have prepared all the data for my own object's training (jpeg images, labels, train.txt, test.txt, mesh .ply) However, I got this error that says I do not have mask for my object and I did not even create a mask file. Your explanation said mask is optional. So, How do I resolve this error?
please help me, thank you

Best wishes

edison@amax2:/data1/edison/singleshotpose$ python train.py cfg/wangwang.data cfg/yolo-pose-pre.cfg cfg/darknet19_448.conv.23

2018-10-08 03:39:03 epoch 0, processed 0 samples, lr 0.000100
Traceback (most recent call last):
File "train.py", line 411, in
niter = train(epoch)
File "train.py", line 70, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/home/edison/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data1/edison/singleshotpose/dataset.py", line 75, in getitem
img, label = load_data_detection(imgpath, self.shape, jitter, hue, saturation, exposure, bgpath)
File "/data1/edison/singleshotpose/image.py", line 176, in load_data_detection
mask = Image.open(maskpath).convert('RGB')
File "/home/edison/anaconda2/lib/python2.7/site-packages/PIL/Image.py", line 2543, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/data1/edison/singleshotpose/chen_lab_data/wangwang/mask/0008.png'

Enabling Visual Output

Is there a way to produce at least some visual output just like the Figure 3 in the paper:

Right now the output is only textual, which is OK but since the paper tackles essentially a "vision" problem, it would be nice to have visual output next to the textual one.

Or perhaps you could tell a quick hack for us which can enable some visualization? I mean, since you captured the images in the paper, you already have the module somehow in the code, it is just not used I guess.

P.S: The work looks great, and we would like to actually really invest our time in utilizing it. If you can help people in this regard, I believe you can get a lot of citations in the close future :)

microsoft / singleshotpose Goto Github PK

singleshotpose's Introduction

SINGLESHOTPOSE

Introduction

Citation

License

Environment and dependencies

Downloading and preparing the data

Training the model

Testing the model

Multi-object pose estimation on the OCCLUSION dataset

Label files

Tips for training on your own dataset

Acknowledgments

Contact

singleshotpose's People

Contributors

Stargazers

Watchers

Forkers

singleshotpose's Issues

Apologies for missing out on your quick response on the Issue, I had some queries on the same but the issue was closed by you so I took the liberty of opening a new issue with the same name, hope you don't mind.

Recommend Projects

Recommend Topics

Recommend Org