skhadem / 3d-boundingbox Goto Github PK

View Code? Open in Web Editor NEW

432.0 432.0 96.0 16.36 MB

PyTorch implementation for 3D Bounding Box Estimation Using Deep Learning and Geometry

License: MIT License

Python 44.60% CMake 0.12% C++ 36.49% MATLAB 17.68% Shell 1.12%

3d computer-vision localization pytorch

3d-boundingbox's People

Contributors

Stargazers

Watchers

Forkers

jiajialin sduzx buptmyc dingmyu minhyung-kang staceycy samghk qizhangncs jarygrace collector-m dbeker nmboyd misslibra xuehaouwa manjotms10 vellano zoom1539 boycehbz sujalbhavsar16 raghuslash 2019-paper-fun t-devh tan-wongsathon fsxy1063200037 life-xp monocular3d-alternatives xiaowuge1201 maximumprogrammer dhruvmsheth joywalker stiiceva dangpnh2 dskov githubfragments mbaranpeker wangdali-jpg wan1995 s-sliwinski marinalpo gunhoro kunikavalecha biancaalexandru royzon gojila1029 marouene-oueslati neelpawarcmu shuiniu86 jie311 wangyitong9 cuulee jmj8038 usamakh20 cflin-cjcu amalbinessa wf-hahaha treeberrytomato satyambansal117 cosmoshua madilyuno sol0invictus liu4lin kk2491 tengfusion yuhz288 alfinnurhalim meistersinergi ali-kazzazi enginbozkurt ocissor duferen songsanling bennyustc kitenite chenyanzhan lewisk1899 alaeddint khalida1wwin bartoszsambor xiaohulugo steve5692 alanmengg rysiere partheee thoroai tmnaeem huleyun op1009 arkirito sachitanmuo anikethh upper127 dtungbmw alenaliu 2271259387 damavand1

3d-boundingbox's Issues

if we have ground truth 2d box

I want to know if we have a ground truth 2d bounding box annotation, Can the performance of this model be improved to what level ,can someone give me a idea

The slack link for discussion is not working.

Question about Derivation

Hello!
I have a question about the derivation:
In the given material (http://ywpkwon.github.io/pdf/bbox3d-study.pdf), what does K mean?
I think it is intrinsic matrix, but why its shape is 3 * 4?

Thank you very much!

how to train on custom datasets

I have my own datasets whose format are like pascal voc with labeled images only, i do not have the calibration file, how can I train on my own datasts?

pre trained weights link broken

Is there another link we can use to access the pretrained weights? Thanks!

Error model = Model(features=my_vgg.features, bins=2).cuda()

When I run this code:
my_vgg = vgg.vgg19_bn(pretrained=True)
model = Model(features=my_vgg.features, bins=2).cuda()
I get notice below:
model = Model(features=my_vgg.features, bins=2).cuda()
TypeError: 'module' object is not callable

How to generate Calibration file for my own dataset?

i want to know the KITTI method that how did they generated calib file because when i generate it and it is totally different in syntax as compared to to that and no fruitful results. I made it by opencv chess method. kindly tell me how i can get my calib file in syntax like of the KITTIs. Thanks

Whether it could be implemented using images from surveillance camera?

Does anybody know if this method could be used in vehicle 3D detection in residential scenes?
In that case, The images are from surveillance cameras. And those cameras are set at about 3-4 meters high from the ground.
Thus, the Roll angle of cameras are definitely not 0.
If I directly use the the pretrained model and original camera calibration files, the predictions are reaaaaally bad in my scenes.
So how could I change the camera calibration files to make it adaptable for image datasets by surveillance cameras?
Or it's not capable to do that?

Camera Calibration Parameters

Hi ... I am currently working on my data generated using Unity. I am not able to get proper 3D bounding box. My 2D bounding box output is proper but 3D box is going for a toss. I am working on data's generated using Unity 3D. My Image size is 1024 x 1024. What are the changes i need to do in order to map the 3D box on my data.

Please could you add a licence file?

Hello,

Thanks for this great repo! Could you please add a licence file, to clarify how it can be used?

Thanks

3D Bbox plotting and original image crop issue

Once a 3D Boundingbox is plotted, it is on the original image (names 'img') in your code. However, next cropped image would be cropped from the previous 'img' with last one or several 3D boundingbox. (i.e. Cropped images with partial 3D boundingbox plots are fed into model for training or testing. So I change tht code by plot_img = np.copy(truth_img) and then plot 3D boundingbox on plot_img rather than 'img'

An Explanation of angles used in paper and used in code.

Hi,

I am attempting to use this method to train on my own dataset which I have generated in Unity using the Unity Perception Package, therefore this requires quite a few modifications of the Dataset class. Unity will generate the ground truth and provide me with the following:

X,Y,Z position of the 3D bounding box center wrt. the camera
Object dimensions
Object rotation wrt. global coordinate frame
2D bounding box coordinates within the image
Camera intrinsic matrix

In the corresponding paper, the three angles of interest are Theta Ray, Theta L, and Theta. I believe understand what these are and the correspondance between them:

Theta ray is the ray angle of the object center (calculated as the angle between the camera principal point and 3D bounding box center).
Theta L is the local orientation i.e. orientation of object wrt. to the camera.
Theta is the global orientation of the object.
Theta = Theta Ray + Theta L

However, looking in the Dataset class, there are references to three different angles: Alpha, Ry and theta_ray. As far as I understand it, Alpha is equivalent to Theta L (as this it what you are regressing), Ry is equivalent to Theta (global orientation), and theta_ray is self-explanitory.

As far as I am aware, theta_ray is calculated using the position of the 2D bounding box within the image, and the model is predicting Alpha, and using the correspondance between these we can find the global orientation of the object.

I would just like to confirm that all this is correct, as I have been having a hard time understanding this.
Your feedback is greatly appreciated :)

Some type of Pose visualization (ROS?)

hello, does this work OK now?

build config file for pytorch training model

Hi there,
I would like to tansfer the pytorch model to caffemodel myself.
The cfg file, which saves the neural structure is necessary to be built.
However the model we get after training is devided into 3 parts, demension, oriented, and confidence.
so I am a bit confused how to write the cfg file on my own.
Will be so appreciate if you can provide a solution or suggestion for me.
Best regards.

Constraints in Math.py

I was unable to understand the formation of constraints in the calculation of the translation vector in Math.py file. Could you please hint me a bit towards it?

How to train the model with nuScenes dataset?

How to handle truncation?

The predictions are way out from the gound truth when the object is truncated i.e. only part of the object is within the image boundary. Estimating position uses the four bounding box corners but when there is truncation the bounding box does not cover all of the object, only the part that is within the image.

Is there a way to overcome this problem? Or is this method simply just not suitable for cases where truncation occurs?

assumption in the paper

Hi
I have a question. In the paper 3D Bounding Box Estimation Using Deep Learning and Geometry, there is a assumption that 3D bounding box fits tightly into 2D detection window requires that each side of the 2D bounding box to be touched by the projection of at least one of the 3D box corners. I have tested your codes, it seems that you have not considered that. Could you please have a discussion?

Convert to KITTI Format for Evaluation

Hello @skhadem, thank you so much for this implementation. I would like to ask how to convert the result back into KITTI format? I have a plan to reproduce the paper result. Could you give me a hint which values should be put for KIITI format?
Also I have problem to understand the label. as we can see from the development kit as follows.

#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.

However when I see the sample of label, let say 000000.txt, the content is follows.
Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01
As we can see in here we only have 15 values instead of 16 values as in description. For evaluation is that necessary to provide the score in the last?

Thank you so much

hello, could you send the model weights to online drive? thanks

hello，I want to test on my own dataset，what parameter should i change，thankyou very much.

error: IndexError: invalid index to scalar variable.

$ python Run.py

output:

Traceback (most recent call last):
  File "Run.py", line 201, in <module>
    main()
  File "Run.py", line 137, in main
    detections = yolo.detect(yolo_img)
  File "code/3D-BoundingBox-master/yolo/yolo.py", line 34, in detect
    ln = [ln[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]
  File "code/3D-BoundingBox-master/yolo/yolo.py", line 34, in <listcomp>
    ln = [ln[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]
IndexError: invalid index to scalar variable.

Issue with downloading of "get_video.sh"

@skhadem @fuenwang Hello.

I received an error while was trying to download video dataset with "get_video.sh". Please find below the screenshot of the error.
Do you know what is the issue and how to fix it? Thanks a lot in advance!

replace vgg with resnet

Hi
Thanks for your great work, I am now trying to replace the backbone of second stage using Resnet since Resnet usually has a better performance when doing a same work. However, after I replace the vgg by resnet, the result is terrible. I wonder if you have do the same thing and cloud you please tell me whether this idea is useful. Thank you!

Best regards

YOLO does not use GPU

I think the Yolo implementation here is CPU-based while PyTorch is cuda-based. How can I use Cuda even for Yolo detections?

The dataset used in the file run_no_yolo.py must use images and tags right?

If I want to do a 3d inspection on an image (without a label file) what should I do?

some code I don't understand

Hi
Thanks for your great work, It helped me a lot, however, there are some codes that I don't understand, for example: dim += averages.get_item(label['Class']) , cloud you please explain it ? Thank you very much!

How to get the score (confidence) of a 3D Bounding Box?

Hi folks,

I have observed this part of the source code:

"""
det.type in self.classes and det.score > self.score_thres):

            intrinsics = ros_intrinsics(self.camera_info.P)
            input_tensor,theta_ray = preprocessing(image,det,intrinsics)
            [orient, conf, dim] = self.model(input_tensor) #Apply the model to get the estimation
            orient = orient.cpu().data.numpy()[0, :, :]
            conf = conf.cpu().data.numpy()[0, :]
            dim = dim.cpu().data.numpy()[0, :]
            # print("Conf:{}".format(conf))
            dim += self.averages.get_item(det.type)

            argmax = np.argmax(conf)
            orient = orient[argmax, :]
            cos = orient[0]
            sin = orient[1]
            alpha = np.arctan2(sin, cos)
            alpha += self.angle_bins[argmax]
            alpha -= np.pi

"""

But that conf is a tuple of two numbers, which is used to determine the best orientation, like this:

"""
Conf:[ 6.3896847 -6.5501723]
Conf:[ 6.496025 -6.7066655]
Conf:[ 5.410366 -5.5474744]
Conf:[ 7.092432 -7.3124714]
Conf:[ 9.061753 -9.251386]
Conf:[ 7.587371 -7.831802]
Conf:[ 2.149212 -2.1235662]
Conf:[-0.84504336 0.89392436]
Conf:[ 4.436549 -4.5268965]
Conf:[ 1.2938225 -1.4327605]
"""

How can I get the score of the final 3D Bounding Box? (0 to 1 value, like in every 2D or 3D object detector)

Thanks in advance.

Different outputs for same input image

Feeding in the same image multiple times produces different orientation estimations each time. Does anyone know why this would be the case?

Dataset label "Location" property

Hi,

I'm wondering how do you construct your dataset from Kitti, especially the "Location" keyword in the label. I don't quite understand how the following lines of code works from torch_lib/Dataset.py

    Location = [line[11], line[12], line[13]] # x, y, z
    Location[1] -= Dimension[0] / 2 # bring the KITTI center up to the middle of the object

Why does the y component of "Location" corresponds to the x component of "Dimension"?
This also appears in library/Math.py in the calc_location() function:

# using a different coord system
dx = dimension[2] / 2
dy = dimension[0] / 2
dz = dimension[1] / 2

Why you are switching the coordinate system and can you please tell me how do you parse the raw data regarding location information from Kitti dataset?

p.s. I noticed this because I try to read ground truth label directly from the generated dataset and plot it using your plot_3d_box(img, cam_to_img, orient, dimensions, location) function. However, there exists serious offset in location, especially in y coordinates. Can you please tell me how to read ground truth location information from the generated dataset and plot it correctly?

Thank you so much~

error

yolo
Using previous model epoch_90.pkl
/home/zjut/anaconda3/envs/pytorch_GPU/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/home/zjut/anaconda3/envs/pytorch_GPU/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_BN_Weights.IMAGENET1K_V1. You can also use weights=VGG19_BN_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "Run.py", line 203, in
main()
File "Run.py", line 137, in main
detections = yolo.detect(yolo_img)
File "/home/zjut/code/3D-BoundingBox/yolo/yolo.py", line 31, in detect
(H,W) = image.shape[:2]
ValueError: not enough values to unpack (expected 2, got 0)

regular image and camera

Hi
I just wondering if the training process running on normal images such as we take from regular camera instead of velo or stereo camera?