Coder Social home page Coder Social logo

skhadem / 3d-boundingbox Goto Github PK

View Code? Open in Web Editor NEW
432.0 432.0 96.0 16.36 MB

PyTorch implementation for 3D Bounding Box Estimation Using Deep Learning and Geometry

License: MIT License

Python 44.60% CMake 0.12% C++ 36.49% MATLAB 17.68% Shell 1.12%
3d computer-vision localization pytorch

3d-boundingbox's People

Contributors

dangpnh2 avatar fuenwang avatar skhadem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-boundingbox's Issues

if we have ground truth 2d box

I want to know if we have a ground truth 2d bounding box annotation, Can the performance of this model be improved to what level ,can someone give me a idea

how to train on custom datasets

I have my own datasets whose format are like pascal voc with labeled images only, i do not have the calibration file, how can I train on my own datasts?

How to generate Calibration file for my own dataset?

i want to know the KITTI method that how did they generated calib file because when i generate it and it is totally different in syntax as compared to to that and no fruitful results. I made it by opencv chess method. kindly tell me how i can get my calib file in syntax like of the KITTIs. Thanks

Whether it could be implemented using images from surveillance camera?

Does anybody know if this method could be used in vehicle 3D detection in residential scenes?
In that case, The images are from surveillance cameras. And those cameras are set at about 3-4 meters high from the ground.
Thus, the Roll angle of cameras are definitely not 0.
If I directly use the the pretrained model and original camera calibration files, the predictions are reaaaaally bad in my scenes.
So how could I change the camera calibration files to make it adaptable for image datasets by surveillance cameras?
Or it's not capable to do that?

Camera Calibration Parameters

Hi ... I am currently working on my data generated using Unity. I am not able to get proper 3D bounding box. My 2D bounding box output is proper but 3D box is going for a toss. I am working on data's generated using Unity 3D. My Image size is 1024 x 1024. What are the changes i need to do in order to map the 3D box on my data.

3D Bbox plotting and original image crop issue

Once a 3D Boundingbox is plotted, it is on the original image (names 'img') in your code. However, next cropped image would be cropped from the previous 'img' with last one or several 3D boundingbox. (i.e. Cropped images with partial 3D boundingbox plots are fed into model for training or testing. So I change tht code by plot_img = np.copy(truth_img) and then plot 3D boundingbox on plot_img rather than 'img'

An Explanation of angles used in paper and used in code.

Hi,

I am attempting to use this method to train on my own dataset which I have generated in Unity using the Unity Perception Package, therefore this requires quite a few modifications of the Dataset class. Unity will generate the ground truth and provide me with the following:

X,Y,Z position of the 3D bounding box center wrt. the camera
Object dimensions
Object rotation wrt. global coordinate frame
2D bounding box coordinates within the image
Camera intrinsic matrix

In the corresponding paper, the three angles of interest are Theta Ray, Theta L, and Theta. I believe understand what these are and the correspondance between them:

Theta ray is the ray angle of the object center (calculated as the angle between the camera principal point and 3D bounding box center).
Theta L is the local orientation i.e. orientation of object wrt. to the camera.
Theta is the global orientation of the object.
Theta = Theta Ray + Theta L

However, looking in the Dataset class, there are references to three different angles: Alpha, Ry and theta_ray. As far as I understand it, Alpha is equivalent to Theta L (as this it what you are regressing), Ry is equivalent to Theta (global orientation), and theta_ray is self-explanitory.

As far as I am aware, theta_ray is calculated using the position of the 2D bounding box within the image, and the model is predicting Alpha, and using the correspondance between these we can find the global orientation of the object.

I would just like to confirm that all this is correct, as I have been having a hard time understanding this.
Your feedback is greatly appreciated :)

build config file for pytorch training model

Hi there,
I would like to tansfer the pytorch model to caffemodel myself.
The cfg file, which saves the neural structure is necessary to be built.
However the model we get after training is devided into 3 parts, demension, oriented, and confidence.
so I am a bit confused how to write the cfg file on my own.
Will be so appreciate if you can provide a solution or suggestion for me.
Best regards.

Constraints in Math.py

I was unable to understand the formation of constraints in the calculation of the translation vector in Math.py file. Could you please hint me a bit towards it?

How to handle truncation?

The predictions are way out from the gound truth when the object is truncated i.e. only part of the object is within the image boundary. Estimating position uses the four bounding box corners but when there is truncation the bounding box does not cover all of the object, only the part that is within the image.

Is there a way to overcome this problem? Or is this method simply just not suitable for cases where truncation occurs?

assumption in the paper

Hi
I have a question. In the paper 3D Bounding Box Estimation Using Deep Learning and Geometry, there is a assumption that 3D bounding box fits tightly into 2D detection window requires that each side of the 2D bounding box to be touched by the projection of at least one of the 3D box corners. I have tested your codes, it seems that you have not considered that. Could you please have a discussion?

Convert to KITTI Format for Evaluation

Hello @skhadem, thank you so much for this implementation. I would like to ask how to convert the result back into KITTI format? I have a plan to reproduce the paper result. Could you give me a hint which values should be put for KIITI format?
Also I have problem to understand the label. as we can see from the development kit as follows.

#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.

However when I see the sample of label, let say 000000.txt, the content is follows.
Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01
As we can see in here we only have 15 values instead of 16 values as in description. For evaluation is that necessary to provide the score in the last?

Thank you so much

error: IndexError: invalid index to scalar variable.

$ python Run.py

output:

Traceback (most recent call last):
  File "Run.py", line 201, in <module>
    main()
  File "Run.py", line 137, in main
    detections = yolo.detect(yolo_img)
  File "code/3D-BoundingBox-master/yolo/yolo.py", line 34, in detect
    ln = [ln[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]
  File "code/3D-BoundingBox-master/yolo/yolo.py", line 34, in <listcomp>
    ln = [ln[i[0] - 1] for i in self.net.getUnconnectedOutLayers()]
IndexError: invalid index to scalar variable.

replace vgg with resnet

Hi
Thanks for your great work, I am now trying to replace the backbone of second stage using Resnet since Resnet usually has a better performance when doing a same work. However, after I replace the vgg by resnet, the result is terrible. I wonder if you have do the same thing and cloud you please tell me whether this idea is useful. Thank you!

Best regards

YOLO does not use GPU

I think the Yolo implementation here is CPU-based while PyTorch is cuda-based. How can I use Cuda even for Yolo detections?

some code I don't understand

Hi
Thanks for your great work, It helped me a lot, however, there are some codes that I don't understand, for example: dim += averages.get_item(label['Class']) , cloud you please explain it ? Thank you very much!

How to get the score (confidence) of a 3D Bounding Box?

Hi folks,

I have observed this part of the source code:

"""
det.type in self.classes and det.score > self.score_thres):

            intrinsics = ros_intrinsics(self.camera_info.P)
            input_tensor,theta_ray = preprocessing(image,det,intrinsics)
            [orient, conf, dim] = self.model(input_tensor) #Apply the model to get the estimation
            orient = orient.cpu().data.numpy()[0, :, :]
            conf = conf.cpu().data.numpy()[0, :]
            dim = dim.cpu().data.numpy()[0, :]
            # print("Conf:{}".format(conf))
            dim += self.averages.get_item(det.type)

            argmax = np.argmax(conf)
            orient = orient[argmax, :]
            cos = orient[0]
            sin = orient[1]
            alpha = np.arctan2(sin, cos)
            alpha += self.angle_bins[argmax]
            alpha -= np.pi

"""

But that conf is a tuple of two numbers, which is used to determine the best orientation, like this:

"""
Conf:[ 6.3896847 -6.5501723]
Conf:[ 6.496025 -6.7066655]
Conf:[ 5.410366 -5.5474744]
Conf:[ 7.092432 -7.3124714]
Conf:[ 9.061753 -9.251386]
Conf:[ 7.587371 -7.831802]
Conf:[ 2.149212 -2.1235662]
Conf:[-0.84504336 0.89392436]
Conf:[ 4.436549 -4.5268965]
Conf:[ 1.2938225 -1.4327605]
"""

How can I get the score of the final 3D Bounding Box? (0 to 1 value, like in every 2D or 3D object detector)

Thanks in advance.

Dataset label "Location" property

Hi,

I'm wondering how do you construct your dataset from Kitti, especially the "Location" keyword in the label. I don't quite understand how the following lines of code works from torch_lib/Dataset.py

    Location = [line[11], line[12], line[13]] # x, y, z
    Location[1] -= Dimension[0] / 2 # bring the KITTI center up to the middle of the object

Why does the y component of "Location" corresponds to the x component of "Dimension"?
This also appears in library/Math.py in the calc_location() function:

# using a different coord system
dx = dimension[2] / 2
dy = dimension[0] / 2
dz = dimension[1] / 2

Why you are switching the coordinate system and can you please tell me how do you parse the raw data regarding location information from Kitti dataset?

p.s. I noticed this because I try to read ground truth label directly from the generated dataset and plot it using your plot_3d_box(img, cam_to_img, orient, dimensions, location) function. However, there exists serious offset in location, especially in y coordinates. Can you please tell me how to read ground truth location information from the generated dataset and plot it correctly?

Thank you so much~

error

yolo
Using previous model epoch_90.pkl
/home/zjut/anaconda3/envs/pytorch_GPU/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/home/zjut/anaconda3/envs/pytorch_GPU/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG19_BN_Weights.IMAGENET1K_V1. You can also use weights=VGG19_BN_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Traceback (most recent call last):
File "Run.py", line 203, in
main()
File "Run.py", line 137, in main
detections = yolo.detect(yolo_img)
File "/home/zjut/code/3D-BoundingBox/yolo/yolo.py", line 31, in detect
(H,W) = image.shape[:2]
ValueError: not enough values to unpack (expected 2, got 0)

regular image and camera

Hi
I just wondering if the training process running on normal images such as we take from regular camera instead of velo or stereo camera?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.