Coder Social home page Coder Social logo

3d_detection's Introduction

3D_detection

This work is inspired by image-to-3d-bbox(https://github.com/experiencor/image-to-3d-bbox), which is an an implementation of the paper "3D Bounding Box Estimation Using Deep Learning and Geometry" (https://arxiv.org/abs/1612.00496).

Instead of using kitti's 3-D truth, i mainly make two supplements:
1、Compute 3-D box center by 2-D box and network's output
2、Compute theta_ray by 2-D box center
Besides, I make some changes to the code structure.

By now, there are still several problems to be solved, for example:
1、The number of situations is 256 in this work, whereas it is 64 in the paper.
2、When detecting, i use objects's truncated and occluded level in kitti's label file to decide whether to generate 3D box, whereas it is reasonable to generate these by the trained neural network.

This is just a raw version, welcome to share your ideas to improve it!

Result on kitti:
000254.jpg
000074.jpg
000154.jpg

Useage:

If you want to train, after fixing paths in the train.py, just run:

python3 train.py

In this way, you can get your own weights file, or you can download the pretrained file from https://pan.cstcloud.cn/web/share.html?hash=7dct49xER5w
In the detection time, after fixing paths in the detection.py, just run:

python3 detection.py

3d_detection's People

Contributors

cersar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d_detection's Issues

where can I get the test dataset

I have tried to run detection.py, but I found that there is no testing labels and calib.txt in kitti datasets. Can you give me a link to download it

How long does training take?

I am training on Azure VM, with the same number of images that you gave in the dataset folder. However, for each epoch it shows ETA of 11 hours. Is there any way to reduce the training time?

There is a error when i try the project

when i download your pretrained weights file, and run the detection.py, there is a error, which is "Dimension 1 in both shapes must be equal, but are 4 and 12. Shapes are [256,4] and [256,12]. for 'Assign_99' (op: 'Assign') with input shapes: [256,4], [256,12]." what i do with the error? thanks a lot!

Please check the way computing anchors

Thanks for your great work, it's been very helpful to understand 3D Detection!

One thing, which I found the difference between your work and the original paper, is the way how to compute the anchors in def compute_anchors .
According to the paper, we need to find the residual correction angle that needs to be applied to the center of the bin. But it seem that you're computing the angle from each left/right boundary to the ground truth angle. Should we not get the angle from the center of the bin to the ground truth angle?

Please share your opinions, @cersar thanks in advance!

What's the way to calculate the 'new_alpha'?

First, Thanks for your amazing work done!
I wonder,
In the code

            new_alpha = float(line[3]) + np.pi / 2.
            if new_alpha < 0:
                new_alpha = new_alpha + 2. * np.pi
            new_alpha = new_alpha - int(new_alpha / (2. * np.pi)) * (2. * np.pi)

Can you please explain how this calculation has been made?

Not able to detect new images

when i am passing new image which is not of kitti dataset ,,,,,error is coming saying not able to corresponding label file
why we need label file for testing purpose....kindly help me out sir

compile environment

Could you tell me the environment? which version of Keras, tensorflow, python? thanks !

Regarding the usage of calib.txt file data

Hi,
Thank you so much for sharing this work with us, it is remarkable.
I have a question regarding the calibration file's data.

  1. As I read the README file, they recommended to use the P_rect_xx data, I am wondering why your work is only using the P2, as in the data is collected via the second cam?
  2. if I want to perform the training/testing on different camera, what should I modify the P2 data? I am not that sure about the meaning of the elements of this data. Could you please give some suggestion based on your experience and providing more detailed description of each elements?

I am looking forward to your reply soon! :)

Kind Regards

Can you explain about post_processing.py ?

First, thanks for your great work @cersar . It's really worth following rather than any other's work.

I've been trying to understand your intentions and most of the codes, and it's very readable and easy to follow.
One thing I have difficulty to understand is, the compute_center() in post_processing.py.
Can you please comment on some core lines that each line indicates for?

I really need your help, looking forward to getting your reply.
Thanks in advance!

Much larger distance Error compared to paper

Thanks for the great work @cersar , it's been so helpful to understand the principle of 3d vision processing.
While I've been working on with your project, I've noticed that the 3D Location error is much larger than the paper shows. x and y distance errors are relatively fine, but the z coordinate error(forward direction) is quite different from the original paper. In the paper, it shows that there are about 1m error in 10-20m distance and 2m error in 20-30m, but I usually get between 2m - 10m error from your code.

The result of Paper:
screenshot

The result of mine:

file:  000127.png
box_2D:  [591.44 175.51 657.28 239.12]
center:  [ 0.27  1.18 15.25]
center_gt:  [ 0.38  1.7  20.03]
dimensions:  [1.02 1.47 2.6 ]
dimensions_gt:  [1.61 1.66 3.2 ]

Can you figure it out and work out this issue?

Overall, the orientation and dimension estimations seem great, and also 3D -> 2D projection seems great in the image, but when I try checking it in bird's eye view which includes the distance information, I see huge error between ground truth and the estimation value.

detection.py No loop matching

Hi, thanks for your work.
when I running detection.py, I have an Error as below, could you give any advice?
Traceback (most recent call last):
File "/home/3D_detection-master/detection.py", line 83, in
points2D = gen_3D_box(yaw, dims, cam_to_img, box_2D)
File "/home/3D_detection-master/util/post_processing.py", line 112, in gen_3D_box
center = compute_center(points3D, rot_M, cam_to_img, box_2D, inds)
File "/home/3D_detection-master/util/post_processing.py", line 75, in compute_center
result = solve_least_squre(W, y)
File "/home/3D_detection-master/util/post_processing.py", line 28, in solve_least_squre
U, Sigma, VT = np.linalg.svd(W)
File "/home/.local/lib/python3.6/site-packages/numpy/linalg/linalg.py", line 1612, in svd
u, s, vh = gufunc(a, signature=signature, extobj=extobj)
TypeError: No loop matching the specified signature and casting
was found for ufunc svd_n_f

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.