Coder Social home page Coder Social logo

simple_shot's Introduction

End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection

This paper has been accepted by Computer Vision and Pattern Recognition 2020.

End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection

by Rui Qian*, Divyansh Garg*, Yan Wang*, Yurong You*, Serge Belongie, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger and Wei-Lun Chao

Citation

@inproceedings{qian2020end,
  title={End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection},
  author={Qian, Rui and Garg, Divyansh and Wang, Yan and You, Yurong and Belongie, Serge and Hariharan, Bharath and Campbell, Mark and Weinberger, Kilian Q and Chao, Wei-Lun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5881--5890},
  year={2020}
}

###Abstract

Reliable and accurate 3D object detection is a necessity for safe autonomous driving. Although LiDAR sensors can provide accurate 3D point cloud estimates of the environment, they are also prohibitively expensive for many settings. Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs. However, so far these two networks have to be trained separately. In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end. The resulting framework is compatible with most state-of-the-art networks for both tasks and in combination with PointRCNN improves over PL consistently across all benchmarks --- yielding the highest entry on the KITTI image-based 3D object detection leaderboard at the time of submission.

Contents

Root
    | PIXOR
    | PointRCNN

We provide end-to-end modification on pointcloud-based detector(PointRCNN) and voxel-based detector(PIXOR).

The PIXOR folder contains implementation of Quantization as described in Section3.1 of the paper. Also it contains our own implementation of PIXOR.

The PointRCNN folder contains implementation of Subsampling as described in Section3.2 of the paper. It is developed based on the codebase of Shaoshuai Shi.

Data Preparation

This repo is based on the KITTI dataset. Please download it and prepare the data as same as in Pseudo-LiDAR++. Please refer to its readme for more details.

Training and evaluation

Please refer to each subfolder for details.

Questions

This repo is currently maintained by Rui Qian and Yurong You. Please feel free to ask any question.

You can reach us by put an issue or email: [email protected], [email protected]

simple_shot's People

Contributors

mileyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

simple_shot's Issues

Some questions about ResNet.

It is a really nice job! We tried to use resnet as a backbone just like what you did in our own experiment. However, it didn't improve performance as we expected and even performed worse than 4-conv. I wonder if there are some tricks when you trained your model with a resnet. Thank you very much.

The idea of SimpleShot is almost the same as that of ProtoNet

ProtoNet also uses the mean of each support class as its center and then implement Euclidean distance to train the network. The difference between SimpleShot and ProtoNet is that SimpleShot just added normalization and centering to the feature vectors before computing the Euclidean distance, is that correct?

inconsistency of enlarge between validation and test

For tiered imagenet, I noticed that in the validation step during training, the args.enlarge is set to False/None because there's no '--enlarge' in the training configs. However, in the evaluation/testing command lines shown in readme, there's a '--enlarge'. Just wondering is this a typo or something else?

Thanks!

ResNet 50 performance

Thank you for your great work!

I am trying to rerun the models and see their performances. However, I realized that the performance of models with the ResNet50 backbone is significantly low. Below you may find the code output. Do you think this result is expected?

Best

Meta Test: LAST
feature UN L2N CL2N
GVP 1Shot 0.5127(0.0020) 0.5303(0.0020) 0.5161(0.0020)
GVP_5Shot 0.7143(0.0018) 0.7467(0.0016) 0.7165(0.0018)

Meta Test: LAST
feature UN L2N CL2N
GVP 1Shot 0.5127(0.0020) 0.5303(0.0020) 0.5161(0.0020)
GVP_5Shot 0.7143(0.0018) 0.7467(0.0016) 0.7165(0.0018)

Performance discrepancy when training from scratch w/ PyTorch 1.4

Hi,
Thanks for the nice work and for sharing the code.

I have tried replicating the results of the paper (training from scratch) but with no luck.
I have followed the instructions of the readme and tried both pytorch/cuda-toolkit 1.4/10.0.
For ResNet-10 and ResNet-18, on miniImageNet I am getting a discrepancy between 2% and 3% (absolute).

Thanks in advance for the help!

Nice work and some questions

Congratulations, very effective work. I have some questions about your work, hope you can give some advice for your convenience.

  1. Why only subtruct mean of gallery, do you have a try on subtructing the mean of query or gallery+query ?
  2. Why on meta-iNat, CL2N not perform well than L2N?Have you tried to analyze the essential reasons?

Thanks!

The introduction part is so weird

First of all, thanks for the interesting work. The experimental results are still competitive to the current SOTA works.
However, the introduction makes me confused and doubt the results. For example, in paragraph 3, you mentioned 'Prior studies suggest that using meta-learning outperformes "vanilla" nearest neighbor classification [26, 30]'. In fact, references 26 and 30 were exactly the methods that used meta-learning, as they tackled the FSL problem within a meta-learning setting. Also, in paragraph 4, you mentioned that your method achieve SOTA performance without using meta-learning, which is the weirdest part because you were already using meta-learning when you used N-way K-shot setting.

is it possible to use simple_shot with CPU only?

Unfortunately on my MacBook, the evaluation command is failing:

python ./src/train.py -c ./configs/mini/softmax/conv4.config --evaluate --enlarge

Traceback (most recent call last):
  File "./src/train.py", line 554, in <module>
    main()
  File "./src/train.py", line 51, in main
    model = torch.nn.DataParallel(model).cuda()
  File "/Users/seb/.virtualenvs/kiss/lib/python3.7/site-packages/torch/nn/modules/module.py", line 305, in cuda
[...]
  File "/Users/seb/.virtualenvs/kiss/lib/python3.7/site-packages/torch/cuda/__init__.py", line 95, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.