Coder Social home page Coder Social logo

stream3dppe's Introduction

StreamPETR with 3dppe Extension

Introduction

This repository is an implementation of StreamPETR with 3dppe.


Getting Started

  1. Prepare nuScenes dataset and generate 2D annotations and temporal information for training & evaluation. (see streamPETR)

  2. Conda env

conda create -n xxx python=3.8 -y
conda activate xxx
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

pip install flash-attn==0.2.2  # (Tesla v100 is not compatible)

pip install mmcv-full==1.6.0
pip install mmdet==2.28.2
pip install mmsegmentation==0.30.0
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v1.0.0rc6 
pip install -v -e .

Note : make sure that numba 0.53.0 numpy 1.23.5
(if not, reinstall numba==0.53.0).

Catalogue: tree -d -L 1

.
├── ckpts
├── data
├── mmdetection3d
├── projects
└── tools
  1. Train & Infer
tools/dist_train.sh [-config] [-num_gpus]
tools/dist_test.sh [-config] [-model] [-num_gpus] --eval bbox

Results on NuScenes Val Set

Model Setting Pretrain Lr Schd Training Time NDS mAP Config Download
StreamPETR V2-99-900q-800x320 FCOS3D 24ep 13h 57.1 48.3 config model/log
Stream3dppe V2-99-900q-800x320 FCOS3D 24ep 16h 58.45/58.45 49.95/50.04 config model1,model2)/(log1,log2)
Stream3dppe_gt_detph V2-99-900q-800x320 FCOS3D 24ep 22h 61.7 55.3 config model/log
StreamPETR V2-99-900q-1600x640 FCOS3D 24ep
Stream3DPPE V2-99-900q-1600x640 FCOS3D 24ep

Note : Stream3dppe is trained on 4 x RTX 3090 with bs4 ,while Stream3dppe_gt_detph is trained on 4 x RTX 2080Ti with bs2 .

More result please refer to https://github.com/drilistbox/3DPPE.


Acknowledgement

Many thanks to the authors of PETR and StreamPETR.


Citation

If you find this project useful for your research, please consider citing:

@article{shu20233DPPE,
  title={3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers},
  author={Shu, Changyong and Deng, Jiajun and Yu, Fisher and Liu, Yifan},
  journal={arXiv preprint arXiv:2211.14710},
  year={2023}
}

stream3dppe's People

Contributors

fivelu avatar

Stargazers

Nguyễn Quí Vinh Quang avatar  avatar  avatar Xiaobing Han avatar zuochongyan avatar  avatar Chen Peiwei avatar xiaoquan wang avatar Yuchao Jin avatar Hu Zhu avatar Lila Garcia avatar Jason avatar savoki avatar loveSnowBest avatar Guangjing avatar Chang-Bin Zhang avatar  avatar  avatar Haoyu Fu avatar Hungsing avatar  avatar Liu Xiaolu avatar Zichen Yu avatar  avatar Sandalots avatar 爱可可-爱生活 avatar Chenhongyi Yang avatar kkkkkk avatar  avatar HJJ avatar Desen Zhou avatar  avatar WangPan avatar JLQ avatar  avatar Yun Xing avatar  avatar dlutguobin avatar Jinghua Hou avatar  avatar FengLin avatar  avatar Bencheng avatar  avatar yingfei liu avatar  avatar  avatar

Watchers

 avatar  avatar

stream3dppe's Issues

About Large Resolution

Hi, do you conduct experiments with large resolution, such as 1408*512? I always get NaN loss at around 10000 iterations. Should I modify any hyper-parameters for depth supervision?

About '3D Point PE Encoder'

It seems share_pe_encoder does not works. self.query_embedding will be redefined on the outside. Line 168 vs Line 190-194.

if self.share_pe_encoder:
position_encoder = nn.Sequential(
nn.Linear(self.embed_dims*3//2, self.embed_dims),
nn.ReLU(),
nn.Linear(self.embed_dims, self.embed_dims),
)
if self.with_position:
self.position_encoder = position_encoder
self.query_embedding = position_encoder
else:
if self.with_position:
# self.position_dim = 3 * self.depth_num # D*3 3:(x, y, z)
self.position_encoder = nn.Sequential(
nn.Linear(self.embed_dims*3//2, self.embed_dims),
nn.ReLU(),
nn.Linear(self.embed_dims, self.embed_dims),
)
self.query_embedding = nn.Sequential(
nn.Linear(self.embed_dims*3//2, self.embed_dims),
nn.ReLU(),
nn.Linear(self.embed_dims, self.embed_dims),
)
if self.init_query is not None:
ref_points = torch.from_numpy(np.load(self.init_query))
else:
ref_points = None
self.reference_points = nn.Embedding(self.num_query, 3, _weight=ref_points)
if self.num_propagated > 0:
self.pseudo_reference_points = nn.Embedding(self.num_propagated, 3)
self.query_embedding = nn.Sequential(
nn.Linear(self.embed_dims*3//2, self.embed_dims),
nn.ReLU(),
nn.Linear(self.embed_dims, self.embed_dims),
)

the defrance between the Models

Thank you for making the code public.
I have a question regarding the difference between the used models. (StreamPETR, Stream3dppe, Stream3dppe_gt_detph)?
as far as i understand in StreamPETR the 3D PE just represents a camera ray for each pixel,

  1. dose Stream3dppe uses the depth estemation from the mono image online?
  2. dose Stream3dppe_gt_detph uses preprosessed depth information? if yes are ther any changes that have to be done to the dataset?

Performance is unstable.

When I use batch_size=2, num_gpus=8 for training stream3dppe, the performance is very low (~55 NDS), and when I use SyncBN the performance is still low (~57 NDS).
When I use batch_size=4, num_gpus=4, I can reproduce the result (58.37 NDS).
I am confused. Why the performance depends heavily on samples_per_gpu?

About no context network.

Hi, thanks for your sharing work!
Can you tell me why you do not utilize context network like 3dppe in PETR. Is it abandoned due to poor performance? If yes, what do you think is the main reason for the poor performance.
Looking forward to your reply, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.