Coder Social home page Coder Social logo

3d-multi-object-tracker's Introduction

Hi! I'm a PhD candidate at Xiamen University! Welcome to my GitHub page 👋.

Hai Wu's GitHub stats

  • (2022/11 - 2023/3) VirConv ranks 1st on KITTI 3D/2D/BEV detection benchmark
  • (2022/11 - 2023/1) VirConvTrack ranks 1st on KITTI tracking benchmark
  • (2022/5 - 2022/11) TED ranks 1st on KITTI 3D detection benchmark
  • (2022/9 - 2022/11) CasTrack ranks 1st on KITTI tracking benchmark
  • (2021/11 - 2022/5) CasA attains SOTA on KITTI 3D detection benchmark

3d-multi-object-tracker's People

Contributors

hailanyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

3d-multi-object-tracker's Issues

How can I use other detection networks?

Hi, @hailanyi, you have provided the point-rcnn, second-iou and pv-rcnn detections. However, I want to use another detection, so I train another detection network on OpenPCDet and get the .txt files, but they do not match the format of the above detection results.

Specifically, the detection results you provided are in accordance with the KITTI tracking dataset format, whereas the detection results I got on OpenPCDet are in accordance with the KITTI detection dataset format(without distinguishing scenes and discontinuous). Therefore, how can I use the other detection networks and get the results like you provided?

Looking forward to your reply, thanks!

License?

Hello @hailanyi

Amazing work with this project! (and the visualizer + Cas-A)

I noticed you didn't provide any license.
I'd like to feature some of your work in a course I'm building about 3D Computer Vision. Is that cool if I do so?

Thank you!
Jeremy

evaluation on testing data

Your great work has helped me a lot, but I don’t know how to run your work on the test data. I wish you could write a guide if it’s convenient for you

visualization

Hello, I would like to inquire about how you achieve the visualization of tracking results. For example, similar to the demo.gif. thanks!

Unnormalized detection score can hardly get reasonable result.

Hi @hailanyi! Thank you for sharing such a great method for 3D MOT! However, I noticed that the scores of detection results you provided are unnormalized. I attempted to replicate your results using the same detection methods (such as CasA, PVRCNN, etc.) trained by myself, and I set "OUTPUT_RAW_SCORE=True" to save them in the same .txt format.
Despite this, my MOT results are significantly worse than those achieved with your provided detections.It seems that the experimental results are not reproducible with the same setup unless your method can handle normalized detection score inputs. 😢

How to deal with the results output through the "OpenPCDet" framework?

{'name': array(['Car', 'Car', 'Cyclist', 'Car', 'Car', 'Car', 'Cyclist',
'Pedestrian', 'Pedestrian', 'Cyclist'], dtype='<U10'), 'truncated': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'occluded': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'alpha': array([-4.515572 , -4.082801 , -8.211994 , -4.2358727, -6.37723 ,
-4.212622 , -7.653681 , -7.828194 , -3.5104022, -7.7533026],
dtype=float32), 'bbox': array([[ 390.46756, 182.59967, 420.16034, 202.18625],
[ 93.72412, 207.40984, 192.874 , 245.46086],
[ 676.26434, 162.27713, 691.895 , 190.43402],
[ 207.97661, 185.72922, 272.2654 , 217.085 ],
[ 771.51385, 172.55157, 896.3316 , 212.16136],
[ 217.41855, 188.42061, 271.71057, 213.47649],
[ 338.7873 , 181.46661, 348.29086, 203.7352 ],
[ 953.0622 , 152.42534, 984.1048 , 240.13443],
[1197.7362 , 145.42426, 1241. , 231.93535],
[ 214.69026, 177.82062, 222.83852, 203.2955 ]], dtype=float32), 'dimensions': array([[3.8072727 , 1.4813445 , 1.6003532 ],
[3.52273 , 1.3748183 , 1.8758411 ],
[1.6750405 , 1.752761 , 0.42281026],
[3.6912992 , 1.555141 , 1.5810864 ],
[4.483073 , 1.4696931 , 1.6517544 ],
[3.673051 , 1.4601415 , 1.5243837 ],
[1.5081377 , 1.6859876 , 0.39957428],
[0.8074811 , 1.77992 , 0.5690844 ],
[0.60090435, 1.7681558 , 0.5539663 ],
[1.5440738 , 1.6429834 , 0.31740588]], dtype=float32), 'location': array([[-16.576267 , 2.297428 , 58.48187 ],
[-20.094868 , 2.954785 , 31.181479 ],
[ 4.6577272, 1.0940528, 45.77933 ],
[-20.028645 , 2.286567 , 39.15602 ],
[ 8.54361 , 1.4581716, 28.02376 ],
[-23.375515 , 2.4971836, 46.22146 ],
[-20.705027 , 2.3634393, 56.004333 ],
[ 7.4708343, 1.3650522, 15.131747 ],
[ 12.740014 , 1.2072334, 15.104981 ],
[-25.844282 , 1.9757288, 47.61044 ]], dtype=float32), 'rotation_y': array([-4.790928 , -4.6516724, -8.111552 , -4.706223 , -6.0847206,
-4.678815 , -8.006577 , -7.3778024, -2.8195329, -8.248504 ],
dtype=float32), 'score': array([0.62366086, 0.5092147 , 0.45878437, 0.4128577 , 0.40351897,
0.3037016 , 0.15691371, 0.12470149, 0.10985158, 0.10319255],
dtype=float32), 'boxes_lidar': array([[ 58.771694 , 16.60492 , -0.842473 , 3.8072727 ,
1.6003532 , 1.4813445 , 3.2201314 ],
[ 31.478838 , 20.126875 , -1.8011765 , 3.52273 ,
1.8758411 , 1.3748183 , 3.0808759 ],
[ 46.06226 , -4.6421843 , 0.13941461, 1.6750405 ,
0.42281026, 1.752761 , 6.540756 ],
[ 39.445976 , 20.054586 , -0.960226 , 3.6912992 ,
1.5810864 , 1.555141 , 3.1354265 ],
[ 28.312376 , -8.526211 , -0.5928153 , 4.483073 ,
1.6517544 , 1.4696931 , 4.513924 ],
[ 46.512444 , 23.404373 , -1.1091216 , 3.673051 ,
1.5243837 , 1.4601415 , 3.1080184 ],
[ 56.294018 , 20.733837 , -0.788435 , 1.5081377 ,
0.39957428, 1.6859876 , 6.4357805 ],
[ 15.419843 , -7.4560823 , -0.46799874, 0.8074811 ,
0.5690844 , 1.77992 , 5.807006 ],
[ 15.392667 , -12.72664 , -0.37202018, 0.60090435,
0.5539663 , 1.7681558 , 1.2487365 ],
[ 47.89532 , 25.867668 , -0.45570797, 1.5440738 ,
0.31740588, 1.6429834 , 6.6777077 ]], dtype=float32), 'frame_id': '000001'}

The output is shown above,I hope to get your help,thank you.

a request

I'm so happy that author write the A guide to submit results to KITTI test. This is the 3d method. Now I want to get the method of 2d results submit to the kitti . Can you help me?

Is this method online MOT?

Hi, thanks for your interesting work!
In your code, before saving results, the post_processing seems to filter out all trajectories with a score lower than the config.avg_score. In my understanding, this process makes your method unable to run online.
Looking forward to your reply.
Thanks.

trackeval.utils.TrackEvalException: no seqmap found: evaluate_tracking.seqmap.val

I have prepared the data and ran kitti_3DMOT.py according to readme.md, the following error has thrown. Did I missing something?

Traceback (most recent call last):
  File "/home/stardust/yrb/3D-Multi-Object-Tracker/kitti_3DMOT.py", line 167, in <module>
    tracking_val_seq(args)
  File "/home/stardust/yrb/3D-Multi-Object-Tracker/kitti_3DMOT.py", line 160, in tracking_val_seq
    eval_kitti()
  File "/home/stardust/yrb/3D-Multi-Object-Tracker/evaluation_HOTA/scripts/run_kitti.py", line 80, in eval_kitti
    dataset_list = [trackeval.datasets.Kitti2DBox(dataset_config)]
  File "/home/stardust/yrb/3D-Multi-Object-Tracker/evaluation_HOTA/trackeval/datasets/kitti_2d_box.py", line 71, in __init__
    raise TrackEvalException('no seqmap found: ' + os.path.basename(seqmap_file))
trackeval.utils.TrackEvalException: no seqmap found: evaluate_tracking.seqmap.val

On line 70, the missing seqmap_file was pointing to /home/stardust/yrb/3D-Multi-Object-Tracker/evaluation_HOTA/../evaluation/data/tracking/evaluate_tracking.seqmap.val

Cannot understand the idea of weight decay

The paper states that the previously missed objects should have larger search space so that they can be matched more easily.

I think it means that those objects should have smaller association costs. But in your code:

(1) The association weights (prediction scores) of previously missed objects are DECAYED LESS (1:15) than the ones of previously updated objects in the state prediction step. This means that previously missed objects will have larger association costs and they are harder to be matched.

(2) Detection confidences are ADDED to association weights (prediction scores) in the state update step. This means that objects with higher detection scores will have larger association costs. In addition, the weights of previously missed objects are ADDED LESS (1:10) than previously updated objects. This is the opposite weighting strategy in (1).

Your explanation of these two parts will be much appreciated. : )

Predict 5 next trajectories

Hi @hailanyi, thanks for your great work.

  • I wanna try this work with these features:
  1. Estimate 5 next trajectories of objects to find the future path
  2. Calculate the velocity of each object
    Can you address some keywords in your code, Thank you.

Tracking speed for custom dataset

Thanks for your great work!

I tried to apply your work to my dataset, but the tracking speed was too fast and reached 689fps. I don't know how to adjust the parameters to solve this problem. If you could answer my questions, I would greatly appreciate it

VirConv detections

Could you provide a download link for the VirConv model trained on KITTI 3D detection ''trainval'' set? The model zoo for VirConv only includes the model trained on ''train'' set.

I have read your paper and code of VirConv. It seems that the data preprocessing for VirConv detector is very complicated. Could you also provide a simple guide or demo code on how to generate input data and detection results on KITTI tracking dataset?

apperance cost features

Hey,
Thanks for open sourcing this awesome work. In your paper, you discuss about using features of object in appearance cost. Can you tell how do you get those features?

  Thanks

Training

Can u tell me how to train the tracker?

Why the source code is different from the published paper?

  1. In your paper, three costs (appearance cost, motion cost and geometry cost) are used. But in the source code, it seems that only position differences are used.
    dis = (all_detections[...,0:3]-all_predictions[...,0:3])**2
    Does it mean that other costs are actullay not so important for tracking?

  2. The prediction confidence is not updated as descripted in the paper (Equition 19), why?

So I wonder whether I can get the same performance decribed in README with the code. If not, can you make some time to updated the code?

Thank you for your good method and looking forword to your reply.

The detection score you provide is not normalized?

I find that the detection score you use is not normalized (not in the range of [0, 1]). Which layer or where did you extract those scores from? Are those score the logits before softmax?

Car -1 -1 -8.0728 299.5648 166.8018 457.8472 299.3745 2.0278 1.8372 4.4899 -4.5203 1.9349 13.4348 -8.3923 6.8229 Cyclist -1 -1 -8.3277 743.3280 157.8975 945.0159 374.0000 1.7928 0.7382 1.7908 1.6935 1.6931 5.7445 -8.0567 6.3583 Pedestrian -1 -1 -8.9000 1086.0386 169.0511 1223.0736 325.3153 1.7003 0.7000 1.0835 6.3164 1.6586 8.4887 -8.2776 3.9818 Car -1 -1 -2.0002 0.0000 198.3879 22.4742 238.5690 1.4643 1.6507 3.9074 -26.4072 2.5741 29.8174 -2.7207 -0.4804 Car -1 -1 -7.6264 284.9278 190.2203 347.9479 230.2715 1.5499 1.6856 4.1383 -12.9598 2.3717 31.9821 -8.0090 -0.7022 Car -1 -1 -4.6372 136.7996 189.1622 176.2292 219.5773 1.5594 1.6092 4.0767 -26.0653 2.5465 41.5085 -5.1953 -1.3563 Car -1 -1 -2.5441 0.0000 195.0718 83.9934 227.2724 1.4837 1.6807 3.8615 -28.5618 2.6095 35.5869 -3.2169 -2.0098 Car -1 -1 -4.4626 347.0251 177.7920 392.8701 208.4988 1.6766 1.6258 3.8540 -13.9505 1.9773 42.0110 -4.7817 -2.1874

pose.txt

After reading your article on 3DMOT tracking and code, I have a question for you: How did you generate the pose file?

Some questions about the evaluation of experiment results in your paper, especially related to AB3DMOT

Thanks for your great work, but I still have a few questions about the experiment part of the paper.

  1. in Table II, the results are obtained on kitti training data, how did you get the AB3DMOT results, did you evaluate them yourself by running the open source code of AB3DMOT?
    image

  2. If the answer to question 1 is yes, do you still converted 3D tracking results onto a 2D image plane to get the evaluation result?

  3. For every tracked object, its BBs IoU with ground truth is required to be above a threshold in order to be considered as a successful match. What was the IoU threshold set in the experiments in Table II, was it 0.5?

  4. Is the MOTA result in Table II the average of all 21 sequences on the traing dataset, or just the average of some of theses 21 equences .

  5. If the answer to question 2 is yes, have you evaluated the 3d MOTA using different 3d IOU thresholds (0.25 0.5 0.7) on the traing sequences as in the AB3DMOT paper and compared with the AB3DMOT results in its paper?

I am looking forward to your reply, thank you very much!

The pose file is obtained from the oxts files provided by KITTI tracking dataset using the official matlab tools.

    The pose file is obtained from the oxts files provided by KITTI tracking dataset using the official matlab tools.

Originally posted by @hailanyi in #11 (comment)

Thank you for your great work!I have a question about pose.txt you provided:

Using official matlab tools directly only seems to get the pose in IMU/GPS coordinate system, so did you convert it to lidar coordinate system? Because I found that your code seems to use only the external parameter between lidar and camera, i.e. velo_to_cam, but not use external parameter between lidar and imu.

So is the pose.txt you provided in the lidar coordinate system or in the IMU/GPS coordinate system?

Can we get a pre-trained model of PC3T?

Hi,
I am doing a project about wireless sensing and want to use PC3T as a reference (ground truth). How could we get a pre-trained model for a first try on our own dataset?

Gain no way to get testing label

Hi, I can't find any link to download the testing label, at least not on Kitti's official website, Can you please provide me the link to testing label ? Thx a lot =。=

Config files for voxel_net

HI! I'm trying to evaluate my custom dataset (format equal to kitti) with voxel_net detections generated with OpenPCDet, but for some reason, when I evaluate the code, I don't get any metric over 1%, most of them 0. I don't know if the problem is due to the configuration of the yaml files or some underlying problem with my dataset.

I read the paper on PC3T, and voxel_net was used as one of the evaluated detectors, so I was wondering if it is possible to get the config file used on the paper.

Thank you!

Tracking speed

thanks for your great work.
I have a question: your tracker runs at which fps?

Test Detections of PointRCNN

Can you please tell me from where to obtain the test detections of PointRCNN. Since you are using these detections in Table 1 of your paper. Moreover, if you please let me know how to obtain test detections for pv-rcnn and second-iou.

Using tracker to track boxes from nuscenes

Hi, I have a question about possibility of tracking boxes from nuscenes database.
If I give to tracker boxes from nuscenes detected by TED that are in global coordinates it does not track any boxes.
Does in what coordinates are present boxes matter, since officially kitti boxes are in camera coordinates?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.