valeoai / slidr Goto Github PK

View Code? Open in Web Editor NEW

155.0 155.0 23.0 355 KB

Official PyTorch implementation of "Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data"

License: Other

Python 88.78% Jupyter Notebook 10.54% Dockerfile 0.68%

slidr's People

Contributors

Stargazers

Watchers

slidr's Issues

training loss

Hi,

Thank you for your previous replies. I have reproduced your results with your given minkunet checkpoint. Now I am trying to pretrain my own minkunet checkpoints.

I want to ask what should the training loss be around at the end of pretraining?

How to use this project to complete the distillation of target detection method

For example, the distillation of PVRCNN model mentioned in the paper

Can not reproduce the results in PV-RCNN for object detection. Voxel size different?

Hi authors,
I tried to use your provided checkpoint to evaluate object detection performance. I found that I got a worse result in PV-RCNN object detection compared to the results claimed in your GitHub (even compared with training from scrtch).

Here I use the official code in OpenPCdet, where the voxel size is [0.5, 0.5, 1], (your pertaining voxel size is [1, 1, 2]).

Maybe you can also send the code of object detection to me? ([email protected]).

Question about reproducing experiment

Hi, I have reproduced your experiment with two 3090 GPUs. However, My results on nuScenes with 1% labeled data are much lower than yours. Here are my detailed results. Could you tell me your detailed configuration of experiments?
Per class IoU:
barrier - 0.000
bicycle - 0.000
bus - 0.073
car - 0.709
construction_vehicle - 0.011
motorcycle - 0.000
pedestrian - 0.424
traffic_cone - 0.000
trailer - 0.222
truck - 0.311
driveable_surface - 0.928
other_flat - 0.162
sidewalk - 0.540
terrain - 0.593manmade - 0.802
vegetation - 0.822

mIoU: 0.34969189763069153
fwIoU: 0.7560390830039978

Setting of the fraction of the training labels

Could you kindly share the setting used to finetune object detection models with a fraction of the training labels, e.g., 1%, 5%, or 10% on the Nuscenes dataset? It'll help me a lot.

The train loss fluactuates at 7 but does not decrease

As I said, I pretrained the model with superpixel-driven InfoNCE loss, but both the training and validating losses did not decrease and both of them fluctuate at 7 (e.g., 7.105, 7.25, 7.15). Is this normal for self-supervised learning? Is there anyone who encounters the same issue? Hope to get some clues from you thanks!

Random initialization in linear probing

Hi,
I want to compare your method in linear probing with random initialization.
About the random initialzation in linear probing, is it that all I have to do is training the backbone and classification head with "lr=0.05, lr_head=null, freeze_layers=True epoch_num=100"?
It doesn't make sense for me because the weights of backbones will not change and the weights of classification head with lr_head=null will also not change?

Could you please share the visualised code?

Hello,
I noticed that the SLidR flow chart has been updated on the official website of your lab. Could you please share the visualised code?
Thanks！
https://valeoai.github.io/blog/2022/06/14/valeoai-at-cvpr-2022.html

The calculation method of mAP in object detection experiment

Hi, I'm quite interested in your work of SLidR and I try to reproduce your experiment of detection using OpenPCDet. But I'm confused about the method you use for calculating mAP, like the threshold setting, or the outputs of the test program you use for calculating. I would be appreciated if you can kindly answer my questions. Thanks!

Maybe a Bug: Missing Feature Normalize in VoxelNet

Maybe a bug: the output feature is not normalized for NCE Loss, which exists in MinkUNet code.

I wonder whether there is a bug around line [https://github.com/valeoai/SLidR/blob/main/pretrain/dataloader_nuscenes_spconv.py#L321](https://github.com/valeoai/SLidR/blob/main/pretrain/dataloader_nuscenes_spconv.py#L321 )? (You should not comment this line) Because in your object detection code base, you use 4 features in object detection.

I wonder whether there is a bug around line https://github.com/valeoai/SLidR/blob/main/pretrain/dataloader_nuscenes_spconv.py#L321? (You should not comment this line) Because in your object detection code base, you use 4 features in object detection.

Originally posted by @mu-cai in #3 (comment)

Downstream.py bug: RuntimeError: grad can be implicitly created only for scalar outputs

Hi,
I am currently reproducing downstream(segmentation with nuscenes) results from your given checkpoints. I am experiencing this runtime error: grad can be implicitly created only for scalar outputs.

Have you seen this error?

The command used is : python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="provide_model/minkunet_slidr_1gpu.pt"

The full error message is as in the file.
error.txt

Object detection experiments

Could you kindly share the code via email ([email protected]) used to finetune object detection models from OpenPCDet? I'm still reproducing the results of SLidR. It'll help me a lot.
Originally posted by @CSautier in #3 (comment).

problem of the randomness of dectection experiment

I'm quite confused about your setting for dectection experiment because from my experience, even the program 'train.py' is set a fixed random seed, the training outputs are evaluated differently in the test procedure, thus providing a different value of mAP. This added randomness destructs the fairness of comparison between different models. So I'm interested in your method to control this randomness and get your results that are shown in your paper. Can you kindly tell me how you handle it? Thanks a lot!

Object detection experiments

Could you kindly share the code via email used to finetune object detection models from OpenPCDet?
My email address is [email protected]
Thank you so much!

SLIC instead of 2D segmentation

Hi,

Would you mind sharing your thoughts on using SLIC to generate superpixel instead of a 2D image segmentor? I feel like the 2D segmentors are more mature than the SLIC algorithm and might have a more reasonable superpixel segments.

One confusing conv layer in Res16UnetBase

In pretrain/model/res16unet.py/Res16UNetBase , a particular conv layer with 2x2x2 kernel size confused me (the SLidR paper said "we use 3x3x3 kernels for all sparse convolutions").
As mentioned in your codes, I have checked the repertory of PointContrast and found the same code snippet in their source code...
But neither MinkUNet nor origin U-Net utilized this 2x2x2 conv layer, I was wondering would this modification improve the performance?

problems of Semantic segmentation

hello,I think SLIDR is an excellent job about the idea and the project. But while building this, I met a problem，I use the dataset nuscenes v1.0-mini , the first two steps :(1)Pre-computing the superpixels (2)Pre-training a 3D backbone is normal, but when I run 3th steps:
python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"__
an ValueError occured: num_samples should be a positive integer value, but got num_samples=0
the Traceback (most recent call last) is here:
File "/home/sjwang/study/SLidR/downstream.py", line 79, in
main()
File "/home/sjwang/study/SLidR/downstream.py", line 59, in main
trainer.fit(module, dm)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
self.accelerator.start_training(self)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
return self._run_train()
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1040, in _run_train
self.reset_train_val_dataloaders(model)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 468, in reset_train_val_dataloaders
self.reset_train_dataloader(model)
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 247, in reset_train_dataloader
self.train_dataloader = self.request_dataloader(model, "train")
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 480, in request_dataloader
dataloader = getattr(model, f"{stage}_dataloader")()
File "/home/sjwang/study/SLidR/downstream/lightning_datamodule.py", line 53, in train_dataloader
return DataLoader(
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 266, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "/home/sjwang/miniconda3/envs/SLIDR/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 103, in init
raise ValueError("num_samples should be a positive integer "

I don't know whether because of the dataset, without changing the dataset, and I change about the code :
nusc = NuScenes(
version="v1.0-mini", dataroot=nuscenes_path, verbose=False
)
I have no idea about what to do next. Could you tell me how to continue my job？Thank you！！！！

Nuscenes dataset selection for downstream task of semantic segmentation

Hi, I am trying to understand how the fine-tuned model is evaluated. From what I understand, the config["training"] parameter is set to "parametrize", so in downstream.py, the first part fine-tuning would use the 600 scenes in training set of Nuscenes divided by "dataset-skip-ratio", and evaluate the validation loss of model using the 100 custom split. In the second part of downstream, after fine-tuning the model, because the parameter "training" is still set to "parametrize", so the phase is set as "verifying", which will still use the 100 custom split for evaluation. If we want to evaluate the 150 scenes provided by Nuscenes, we have to switch the config["training"] to "validate", and that is going to re-run the fine-tuning with 700 scenes(all training in Nuscenes) divided by skip_ratio.
My question is, isn't this switching is going to change the downstream model? Or is it when you set config["training"] = "parameterize" is just to find the best parameters for the model, and you use config["training"] = "validate" to collect the results you present in the paper?

Thank you very much.

About training time

Hi, thanks for the code. This is not an issue.

When I pretrain the Minkunet with the provided config, it takes 4 hours to train one epoch on 1 V100 GPU. That is 200 hours for pretraining. Is this right? I notice that the code supports ddp training. Is it OK to train with more gpus? Will it cause a performance drop?

detection on kitti using Minkowski SR-Unet

Could you kindly share the code via email used to finetune object detection models from OpenPCDet? I'm specifically interested in PointRCNN using Minkowski-UNet backbone.

posted by @CSautier in #7

Object detection experiments

Could you kindly share the code via email used to finetune object detection models from OpenPCDet? I'm specifically interested in PointRCNN using Minkowski-UNet backbone.

Originally posted by @CSautier in #3 (comment)

PPKT pretrain model

Could you kindly share the pretrained weight of PPKT on nuscenes? I'm trying to compare them on more downstream tasks. Thanks.

Object detection experiments

Could you kindly share the code via email used to finetune object detection models from OpenPCDet? I'm still reproducing the results of SLidR. It'll help me a lot.
Originally posted by @CSautier in #3 (comment).

RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

This problem occurs when I run the code pretrain.py.
I tried a lot of methods, but do not know how to deal with them.
Could you help me？

I run with 1GPU, ubuntu 18.04, cudnn8, cuda11.1 and other requirements same like requirements.txt.

Training: -1it [00:00, ?it/s]
Training: 0%| | 0/7033 [00:00<00:00, 22671.91it/s]
Epoch 0: 0%| | 0/7033 [00:00<00:01, 3584.88it/s] Traceback (most recent call last):
File "pretrain.py", line 61, in
main()
File "pretrain.py", line 57, in main
trainer.fit(module, dm)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
self.accelerator.start_training(self)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
return self._run_train()
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train
self.fit_loop.run()
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
epoch_output = self.epoch_loop.run(train_dataloader)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 131, in advance
batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 100, in run
super().run(batch, batch_idx, dataloader_idx)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 147, in advance
result = self._run_optimization(batch_idx, split_batch, opt_idx, optimizer)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 201, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 402, in _optimizer_step
using_lbfgs=is_lbfgs,
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1593, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 209, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 129, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 296, in optimizer_step
self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 303, in run_optimizer_step
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 226, in optimizer_step
optimizer.step(closure=lambda_closure, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/optim/sgd.py", line 87, in step
loss = closure()
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 235, in _training_step_and_backward_closure
result = self.training_step_and_backward(split_batch, batch_idx, opt_idx, optimizer, hiddens)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 533, in training_step_and_backward
result = self._training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 306, in _training_step
training_step_output = self.trainer.accelerator.training_step(step_kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 193, in training_step
return self.training_type_plugin.training_step(*step_kwargs.values())
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 386, in training_step
return self.model(*args, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/anaconda3/envs/SLidR/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/user/SLidR/pretrain/lightning_trainer.py", line 62, in training_step
for loss in self.losses
File "/user/SLidR/pretrain/lightning_trainer.py", line 62, in
for loss in self.losses
File "/user/SLidR/pretrain/lightning_trainer.py", line 124, in loss_superpixels_average
k = one_hot_P @ output_points[batch["pairing_points"]]
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

Epoch 0: 0%| | 0/7033 [00:37<74:08:49, 37.95s/it]

Random initialisation learning rate

Hi, would you mind providing the lr for random initialisation? for both kitti & nuscenes dataset. Thank you!

About the the number of superpixels

Hello, I would like to ask why the number of SuperPixels in the VoxelNet file you provided is 30. How many Superpixels should I set properly if I use the 3D backbone network based on the pillar?

Can't pickle local object 'PretrainDataModule.train_dataloader.<locals>.<lambda>'

Dear Author,
Thank you for your great work, I am trying to produce your work in my server, however, I cannot run pretain.py successfully.
Do you know how to avoid this issue?

python pretrain.py --cfg config/slidr_voxelnet.yaml

  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/conda/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/conda/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/conda/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'PretrainDataModule.train_dataloader.<locals>.<lambda>'
Exception ignored in: <function tqdm.__del__ at 0x15131159ef70>

None

help: cannot reproduce segmentation results

Thank you for releasing this code. The idea is awesome! However, there may be something wrong when I try to reproduce the nuscenes segmentation result. Could you give me some help? I followed your tutorials from the very beginning and used the pretrain model provided, but got only 0.360 [it is supposed to be 0.383] mIoU when using 1% of the annotated training scans with 100 epochs. The detailed results are show in fig. The environment is the required version.

Training is stuck for pretraining

I found an error around line 200 in transforms.py. For some samples, it never goes out of the loop, which makes the training not work, meaning that len_indexes= sum_indexes=0

Downstream Object Detection Experiments

Could you kindly share the code via email([email protected]) used to finetune object detection models from OpenPCDet? I'm still reproducing the results of SLidR. It'll help me a lot.

valeoai / slidr Goto Github PK

slidr's People

Contributors

Stargazers

Watchers

Forkers

slidr's Issues

Recommend Projects

Recommend Topics

Recommend Org