python3 scenerf/scripts/train.py \
--bs=1 --n_gpus=1 --enable_log=True \
--preprocess_root=/home/trainer/Datasets/preprocess \
--root=/home/trainer/Datasets/Kitti/ \
--logdir=./kitti/logs \
--n_gaussians=4 --n_pts_per_gaussian=8 --max_epochs=50 --exp_prefix=Train
root@devbox:/home/trainer/scenerf# ./train.sh
Global seed set to 42
Using cache found in /root/.cache/torch/hub/rwightman_gen-efficientnet-pytorch_master
Loading base model ()...Done.
Removing last two layers (global_pool & classifier).
Building Encoder-Decoder model..Done.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Global seed set to 42
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All DDP processes registered. Starting ddp with 1 processes
----------------------------------------------------------------------------------------------------
00 5 23
01 4 11
02 7 21
03 10 20
04 7 8
05 2 22
06 7 23
07 2 22
09 7 20
10 5 20
Preprocess time: --- 3.7724807262420654 seconds ---
08 2 23
Preprocess time: --- 0.8777365684509277 seconds ---
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------------------------------
0 | spherical_mapping | SphericalMapping | 0
1 | net_rgb | UNet2DSphere | 231 M
2 | pe | PositionalEncoding | 0
3 | mlp | ResnetFC | 5.4 M
4 | mlp_gaussian | ResnetFC | 5.4 M
5 | ray_som | RaySOM | 0
---------------------------------------------------------
242 M Trainable params
0 Non-trainable params
242 M Total params
970.275 Total estimated model params size (MB)
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1120, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/lib/python3.8/queue.py", line 179, in get
self.not_empty.wait(remaining)
File "/usr/lib/python3.8/threading.py", line 306, in wait
gotit = waiter.acquire(True, timeout)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 6354) is killed by signal: Segmentation fault.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scenerf/scripts/train.py", line 161, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "scenerf/scripts/train.py", line 156, in main
trainer.fit(model, data_module)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
self._run(model)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
self._dispatch()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 990, in _dispatch
self.accelerator.start_training(self)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1000, in run_stage
return self._run_train()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_train
self._run_sanity_check(self.lightning_module)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1122, in _run_sanity_check
self._evaluation_loop.run()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 94, in advance
batch_idx, batch = next(dataloader_iter)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1316, in _next_data
idx, data = self._get_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1272, in _get_data
success, data = self._try_get_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 6354) exited unexpectedly
ERROR: Unexpected segmentation fault encountered in worker.