Hi there, firstly thank you very much for your work. Upon trying to use your backbone to train a segmentation model, I run into a CUBLAS_STATUS_INTERNAL_ERROR:
2023-03-10 22:05:40,534 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
2023-03-10 22:05:40,534 - mmseg - INFO - Checkpoints will be saved to mmsegmentation/work_dirs/internimage_base_512 by HardDiskBackend.
2023-03-10 22:05:46,860 - mmseg - INFO - Iter [20/160000] lr: 7.600e-07, eta: 13:43:03, time: 0.309, data_time: 0.014, memory: 6998, decode.loss_ce: nan, decode.acc_seg: 7.1505, aux.loss_ce: nan, aux.acc_seg: 7.1649, loss: nan
Traceback (most recent call last):
File "mmsegmentation/train.py", line 162, in <module>
train_segmentor(model, datasets, cfg, distributed=False, validate=True,
File "mmsegmentation/mmseg/apis/train.py", line 194, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
iter_runner(iter_loaders[i], **kwargs)
File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 62, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "mmsegmentation/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(**data_batch)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File ".conda/envs/mmlab/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "mmsegmentation/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, **kwargs)
File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 139, in forward_train
x = self.extract_feat(img)
File "mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 65, in extract_feat
x = self.backbone(img)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 479, in forward
x, x_ = level(x, return_wo_downsample=True)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 316, in forward
x = blk(x)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 252, in forward
x = _inner_forward(x)
File "mmsegmentation/mmseg/models/backbones/intern_image.py", line 242, in _inner_forward
x = x + self.drop_path(self.gamma1 * self.norm1(self.dcn(x)))
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "mmsegmentation/ops_dcnv3/modules/dcnv3.py", line 276, in forward
x = self.output_proj(x)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File ".conda/envs/mmlab/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
# Name Version Build Channel
addict 2.4.0 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.4.26 h06a4308_0
certifi 2022.6.15 py39h06a4308_0
cffi 1.15.0 py39hd667e15_1
charset-normalizer 2.0.12 pypi_0 pypi
click 7.1.2 pypi_0 pypi
colorama 0.4.5 pypi_0 pypi
cryptography 37.0.1 py39h9ce1e76_0
cudatoolkit 11.3.1 h2bc3f7f_2
cycler 0.11.0 pypi_0 pypi
dcnv3 1.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.9.0 pypi_0 pypi
fonttools 4.33.3 pypi_0 pypi
freetype 2.11.0 h70c0345_0
giflib 5.2.1 h7b6447c_0
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
huggingface-hub 0.13.1 pypi_0 pypi
idna 3.3 pyhd3eb1b0_0
importlib-metadata 4.11.4 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
jpeg 9e h7f8727e_0
kiwisolver 1.4.3 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h2818925_1
libunistring 0.9.10 h27cfd23_0
libuv 1.40.0 h7b6447c_0
libwebp 1.2.2 h55f646e_0
libwebp-base 1.2.2 h7f8727e_0
lz4-c 1.9.3 h295c915_1
markdown 3.3.7 pypi_0 pypi
matplotlib 3.5.2 pypi_0 pypi
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
mmcls 0.23.1 pypi_0 pypi
mmcv-full 1.5.3 pypi_0 pypi
mmdet 2.28.1 pypi_0 pypi
mmsegmentation 0.25.0 dev_0 <develop>
model-index 0.1.11 pypi_0 pypi
ncurses 6.3 h7f8727e_2
nettle 3.7.3 hbbd107a_1
numpy 1.23.0 pypi_0 pypi
numpy-base 1.22.3 py39hf524024_0
opencv-python 4.6.0.66 pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openmim 0.1.6 pypi_0 pypi
openssl 1.1.1o h7f8727e_0
ordered-set 4.1.0 pypi_0 pypi
packaging 21.3 pypi_0 pypi
pandas 1.4.3 pypi_0 pypi
pillow 9.1.1 pypi_0 pypi
pip 21.2.4 py39h06a4308_0
prettytable 3.3.0 pypi_0 pypi
pycocotools 2.0.6 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py39h06a4308_0
python 3.9.12 h12debd9_1
python-dateutil 2.8.2 pypi_0 pypi
pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2022.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1.2 h7f8727e_1
requests 2.28.0 pypi_0 pypi
scipy 1.10.1 pypi_0 pypi
setuptools 61.2.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.5 hc218d9a_0
tabulate 0.8.10 pypi_0 pypi
termcolor 2.2.0 pypi_0 pypi
terminaltables 3.1.10 pypi_0 pypi
timm 0.6.11 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
torchaudio 0.11.0 py39_cu113 pytorch
torchvision 0.12.0 py39_cu113 pytorch
tqdm 4.65.0 pypi_0 pypi
typing-extensions 4.2.0 pypi_0 pypi
typing_extensions 4.1.1 pyh06a4308_0
tzdata 2022a hda174b7_0
urllib3 1.26.9 py39h06a4308_0
wcwidth 0.2.5 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7f8727e_1
yacs 0.1.8 pypi_0 pypi
yapf 0.32.0 pypi_0 pypi
zipp 3.8.0 pypi_0 pypi
zlib 1.2.12 h7f8727e_2
zstd 1.5.2 ha4553b6_0