Hi Yoshitomo, My machine has 2 TitanV + Torch 1.7.1 + Cuda11.0 + Tor

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Thank you for providing the info <a class="user-mention notranslate" data-hovercard-ty

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

RuntimeError: CUDA error: device-side assert triggered,about yoshitomo-matsubara/torchdistill

Comments (8)

yoshitomo-matsubara commented on May 23, 2024 2

@PotatoThanh
That makes sense. Probably I forgot to add a couple of commands to rename and delete files or something.
Thank you for pointing out! I'll update the instructions.

from torchdistill.

yoshitomo-matsubara commented on May 23, 2024 1

Actually, I double-checked on Ubuntu 18 that tar -xvf ILSVRC2012_img_train.tar does not produce a folder ILSVRC2012_img_train/ but found some typos in commands to process validation dataset instead.
So the initial commands should be fine for training dataset.

from torchdistill.

yoshitomo-matsubara commented on May 23, 2024

Hi @PotatoThanh

I just tried to reproduce the error, but example/image_classification.py is running well on multiple GPUs with configs/sample/ilsvrc2012/single_stage/kd/alexnet_from_resnet152.yaml so far.

Could you provide 1) OS info, 2) Python ver., and 3) torchdistill ver. as well?
Also, if you have made any change on code and/or yaml config file, please share them here too.

Thank you

from torchdistill.

PotatoThanh commented on May 23, 2024

Hi @yoshitomo-matsubara,

I am using Ubuntu=20.04, TorchDistill=0.1.4, NvidiaDriver=450.102.04, Torch=1.7.1, Cuda=11.0, TorchVision=0.8.2.

I did not modify anything from your code as well as yaml files. I am trying to reproduce your results on ImageNet.

Thank you!

from torchdistill.

yoshitomo-matsubara commented on May 23, 2024

@PotatoThanh
And which python version are you using? Your provided environment is more or less the same with mine, so it should be fine as long as you're using Python 3.6 - 3.8 and you follow this instruction for ImageNet dataset

Besides, if you'd like to reproduce the results reported in my paper, please follow the instructions under configs/official/

As noted here, all the config files under configs/sample/ are not tuned, but used mostly for debugging purpose.

from torchdistill.

PotatoThanh commented on May 23, 2024

Thank you @yoshitomo-matsubara,

Yes, I am using Python 3.8.5 as well as your instructions for ImageNet. I ran yaml file in configs/sample/. Let me try configs/official/ and see

from torchdistill.

yoshitomo-matsubara commented on May 23, 2024

Thank you for providing the info @PotatoThanh

I'm assuming you're using the latest version in this repo (currently 627abd5) for image_classification.py.

If you still face the same error, please make sure that your ImageNet folder contains 1000 sub folders only as the following error message implies that sometimes targets contains at least one class index that is out of range 0 - 999

/opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.

from torchdistill.

PotatoThanh commented on May 23, 2024

Thank you @yoshitomo-matsubara,

I found the problem. When I preprocess the ImageNet using

mkdir ./resource/dataset/ilsvrc2012/{train,val} -p
mv ILSVRC2012_img_train.tar ./resource/dataset/ilsvrc2012/train/
cd ./resource/dataset/ilsvrc2012/train/
tar -xvf ILSVRC2012_img_train.tar
for f in *.tar; do
d=basename $f .tar
mkdir $d
(cd $d && tar xf ../$f)
done
rm -r *.tar

There is a folder name ILSVRC2012_img_train under ./resource/dataset/ilsvrc2012/train/. Therefore, when the code loads data, it will raise error.

from torchdistill.

RuntimeError: CUDA error: device-side assert triggered about torchdistill HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent