Hey, thx for your codes released!
OS: Ubuntu 16.04
CUDA: 8.0.44
GPU: TITAN X Pascal (11.2GB as the memory) X 4
I intend to train model in PASCAL VOC 2012, and I run CUDA_VISIBLE_DEVICES=0,1,2,3 python train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4 --epochs 40 --batch_size 1 --eval_interval 1 --dataset pascal
and the error message show as below
Namespace(arch_lr=0.003, arch_weight_decay=0.001, backbone='resnet', base_size=320, batch_size=1, checkname='deeplab-resnet', crop_size=320, cuda=True, dataset='pascal', epochs=40, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=0, loss_type='ce', lr=0.007, lr_scheduler='cos', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resize=512, resume=None, seed=1, start_epoch=0, sync_bn=True, test_batch_size=1, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)Number of images in train: 1464Number of images in val: 1449cuda finished
Using cos LR Scheduler!
Starting Epoch: 0 Total Epoches: 40 0%| | 0/1464 [00:00<?, ?it/s]=>Epoches 0, learning rate = 0.0070, previous best = 0.0000
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "train_autodeeplab.py", line 324, in main()
File "train_autodeeplab.py", line 317, in main trainer.training(epoch) File "train_autodeeplab.py", line 116, in training output = self.model(image)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/auto_deeplab.py", line 214, in forward level4_new_2 = self.cells[count] (self.level_4[-2], self.level_8[-1], weight_cells)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops))
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in return sum(w * op(x) for w, op in zip(weights, self._ops))
RuntimeError: CUDA error: out of memory
I guess I may fail to use multi-GPUs, so I even change a line the code into self.model = torch.nn.DataParallel(self.model, device_ids=[0, 1, 2, 3])
, but the same error message show again.
What can I do to resolve it, please
Thx in advance