Detailed error description::
Traceback (most recent call last):
File "crnn_main.py", line 193, in
training()
File "crnn_main.py", line 110, in training
cost = trainBatch(crnn, criterion, optimizer, train_iter)
File "crnn_main.py", line 96, in trainBatch
cost = criterion(preds, text, preds_size, length) / batch_size
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/init.py", line 82, in forward
self.length_average, self.blank)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/init.py", line 32, in forward
blank)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/ffi/init.py", line 202, in safe_call
result = torch._C._safe_call(args, kwargs)
torch.FatalError: CUDA error: out of memory (allocate at /pytorch/aten/src/THC/THCCachingAllocator.cpp:510)
frame #0: THCudaMalloc + 0x79 (0x7f50f7b32e99 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/lib/libcaffe2_gpu.so)
frame #1: gpu_ctc + 0x134 (0x7f50f61f92a4 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorch/_warp_ctc/$
_warp_ctc.cpython-35m-x86_64-linux-gnu.so)
frame #2: + 0x1ad2 (0x7f50f61f8ad2 in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/warpctc_pytorch-0.1-py3.5-linux-x86_64.egg/warpctc_pytorc$
/_warp_ctc/__warp_ctc.cpython-35m-x86_64-linux-gnu.so)
frame #5: THPModule_safeCall(_object, _object, _object) + 0x4c (0x7f511e7a67cc in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/_C.cpython-35m-x86_64-l$
nux-gnu.so)
frame #8: python() [0x5401ef]
frame #11: python() [0x4ec358]
frame #14: THPFunction_apply(_object, _object) + 0x38f (0x7f511eb9383f in /home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/_C.cpython-35m-x86_64-linux-gnu.so)
frame #18: python() [0x4ec3f7]
frame #22: python() [0x4ec2e3]
frame #24: python() [0x4fbfce]
frame #26: python() [0x574db6]
frame #31: python() [0x53fc97]
frame #33: python() [0x60cb42]
frame #38: __libc_start_main + 0xf0 (0x7f513430a830 in /lib/x86_64-linux-gnu/libc.so.6)
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f50ec9151d0>>
Traceback (most recent call last):
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
return ForkingPickler.loads(res)
File "/home/ubuntu/suraj/TrainModel/venv/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
I am using ::
cuda: 8.0
python: 3.5
pytourch : 0.4.1
I am getting error while using cuda. It is running fine on cpu.