I have:
ubuntu@ubuntu~/MegaDepth$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176
CUDA works well with tensorflow and other sample cases
I haave a dual nvidia and intel chip
ubuntu@ubuntu:~/MegaDepth$ nvidia-smi Fri Jun 29 17:06:21 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.48 Driver Version: 390.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 860M Off | 00000000:01:00.0 Off | N/A | | N/A 47C P8 N/A / N/A | 1347MiB / 4046MiB | 1% Default | +-------------------------------+----------------------+-------------------
with Ubuntu 18.04. Torch and torchvision are fully installed for python2.7 and capable in sample projects.
However,
ubuntu@ubuntu:~/MegaDepth$ sudo python2 demo.py [sudo] password for ubuntu: ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints/ continue_train: False display_freq: 100 display_id: 1 display_winsize: 256 fineSize: 256 gpu_ids: [0, 1] identity: 0.0 input_nc: 3 isTrain: True lambda_A: 10.0 lambda_B: 10.0 loadSize: 286 lr: 0.0002 max_dataset_size: inf model: pix2pix nThreads: 2 name: test_local ndf: 64 ngf: 64 niter: 100 niter_decay: 100 no_flip: False no_html: False no_lsgan: False norm: instance output_nc: 3 phase: train pool_size: 50 print_freq: 100 save_epoch_freq: 5 save_latest_freq: 5000 serial_batches: False use_dropout: False which_epoch: latest which_model_netG: unet_256 -------------- End ---------------- ===========================================LOADING Hourglass NETWORK==================================================== Traceback (most recent call last): File "demo.py", line 15, in <module> model = create_model(opt) File "/home/ubuntu/MegaDepth/models/models.py", line 5, in create_model model = HGModel(opt) File "/home/ubuntu/MegaDepth/models/HG_model.py", line 18, in __init__ model= torch.nn.parallel.DataParallel(model, device_ids = [0,1]) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 102, in __init__ _check_balance(self.device_ids) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 17, in _check_balance dev_props = [torch.cuda.get_device_properties(i) for i in device_ids] File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 292, in get_device_properties raise AssertionError("Invalid device id") AssertionError: Invalid device id
I get the same error if I explicitly set the --gpu_ids and/or CUDA_VISIBLE_DEVICES=0 environmental variable
ubuntu@ubuntu-Lenovo-Y50-70:~/MegaDepth$ sudo CUDA_VISIBLE_DEVICES=0 python demo.py --gpu_ids=0 ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints/ continue_train: False display_freq: 100 display_id: 1 display_winsize: 256 fineSize: 256 gpu_ids: [0] identity: 0.0 input_nc: 3 isTrain: True lambda_A: 10.0 lambda_B: 10.0 loadSize: 286 lr: 0.0002 max_dataset_size: inf model: pix2pix nThreads: 2 name: test_local ndf: 64 ngf: 64 niter: 100 niter_decay: 100 no_flip: False no_html: False no_lsgan: False norm: instance output_nc: 3 phase: train pool_size: 50 print_freq: 100 save_epoch_freq: 5 save_latest_freq: 5000 serial_batches: False use_dropout: False which_epoch: latest which_model_netG: unet_256 -------------- End ---------------- ===========================================LOADING Hourglass NETWORK==================================================== Traceback (most recent call last): File "demo.py", line 15, in <module> model = create_model(opt) File "/home/ubuntu/MegaDepth/models/models.py", line 5, in create_model model = HGModel(opt) File "/home/ubuntu/MegaDepth/models/HG_model.py", line 18, in __init__ model= torch.nn.parallel.DataParallel(model, device_ids = [0,1]) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 102, in __init__ _check_balance(self.device_ids) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 17, in _check_balance dev_props = [torch.cuda.get_device_properties(i) for i in device_ids] File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 292, in get_device_properties raise AssertionError("Invalid device id") AssertionError: Invalid device id
if I try to set the --gpu_ids or CUDA_VISIBLE_DEVICES environmental variable to anything other than 0 through the arguments, I get
ubuntu@ubuntu-Lenovo-Y50-70:~/MegaDepth$ sudo CUDA_VISIBLE_DEVICES=1,2,3 python2 demo.py --gpu_ids=1 ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints/ continue_train: False display_freq: 100 display_id: 1 display_winsize: 256 fineSize: 256 gpu_ids: [1] identity: 0.0 input_nc: 3 isTrain: True lambda_A: 10.0 lambda_B: 10.0 loadSize: 286 lr: 0.0002 max_dataset_size: inf model: pix2pix nThreads: 2 name: test_local ndf: 64 ngf: 64 niter: 100 niter_decay: 100 no_flip: False no_html: False no_lsgan: False norm: instance output_nc: 3 phase: train pool_size: 50 print_freq: 100 save_epoch_freq: 5 save_latest_freq: 5000 serial_batches: False use_dropout: False which_epoch: latest which_model_netG: unet_256 -------------- End ---------------- ===========================================LOADING Hourglass NETWORK==================================================== ./checkpoints/test_local/best_vanila_net_G.pth THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=70 error=38 : no CUDA-capable device is detected Traceback (most recent call last): File "demo.py", line 15, in <module> model = create_model(opt) File "/home/ubuntu/MegaDepth/models/models.py", line 5, in create_model model = HGModel(opt) File "/home/ubuntu/MegaDepth/models/HG_model.py", line 21, in __init__ self.netG = model.cuda() File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 249, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 182, in _apply param.data = fn(param.data) File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 249, in <lambda> return self._apply(lambda t: t.cuda(device)) RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:70
If I try to set the gpu id to anything other than 0 in any case through python I get
>>> torch.cuda.set_device(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/.local/lib/python2.7/site-packages/torch/cuda/__init__.py", line 262, in set_device torch._C._cuda_setDevice(device) RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:32
I also tried running all these combinations for python3.6 and it just won't work...