when I use train_df.py to train, It outputs: /bin/sh: 1: nvcc: not found Trace

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

train ERROR about df-net HOT 4 CLOSED

boyob commented on May 23, 2024

train ERROR

from df-net.

Comments (4)

Yuliang-Zou commented on May 23, 2024

Seems that you did not specify your CUDA path. If you try which nvcc, it should output nothing in your case.

from df-net.

ReekiLee commented on May 23, 2024

hi @boyob @Yuliang-Zou
Have you solved this error?
I meet a similar problem now. When I run python test_kitti_depth.py --dataset_dir=./dataset --output_dir=./prediction --ckpt_file=./pretrained --split="test"
I got an error like this:

backward_warp_op.cu.cc:5:54: fatal error: tensorflow/core/framework/register_types.h: No such file or directory
compilation terminated.
Traceback (most recent call last):
File "/root/DF-Net/core/UnFlow/src/e2eflow/ops.py", line 53, in
op_lib = tf.load_op_library(lib_path)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_kitti_depth.py", line 6, in
from core import DFLearner
File "/root/DF-Net/core/init.py", line 2, in
from .DFLearner import DFLearner
File "/root/DF-Net/core/DFLearner.py", line 14, in
from .UnFlow import flownet
File "/root/DF-Net/core/UnFlow/init.py", line 1, in
from .src import flownet
File "/root/DF-Net/core/UnFlow/src/init.py", line 1, in
from .e2eflow import flownet
File "/root/DF-Net/core/UnFlow/src/e2eflow/init.py", line 1, in
from .core import flownet
File "/root/DF-Net/core/UnFlow/src/e2eflow/core/init.py", line 1, in
from .flownet import flownet
File "/root/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 5, in
from ..ops import correlation
File "/root/DF-Net/core/UnFlow/src/e2eflow/ops.py", line 55, in
compile(n)
File "/root/DF-Net/core/UnFlow/src/e2eflow/ops.py", line 35, in compile
subprocess.check_output(nvcc_cmd, shell=True)
File "/usr/local/python3.7.5/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/usr/local/python3.7.5/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'nvcc -std=c++11 -c -gencode=arch=compute_30,code=sm_30 -o backward_warp_op.cu.o backward_warp_op.cu.cc -I /usr/local/cuda-10.0/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC' returned non-zero exit status 1.

My enviroment is TF1.15.0+cuda10.0+g++5.4.0
Here is Line31~41 of my ops.py:

    cuda_lib64_path_arg = "-L /usr/local/cuda-10.0/lib64"
    nvcc_cmd = "nvcc -std=c++11 -c -gencode=arch=compute_30,code=sm_30 -o {} -I /usr/local/cuda-10.0/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC"
    nvcc_cmd = nvcc_cmd.format(" ".join([fn_cu_o, fn_cu_cc]),
                               tf_inc)
    subprocess.check_output(nvcc_cmd, shell=True)

    gcc_cmd = "{} -std=c++11 -shared -o {} -I {} -fPIC -lcudart -D GOOGLE_CUDA=1 {}"
    gcc_cmd = gcc_cmd.format('g++ 5.4.0',
                            " ".join([fn_so, fn_cu_o, fn_cc]),
                             tf_inc,
                             cuda_lib64_path_arg)

I've been stucked with this problem for several days and I have to solve it urgently, could you please give me some advice?
Thank you in advance.

from df-net.

Yuliang-Zou commented on May 23, 2024

The code was developed using tf-1.2.0 version. Please switch to the same version to use it.

from df-net.

ReekiLee commented on May 23, 2024

The code was developed using tf-1.2.0 version. Please switch to the same version to use it.

Glad to receive your reply, but I have to train this model using TF1.15+CUDA10 because of my project. This error seems from UnFlow when compiling, so I'm going to run Unflow first.

from df-net.

train ERROR about df-net HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent