Coder Social home page Coder Social logo

Comments (8)

Yuliang-Zou avatar Yuliang-Zou commented on May 23, 2024

Hi @lhoangan , can you tell me what system configurations you are using? (i.e., python version, GPU, etc.) Also, did you modify the input argument?

from df-net.

lhoangan avatar lhoangan commented on May 23, 2024

Hi Yuliang-Zou, thanks for the quick reply. I'm using Linux Mint 17.2, cuda-8, 1 TitanX and I installed tensorflow with this command:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl

I have changed the num_gpu from 4 to 1

from df-net.

Yuliang-Zou avatar Yuliang-Zou commented on May 23, 2024

Hi @lhoangan , seems that you use python2? Might you try python3 instead? Also, I used the code on Ubuntu so I am not sure if the OS matters.

BTW, you might also need to change batch size to 1, otherwise, it cannot fit into the GPU.

from df-net.

lhoangan avatar lhoangan commented on May 23, 2024

Yes, I can try with it. Is there any preferable version of python3 like 3.4 or 3.5?
BTW, it seems that the PIL package requires python2.7, I thought I got an error of not having that PIL package before.

from df-net.

Yuliang-Zou avatar Yuliang-Zou commented on May 23, 2024

I used Python3.6. I think you can install pypng with pip and it should be fine.

from df-net.

lhoangan avatar lhoangan commented on May 23, 2024

I've changed to Python3.6
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl
but the problem still persists

Here's the full error message, in case it helps

{'alpha_image_loss': 0.85,
 'batch_size': 1,
 'beta1': 0.9,
 'checkpoint_dir': './checkpoint',
 'ckpt_dp': 'pretrained/cs_5frame_pre',
 'ckpt_flow': 'pretrained/unflowc_pre',
 'ckpt_pose': None,
 'continue_train': True,
 'cross_consistency': 0.5,
 'dataset_dir': 'raw',
 'depth_consistency': 0.2,
 'fix_pose': False,
 'flow_consistency': 0.2,
 'flow_smooth_weight': 3.0,
 'img_height': 320,
 'img_width': 1152,
 'learning_rate': 0.0001,
 'max_steps': 100000,
 'num_gpus': 1,
 'save_latest_freq': 5000,
 'scale_normalize': False,
 'seq_length': 5,
 'smooth_weight': 3.0,
 'summary_freq': 100}
WARNING:tensorflow:From /home/hale/TrimBot/projects/DF-Net/core/DFLearner.py:475: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
2018-09-16 00:02:46.514853: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514890: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514899: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514907: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514915: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.928058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:03:00.0
Total memory: 11.91GiB
Free memory: 10.53GiB
2018-09-16 00:02:46.928149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2018-09-16 00:02:46.928167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2018-09-16 00:02:46.928188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
Traceback (most recent call last):
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1117, in _run_fn
    self._extend_graph()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1166, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_df.py", line 53, in <module>
    tf.app.run()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_df.py", line 50, in main
    learner.train(FLAGS)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 489, in train
    with sv.managed_session(config=config) as sess:
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
    start_standard_services=start_standard_services)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 279, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

Caused by op 'flow_prediction/flownet_c_3/Correlation_1', defined at:
  File "train_df.py", line 53, in <module>
    tf.app.run()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_df.py", line 50, in main
    learner.train(FLAGS)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 467, in train
    self.build_train_graph()
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 81, in build_train_graph
    losses, grads = self.single_tower_operation(optim, tgt_image_dp_splits[i], src_image_stack_dp_splits[i], tgt_image_flow_splits[i], src_image_stack_flow_splits[i], tgt_image_splits[i], src_image_stack_splits[i], intrinsics_splits[i], model_idx=i)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 148, in single_tower_operation
    tgt2src, src2tgt = flownet(tgt_image_flow, src_image_stack_flow[:,:,:,3*i:3*(i+1)], flownet_spec='C', backward_flow=True, reuse=reuse_variables)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 86, in flownet
    scoped_block(reuse)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 49, in scoped_block
    channel_mult=channel_mult)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 231, in flownet_c
    pad=20, kernel_size=1, max_displacement=20, stride_1=1, stride_2=2)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/ops.py", line 64, in correlation
    return _correlation_module.correlation(first, second, **kwargs)[0]
  File "<string>", line 49, in correlation
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

from df-net.

Yuliang-Zou avatar Yuliang-Zou commented on May 23, 2024

Try to modify this line, change the machine code from sm_30 to sm_52? Then delete all the previous generated .so files and re-run the code.

If possible, please also display the message for the compilation.

from df-net.

lhoangan avatar lhoangan commented on May 23, 2024

Hi @Yuliang-Zou, I found the error, I had tried to remove -D GOOGLE_CUDA=1 from the same line before, trying to solve another error, but it seemed to be the wrong call. I put it back, and it passes that error now (even with sm_30 or python2). It's my bad, sorry for wasting your time.

from df-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.