Hi, I am trying to run training with the code while this error bounc

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Try to modify this <a href="https://github.com/vt-vl-lab/DF-Net/blob/master/core/UnFlo

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' about df-net HOT 8 CLOSED

vt-vl-lab commented on May 23, 2024

InvalidArgumentError: No OpKernel was registered to support Op 'Correlation'

from df-net.

Comments (8)

Yuliang-Zou commented on May 23, 2024

Hi @lhoangan , can you tell me what system configurations you are using? (i.e., python version, GPU, etc.) Also, did you modify the input argument?

from df-net.

lhoangan commented on May 23, 2024

Hi Yuliang-Zou, thanks for the quick reply. I'm using Linux Mint 17.2, cuda-8, 1 TitanX and I installed tensorflow with this command:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl

I have changed the num_gpu from 4 to 1

from df-net.

Yuliang-Zou commented on May 23, 2024

Hi @lhoangan , seems that you use python2? Might you try python3 instead? Also, I used the code on Ubuntu so I am not sure if the OS matters.

BTW, you might also need to change batch size to 1, otherwise, it cannot fit into the GPU.

from df-net.

lhoangan commented on May 23, 2024

Yes, I can try with it. Is there any preferable version of python3 like 3.4 or 3.5?
BTW, it seems that the PIL package requires python2.7, I thought I got an error of not having that PIL package before.

from df-net.

Yuliang-Zou commented on May 23, 2024

I used Python3.6. I think you can install pypng with pip and it should be fine.

from df-net.

lhoangan commented on May 23, 2024

I've changed to Python3.6
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl
but the problem still persists

Here's the full error message, in case it helps

{'alpha_image_loss': 0.85,
 'batch_size': 1,
 'beta1': 0.9,
 'checkpoint_dir': './checkpoint',
 'ckpt_dp': 'pretrained/cs_5frame_pre',
 'ckpt_flow': 'pretrained/unflowc_pre',
 'ckpt_pose': None,
 'continue_train': True,
 'cross_consistency': 0.5,
 'dataset_dir': 'raw',
 'depth_consistency': 0.2,
 'fix_pose': False,
 'flow_consistency': 0.2,
 'flow_smooth_weight': 3.0,
 'img_height': 320,
 'img_width': 1152,
 'learning_rate': 0.0001,
 'max_steps': 100000,
 'num_gpus': 1,
 'save_latest_freq': 5000,
 'scale_normalize': False,
 'seq_length': 5,
 'smooth_weight': 3.0,
 'summary_freq': 100}
WARNING:tensorflow:From /home/hale/TrimBot/projects/DF-Net/core/DFLearner.py:475: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
2018-09-16 00:02:46.514853: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514890: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514899: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514907: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.514915: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-09-16 00:02:46.928058: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:03:00.0
Total memory: 11.91GiB
Free memory: 10.53GiB
2018-09-16 00:02:46.928149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2018-09-16 00:02:46.928167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2018-09-16 00:02:46.928188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:03:00.0)
Traceback (most recent call last):
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1117, in _run_fn
    self._extend_graph()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1166, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_df.py", line 53, in <module>
    tf.app.run()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_df.py", line 50, in main
    learner.train(FLAGS)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 489, in train
    with sv.managed_session(config=config) as sess:
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
    start_standard_services=start_standard_services)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 279, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

Caused by op 'flow_prediction/flownet_c_3/Correlation_1', defined at:
  File "train_df.py", line 53, in <module>
    tf.app.run()
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_df.py", line 50, in main
    learner.train(FLAGS)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 467, in train
    self.build_train_graph()
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 81, in build_train_graph
    losses, grads = self.single_tower_operation(optim, tgt_image_dp_splits[i], src_image_stack_dp_splits[i], tgt_image_flow_splits[i], src_image_stack_flow_splits[i], tgt_image_splits[i], src_image_stack_splits[i], intrinsics_splits[i], model_idx=i)
  File "/home/hale/TrimBot/projects/DF-Net/core/DFLearner.py", line 148, in single_tower_operation
    tgt2src, src2tgt = flownet(tgt_image_flow, src_image_stack_flow[:,:,:,3*i:3*(i+1)], flownet_spec='C', backward_flow=True, reuse=reuse_variables)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 86, in flownet
    scoped_block(reuse)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 49, in scoped_block
    channel_mult=channel_mult)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/core/flownet.py", line 231, in flownet_c
    pad=20, kernel_size=1, max_displacement=20, stride_1=1, stride_2=2)
  File "/home/hale/TrimBot/projects/DF-Net/core/UnFlow/src/e2eflow/ops.py", line 64, in correlation
    return _correlation_module.correlation(first, second, **kwargs)[0]
  File "<string>", line 49, in correlation
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/hale/anaconda2/envs/tf1.2p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Correlation' with these attrs.  Registered devices: [CPU,GPU], Registered kernels:
  <no registered kernels>

	 [[Node: flow_prediction/flownet_c_3/Correlation_1 = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2, _device="/device:GPU:0"](flow_prediction/flownet_c_features_3/conv3_1/leaky_relu/Maximum, flow_prediction/flownet_c_features_3/conv3/leaky_relu/Maximum)]]

from df-net.

Yuliang-Zou commented on May 23, 2024

Try to modify this line, change the machine code from sm_30 to sm_52? Then delete all the previous generated .so files and re-run the code.

If possible, please also display the message for the compilation.

from df-net.

lhoangan commented on May 23, 2024

Hi @Yuliang-Zou, I found the error, I had tried to remove -D GOOGLE_CUDA=1 from the same line before, trying to solve another error, but it seemed to be the wrong call. I put it back, and it passes that error now (even with sm_30 or python2). It's my bad, sorry for wasting your time.

from df-net.

InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' about df-net HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent