benchmark-overfeat fails about ck-tensorflow HOT 10 CLOSED

ctuning commented on May 28, 2024

benchmark-overfeat fails

from ck-tensorflow.

Comments (10)

psyhtest commented on May 28, 2024

It looks like a have a temporary problem with CUDA on this machine. Will reinstall the driver and try again.

But here are three points to consider for now to make the program more robust to such failures:

If the benchmark can only be executed with a CUDA version of TF, the CPU versions shouldn't be shown. If so, this should be easy to add such a restriction (will help here).
The benchmark should be updated, as otherwise it can stop working any moment:

WARNING:tensorflow:From /home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py:204: 
initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Use `tf.global_variables_initializer` instead.

(By the way, today is the 8th of March - The International Women's Day:).)

There's probably nothing we can do when downloading prebuilt libraries, but when we compile from the sources we should enable vector instructions depending on the target CPU support:

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

from ck-tensorflow.

psyhtest commented on May 28, 2024

With the CUDA driver back into action, the benchmark fails with a different error:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-03-08 13:05:59.217026: step 10, duration = 0.007
2017-03-08 13:05:59.284678: step 20, duration = 0.007
2017-03-08 13:05:59.311790: Forward across 25 steps, 0.006 +/- 0.001 sec / batch
Traceback (most recent call last):
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 241, in <module>
    tf.app.run()
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 237, in main
    run_benchmark()
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 226, in run_benchmark
    objective = loss(last_layer, labels)
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 105, in loss
    concated = tf.concat(1, [indices, labels])
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/ops/array_ops.py", line 1048, in concat
    ).assert_is_compatible_with(tensor_shape.scalar())
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/framework/tensor_shape.py", line 756, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 32, 1) and () are incompatible

Execution time: 86.313 sec.

Any ideas?

from ck-tensorflow.

fanranGit commented on May 28, 2024

@psyhtest, after 5616900 don't have such mistake with tf 1.0.0+. Please, check

from ck-tensorflow.

psyhtest commented on May 28, 2024

@fanranGit, thanks!

After your update, benchmark-overfeat behaves similarly to benchmark-googlenet (issue #4):

fails on the CPU due to:

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
         [[Node: conv1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](conv1/Conv2D, conv1/biases/read)]]

works on the GPU but only if launched with LD_LIBRARY_PATH pointing to the CUDA RT and cuDNN:

$ ck run program:tensorflow --env.LD_LIBRARY_PATH=/usr/local/cuda-8.0.61/lib64:/usr/local/cudnn-5.1/lib64
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-03-16 10:07:43.316209: step 10, duration = 0.006
2017-03-16 10:07:43.380252: step 20, duration = 0.006
2017-03-16 10:07:43.405881: Forward across 25 steps, 0.006 +/- 0.001 sec / batch
2017-03-16 10:07:44.139324: step 10, duration = 0.019
2017-03-16 10:07:44.329957: step 20, duration = 0.019
2017-03-16 10:07:44.405089: Forward-backward across 25 steps, 0.018 +/- 0.004 sec / batch

Execution time: 5.079 sec.

from ck-tensorflow.

psyhtest commented on May 28, 2024

The changes that resolved issue #4 have also resolved this one.

I've opened a new issue #8 to build TF with CPU vector instruction support.

from ck-tensorflow.

prabirsinha commented on May 28, 2024

Can Some one tell me Why I am getting this error
" File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 756, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible.

Please help I am stuck here.

from ck-tensorflow.

prabirsinha commented on May 28, 2024

What is the fix for this issue ---

File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1001, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 756, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible

Please provide some hint to fix this problem

from ck-tensorflow.

psyhtest commented on May 28, 2024

@prabirsinha Care to explain what are you trying to do, on which system and with which version of TensorFlow?

from ck-tensorflow.

prabirsinha commented on May 28, 2024

Hi Anton, Basically I am trying to run the tutorial program from the url https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb which is nothing but CNN for image classification with CIFAR10 image dataset on GPU version of Tensorflow version '0.12.0-rc0'.(python 3.5.2) When the execute the code on the python shell I got an error as attached . I am unable to proceed further. Let me know if you have any solution or some pointers . Thanks Prabir

On Tue, Oct 10, 2017 at 4:37 AM, Anton Lokhmotov ***@***.***> wrote: @prabirsinha <https://github.com/prabirsinha> Care to explain what are you trying to do, on which system and with which version of TensorFlow? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AX1aULiKTXkiY7jFMTWxGI7FGhTXqTE1ks5sqqcegaJpZM4MWvi0> .

Traceback (most recent call last): File "C:\MSRUS_MTECH2015\Individual_Dissertation\tensorflow_programs\TensorFlow-Tutorials-master\TensorFlow-Tutorials-master\cifar10_summaries.py", line 190, in <module> _, loss = create_network(training=True) File "C:\MSRUS_MTECH2015\Individual_Dissertation\tensorflow_programs\TensorFlow-Tutorials-master\TensorFlow-Tutorials-master\cifar10_summaries.py", line 184, in create_network y_pred, loss = main_network(images=images, training=training) File "C:\MSRUS_MTECH2015\Individual_Dissertation\tensorflow_programs\TensorFlow-Tutorials-master\TensorFlow-Tutorials-master\cifar10_summaries.py", line 165, in main_network max_pool(kernel=2, stride=2).\ File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1972, in method result = func(non_seq_layer, *args, **kwargs) File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\prettytensor\pretty_tensor_methods.py", line 217, in flatten return reshape(input_layer, [DIM_SAME, -1]) File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\prettytensor\pretty_tensor_class.py", line 1972, in method result = func(non_seq_layer, *args, **kwargs) File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\prettytensor\pretty_tensor_methods.py", line 196, in reshape reshape_tensor = tf.concat(reshape_tensor, 0) File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1003, in concat ).assert_is_compatible_with(tensor_shape.scalar()) File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 756, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (2, 1) and () are incompatible

from ck-tensorflow.

psyhtest commented on May 28, 2024

@prabirsinha This is a repository for CK-TensorFlow, that is a repository for managing TensorFlow with the Collective Knowledge framework for AI/SW/HW co-design and optimisation.

I'm afraid we cannot provide any guidance regarding TensorFlow, as as we are not its developers. However, if you have any questions about Collective Knowledge, we will be happy to help.

from ck-tensorflow.

benchmark-overfeat fails about ck-tensorflow HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent