Coder Social home page Coder Social logo

Comments (10)

psyhtest avatar psyhtest commented on May 28, 2024

It looks like a have a temporary problem with CUDA on this machine. Will reinstall the driver and try again.

But here are three points to consider for now to make the program more robust to such failures:

  • If the benchmark can only be executed with a CUDA version of TF, the CPU versions shouldn't be shown. If so, this should be easy to add such a restriction (will help here).

  • The benchmark should be updated, as otherwise it can stop working any moment:

WARNING:tensorflow:From /home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py:204: 
initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Use `tf.global_variables_initializer` instead.

(By the way, today is the 8th of March - The International Women's Day:).)

  • There's probably nothing we can do when downloading prebuilt libraries, but when we compile from the sources we should enable vector instructions depending on the target CPU support:
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

from ck-tensorflow.

psyhtest avatar psyhtest commented on May 28, 2024

With the CUDA driver back into action, the benchmark fails with a different error:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-03-08 13:05:59.217026: step 10, duration = 0.007
2017-03-08 13:05:59.284678: step 20, duration = 0.007
2017-03-08 13:05:59.311790: Forward across 25 steps, 0.006 +/- 0.001 sec / batch
Traceback (most recent call last):
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 241, in <module>
    tf.app.run()
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 237, in main
    run_benchmark()
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 226, in run_benchmark
    objective = loss(last_layer, labels)
  File "/home/anton/CK_REPOS/ck-tensorflow/dataset/benchmark-overfeat/benchmark-overfeat.py", line 105, in loss
    concated = tf.concat(1, [indices, labels])
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/ops/array_ops.py", line 1048, in concat
    ).assert_is_compatible_with(tensor_shape.scalar())
  File "/home/anton/CK_TOOLS/tensorflow-prebuilt-cuda-1.0.0-compiler.cuda-8.0.61-lib.cudnn-api-5.1.5-linux-64/lib/tensorflow/python/framework/tensor_shape.py", line 756, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 32, 1) and () are incompatible

Execution time: 86.313 sec.

Any ideas?

from ck-tensorflow.

fanranGit avatar fanranGit commented on May 28, 2024

@psyhtest, after 5616900 don't have such mistake with tf 1.0.0+. Please, check

from ck-tensorflow.

psyhtest avatar psyhtest commented on May 28, 2024

@fanranGit, thanks!

After your update, benchmark-overfeat behaves similarly to benchmark-googlenet (issue #4):

  • fails on the CPU due to:
InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
         [[Node: conv1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](conv1/Conv2D, conv1/biases/read)]]
  • works on the GPU but only if launched with LD_LIBRARY_PATH pointing to the CUDA RT and cuDNN:
$ ck run program:tensorflow --env.LD_LIBRARY_PATH=/usr/local/cuda-8.0.61/lib64:/usr/local/cudnn-5.1/lib64
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
2017-03-16 10:07:43.316209: step 10, duration = 0.006
2017-03-16 10:07:43.380252: step 20, duration = 0.006
2017-03-16 10:07:43.405881: Forward across 25 steps, 0.006 +/- 0.001 sec / batch
2017-03-16 10:07:44.139324: step 10, duration = 0.019
2017-03-16 10:07:44.329957: step 20, duration = 0.019
2017-03-16 10:07:44.405089: Forward-backward across 25 steps, 0.018 +/- 0.004 sec / batch

Execution time: 5.079 sec.

from ck-tensorflow.

psyhtest avatar psyhtest commented on May 28, 2024

The changes that resolved issue #4 have also resolved this one.

I've opened a new issue #8 to build TF with CPU vector instruction support.

from ck-tensorflow.

prabirsinha avatar prabirsinha commented on May 28, 2024

Can Some one tell me Why I am getting this error
" File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 756, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible.

Please help I am stuck here.

from ck-tensorflow.

prabirsinha avatar prabirsinha commented on May 28, 2024

What is the fix for this issue ---

File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1001, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "C:\Users\Prabir Sinha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 756, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible

Please provide some hint to fix this problem

from ck-tensorflow.

psyhtest avatar psyhtest commented on May 28, 2024

@prabirsinha Care to explain what are you trying to do, on which system and with which version of TensorFlow?

from ck-tensorflow.

prabirsinha avatar prabirsinha commented on May 28, 2024

from ck-tensorflow.

psyhtest avatar psyhtest commented on May 28, 2024

@prabirsinha This is a repository for CK-TensorFlow, that is a repository for managing TensorFlow with the Collective Knowledge framework for AI/SW/HW co-design and optimisation.

I'm afraid we cannot provide any guidance regarding TensorFlow, as as we are not its developers. However, if you have any questions about Collective Knowledge, we will be happy to help.

from ck-tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.