Coder Social home page Coder Social logo

Comments (6)

714586886 avatar 714586886 commented on May 24, 2024

I have the same problem ! Can you tell me how to solve it?@ jeffreynghm

from tffrcnn.

jeffreynghm avatar jeffreynghm commented on May 24, 2024

@714586886 no solution yet... I am sure it is not the problem of the tensorflow as I tried running other program and it works well.

from tffrcnn.

714586886 avatar 714586886 commented on May 24, 2024

@CharlesShang we meet this error,and i have spend 2 days on it.can you help me ?
i changed tensorflow from 0.11.0 to 1.0.0,it does not work.

from tffrcnn.

BStudent avatar BStudent commented on May 24, 2024

Bad install ...
I updated something from the CUDA toolkit and didn't realize that everything you don't install (explicitly check off) get's uninstalled rather than left alone.

UPDATED:
When I ignore all the messages cascading out of python and look at the original code trace:
This seems pretty self-explanatory (maybe that's too strong a word):

So, for some reason, a BLAS component is failing on allocating what appears to be a very small array, which it does not fail to allocate at other times ...

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 950, pci bus id: 0000:01:00.0)
Initialized
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support

Traceback (most recent call last):
File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1022, in _do_call return fn(*args) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1004, in _run_fn status, run_metadata) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\contextlib.py", line 66, in __exit__ next(self.gen) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(500, 2), b.shape=(2, 64), m=500, n=64, k=2
_[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_1_0/7, Variable/read)]]

ORIGINAL:
I have been running the ToyGANs example code which can be found online (and required running the upgrade-to-TF-v1.0 script).
I have gotten it to run several times in a row with high performance, and then the problem suddenly starts happening. The one problem that might be correlated is when multiple processes are trying to hit the GPU via tensorflow at the same time, which only happens by accident in my world.
The problem does not affect CUDA demo suite procedures like nbody.
The problem also "survives" shutting down all python processes on the system and restarting them.
So it's maaayyybe possible that it's a CUDNN problem, if one of those DLLs is somehow stateful. Restarting seemed to fix it earlier, but I had made a lot of other changes as well.

from tffrcnn.

JaneTakanashi avatar JaneTakanashi commented on May 24, 2024

use nvidia-smi to see if gpu is overload, and use 'kill -9 pid' to kill the process.

from tffrcnn.

BStudent avatar BStudent commented on May 24, 2024

Thanks, Jane.
I actually wound up solving the problem by applying a more global strategy of never doing TensorFlow in Windows, and using Ubuntu instead. It can be a tremendous headache to install CUDA dev kit and correct Nvidia drivers on Ubuntu (or any Linux system), but once you do then there are far fewer mysterious problems with TF than on Windows.
Windows users are probably better off using Matlab or, recently, CNTK instead of Tensorflow - and if they insist on Tensorflow they should probably use the R version which is not officially supported by Goog, but is very well-maintained by the RStudio people.

from tffrcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.