Comments (6)
I have the same problem ! Can you tell me how to solve it?@ jeffreynghm
from tffrcnn.
@714586886 no solution yet... I am sure it is not the problem of the tensorflow as I tried running other program and it works well.
from tffrcnn.
@CharlesShang we meet this error,and i have spend 2 days on it.can you help me ?
i changed tensorflow from 0.11.0 to 1.0.0,it does not work.
from tffrcnn.
Bad install ...
I updated something from the CUDA toolkit and didn't realize that everything you don't install (explicitly check off) get's uninstalled rather than left alone.
UPDATED:
When I ignore all the messages cascading out of python and look at the original code trace:
This seems pretty self-explanatory (maybe that's too strong a word):
So, for some reason, a BLAS component is failing on allocating what appears to be a very small array, which it does not fail to allocate at other times ...
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device
(/gpu:0) -> (device: 0, name: GeForce GTX 950, pci bus id: 0000:01:00.0)
Initialized
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle:
CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390]
attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1022, in _do_call return fn(*args) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1004, in _run_fn status, run_metadata) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\contextlib.py", line 66, in __exit__ next(self.gen) File "C:\Users\BStudent\Anaconda4_3_1\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(500, 2), b.shape=(2, 64), m=500, n=64, k=2
_[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_1_0/7, Variable/read)]]
ORIGINAL:
I have been running the ToyGANs example code which can be found online (and required running the upgrade-to-TF-v1.0 script).
I have gotten it to run several times in a row with high performance, and then the problem suddenly starts happening. The one problem that might be correlated is when multiple processes are trying to hit the GPU via tensorflow at the same time, which only happens by accident in my world.
The problem does not affect CUDA demo suite procedures like nbody.
The problem also "survives" shutting down all python processes on the system and restarting them.
So it's maaayyybe possible that it's a CUDNN problem, if one of those DLLs is somehow stateful. Restarting seemed to fix it earlier, but I had made a lot of other changes as well.
from tffrcnn.
use nvidia-smi to see if gpu is overload, and use 'kill -9 pid' to kill the process.
from tffrcnn.
Thanks, Jane.
I actually wound up solving the problem by applying a more global strategy of never doing TensorFlow in Windows, and using Ubuntu instead. It can be a tremendous headache to install CUDA dev kit and correct Nvidia drivers on Ubuntu (or any Linux system), but once you do then there are far fewer mysterious problems with TF than on Windows.
Windows users are probably better off using Matlab or, recently, CNTK instead of Tensorflow - and if they insist on Tensorflow they should probably use the R version which is not officially supported by Goog, but is very well-maintained by the RStudio people.
from tffrcnn.
Related Issues (20)
- Error compiling Cython file
- My GPU card doesn't support cuda,can I use your code?
- easydict HOT 1
- How to load tensorflow's new v2 format checkpoint? HOT 1
- Meet some problems when training on my own data. See below. HOT 3
- ZF_imagenet.npy file
- nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified HOT 2
- NameError: name 'xrange' is not defined HOT 1
- AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'
- demo.py HOT 2
- Nontype error when run demo and train HOT 4
- A question about anchors HOT 1
- How to load half of the trained model, then go to that point to train
- About BachNorm Layer
- deform_conv.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev HOT 1
- Compile Error
- I want to ask how i can get the model when i run the demo.py
- TypeError: 'NoneType' object is not subscriptable
- For getting more accuracy in faster rcnn , which parameters i have to tune (tuning parameters)
- potential bug in __init__.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tffrcnn.