Comments (9)
To avoid OOM errors you should reduce the batchsize (-b
) option when running medaka_consensus
.
from medaka.
If I halve the batch size with -b 100, I get the same thing, but it fails allocating a tensor of 768M instead...
from medaka.
If absolutely nothing else is running on the GPU, I must admit I am at a loss here; a 16Gb GPU should easily handle a batch size of 100. Can you watch the output of nvidia-smi
before, during and after medaka_consensus
is running and report what you observe?
from medaka.
Before:
Tue May 28 17:44:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
During:
It ramps up to ~397MB for several seconds, then jumps up to 15,345 and quickly from there to 15,731, where it sat for several seconds before crashing.
Tue May 28 17:44:34 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 45C P0 57W / 300W | 15731MiB / 16130MiB | 86% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7890 C ...me/ubuntu/.conda/envs/medaka/bin/python 15721MiB |
+-----------------------------------------------------------------------------+
After:
Tue May 28 17:44:52 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 40W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
from medaka.
I notice from above you are using CUDA 10.1.
Which version of tensorflow are you using and how did you build/obtain it? The tensorflow version in the requirements.txt file is pinned at 1.12.2, and the binary for this available on pypi is built against CUDA 9.
medaka is untested with tensorflow versions other than the binary version available from pypi.
from medaka.
So I noticed this, and I’m using CUDA 9.0 toolkit and compiler, and tensorflow is installed through the medaka build chain, so it’s using 1.12.2. I’m not sure if the 10.1 listed is the driver version and separate from the toolkit and nvcc or if 10.1 snuck in somewhere. I’m testing on a clean google cloud instance running Ubuntu 16.04 with a single V100 attached, and never explicitly installed cuda 10.1, so I’m not sure if or how my versions can get mixed up.
from medaka.
@txje Did you resolve this issue? It might be useful for other users to know your resolution.
from medaka.
No, I have to stick with a smaller batch - 50 works. I can get it to run as expected on some other systems, but I haven't been able to resolve this installation.
from medaka.
Thanks for the feedback. It seems that the memory use varies with factors beyond out control; we've had some communication with Nvidia on this matter, there are some changes in tensorflow coming which will lower memory use for RNNs as used in medaka.
I will close this issue as short of reducing batch size there is not much we can advise.
from medaka.
Related Issues (20)
- Python 3.12 compatibility for pip HOT 1
- batch size and GPU use HOT 4
- ModelStoreTF exception <class 'tensorflow.python.framework.errors_impl.InternalError'> HOT 1
- Need help with 'AVX instructions not available' error HOT 2
- Medaka consensus error when stitching consensus chunks together HOT 7
- Medaka v1.11.3 ImportError - undefined symbol: libdeflate_free_compressor HOT 2
- Failed to run medaka consensus HOT 4
- [Question] Seeking Guidance on Selecting Appropriate Model for ONT Sequencing HOT 2
- failed to predict model HOT 6
- error of -d must be specified. HOT 2
- Use what kinds of reads for medaka consensus? HOT 4
- Suggestions of threshold of minimum mapping quality and read quality score HOT 2
- missing the bam file when running medaka_consensus HOT 1
- installing medaka (with pip) on M1 HOT 1
- Distribution for v1.12.0 on Bioconda please! HOT 1
- are multiple rounds of polishing with medaka recommended? HOT 4
- "Incomplete aux field" when running medaka_consensus for certain fastq files HOT 2
- Failed to run medaka consensus. HOT 2
- error with numpy 2.0.0 HOT 1
- r1041_e82_400bps_fast_v4.3.0 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from medaka.