Comments (10)
Processing the short regions efficiently has often been a thorny issue. We have made some changes to how these are handled in v0.6.4 which is available now on github and pypi (a conda package is waiting for a resolution to an issue in bioconda/bioconda-recipes#14433).
from medaka.
Unfortunately, the v0.6.4 update seems to have slowed things down severely (I was running v0.6.2 previously).
The same runs that were running relatively quickly (though still taking a disproportionate amount of time in the short regions and eventually failing) are now moving extremely slowly:
[23:04:31 - PWorker] 100.0% Done (747.7/747.7 Mbases) in 22537.3s
[04:34:34 - PWorker] All done, 530 remainder regions.
[04:34:34 - Predict] Processing 530 short region(s).
[04:34:34 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[04:34:34 - ModelLoad] With cudnn: False
[04:34:35 - ModelLoad] Loading weights from /mnt/grid/martienssen/hpc/home/data/eernst/src/medaka-env/lib/python3.6/site-packages/medaka/data/r941_flip235_model.hdf5
[04:34:35 - PWorker] Running inference for 0.0M draft bases.
[04:34:35 - Sampler] Initializing sampler for consensus of region ctg1:2996042-3000000.
[04:34:36 - Feature] Processed ctg1:2996042.0-2999999.1 (median depth 8.0)
[04:34:36 - Sampler] Took 0.66s to make features.
[04:34:36 - Sampler] Pileup for ctg1:2996042.0-2999999.1 is of width 4854
[09:38:42 - PWorker] All done, 0 remainder regions.
[09:38:42 - PWorker] Running inference for 0.0M draft bases.
[09:38:42 - Sampler] Initializing sampler for consensus of region ctg2:4213693-4215290.
[09:38:43 - Feature] Processed ctg2:4213693.0-4215289.0 (median depth 1.0)
[09:38:43 - Sampler] Took 0.20s to make features.
[09:38:43 - Sampler] Pileup for ctg2:4213693.0-4215289.0 is of width 1744
It's taking about 5 hours for each short region it appears:
$ grep -i process 03-medaka/logs/medaka-consensus.log
[...]
[04:34:34 - Predict] Processing 530 short region(s).
[04:34:36 - Feature] Processed ctg1:2996042.0-2999999.1 (median depth 8.0)
[09:38:43 - Feature] Processed ctg2:4213693.0-4215289.0 (median depth 1.0)
[15:11:00 - Feature] Processed ctg29:4128474.0-4130248.0 (median depth 2.0)
[21:29:19 - Feature] Processed ctg29:4133104.0-4135866.0 (median depth 2.0)
[04:07:00 - Feature] Processed ctg79:3945261.0-3945992.0 (median depth 1.0)
[09:07:24 - Feature] Processed ctg101:1131000.0-1132157.0 (median depth 1.0)
[14:03:16 - Feature] Processed ctg129:2999092.0-2999999.0 (median depth 8.0)
[20:13:54 - Feature] Processed ctg140:1242445.0-1251787.0 (median depth 3.0)
[01:14:01 - Feature] Processed ctg140:1251913.0-1255851.0 (median depth 1.0)
[06:08:58 - Feature] Processed ctg149:3267277.0-3268391.0 (median depth 1.0)
[11:04:10 - Feature] Processed ctg162:999000.0-1001270.0 (median depth 53.0)
[16:12:36 - Feature] Processed ctg165:2840290.0-2840425.0 (median depth 5.0)
from medaka.
I've managed to replicate this; in my case the program seems to be in an uninterruptible sleep state. Running medaka consensus
only on the region which made the full calculation hang, completes successfully.
We will continue to debug this, first by finding an example that doesn't take two hours before hanging!
from medaka.
We have identified the cause of the slowdown (some unnecessary verification of the output file), and will have a bugfix release ASAP.
from medaka.
Great timing! I have just found this with my sample run - it seems to be using just one thread too at this short read processing stage. I am looking forward to the fix!
Thanks!
from medaka.
We will have a new release later today.
from medaka.
medaka v0.6.5 is now available on github and pypi, a bioconda package should follow shortly.
from medaka.
I still saw this bug. Killing the job and only re-running this particular region fixed it and it ran through.
Just wanted to tell you that it still occurs in v0.6.5
Dominik
from medaka.
On all four of my datasets v0.6.5 now runs successfully in a reasonable amount of time. Thanks!
from medaka.
I've managed to replicate this; in my case the program seems to be in an uninterruptible sleep state. Running
medaka consensus
only on the region which made the full calculation hang, completes successfully.We will continue to debug this, first by finding an example that doesn't take two hours before hanging!
Hi,
I am running the latest conda installation of medaka 0.10.1, and I am encountering a similar error. The process seems to go to sleep randomly during the short region processing stage. It doesn't always occur within the same region/contig. Running medaka consensus on the contig in question works fine.
It does report back some messages concerning tensorflow. Not sure what to make of them.
0: [08:58:40 - Sampler] Region contig_10079:4499.0-6515.0 (2341 positions) is smaller than inference chunk length 10000, quarantining.
0: [08:58:40 - Sampler] Region contig_10079:7002.0-16161.0 (9899 positions) is smaller than inference chunk length 10000, quarantining.
0: [08:58:40 - Sampler] Region contig_10079:17079.0-19253.0 (2250 positions) is smaller than inference chunk length 10000, quarantining.
0: [08:58:40 - Sampler] Region contig_10079:19342.0-26003.0 (6823 positions) is smaller than inference chunk length 10000, quarantining.
0: [08:58:40 - Sampler] Region contig_10079:26516.0-29471.0 (3113 positions) is smaller than inference chunk length 10000, quarantining.
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11886 thread 798 bound to OS proc set 1
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11887 thread 799 bound to OS proc set 2
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11888 thread 800 bound to OS proc set 3
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11885 thread 797 bound to OS proc set 39
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11889 thread 801 bound to OS proc set 4
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11890 thread 802 bound to OS proc set 5
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11891 thread 803 bound to OS proc set 6
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11892 thread 804 bound to OS proc set 7
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11893 thread 805 bound to OS proc set 8
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11894 thread 806 bound to OS proc set 9
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11895 thread 807 bound to OS proc set 10
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11897 thread 809 bound to OS proc set 12
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11896 thread 808 bound to OS proc set 11
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11898 thread 810 bound to OS proc set 13
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11899 thread 811 bound to OS proc set 14
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11900 thread 812 bound to OS proc set 15
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11902 thread 814 bound to OS proc set 17
0: OMP: Info #250: KMP_AFFINITY: pid 8985 tid 11901 thread 813 bound to OS proc set 16
Valentin
from medaka.
Related Issues (20)
- [E::fai_retrieve] Failed to retrieve block: unexpected end of file HOT 1
- Error running medaka_consensus: cannot import name 'dtensor' from 'tensorflow.compat.v2.experimental' HOT 2
- medaka_consensus Model for Guppy 6.4.6 HOT 2
- Output .fasta is missing several contigs HOT 1
- Medaka installation problems and a question about models HOT 2
- Cannot import pyabpoa, some features may not be available. medaka 1.9.0 HOT 1
- Medaka running slow - only uses 1% of CPU
- How trustworthy is the quality score output of medaka smolecule using the --quality flag? HOT 1
- choosinge the correct medaka model HOT 1
- Please push new releases to Conda HOT 2
- illegal hardware instruction HOT 14
- Trying to install medaka on macOS with pip. HOT 2
- Failed to run medaka consensus HOT 3
- Error/warning message when running v1.9.0 docker HOT 1
- The medaka was installed successfully. But when I typed "conda activate medaka''. It indicates that: EnvironmentNameNotFound: Could not find conda environment: medaka. How to fix this? HOT 1
- how to add Base Quality (BQ) tags to vcf output in medaka_haploid_variant HOT 5
- How does medaka handle duplex reads/quality scores? HOT 1
- medaka_consensus outputs .fasta with option -q HOT 2
- tools resolve_model requires chardet HOT 1
- Sum of SR is higher than DP for medaka_haploid_variant output file HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from medaka.