nanoporetech / medaka Goto Github PK
View Code? Open in Web Editor NEWSequence correction provided by ONT Research
Home Page: https://nanoporetech.com
License: Other
Sequence correction provided by ONT Research
Home Page: https://nanoporetech.com
License: Other
Hi,
I am trying to run the 0.6.0-alpha.1 version of medaka to look at variant calling. I installed from source but upon running I get the following error:
Running medaka consensus /home/dct7/medaka/scripts/test.reads.sorted.bam
======================================
Traceback (most recent call last):
File "/home/dct7/medaka/venv/bin/medaka", line 11, in <module>
load_entry_point('medaka==0.6.0a1', 'console_scripts', 'medaka')()
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2793, in load_entry_point
return ep.load()
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2411, in load
return self.resolve()
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2417, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/medaka.py", line 11, in <module>
from medaka.inference import train, predict
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/inference.py", line 22, in <module>
from medaka.features import SampleGenerator
File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/features.py", line 23, in <module>
import libmedaka
ModuleNotFoundError: No module named 'libmedaka'
Failed to run medaka consensus.
Any suggestions?
Many thanks,
Damien
Hi,
I would like to test the medaka_variant pipeline to do SNP detection and haplotyping.
We did basecalling of nanopore data using guppy version 2.2.3
and the model dna_r9.4.1_450bps_flipflop_prom.cfg
(json file: template_r9.4.1_450bps_large_flipflop_prom.jsn).
There are three possible options to choose for base-call models in medaka_variant (r941_trans', 'r941_flip213', 'r941_flip235'
) and i am unsure which one (r941_flip213
or r941_flip235
) supports best our data.
Could you tell me which is preferentially used for the data described?
Thank you,
Michel
Enhancement request: I'm keen to give Medaka a try, but I was wondering if ONT could provide a pre-trained model to use? Perhaps a model trained on the same datasets ONT uses to train basecallers? Or maybe separate models based on organism type: bacterial, human, etc.
I saw the medaka/test/data/test_model.hdf5
file and assumed that was just for running the tests. But let me know if it is appropriate to use that model more generally.
Ryan
However I got an error when running:
medaka_variant -r GRCh38_full_analysis_set_plus_decoy_hla.fa -b rel5-guppy-0.3.0-chunk10k.sorted.bam -m r94 -R chr22
The reference file and bam file are downloaded from
https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-genome/rel5.md
[15:11:34 - Feature] Processed chr22:43999000.0-45000000.0 (median depth 34.0)
[15:11:34 - Sampler] Took 3.93s to make features.
[15:11:53 - PWorker] 46.8% Done (23.8/50.9 Mbases) in 532.3s
[15:11:54 - Sampler] Initializing sampler for consensus or region chr22:44999000-46000000.
python3.6: sam.c:1550: resolve_cigar2: Assertion `k < c->n_cigar' failed.
python/Python-3.6.3/bin/medaka_variant: line 75: 24999 Aborted (core dumped) medaka consensus ${BAM} ${PROBS} --model ${MODEL} --batch_size ${BATCH_SIZE} ${REGIONS} --threads ${THREADS} ${EXTRAOPTS}
From the error message above, it seems there are some bad cigar in the aligned bam? This problem happened to other chromosomes as well. Please let me know if I shall open a separate ticket on this. Thank you!
Originally posted by @sharon558 in #25 (comment)
Running nanopore draft assemblies from Flye, run through racon 4 times and medaka fails at the consensus stage. The basecalls were made by guppy version 2.3.5. Medaka was installed into its own environment in conda.
Checking program versions
Program Version Required Pass
bgzip 1.9 1.9 True
minimap2 2.11 2.11 True
samtools 1.9 1.9 True
tabix 1.9 1.9 True
Aligning basecalls to draft
-P option is deprecated
Found minimap files.
[M::main::1.814*1.00] loaded/built the index for 5496 target sequence(s)
[M::mm_mapopt_update::2.293*1.00] mid_occ = 52
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 5496
[M::mm_idx_stat::2.685*1.00] distinct minimizers: 20489295 (72.38% are singletons); average occurrences: 1.722; average spacing: 5.361
[M::worker_pipeline::51.221*14.92] mapped 79682 sequences
[M::worker_pipeline::83.219*19.17] mapped 76831 sequences
[M::worker_pipeline::107.127*21.37] mapped 75735 sequences
[M::worker_pipeline::130.271*22.80] mapped 77551 sequences
[M::worker_pipeline::156.134*23.91] mapped 76292 sequences
[M::worker_pipeline::181.773*24.29] mapped 75865 sequences
[M::worker_pipeline::211.141*24.11] mapped 76293 sequences
[M::worker_pipeline::236.326*24.68] mapped 75231 sequences
[M::worker_pipeline::260.724*25.09] mapped 77703 sequences
[M::worker_pipeline::287.333*25.15] mapped 76443 sequences
[M::worker_pipeline::314.211*24.76] mapped 75857 sequences
[M::worker_pipeline::339.408*25.12] mapped 78103 sequences
[M::worker_pipeline::355.932*25.30] mapped 83441 sequences
[M::worker_pipeline::374.894*25.32] mapped 84073 sequences
[M::worker_pipeline::400.614*25.06] mapped 78833 sequences
[M::worker_pipeline::428.461*25.36] mapped 72657 sequences
[M::worker_pipeline::452.870*25.56] mapped 72243 sequences
[M::worker_pipeline::480.282*25.62] mapped 73771 sequences
[M::worker_pipeline::508.286*25.35] mapped 71908 sequences
[M::worker_pipeline::536.047*25.57] mapped 71866 sequences
[M::worker_pipeline::560.878*25.73] mapped 72110 sequences
[M::worker_pipeline::587.958*25.77] mapped 71773 sequences
[M::worker_pipeline::609.060*25.68] mapped 78327 sequences
[M::worker_pipeline::613.150*25.58] mapped 55049 sequences
[M::main] Version: 2.11-r797
[M::main] CMD: minimap2 -x map-ont --MD -t 32 -a /home/ubuntu/stan/nanopore/flye_assemblies/D3S/D3S.flye.assembly.racon4.fasta.mmi /home/ubuntu/stan/nanopore/raw_data/D3S/D3S_cat.fasta
[M::main] Real time: 613.456 sec; CPU: 15683.244 sec
[bam_sort_core] merging from 0 files and 32 in-memory blocks...
Running medaka consensus
Using TensorFlow backend.
/home/ubuntu/miniconda3/envs/medaka/bin/medaka_consensus: line 88: 3068 Floating point exception(core dumped) medaka consensus ${CALLS2DRAFT}.bam ${CONSENSUSPROBS} --model ${MODEL} --batch_size ${BATCH_SIZE} --threads ${THREADS}
Failed to run medaka consensus.
This is what's in the output file:
total 2128508
-rw-rw-r-- 1 ubuntu ubuntu 2178192425 Apr 25 14:30 calls_to_draft.bam
-rw-rw-r-- 1 ubuntu ubuntu 1392768 Apr 25 14:30 calls_to_draft.bam.bai
Anyone know where I'm going wrong?
Hi,
When running medaka, the stitching step seems to run into some kind of exceptions with the assembly we have:
[03:18:31 - Stitch] Processing ctg1641.
[03:18:34 - Stitch] Processing ctg1642.
[03:18:42 - Stitch] There is no overlap betwen ctg1642:32609.1-37277.1643 and ctg1642:37276.3367-41429.0
[03:18:43 - Stitch] ctg1642:37276.644-39589.1 ends before ctg1642:37276.3367-41429.0, skipping.
ctg127:259136.66-261946.5049 and ctg127:261945.13050-264482.0 do not overlap
ctg148:271139.1918-271157.3508 and ctg148:271156.11509-273932.0 do not overlap
ctg149:1355570.47-1356768.7744 and ctg149:1356767.15745-1362547.0 do not overlap
ctg1642:32609.1-37277.1643 and ctg1642:37276.3367-41429.0 do not overlap
Traceback (most recent call last):
File "/local/genome/packages/anaconda3/latest/bin/medaka", line 11, in <module>
sys.exit(main())
File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/medaka.py", line 213, in main
args.func(args)
File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/stitch.py", line 80, in stitch
joined = stitch_from_probs(args.inputs, regions=args.regions)
File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/stitch.py", line 71, in stitch_from_probs
logger.info(msg.format(s1.name, s2.name))
AttributeError: 'NoneType' object has no attribute 'name'
Failed to stitch consensus chunks.
I thought its similar to #20 , but we are using version medaka 0.5.0 already.
command used:
DRAFT=wtdbg2-medakaPolish_p19-k0-L10k-AS3-s0p05-e3.ctg.fa
BASECALLS=1-fastp-guppyNewVer-Recall-bases/Cod.recalled.4kb.q7.50h.fastq.gz
OUTDIR=medaka_trial_j
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t 25 -m 'r941_flip'
Thank you,
Michel
Hi,
I am trying to clone the repository, and I am being asked for a username and password
My username and password lead to a "Authentication failed" response
Cheers,
Vineeth
Title says it all :)
Regards
Søren
I've noticed more than once now that medaka_variant produces an unsorted vcf file, which will then downstream crash the whatshap phase step with the following error:
jTraceback (most recent call last):
File "/home/wdecoster/anaconda3/envs/medaka/bin/whatshap", line 12, in <module>
sys.exit(main())
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/__main__.py", line 83, in main
module.main(args)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/phase.py", line 1114, in main
run_whatshap(**vars(args))
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/phase.py", line 663, in run_whatshap
for variant_table in vcf_reader:
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/vcf.py", line 329, in __iter__
yield self._process_single_chromosome(chromosome, records)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/vcf.py", line 379, in _process_single_chromosome
raise VcfNotSortedError('VCF not ordered: {}:{} appears before {}:{}'.format(chromosome, prev_position+1, chromosome, pos+1))
whatshap.vcf.VcfNotSortedError: VCF not ordered: chr4:155958777 appears before chr4:155957327
Cheers,
Wouter
The process ran until
...
[03:05:22 - Stitch] Processing ctg254.
[03:05:29 - Stitch] Processing ctg255.
[03:05:36 - Stitch] Processing ctg256.
[03:05:47 - Stitch] Processing ctg257.
[03:05:59 - Stitch] Processing ctg258.
Traceback (most recent call last):
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.4.3', 'console_scripts', 'medaka')()
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/medaka.py", line 170, in main
args.func(args)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/stitch.py", line 81, in stitch
joined = stitch_from_probs(args.inputs, regions=args.regions, model_yml=args.model_yml)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/stitch.py", line 62, in stitch_from_probs
end_1_ind, start_2_ind = get_sample_overlap(s1, s2)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/common.py", line 487, in get_sample_overlap
assert len(pos1_ovl) == len(pos2_ovl)
AssertionError
Originally posted by @chklopp in #16 (comment)
Hi,
Just wanted to check whether the models released as part of guppy v3.0.3 compatible with the medaka models specifically the r941_flip235 model. As of 3.0.3 all of the r9.4.1 models are now flip-flop models so are they the same as those in medaka. Reason I ask is that I noticed a very slight drop in accuracy (99.87 -> 99.85!) when re-basecalling the same data with v3.0.3.
Thanks.
Damien
Hi,
I got medaka running, and successfully trained it on one of our datasets - I stopped the training at Epoch 108 as I saw acc not improving (it had reached 1.0000, and val_acc was at 0.9996)
On trying to generate the consensus, I end up with this error:
Traceback (most recent call last):
File "/home/ngs/medaka/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.1.0', 'console_scripts', 'medaka')()
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/medaka.py", line 69, in main
args.func(args)
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/inference.py", line 372, in predict
model, encoding = load_model_hdf(args.model, args.encoding, need_encoding=True)
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/inference.py", line 127, in load_model_hdf
raise KeyError("Could not find label encodings in the model, please provide an encoding json")
KeyError: 'Could not find label encodings in the model, please provide an encoding json'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f7d0b991c50>>
Traceback (most recent call last):
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 696, in __del__
TypeError: 'NoneType' object is not callable
Since I see that the alphabet considered is A,T,G,C,* and the draft reference I use has a N in it, could this be the problem ?
Cheers and TIA,
Vineeth
Tested medaka (v0.6.0a3) on chr3 of the NA12878 sample downloaded from the following link:
https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-genome
cmd:
medaka_variant -r GRCh38_full_analysis_set_plus_decoy_hla.fa -b rel5-guppy-0.3.0-chunk10k.sorted.bam -m r94 -R chr3
Errors:
[13:59:00 - PWorker] 100.0% Done (198.5/198.5 Mbases) in 4397.1s
[13:59:06 - PWorker] All done
[13:59:06 - Predict] Finished processing all regions.
[13:59:21 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[13:59:25 - SNPs] Processing chr3.
Traceback (most recent call last):
File "python/Python-3.6.3/bin/medaka", line 11, in
sys.exit(main())
File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/medaka.py", line 257, in main
args.func(args)
File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 347, in snps
find_snps(args.inputs, args.ref_fasta, args.output, regions=args.regions, threshold=args.threshold, ref_vcf=args.ref_vcf)
File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 178, in find_snps
ref_seq_encoded = np.fromiter((label_encoding[ref_seq[i]] for i in major_pos), int, count=len(major_pos))
File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 178, in
ref_seq_encoded = np.fromiter((label_encoding[ref_seq[i]] for i in major_pos), int, count=len(major_pos))
KeyError: 'B'
The output and error files can be found in the attached zip folder.
Thank you!
Hi,
bcftools complains about missing definitions in the vcf header:
[W::vcf_parse] INFO 'pos2' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'q2' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GT' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GQ' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'pos1' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'q1' is not defined in the header, assuming Type=String
[E::bcf_write] Unchecked error (2), exiting
I'll write a script to modify 3000 vcf files before being able to merge the results, so not an urgent problem, but might be good for the next release.
Thanks!
I'm trying running Medaka Variant on an Nvidia DGX box (8xTesla GPUs). I have a few queries:
gpustat:
Tue Apr 2 13:04:00 2019 410.78
[0] Tesla V100-SXM2-32GB | 35'C, 0 % | 30932 / 32480 MB | koneill(30921M)
[1] Tesla V100-SXM2-32GB | 35'C, 0 % | 502 / 32480 MB | koneill(491M)
[2] Tesla V100-SXM2-32GB | 34'C, 0 % | 502 / 32480 MB | koneill(491M)
[3] Tesla V100-SXM2-32GB | 34'C, 0 % | 502 / 32480 MB | koneill(491M)
[4] Tesla V100-SXM2-32GB | 36'C, 0 % | 502 / 32480 MB | koneill(491M)
[5] Tesla V100-SXM2-32GB | 37'C, 0 % | 502 / 32480 MB | koneill(491M)
[6] Tesla V100-SXM2-32GB | 38'C, 0 % | 502 / 32480 MB | koneill(491M)
[7] Tesla V100-SXM2-32GB | 38'C, 0 % | 502 / 32480 MB | koneill(491M)
I'm also testing it on CPUs, and it seems to be using ~70-80 CPus. However, the servers I'm running it on are shared, and it would be best if I could control CPU usage so as to be a considerate co-user of these resources. Is there a parameter that can be passed to medaka_variant to limit the maximum number of CPUs/threads?
For the GPU run, it seems to have gotten stuck. It ran for 14 hours to do the "long regions", then had a 4:45 hour gap before processing the short regions (with no logging during this time). It's now another four hours later after doing the short regions, and it's been sitting in a single CPU thread, not using the GPU or other CPUs for that entire time.
Is this normal/expected behaviour?
Would it be possible to get more granular logging during these long delays?
Would it be possible to parallelise better during these times?
Logs below (snipped)
medaka_variant -r GRCh37-lite.fa \
-b promethion_NA19240.bam.dup.bam \
-o medaka_variant_gpu \
-t 4
+ medaka_variant -r GRCh37-lite.fa -b promethion_NA19240.bam.dup.bam -o medaka_variant_gpu -t 4
Checking program versions
Program Version Required Pass
bgzip 1.9 1.9 True
minimap2 2.11 2.11 True
samtools 1.9 1.9 True
tabix 1.9 1.9 True
======================================
Running medaka consensus /projects/koneill_prj/promethion/promethion_NA19240.bam.dup.bam
======================================
Using TensorFlow backend.
[15:11:30 - Predict] Processing region(s): 1:0-249250621 2:0-243199373 3:0-198022430 4:0-191154276 5:0-180915260 6:0-171115067 7:0-159138663 8:0-146364022 9:0-141213431 10:0-135534747 11:0-135006516 12:0-133851895 13:0-115169878 14:0-107349540 15:0-102531392 16:0-90354753 17:0-81195210 18:0-78077248 19:0-59128983 20:0-63025520 21:0-48129895 22:0-51304566 X:0-155270560 Y:0-59373566 MT:0-16569 GL000207.1:0-4262 GL000226.1:0-15008 GL000229.1:0-19913 GL000231.1:0-27386 GL000210.1:0-27682 GL000239.1:0-33824 GL000235.1:0-34474 GL000201.1:0-36148 GL000247.1:0-36422 GL000245.1:0-36651 GL000197.1:0-37175 GL000203.1:0-37498 GL000246.1:0-38154 GL000249.1:0-38502 GL000196.1:0-38914 GL000248.1:0-39786 GL000244.1:0-39929 GL000238.1:0-39939 GL000202.1:0-40103 GL000234.1:0-40531 GL000232.1:0-40652 GL000206.1:0-41001 GL000240.1:0-41933 GL000236.1:0-41934 GL000241.1:0-42152 GL000243.1:0-43341 GL000242.1:0-43523 GL000230.1:0-43691 GL000237.1:0-45867 GL000233.1:0-45941 GL000204.1:0-81310 GL000198.1:0-90085 GL000208.1:0-92689 GL000191.1:0-106433 GL000227.1:0-128374 GL000228.1:0-129120 GL000214.1:0-137718 GL000221.1:0-155397 GL000209.1:0-159169 GL000218.1:0-161147 GL000220.1:0-161802 GL000213.1:0-164239 GL000211.1:0-166566 GL000199.1:0-169874 GL000217.1:0-172149 GL000216.1:0-172294 GL000215.1:0-172545 GL000205.1:0-174588 GL000219.1:0-179198 GL000224.1:0-179693 GL000223.1:0-180455 GL000195.1:0-182896 GL000212.1:0-186858 GL000222.1:0-186861 GL000200.1:0-187035 GL000193.1:0-189789 GL000194.1:0-191469 GL000225.1:0-211173 GL000192.1:0-547496
[15:11:30 - Predict] Setting tensorflow threads to 4.
[15:11:36 - Predict] Found 3171 long and 2 short regions.
[15:11:36 - Predict] Processing long regions.
[15:11:36 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[15:11:37 - ModelLoad] Loading weights from /projects/koneill_prj/conda/envs/medaka_gpu/lib/python3.6/site-packages/medaka/data/r941_trans_model.hdf5
[15:11:38 - PWorker] Running inference for 3104.9M draft bases.
[15:11:38 - Sampler] Initializing sampler for consensus or region 1:0-1000000.
[15:11:39 - Feature] Pileup counts do not span requested region, requested 1:0-1000000, received 10000-999999.
[15:11:40 - Feature] Processed 1:10000.0-1000000.1 (median depth 64.0)
[15:11:40 - Sampler] Took 2.13s to make features.
<<<snip>>>
[05:05:09 - Sampler] Initializing sampler for consensus or region GL000225.1:0-211173.
[05:05:13 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50015.4s
[05:05:18 - Feature] Processed GL000225.1:0.0-211173.0 (median depth 709.0)
[05:05:18 - Sampler] Took 8.14s to make features.
[05:05:18 - Sampler] Initializing sampler for consensus or region GL000192.1:0-547496.
[05:05:19 - Feature] Pileup counts do not span requested region, requested GL000192.1:0-547496, received 3095-547495.
[05:05:19 - Feature] Processed GL000192.1:3095.0-547496.0 (median depth 59.0)
[05:05:19 - Sampler] Took 1.37s to make features.
[05:05:25 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50027.6s
[05:05:42 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50044.4s
[05:05:55 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50057.1s
[05:06:10 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50072.5s
[05:06:22 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50084.6s
[05:06:38 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50100.7s
[05:06:50 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50112.8s
[05:07:03 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50125.6s
[05:07:19 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50141.0s
[05:07:30 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50152.9s
[05:07:44 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50166.1s
[05:07:57 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50179.3s
[09:43:38 - PWorker] All done
[09:43:38 - Predict] Processing short regions
[09:43:38 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[09:43:39 - ModelLoad] Loading weights from /projects/koneill_prj/conda/envs/medaka_gpu/lib/python3.6/site-packages/medaka/data/r941_trans_model.hdf5
[09:43:39 - PWorker] Running inference for 0.0M draft bases.
[09:43:40 - Sampler] Initializing sampler for consensus or region 11:134999000-135006516.
[09:43:40 - Feature] Pileup-feature is zero-length for 11:134999000-135006516 indicating no reads in this region.
[09:43:40 - Sampler] Took 0.12s to make features.
[09:43:41 - PWorker] All done
[09:43:41 - PWorker] Running inference for 0.0M draft bases.
[09:43:41 - Sampler] Initializing sampler for consensus or region GL000207.1:0-4262.
[09:43:41 - Feature] Pileup counts do not span requested region, requested GL000207.1:0-4262, received 0-4254.
[09:43:41 - Feature] Processed GL000207.1:0.0-4255.0 (median depth 18.0)
[09:43:41 - Sampler] Took 0.09s to make features.
Hi,
we are regularly using medaka now. Are you planning to build a Conda package?
Cheers,
Felix
Hello,
I would like to call variants on realigned reads to an assembly and at the same time polish the assembly. I see that medaka_variant produces a hdf file (round_0_hap_mixed_probs.hdf
) similar to the one used in medaka_consensus. Is it advisable to use the same hdf, so one could go directly for stitching with round_0_hap_mixed_probs.hdf
?
Or are different mapping parameters used to produce consensus_probs.hdf
?
Thanks,
Michel
I'm currently testing the medaka variant pipeline and seem to have encountered an error. I've copied the tail of the stderr below...
[23:36:46 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg586:0-59347.
[23:36:48 - Feature] Pileup counts do not span requested region, requested Consensus_Consensus_Consensus_ctg586:0-59347, received 0-59345.
[23:36:48 - Feature] Processed Consensus_Consensus_Consensus_ctg586:0.0-59346.0 (median depth 42.0)
[23:36:48 - Sampler] Took 1.21s to make features.
[23:36:48 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg587:0-57289.
[23:36:48 - Feature] Could not process sample with bam_to_sample_c, using python code instead.
(index 0 is out of bounds for axis 0 with size 0).
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/common.py", line 446, in gen_to_queue
for item in generator:
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/common.py", line 477, in grouper
batch.append(next(gen))
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/inference.py", line 478, in sample_gen
yield from data_gen.samples
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 607, in samples
self._fill_features()
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 592, in _fill_features
self.bam, self.region, self.rle_ref, self.read_fraction)
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 316, in bam_to_sample
raise NotImplementedError("Filtering alignments by tag is not supported in python code.")
NotImplementedError: Filtering alignments by tag is not supported in python code.
[23:38:02 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221342.7s
[23:39:47 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221447.4s
[23:41:34 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221554.1s
[23:43:20 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221660.9s
[23:45:07 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221767.1s
[23:46:58 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221878.1s
[23:48:47 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221987.6s
[23:50:36 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222096.1s
[23:52:23 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222203.6s
[23:54:10 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222310.1s
[23:55:55 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222415.4s
The medaka variant process still seems to be running, though it hasn't produced any output in over 12 hours and seems to be hung.
Any plans to support filtering alignments by tag in python code? What causes medaka to use a pythonic implementation for this contig? Is it just length? Is there any way I could modify my data set to run medaka variant?
Hello,
Thank you for the recent update! The speed does seem to have improved, and I understand the program does take time, however I am wondering if there is anything I can do to process my metagenomes faster. There is a 24hr limit on our HPC cluster, and the processes aren't finishing in time before they are killed. It still seems to be the short region processing slowing things down, and only one thread is being used even though I specified to use all the available threads (24) on a ~700 GB RAM node. Is it possible for me to make use of more threads at this point?
The command I am currently using is:
medaka_consensus -i $NPFASTQ -d draft_raconNP.fa -o medaka -t 24 -m r941_flip213
And the timing:
[14:52:29 - Predict] Processing 28934 long region(s) with batching.
...
[19:37:00 - Predict] Processing 11274 short region(s).
...
[14:11:07 - Sampler] Pileup for tig00053666:0.0-7357.0 is of width 8279 ### short read processing still going the next day
I am testing it on a draft assembly (metagenome) that is 700 Mbp in size (all contigs >4 kbp, mean length 24kb), but this is our smallest assembly - the others are between 1-3 Gbp. Is there something I can do, such as in issue #39, like subsetting the assemblies, and recombining the split assembly at the end?
If the process is killed due to the 24 hr limit, can medaka pick up where it left off with the short reads if I restart the command?
Thank you for your help!
I'm consistently getting an OOM error from tensorflow with the most recent version of medaka and r941_min_high. I'm running it now on a dedicated Tesla V100 (16GB) - there's definitely, absolutely nothing else running on it. Of note, it usually happens after 5-10 chunks have already been processed. I've had the same issue with all 4 samples we've tried since upgrading, and never had this issue with the previous version and r941_flip.
[23:10:10 - Predict] Setting tensorflow threads to 8.
[23:14:04 - Predict] Processing 838 long region(s) with batching.
[23:14:04 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[23:14:04 - ModelLoad] With cudnn: True
[23:14:08 - ModelLoad] Loading weights from /home/ubuntu/.conda/envs/medaka/lib/python3.6/site-packages/medaka-0.7.1-py3.6-linux-x86_64.egg/medaka/data/r941_min_high_model.hdf5
[23:14:08 - PWorker] Running inference for 163.8M draft bases.
[23:14:08 - Sampler] Initializing sampler for consensus of region utg000001l:0-1000000.
[23:14:11 - Feature] Processed utg000001l:0.0-999999.0 (median depth 35.0)
[23:14:11 - Sampler] Took 2.51s to make features.
...
[23:14:30 - Sampler] Pileup for utg000001l:7999000.0-8999999.0 is of width 1849572
[23:14:30 - Sampler] Initializing sampler for consensus of region utg000001l:8999000-10000000.
2019-05-24 23:14:31.331254: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] OOM when allocating tensor with shape[1536000000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Hi,
I am getting the following error with v1.4.3. Also to use use the flip-flop basecaller from Guppy the model should be called using -m r941_flip not -m r94_flip as suggested.
medaka_consensus -i Run12_all_guppy_v2.2.2.fastq -d Run12_guppy_contigs.fasta -o Run12 Aligning basecalls to draft Found minimap files. open: No such file or directory [bam_sort_core] fail to open file calls_to_draft.bam [M::main::0.003*1.04] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*1.02] mid_occ = 3 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.003*1.15] distinct minimizers: 1620 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.290 [samopen] SAM header is present: 1 sequences. open: No such file or directory [bam_index_build2] fail to open the BAM file. Running medaka consensus Using TensorFlow backend. [E::hts_open_format] Failed to open file calls_to_draft.bam Traceback (most recent call last): File "/home/dct7/medaka/bin/medaka", line 11, in <module> sys.exit(main()) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 170, in main args.func(args) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/inference.py", line 381, in predict args.regions = get_regions(args.bam, region_strs=args.regions) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/common.py", line 218, in get_regions with pysam.AlignmentFile(bam) as bam_fh: File "pysam/libcalignmentfile.pyx", line 736, in pysam.libcalignmentfile.AlignmentFile.__cinit__ File "pysam/libcalignmentfile.pyx", line 935, in pysam.libcalignmentfile.AlignmentFile._open FileNotFoundError: [Errno 2] could not open alignment file
calls_to_draft.bam: No such file or directory
Any suggestions?
Thanks,
Damien
Hi,
medaka 0.6.2 hdf creation runs through the large regions fairly quickly, but as soon its processing the short regions
, its starting to be super slow. Although all sequences should be processed by then and hdf file has almost reached completion, it runs for days now...
Its stalling on many of contigs in the 1-5Kbp range with low depth coverage.
Is there a way to terminate hdf creation and resuce the unfinished hdf file to continue with stitching of the large contigs?
It also looks like it switched from all-CPU usage to single CPU usage for the Processing short regions
.
I never came across such issues in previous versions of medaka.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
44047 prom 20 0 79.883g 0.063t 95740 R 100.3 17.1 77535:50 medaka
[02:32:46 - Sampler] Initializing sampler for consensus or region ctg9:7.0-10099823.0:10999000-11149746.
[02:32:47 - Feature] Processed ctg9:7.0-10099823.0:10999000.0-11149746.0 (median depth 55.0)
[02:32:47 - Sampler] Took 1.11s to make features.
[02:33:58 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57004.8s
[02:35:11 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57077.5s
[02:36:24 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57150.7s
[02:37:36 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57222.3s
[02:38:49 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57295.1s
[02:40:02 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57368.2s
[02:41:15 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57441.5s
[02:42:27 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57513.3s
[02:43:40 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57586.2s
[02:44:53 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57659.5s
[02:46:06 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57732.1s
[02:47:18 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57804.4s
[02:57:37 - PWorker] All done
[02:57:37 - Predict] Processing short regions
[02:57:37 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[02:57:37 - ModelLoad] With cudnn: False
[02:57:37 - ModelLoad] Loading weights from /home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/data/r941_flip235_model.hdf5
[02:57:38 - PWorker] Running inference for 0.0M draft bases.
[02:57:38 - Sampler] Initializing sampler for consensus or region ctg1027:1.0-9521.0:0-9160.
[02:57:38 - Feature] Pileup counts do not span requested region, requested ctg1027:1.0-9521.0:0-9160, received 745-8244.
[02:57:38 - Feature] Processed ctg1027:1.0-9521.0:745.0-8245.0 (median depth 3.0)
[02:57:38 - Sampler] Took 0.02s to make features.
[03:08:26 - PWorker] All done
[03:08:26 - PWorker] Running inference for 0.0M draft bases.
[03:08:26 - Sampler] Initializing sampler for consensus or region ctg1036:1.0-13004.0:0-8879.
[03:08:26 - Feature] Pileup-feature is zero-length for ctg1036:1.0-13004.0:0-8879 indicating no reads in this region.
[03:08:26 - Sampler] Took 0.03s to make features.
[03:08:27 - PWorker] All done
[03:08:27 - PWorker] Running inference for 0.0M draft bases.
[03:08:27 - Sampler] Initializing sampler for consensus or region ctg1045:23.0-8454.0:0-8489.
[03:08:27 - Feature] Processed ctg1045:23.0-8454.0:0.0-8489.0 (median depth 17.0)
[03:08:27 - Sampler] Took 0.05s to make features.
[03:18:56 - PWorker] All done
[03:18:56 - PWorker] Running inference for 0.0M draft bases.
[03:18:56 - Sampler] Initializing sampler for consensus or region ctg1133:1.0-8117.0:0-8415.
[03:18:56 - Feature] Processed ctg1133:1.0-8117.0:0.0-8415.0 (median depth 10.0)
[03:18:56 - Sampler] Took 0.04s to make features.
[03:29:43 - PWorker] All done
[03:29:43 - PWorker] Running inference for 0.0M draft bases.
[03:29:43 - Sampler] Initializing sampler for consensus or region ctg1143:885.0-10511.0:0-9562.
[03:29:43 - Feature] Pileup counts do not span requested region, requested ctg1143:885.0-10511.0:0-9562, received 651-9561.
[03:29:43 - Feature] Processed ctg1143:885.0-10511.0:651.0-9562.0 (median depth 2.0)
[03:29:43 - Sampler] Took 0.04s to make features.
[03:40:31 - PWorker] All done
[03:40:31 - PWorker] Running inference for 0.0M draft bases.
[03:40:31 - Sampler] Initializing sampler for consensus or region ctg1147:174.0-3204.0:0-3063.
[03:40:31 - Feature] Pileup counts do not span requested region, requested ctg1147:174.0-3204.0:0-3063, received 0-3048.
[03:40:31 - Feature] Processed ctg1147:174.0-3204.0:0.0-3049.0 (median depth 3.0)
[03:40:31 - Sampler] Took 0.03s to make features.
[03:51:04 - PWorker] All done
[03:51:05 - PWorker] Running inference for 0.0M draft bases.
[03:51:05 - Sampler] Initializing sampler for consensus or region ctg1148:15.0-8458.0:0-8555.
[03:51:05 - Feature] Pileup counts do not span requested region, requested ctg1148:15.0-8458.0:0-8555, received 2667-7312.
[03:51:05 - Feature] Processed ctg1148:15.0-8458.0:2667.0-7313.0 (median depth 10.0)
[03:51:05 - Sampler] Took 0.05s to make features.
[04:01:45 - PWorker] All done
[04:01:45 - PWorker] Running inference for 0.0M draft bases.
[04:01:45 - Sampler] Initializing sampler for consensus or region ctg1162:5722.0-11740.0:0-6230.
[04:01:45 - Feature] Processed ctg1162:5722.0-11740.0:0.0-6230.0 (median depth 146.0)
[04:01:45 - Sampler] Took 0.11s to make features.
[04:12:27 - PWorker] All done
[04:12:27 - PWorker] Running inference for 0.0M draft bases.
[04:12:27 - Sampler] Initializing sampler for consensus or region ctg1171:1.0-3927.0:0-3682.
[04:12:27 - Feature] Processed ctg1171:1.0-3927.0:0.0-3682.0 (median depth 24.0)
[04:12:27 - Sampler] Took 0.04s to make features.
Hi there,
I'm trying to get medaka working using an RTX2080 GTi card that needs cuda 10.0. I can run tensorflow and keras in the medaka virtualenv, so that part of it is working. The conda install uses the wrong libraries for cuda 10 (and fails looking for libcublas.9.0), so I had to install it using from git. For that I also had to install pyyaml using pip. However, now when medaka runs, it throws an error (file attached):
yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
in "<unicode string>", line 3, column 5:
- !!python/object/apply:numpy.core ...
Any advice on how I can fix this?
Thanks,
Ben
medaka.log
I would like to perform polishing using medaka on "short" references, which are approximately 1400 bp long. Medaka breaks when I attempt to do this and the error seems to be connected to the length of my references/sequences:
The reference data set in this case only has one sequence, and it seems to be skipped here.
Would it be possible to change the < min 10000 setting, with out breaking downstream processing?
Thanks for developing and making medaka available to us!
Best regards
Søren
Hi,
I encountered the following error in the TrimOverlap step, using medaka v0.7.0-alpha.1:
Traceback (most recent call last):
File "/home/wdecoster/anaconda3/envs/medaka/bin/medaka", line 10, in <module>
sys.exit(main())
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
args.func(args)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 472, in variants_from_hdf
vcf_writer.write_variants(variants, sort=True)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/vcf.py", line 272, in write_variants
variants = medaka.common.loose_version_sort(variants, key=lambda v: '{}-{}'.format(v.chrom, v.pos))
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/common.py", line 708, in loose_version_sort
it = list(it)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 206, in decode_variants
for s, _ in yield_trimmed_consensus_chunks(samples):
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 69, in yield_trimmed_consensus_chunks
raise RuntimeError('Unexpected sample relationship {} between {} and {}'.format(repr(rel), s1.name, s2.name))
RuntimeError: Unexpected sample relationship <Relationship.reverse_overlap: 'The end of s2 overlaps the start of s1.'> between chr1:198506700.10723-198510472.2 and chr1:198506700.1723-198506883.0
Failed to call variants from consensus chunks.
Please let me know if there is a file which I can share to help you debug this issue. The bam file I'm using is 80x coverage of the human genome...
Cheers,
Wouter
Hello,
I'm trying to run medaka on my assembly. It seems to work until the stitch step, but then fails with the following error:
Running medaka stitch
[20:25:55 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[20:25:56 - Stitch] Processing ctg1.
Traceback (most recent call last):
File "/ceph/users/lstevens/.conda/envs/medaka_env/bin/medaka", line 11, in <module>
sys.exit(main())
File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
args.func(args)
File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/stitch.py", line 85, in stitch
joined = stitch_from_probs(args.inputs, regions=args.regions)
File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/stitch.py", line 66, in stitch_from_probs
end_1_ind, start_2_ind = medaka.common.Sample.overlap_indices(s1, s2)
File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/common.py", line 256, in overlap_indices
raise OverlapException(msg.format(s1.name, s2.name, repr(rel)))
medaka.common.OverlapException: Cannot overlap samples ctg1:0.0-260.0 and ctg1:266.0-3840.0 with relationhip <Relationship.forward_gapped: 's2 follows s1 with a gab inbetween.'>
Failed to stitch consensus chunks.
The other steps appear to have completed properly (ie there is no error output and the files calls_to_draft.bam
, calls_to_draft.bam.bai
, and consensus_probs.hdf
exist and are not empty).
I'm running medaka with the following command:
medaka_consensus -i [gzipped_read_fastq] -d [raconpolished_fasta] -o medaka -t 32
The assembly is from wtdbg2 and polished 4x with racon using the command suggested in your README.
Medaka version is 0.7.0 and installed using conda.
Any ideas what might be wrong?
Thanks for your help,
Lewis
Hi, I have been running medaka on a GPU machine, but it looks like the call consensus gets stuck in a loop after a while.
[23:00:04 - PWorker] All done, 0 remainder regions.
[23:00:04 - PWorker] Running inference for 0.0M draft bases.
[23:00:04 - Sampler] Initializing sampler for consensus of region 4684:48-3642.
[23:00:04 - Feature] Processed 4684:48.0-3641.0 (median depth 4.0)
[23:00:04 - Sampler] Took 0.13s to make features.
[23:00:04 - Sampler] Pileup for 4684:48.0-3641.0 is of width 4324
[23:02:19 - PWorker] All done, 0 remainder regions.
[23:02:19 - PWorker] Running inference for 0.0M draft bases.
[23:02:19 - Sampler] Initializing sampler for consensus of region 6506:749-8932.
[23:02:19 - Feature] Processed 6506:749.0-8931.0 (median depth 1.0)
[23:02:19 - Sampler] Took 0.13s to make features.
[23:02:19 - Sampler] Pileup for 6506:749.0-8931.0 is of width 8875
[23:04:34 - PWorker] All done, 0 remainder regions.
[23:04:34 - PWorker] Running inference for 0.0M draft bases.
[23:04:34 - Sampler] Initializing sampler for consensus of region 2556:0-7245.
[23:04:34 - Feature] Processed 2556:0.0-7244.0 (median depth 3.0)
[23:04:34 - Sampler] Took 0.12s to make features.
[23:04:34 - Sampler] Pileup for 2556:0.0-7244.0 is of width 8008
Also, you guys reported that polishing human genome takes ~5h. I have tried almost all possible parameters to make things finish under 10h, but the stitch itself takes more than 5 hours. Am I missing something here?
I try to polish a 1G genome using 70X of ONT reads.
NPROC=$(nproc)
BASECALLS=reads.fastq.gz
DRAFT=assembly.fa
OUTDIR=medaka_consensus
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r94
medaka performs the genome indexing and alignment. The bam and bai files are created in OUTDIR.
Then I get the following error
Creating features
Using TensorFlow backend.
None
Traceback (most recent call last):
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/bin/hp_compress", line 11, in
load_entry_point('medaka==0.3.0', 'console_scripts', 'hp_compress')()
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 650, in main
args.func(args)
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 617, in choose_feature_func
features(args)
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 365, in features
opt_str = '\n'.join(['{}: {}'.format(k,v) for k, v in fe_kwargs.items()])
AttributeError: 'NoneType' object has no attribute 'items'
How can I find what went wrong?
I removed TensorFlow and installed the GPU version but when running medaka_consensus I get a crash (below).
I installed medaka (0.6.0-py36h2b5150b_0) and tensorflow-gpu (1.12.0-h0d30ee6_0) using miniconda in a py3 conda env but do not dare to install CUDA software by fear of interfering with the original ONT software on the GridION.
Is there a way to run medaka with GPU on my gridION (2xGeForce GTX 1080 Ti) or should I use only cpu's?
Thanks
...
Running medaka consensus
Using TensorFlow backend.
[13:01:54 - Predict] Processing region(s): tig00000001:0-1469063 tig00000003:0-25183 tig00000005:0-1053078 tig00000009:0-1042324 tig00000012:0-24609 tig00000014:0-988579 tig00000016:0-875005 tig00000019:0-1047544 tig00000022:0-873730 tig00000027:0-920396 tig00000031:0-676909 tig00000035:0-748747 tig00000037:0-764060 tig00000039:0-209276 tig00000042:0-653884 tig00000058:0-692998 tig00000059:0-772142 tig00000062:0-816770 tig00000066:0-553488 tig00000067:0-580323 tig00000068:0-530610 tig00000077:0-409975 tig00000079:0-417209 tig00000080:0-429052 tig00000082:0-357446 tig00000093:0-308749 tig00000095:0-36675 tig00000098:0-265607 tig00000102:0-200009 tig00000105:0-261715 tig00000109:0-104085 tig00000112:0-36029 tig00000114:0-40744 tig00000117:0-27693 tig00000119:0-179847 tig00000150:0-12418 tig00000156:0-23170 tig00000159:0-16860 tig00000161:0-1825 tig00000170:0-3380 tig00000172:0-58317 tig00000191:0-1836 tig00000202:0-15093 tig00000210:0-9564 tig00000213:0-16630 tig00000276:0-3889 tig00000286:0-560358 tig00000287:0-68667 tig00000288:0-462709 tig00000289:0-13786 tig00000290:0-581156 tig00000291:0-208241 tig00007300:0-12563 tig00007301:0-1285605 tig00007302:0-954263 tig00007303:0-16559 tig00007304:0-795945 tig00007305:0-35880 tig00007306:0-30213 tig00007307:0-38796 tig00007308:0-6443 tig00007309:0-4305 tig00007310:0-24086 tig00007311:0-20001
/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/datastore.py:131: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
return {g: yaml.load(self.fh[g][()]) for g in groups if g in self.fh}
[13:01:54 - Predict] Setting tensorflow threads to 8.
Traceback (most recent call last):
File "/opt/miniconda3/envs/py3/bin/medaka", line 11, in <module>
sys.exit(main())
File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/medaka.py", line 261, in main
args.func(args)
File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/inference.py", line 535, in predict
inter_op_parallelism_threads=args.threads)
File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Failed to run medaka consensus.
During execution of make install
if additional argument -j is added to make ( like make -j 4 install
) build fails, because of
/usr/bin/ld: htslib-1.9/libhts.a(hfile_s3.o): in function `s3_sign':
hfile_s3.c:(.text+0x306): undefined reference to `EVP_sha1'
/usr/bin/ld: hfile_s3.c:(.text+0x32a): undefined reference to `HMAC'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:144: samtools] Error 1
Everything works when -j is no greater than 2.
I would like to test flappie basecalled data with medaka, but I guess I need a new model for medaka since the error profile has changed. Would it be possible for you to train and provide such a model?
Hi, i would like to do polishing of genome assemblies with promethION data but am unsure if r94 model is suitable for the data since a different ligation kit and flowcell is used than for minION and gridION.
Do you have other models or would r94 just work fine?
Thank you,
Michel
Hi,
I am currently testing medaka 0.6.2 on GPUs using the promethion machine with its Tensor V100 socket.
I subset the origina file for testing to a single contig to polish, only working with data from a single contig: ctg55_21516
hdf production seemed so run through, but i got an error thrown after it reached the last contig:
Checking program versions
Program Version Required Pass
bgzip 1.9 1.9 True
minimap2 2.11 2.11 True
samtools 1.9 1.9 True
tabix 1.9 1.9 True
Warning: Output ctg already exists, may use old results.
Not aligning basecalls to draft, calls_to_draft.bam exists.
Running medaka consensus
Using TensorFlow backend.
[14:25:43 - Predict] Processing region(s): ctg1000_107:0-12339 ctg1002_24:0-14544 ctg1004.....
[14:25:43 - Predict] Setting tensorflow threads to 1.
[14:25:43 - Predict] Found 1557 long and 256 short regions.
[14:25:43 - Predict] Processing long regions.
[14:25:43 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[14:25:43 - ModelLoad] With cudnn: True
[14:25:43 - ModelLoad] Loading weights from /home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/data/r941_flip235_model.hdf5
[14:25:43 - PWorker] Running inference for 673.1M draft bases.
[14:25:43 - Sampler] Initializing sampler for consensus or region ctg1000_107:0-12339.
[14:25:43 - Feature] Pileup-feature is zero-length for ctg1000_107:0-12339 indicating no reads in this region.
[14:25:43 - Sampler] Took 0.01s to make features.....
2019-04-09 13:39:55.779570: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] OOM when allocating tensor with shape[1536000000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/prom/.conda/envs/medaka/bin/medaka", line 11, in <module>
sys.exit(main())
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 261, in main
args.func(args)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/inference.py", line 560, in predict
tag_name=args.tag_name, tag_value=args.tag_value, tag_keep_missing=args.tag_keep_missing
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/inference.py", line 489, in run_prediction
class_probs = model.predict_on_batch(x_data)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/engine/training.py", line 1274, in predict_on_batch
outputs = self.predict_function(ins)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, seq_length, batch_size]: [1, 256, 128, 1, 10000, 200]
[[{{node bidirectional_2/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=87654321, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bidirectional_2/transpose, bidirectional_2/ExpandDims_1, bidirectional_2/Const, bidirectional_2/concat)]]
Failed to run medaka consensus.
I wonder if its subsetting related or if there is some issue with the tensorflow-gpu installation as tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config:
is indicating?
Stdout for ctg55:
[13:39:26 - Sampler] Initializing sampler for consensus or region ctg54_1:999000-2000000.
[13:39:26 - Feature] Pileup-feature is zero-length for ctg54_1:999000-2000000 indicating no reads in this region.
[13:39:26 - Sampler] Took 0.02s to make features.
[13:39:26 - Sampler] Initializing sampler for consensus or region ctg54_1:1999000-2207334.
[13:39:26 - Feature] Pileup-feature is zero-length for ctg54_1:1999000-2207334 indicating no reads in this region.
[13:39:26 - Sampler] Took 0.00s to make features.
[13:39:26 - Sampler] Initializing sampler for consensus or region ctg55_21516:0-1000000.
[13:39:32 - Feature] Processed ctg55_21516:0.0-1000000.3 (median depth 59.0)
[13:39:32 - Sampler] Took 5.60s to make features.
[13:39:32 - Sampler] Initializing sampler for consensus or region ctg55_21516:999000-2000000.
[13:39:37 - Feature] Processed ctg55_21516:999000.0-2000000.1 (median depth 56.0)
[13:39:37 - Sampler] Took 5.21s to make features.
[13:39:37 - Sampler] Initializing sampler for consensus or region ctg55_21516:1999000-3000000.
[13:39:39 - Feature] Pileup counts do not span requested region, requested ctg55_21516:1999000-3000000, received 1999000-2207333.
[13:39:39 - Feature] Processed ctg55_21516:1999000.0-2207334.0 (median depth 57.0)
[13:39:39 - Sampler] Took 1.56s to make features.
[13:39:39 - Sampler] Initializing sampler for consensus or region ctg55_21516:2999000-4000000.
[13:39:39 - Feature] Pileup-feature is zero-length for ctg55_21516:2999000-4000000 indicating no reads in this region.
[13:39:39 - Sampler] Took 0.37s to make features.
[13:39:39 - Sampler] Initializing sampler for consensus or region ctg55_21516:3999000-5000000.
[13:39:40 - Feature] Pileup-feature is zero-length for ctg55_21516:3999000-5000000 indicating no reads in this region.
[13:39:40 - Sampler] Took 0.34s to make features.
[13:39:40 - Sampler] Initializing sampler for consensus or region ctg55_21516:4999000-6000000.
[13:39:40 - Feature] Pileup-feature is zero-length for ctg55_21516:4999000-6000000 indicating no reads in this region.
[13:39:40 - Sampler] Took 0.43s to make features.
[13:39:40 - Sampler] Initializing sampler for consensus or region ctg55_21516:5999000-7000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:5999000-7000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.42s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:6999000-8000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:6999000-8000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.40s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:7999000-9000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:7999000-9000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.17s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:8999000-10000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:8999000-10000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.02s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:9999000-11000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:9999000-11000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.01s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:10999000-12000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:10999000-12000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.02s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:11999000-13000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:11999000-13000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg55_21516:12999000-14000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:12999000-14000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.03s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg55_21516:13999000-14035920.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:13999000-14035920 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.01s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:0-1000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:0-1000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:999000-2000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:999000-2000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.01s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:1999000-3000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:1999000-3000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:2999000-4000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:2999000-4000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.03s to make features.
Hi,
Maybe related to #17
I am running medaka from a Singularity container. I specified 10 threads for Medaka but it seems to just use most of the cores available on the node (28):
Dominik
Hi
I have been running medaka with the new flip flop model on a system with 80 threads but used the -t option to specify 60 threads. It appears that medaka frequently ignores this option and uses all of the threads.
I have attached a screendump of the CPU usage at 8000% (80 threads) even though it was supposed to be maxing out at 6000 % (60 threads).
I hope you find a fix for this.
Best regards
Rasmus
Hi,
I have a sort of unrelated question. Is there a chance to (re-)train guppy and also tailor medals towards the restrained guppy model?
Cheers,
F
I'm attempting to run medaka on an unpolished assembly and I seem to have encountered an error. I don't recieve any error message and the tool is currently "running" but doesn't seem to be doing anything. I've posted the last message medaka created below. The time stamp is from over 12 hours ago.
[06:29:02 - Predict] Processing short regions
[06:29:02 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[06:29:04 - ModelLoad] Loading weights from /local/ifs2_projdata/9043/projects/MGX/localenv/anaconda/envs/medaka/lib/python3.6/site-packages/medaka-0.6.0a2-py3.6-linux-x86_64.egg/medaka/data/medaka_model.hdf5
[06:29:05 - PWorker] Running inference for 0.0M draft bases.
[06:29:05 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg865:0-7228.
[06:29:05 - Feature] Pileup-feature is zero-length for Consensus_Consensus_Consensus_ctg865:0-7228 indicating no reads in this region.
[06:29:05 - Sampler] Took 0.13s to make features.
I am trying to conduct the Walkthrough on my machine to train a consensus network using the example data and commands given here: https://nanoporetech.github.io/medaka/
I am encountering the following error when I try to run 'hp_compress':
$ hp_compress features ${CALLS2DRAFT}.bam ${TRAINFEATURES} -T ${TRUTH2DRAFT}.bam -t ${NUM_THREADS} -r ${REFNAME}:-${TRAINEND} --batch_size ${BATCHSIZE} --read_fraction ${FRACTION} --chunk_len 1000 --chunk_ovlp 0 -m ${MODEL_FEAT_OPT} --max_label_len 1
Using TensorFlow backend.
{'consensus_as_ref': False, 'is_compressed': False, 'log_min': None, 'max_hp_len': 1, 'normalise': 'total', 'ref_mode': None, 'with_depth': False}
[17:00:31 - root] FeatureEncoder options:
consensus_as_ref: False
is_compressed: False
log_min: None
max_hp_len: 1
normalise: total
ref_mode: None
with_depth: False
[17:00:31 - root] Got regions:
utg000001c:0-3762624
[17:00:32 - root] Processed utg000001c:3620001.0-3630000.0 (median depth 13.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:33 - root] Processed utg000001c:40001.0-50000.0 (median depth 15.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:33 - root] Processed utg000001c:3160001.0-3170000.0 (median depth 18.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:2110001.0-2120000.0 (median depth 32.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:1150001.0-1160000.0 (median depth 40.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:3660001.0-3670000.1 (median depth 19.0)
[17:00:35 - root] Processed utg000001c:3730001.0-3740000.1 (median depth 45.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:35 - root] Processed utg000001c:380001.0-390000.0 (median depth 12.0)
[17:00:36 - root] Processed utg000001c:2440001.0-2450000.1 (median depth 43.0)
[17:00:36 - root] Processed utg000001c:220001.0-230000.1 (median depth 26.0)
[17:00:36 - root] Processed utg000001c:2230001.0-2240000.1 (median depth 73.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:37 - root] Processed utg000001c:320001.0-330000.0 (median depth 57.0)
[17:00:37 - root] Processed utg000001c:36198.0-40000.1 (median depth 47.0)
[17:00:38 - root] Processed utg000001c:1220001.0-1230000.0 (median depth 9.0)
[17:00:38 - root] Processed utg000001c:620001.0-630000.1 (median depth 48.0)
[17:00:38 - root] Processed utg000001c:2660001.0-2670000.1 (median depth 100.0)
[17:00:38 - root] Processed utg000001c:2420001.0-2428111.0 (median depth 28.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; usearr[tuple(seq)]
instead ofarr[seq]
. In the future this will be interpreted as an array index,arr[np.array(seq)]
, which will result either in an error or a different result.
yield a[slicee]
[17:00:38 - root] Processed utg000001c:2428112.0-2430000.0 (median depth 8.0)
[17:00:39 - root] Processed utg000001c:1460001.0-1470000.1 (median depth 58.0)
[17:00:40 - root] Processed utg000001c:1600001.0-1610000.0 (median depth 28.0)
[17:00:41 - root] Processed utg000001c:3050001.0-3060000.2 (median depth 92.0)
[17:00:41 - root] Processed utg000001c:1800001.0-1810000.0 (median depth 18.0)
[17:00:42 - root] Processed utg000001c:3450001.0-3460000.1 (median depth 44.0)
[17:00:42 - root] Processed utg000001c:2980001.0-2990000.0 (median depth 55.0)
[17:00:42 - root] Processed utg000001c:3000001.0-3010000.0 (median depth 76.0)
[17:00:43 - root] Processed utg000001c:3330001.0-3340000.1 (median depth 80.0)
[17:00:43 - root] Processed utg000001c:1550001.0-1560000.0 (median depth 23.0)
[17:00:44 - root] Processed utg000001c:3520001.0-3530000.1 (median depth 82.0)
[17:00:44 - root] Processed utg000001c:3750001.0-3760000.0 (median depth 63.0)
[17:00:44 - root] Processed utg000001c:1610001.0-1620000.2 (median depth 32.0)
[17:00:45 - root] Processed utg000001c:3190001.0-3200000.0 (median depth 36.0)
[17:00:46 - root] Processed utg000001c:2450001.0-2460000.0 (median depth 42.0)
[17:00:46 - root] Processed utg000001c:480001.0-490000.1 (median depth 71.0)
[17:00:46 - root] Processed utg000001c:660001.0-670000.3 (median depth 58.0)
[17:00:47 - root] Processed utg000001c:1990001.0-2000000.0 (median depth 40.0)
[17:00:47 - root] Processed utg000001c:2460001.0-2470000.1 (median depth 38.0)
[17:00:48 - root] Processed utg000001c:3130001.0-3140000.1 (median depth 69.0)
[17:00:48 - root] Processed utg000001c:1330001.0-1331747.1 (median depth 78.0)
[17:00:48 - root] Processed utg000001c:300001.0-310000.0 (median depth 46.0)
[17:00:49 - root] Processed utg000001c:2280001.0-2290000.0 (median depth 29.0)
[17:00:49 - root] Processed utg000001c:1670001.0-1680000.0 (median depth 74.0)
[17:00:49 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=20000, end=30000).
[17:00:49 - root] Processed utg000001c:890001.0-900000.0 (median depth 25.0)
[17:00:51 - root] Processed utg000001c:60001.0-70000.0 (median depth 22.0)
[17:00:51 - root] Processed utg000001c:350001.0-360000.0 (median depth 35.0)
[17:00:51 - root] Processed utg000001c:3170001.0-3180000.1 (median depth 72.0)
[17:00:52 - root] Processed utg000001c:1331748.0-1340000.0 (median depth 75.0)
[17:00:52 - root] Processed utg000001c:3340001.0-3344122.2 (median depth 47.0)
[17:00:53 - root] Processed utg000001c:2590001.0-2600000.1 (median depth 93.0)
[17:00:53 - root] Processed utg000001c:1570001.0-1580000.1 (median depth 16.0)
[17:00:54 - root] Processed utg000001c:1940001.0-1950000.0 (median depth 88.0)
[17:00:54 - root] Processed utg000001c:1830035.0-1840000.0 (median depth 67.0)
[17:00:54 - root] Processed utg000001c:3460001.0-3470000.0 (median depth 41.0)
[17:00:55 - root] Processed utg000001c:610001.0-620000.0 (median depth 20.0)
[17:00:55 - root] Processed utg000001c:3344123.0-3350000.0 (median depth 47.0)
[17:00:55 - root] Processed utg000001c:1050001.0-1060000.2 (median depth 51.0)
[17:00:55 - root] Processed utg000001c:2970001.0-2980000.0 (median depth 85.0)
[17:00:56 - root] Processed utg000001c:3040001.0-3045139.0 (median depth 39.0)
[17:00:56 - root] Processed utg000001c:970001.0-980000.0 (median depth 6.0)
[17:00:57 - root] Processed utg000001c:2010001.0-2020000.1 (median depth 26.0)
[17:00:57 - root] Processed utg000001c:2840001.0-2845841.0 (median depth 45.0)
[17:00:57 - root] Processed utg000001c:720001.0-730000.1 (median depth 38.0)
[17:00:57 - root] Processed utg000001c:1680001.0-1690000.0 (median depth 45.0)
[17:00:58 - root] Processed utg000001c:2640001.0-2646413.0 (median depth 35.0)
[17:00:58 - root] Processed utg000001c:3045140.0-3050000.1 (median depth 54.0)
[17:00:58 - root] Processed utg000001c:2646414.0-2650000.0 (median depth 22.0)
[17:00:59 - root] Processed utg000001c:2845842.0-2850000.1 (median depth 60.0)
[17:00:59 - root] Processed utg000001c:1390001.0-1400000.0 (median depth 81.0)
[17:01:00 - root] Processed utg000001c:3010001.0-3020000.0 (median depth 67.0)
[17:01:00 - root] Processed utg000001c:450001.0-460000.0 (median depth 31.0)
[17:01:00 - root] Processed utg000001c:1560001.0-1570000.2 (median depth 51.0)
[17:01:01 - root] Processed utg000001c:3260001.0-3270000.0 (median depth 51.0)
[17:01:01 - root] Processed utg000001c:1030001.0-1032827.0 (median depth 14.0)
[17:01:01 - root] Processed utg000001c:2610001.0-2620000.0 (median depth 37.0)
[17:01:01 - root] Processed utg000001c:570001.0-580000.0 (median depth 68.0)
[17:01:01 - root] Processed utg000001c:1580001.0-1590000.4 (median depth 17.0)
[17:01:02 - root] Processed utg000001c:2700001.0-2710000.1 (median depth 45.0)
[17:01:03 - root] Processed utg000001c:1032828.0-1040000.4 (median depth 34.0)
[17:01:03 - root] Processed utg000001c:1540001.0-1550000.0 (median depth 18.0)
[17:01:03 - root] Processed utg000001c:290001.0-300000.0 (median depth 23.0)
[17:01:03 - root] Processed utg000001c:2690001.0-2700000.1 (median depth 76.0)
[17:01:05 - root] Processed utg000001c:3640001.0-3643166.0 (median depth 39.0)
[17:01:05 - root] Processed utg000001c:1280001.0-1290000.2 (median depth 58.0)
[17:01:05 - root] Processed utg000001c:1110001.0-1120000.0 (median depth 15.0)
[17:01:05 - root] Processed utg000001c:3110001.0-3120000.1 (median depth 43.0)
[17:01:05 - root] Processed utg000001c:990001.0-1000000.1 (median depth 68.0)
[17:01:05 - root] Processed utg000001c:3530001.0-3540000.0 (median depth 24.0)
[17:01:06 - root] Processed utg000001c:2000001.0-2010000.0 (median depth 45.0)
[17:01:06 - root] Processed utg000001c:460001.0-470000.0 (median depth 17.0)
[17:01:06 - root] Processed utg000001c:1650001.0-1660000.1 (median depth 14.0)
[17:01:06 - root] Processed utg000001c:2730001.0-2740000.0 (median depth 43.0)
[17:01:07 - root] Processed utg000001c:70001.0-80000.2 (median depth 14.0)
[17:01:07 - root] Processed utg000001c:3643167.0-3650000.1 (median depth 50.0)
[17:01:07 - root] Processed utg000001c:110001.0-120000.0 (median depth 33.0)
[17:01:08 - root] Processed utg000001c:2360001.0-2370000.0 (median depth 28.0)
[17:01:09 - root] Processed utg000001c:2120001.0-2129059.0 (median depth 43.0)
[17:01:10 - root] Processed utg000001c:180001.0-190000.1 (median depth 57.0)
[17:01:10 - root] Processed utg000001c:2370001.0-2380000.1 (median depth 73.0)
[17:01:10 - root] Processed utg000001c:2190001.0-2200000.0 (median depth 22.0)
[17:01:11 - root] Processed utg000001c:590001.0-600000.1 (median depth 14.0)
[17:01:11 - root] Processed utg000001c:560001.0-570000.2 (median depth 63.0)
[17:01:11 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=0, end=10000).
[17:01:11 - root] Processed utg000001c:1250001.0-1260000.0 (median depth 61.0)
[17:01:11 - root] Processed utg000001c:2040001.0-2050000.1 (median depth 18.0)
[17:01:12 - root] Processed utg000001c:2670001.0-2680000.0 (median depth 65.0)
[17:01:12 - root] Processed utg000001c:1200001.0-1210000.0 (median depth 11.0)
[17:01:12 - root] Processed utg000001c:2070001.0-2080000.0 (median depth 29.0)
[17:01:13 - root] Processed utg000001c:3410001.0-3420000.2 (median depth 82.0)
[17:01:13 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2560000, end=2570000).
[17:01:14 - root] Processed utg000001c:50001.0-60000.1 (median depth 43.0)
[17:01:14 - root] Processed utg000001c:1410001.0-1420000.0 (median depth 27.0)
[17:01:14 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=10000, end=20000).
[17:01:15 - root] Processed utg000001c:260001.0-270000.2 (median depth 9.0)
[17:01:15 - root] Processed utg000001c:780001.0-790000.2 (median depth 53.0)
[17:01:16 - root] Processed utg000001c:740001.0-750000.3 (median depth 58.0)
[17:01:16 - root] Processed utg000001c:2860001.0-2870000.1 (median depth 82.0)
[17:01:16 - root] Processed utg000001c:2300001.0-2310000.0 (median depth 26.0)
[17:01:16 - root] Processed utg000001c:3140001.0-3144796.2 (median depth 83.0)
[17:01:17 - root] Processed utg000001c:1230001.0-1232100.1 (median depth 54.0)
[17:01:17 - root] Processed utg000001c:3490001.0-3500000.1 (median depth 77.0)
[17:01:17 - root] Processed utg000001c:540001.0-550000.0 (median depth 8.0)
[17:01:17 - root] Processed utg000001c:90001.0-100000.2 (median depth 77.0)
[17:01:18 - root] Processed utg000001c:3144797.0-3150000.1 (median depth 47.0)
[17:01:19 - root] Processed utg000001c:2490001.0-2500000.0 (median depth 33.0)
[17:01:19 - root] Processed utg000001c:3740001.0-3742853.1 (median depth 74.0)
[17:01:20 - root] Processed utg000001c:1730367.0-1740000.3 (median depth 67.0)
[17:01:20 - root] Processed utg000001c:1232101.0-1240000.0 (median depth 55.0)
[17:01:20 - root] Processed utg000001c:1340001.0-1350000.5 (median depth 58.0)
[17:01:20 - root] Processed utg000001c:680001.0-690000.1 (median depth 45.0)
[17:01:21 - root] Processed utg000001c:2630001.0-2640000.3 (median depth 54.0)
[17:01:21 - root] Processed utg000001c:1530001.0-1531081.2 (median depth 66.0)
[17:01:21 - root] Processed utg000001c:130001.0-135808.0 (median depth 72.0)
[17:01:22 - root] Processed utg000001c:270001.0-280000.2 (median depth 15.0)
[17:01:22 - root] Processed utg000001c:1531082.0-1540000.0 (median depth 11.0)
[17:01:22 - root] Processed utg000001c:520001.0-530000.0 (median depth 37.0)
[17:01:23 - root] Processed utg000001c:135809.0-140000.3 (median depth 37.0)
[17:01:23 - root] Processed utg000001c:3742855.0-3750000.2 (median depth 77.0)
[17:01:23 - root] Processed utg000001c:3550001.0-3560000.1 (median depth 15.0)
[17:01:24 - root] Processed utg000001c:3710001.0-3720000.3 (median depth 76.0)
[17:01:24 - root] Processed utg000001c:360001.0-370000.1 (median depth 68.0)
[17:01:25 - root] Processed utg000001c:950001.0-960000.0 (median depth 25.0)
[17:01:25 - root] Processed utg000001c:1500001.0-1510000.0 (median depth 19.0)
[17:01:25 - root] Processed utg000001c:1100001.0-1110000.0 (median depth 26.0)
[17:01:26 - root] Processed utg000001c:2790001.0-2800000.0 (median depth 54.0)
[17:01:26 - root] Processed utg000001c:700001.0-710000.1 (median depth 17.0)
[17:01:27 - root] Processed utg000001c:1130001.0-1132473.1 (median depth 70.0)
[17:01:27 - root] Processed utg000001c:2810001.0-2820000.0 (median depth 69.0)
[17:01:27 - root] Processed utg000001c:3630001.0-3640000.2 (median depth 99.0)
[17:01:28 - root] Processed utg000001c:3380001.0-3390000.0 (median depth 25.0)
[17:01:29 - root] Processed utg000001c:600001.0-610000.1 (median depth 59.0)
[17:01:29 - root] Processed utg000001c:2310001.0-2320000.0 (median depth 75.0)
[17:01:30 - root] Processed utg000001c:1132474.0-1140000.1 (median depth 57.0)
[17:01:30 - root] Processed utg000001c:3480001.0-3490000.2 (median depth 71.0)
[17:01:30 - root] Processed utg000001c:420001.0-430000.0 (median depth 31.0)
[17:01:30 - root] Processed utg000001c:940001.0-950000.1 (median depth 38.0)
[17:01:30 - root] Processed utg000001c:3430001.0-3440000.1 (median depth 71.0)
[17:01:31 - root] Processed utg000001c:1970001.0-1980000.0 (median depth 28.0)
[17:01:31 - root] Processed utg000001c:3650001.0-3660000.0 (median depth 10.0)
[17:01:32 - root] Processed utg000001c:1190001.0-1200000.1 (median depth 16.0)
[17:01:32 - root] Processed utg000001c:550001.0-560000.0 (median depth 23.0)
[17:01:32 - root] Processed utg000001c:1070001.0-1080000.1 (median depth 35.0)
[17:01:33 - root] Processed utg000001c:2500001.0-2510000.1 (median depth 18.0)
[17:01:33 - root] Processed utg000001c:2400001.0-2410000.0 (median depth 70.0)
[17:01:34 - root] Processed utg000001c:1910001.0-1920000.0 (median depth 64.0)
[17:01:35 - root] Processed utg000001c:3670001.0-3680000.5 (median depth 58.0)
[17:01:35 - root] Processed utg000001c:2600001.0-2610000.1 (median depth 36.0)
[17:01:35 - root] Processed utg000001c:1400001.0-1410000.3 (median depth 49.0)
[17:01:36 - root] Processed utg000001c:1180001.0-1190000.2 (median depth 68.0)
[17:01:36 - root] Processed utg000001c:3420001.0-3430000.0 (median depth 39.0)
[17:01:37 - root] Processed utg000001c:410001.0-420000.1 (median depth 26.0)
[17:01:38 - root] Processed utg000001c:3270001.0-3280000.0 (median depth 86.0)
[17:01:38 - root] Processed utg000001c:1820001.0-1830000.0 (median depth 74.0)
[17:01:38 - root] Processed utg000001c:830001.0-833529.0 (median depth 15.0)
[17:01:38 - root] Processed utg000001c:500001.0-510000.0 (median depth 44.0)
[17:01:39 - root] Processed utg000001c:2960001.0-2970000.2 (median depth 71.0)
[17:01:39 - root] Processed utg000001c:710001.0-720000.0 (median depth 9.0)
[17:01:39 - root] Processed utg000001c:1090001.0-1100000.1 (median depth 61.0)
[17:01:39 - root] Processed utg000001c:2210001.0-2220000.0 (median depth 49.0)
[17:01:40 - root] Processed utg000001c:2520001.0-2527743.2 (median depth 65.0)
[17:01:40 - root] Processed utg000001c:790001.0-800000.0 (median depth 9.0)
[17:01:40 - root] Processed utg000001c:2510001.0-2520000.0 (median depth 18.0)
[17:01:40 - root] Processed utg000001c:1470001.0-1480000.0 (median depth 35.0)
[17:01:41 - root] Processed utg000001c:833530.0-840000.2 (median depth 70.0)
[17:01:43 - root] Processed utg000001c:2581818.0-2590000.0 (median depth 43.0)
[17:01:44 - root] Processed utg000001c:3070001.0-3080000.1 (median depth 67.0)
[17:01:44 - root] Processed utg000001c:1310001.0-1320000.0 (median depth 74.0)
[17:01:44 - root] Processed utg000001c:840001.0-850000.3 (median depth 58.0)
[17:01:45 - root] Processed utg000001c:1270001.0-1280000.1 (median depth 65.0)
[17:01:45 - root] Processed utg000001c:3080001.0-3090000.1 (median depth 89.0)
[17:01:45 - root] Processed utg000001c:1740001.0-1750000.4 (median depth 77.0)
[17:01:46 - root] Processed utg000001c:310001.0-320000.0 (median depth 41.0)
[17:01:46 - root] Processed utg000001c:1210001.0-1220000.0 (median depth 18.0)
[17:01:46 - root] Processed utg000001c:1000001.0-1010000.0 (median depth 19.0)
[17:01:47 - root] Processed utg000001c:170001.0-180000.0 (median depth 27.0)
[17:01:47 - root] Processed utg000001c:2200001.0-2210000.4 (median depth 82.0)
[17:01:48 - root] Processed utg000001c:2100001.0-2110000.1 (median depth 28.0)
[17:01:48 - root] Processed utg000001c:3700001.0-3710000.1 (median depth 33.0)
[17:01:49 - root] Processed utg000001c:2910001.0-2920000.0 (median depth 39.0)
[17:01:49 - root] Processed utg000001c:2920001.0-2930000.1 (median depth 58.0)
[17:01:49 - root] Processed utg000001c:3470001.0-3480000.2 (median depth 70.0)
[17:01:50 - root] Processed utg000001c:1490001.0-1500000.1 (median depth 11.0)
[17:01:50 - root] Processed utg000001c:3300001.0-3310000.1 (median depth 48.0)
[17:01:50 - root] Processed utg000001c:1770001.0-1780000.0 (median depth 11.0)
[17:01:51 - root] Processed utg000001c:750001.0-760000.0 (median depth 10.0)
[17:01:51 - root] Processed utg000001c:1790001.0-1800000.2 (median depth 62.0)
[17:01:52 - root] Processed utg000001c:860001.0-870000.1 (median depth 55.0)
[17:01:52 - root] Processed utg000001c:2890001.0-2900000.2 (median depth 84.0)
[17:01:52 - root] Processed utg000001c:1860001.0-1870000.1 (median depth 56.0)
[17:01:53 - root] Processed utg000001c:2930001.0-2940000.0 (median depth 28.0)
[17:01:53 - root] Processed utg000001c:2900001.0-2910000.0 (median depth 56.0)
[17:01:55 - root] Processed utg000001c:1850001.0-1860000.1 (median depth 20.0)
[17:01:55 - root] Processed utg000001c:2180001.0-2190000.2 (median depth 52.0)
[17:01:55 - root] Processed utg000001c:2770001.0-2780000.0 (median depth 45.0)
[17:01:55 - root] Processed utg000001c:2150001.0-2160000.0 (median depth 62.0)
[17:01:55 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2550000, end=2560000).
[17:01:55 - root] Processed utg000001c:2780001.0-2790000.1 (median depth 46.0)
[17:01:56 - root] Processed utg000001c:1320001.0-1330000.0 (median depth 39.0)
[17:01:56 - root] Processed utg000001c:1920001.0-1929745.0 (median depth 56.0)
[17:01:56 - root] Processed utg000001c:1170001.0-1180000.0 (median depth 57.0)
[17:01:56 - root] Processed utg000001c:3200001.0-3210000.0 (median depth 16.0)
[17:01:56 - root] Processed utg000001c:240001.0-250000.0 (median depth 21.0)
[17:01:57 - root] Processed utg000001c:2060001.0-2070000.2 (median depth 15.0)
[17:01:57 - root] Processed utg000001c:3390001.0-3400000.0 (median depth 13.0)
[17:01:58 - root] Processed utg000001c:630001.0-634255.1 (median depth 53.0)
[17:01:58 - root] Processed utg000001c:800001.0-810000.0 (median depth 53.0)
[17:01:59 - root] Processed utg000001c:3560001.0-3570000.0 (median depth 37.0)
[17:02:00 - root] Processed utg000001c:1840001.0-1850000.2 (median depth 61.0)
[17:02:00 - root] Processed utg000001c:3060001.0-3070000.0 (median depth 72.0)
[17:02:00 - root] Processed utg000001c:1380001.0-1390000.0 (median depth 17.0)
[17:02:00 - root] Processed utg000001c:2620001.0-2630000.4 (median depth 48.0)
[17:02:00 - root] Processed utg000001c:3570001.0-3580000.0 (median depth 43.0)
[17:02:00 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2540000, end=2550000).
[17:02:01 - root] Processed utg000001c:1780001.0-1790000.0 (median depth 21.0)
[17:02:01 - root] Processed utg000001c:634256.0-640000.1 (median depth 69.0)
[17:02:01 - root] Processed utg000001c:510001.0-520000.0 (median depth 11.0)
[17:02:02 - root] Processed utg000001c:1660001.0-1670000.1 (median depth 62.0)
[17:02:02 - root] Processed utg000001c:3240001.0-3244456.2 (median depth 63.0)
[17:02:03 - root] Processed utg000001c:2320001.0-2328420.0 (median depth 59.0)
[17:02:03 - root] Processed utg000001c:3680001.0-3690000.0 (median depth 34.0)
[17:02:04 - root] Processed utg000001c:3244457.0-3250000.0 (median depth 29.0)
[17:02:04 - root] Processed utg000001c:2328421.0-2330000.2 (median depth 56.0)
[17:02:05 - root] Processed utg000001c:900001.0-910000.1 (median depth 54.0)
[17:02:05 - root] Processed utg000001c:650001.0-660000.1 (median depth 43.0)
[17:02:05 - root] Processed utg000001c:1710001.0-1720000.2 (median depth 67.0)
[17:02:05 - root] Processed utg000001c:3370001.0-3380000.2 (median depth 55.0)
[17:02:06 - root] Processed utg000001c:1960001.0-1970000.1 (median depth 88.0)
[17:02:07 - root] Processed utg000001c:3250001.0-3260000.0 (median depth 25.0)
[17:02:07 - root] Processed utg000001c:2470001.0-2480000.0 (median depth 11.0)
[17:02:07 - root] Processed utg000001c:160001.0-170000.0 (median depth 29.0)
[17:02:07 - root] Processed utg000001c:2850001.0-2860000.0 (median depth 49.0)
[17:02:08 - root] Processed utg000001c:3610001.0-3620000.0 (median depth 59.0)
[17:02:08 - root] Processed utg000001c:3580001.0-3590000.1 (median depth 15.0)
[17:02:09 - root] Processed utg000001c:820001.0-830000.0 (median depth 62.0)
[17:02:09 - root] Processed utg000001c:3280001.0-3290000.2 (median depth 65.0)
[17:02:10 - root] Processed utg000001c:1950001.0-1960000.0 (median depth 30.0)
[17:02:10 - root] Processed utg000001c:1510001.0-1520000.0 (median depth 18.0)
[17:02:10 - root] Processed utg000001c:3020001.0-3030000.0 (median depth 76.0)
[17:02:10 - root] Processed utg000001c:190001.0-200000.3 (median depth 47.0)
[17:02:10 - root] Processed utg000001c:980001.0-990000.1 (median depth 23.0)
[17:02:11 - root] Processed utg000001c:2350001.0-2360000.0 (median depth 18.0)
[17:02:11 - root] Processed utg000001c:1450001.0-1460000.0 (median depth 66.0)
[17:02:12 - root] Processed utg000001c:2260001.0-2270000.0 (median depth 26.0)
[17:02:12 - root] Processed utg000001c:2710001.0-2720000.0 (median depth 64.0)
[17:02:12 - root] Processed utg000001c:530001.0-534557.0 (median depth 57.0)
[17:02:12 - root] Processed utg000001c:3030001.0-3040000.0 (median depth 45.0)
[17:02:13 - root] Processed utg000001c:534558.0-540000.0 (median depth 10.0)
[17:02:13 - root] Processed utg000001c:2820001.0-2830000.1 (median depth 40.0)
[17:02:14 - root] Processed utg000001c:1360001.0-1370000.0 (median depth 24.0)
[17:02:14 - root] Processed utg000001c:80001.0-90000.0 (median depth 39.0)
[17:02:15 - root] Processed utg000001c:1140001.0-1150000.1 (median depth 52.0)
[17:02:16 - root] Processed utg000001c:1480001.0-1490000.1 (median depth 75.0)
[17:02:16 - root] Processed utg000001c:1370001.0-1380000.2 (median depth 45.0)
[17:02:16 - root] Processed utg000001c:2680001.0-2690000.1 (median depth 57.0)
[17:02:16 - root] Processed utg000001c:580001.0-590000.0 (median depth 31.0)
[17:02:16 - root] Processed utg000001c:2390001.0-2400000.1 (median depth 41.0)
[17:02:17 - root] Processed utg000001c:1870001.0-1880000.0 (median depth 32.0)
[17:02:17 - root] Processed utg000001c:440001.0-450000.0 (median depth 20.0)
[17:02:17 - root] Processed utg000001c:2330001.0-2340000.0 (median depth 68.0)
[17:02:18 - root] Processed utg000001c:2760001.0-2770000.0 (median depth 10.0)
[17:02:18 - root] Processed utg000001c:2250001.0-2260000.0 (median depth 15.0)
[17:02:19 - root] Processed utg000001c:2140001.0-2150000.0 (median depth 12.0)
[17:02:19 - root] Processed utg000001c:470001.0-480000.0 (median depth 39.0)
[17:02:19 - root] Processed utg000001c:1880001.0-1890000.0 (median depth 35.0)
[17:02:20 - root] Processed utg000001c:880001.0-890000.0 (median depth 15.0)
[17:02:21 - root] Processed utg000001c:1690001.0-1700000.2 (median depth 21.0)
[17:02:21 - root] Processed utg000001c:3230001.0-3240000.0 (median depth 68.0)
[17:02:21 - root] Processed utg000001c:3540001.0-3543504.1 (median depth 36.0)
[17:02:22 - root] Processed utg000001c:1440001.0-1450000.2 (median depth 75.0)
[17:02:22 - root] Processed utg000001c:3760001.0-3762623.1 (median depth 20.0)
[17:02:22 - root] Processed utg000001c:1890001.0-1900000.2 (median depth 81.0)
[17:02:23 - root] Processed utg000001c:1620001.0-1630000.1 (median depth 53.0)
[17:02:23 - root] Processed utg000001c:3180001.0-3190000.1 (median depth 72.0)
[17:02:24 - root] Processed utg000001c:1420001.0-1430000.0 (median depth 92.0)
[17:02:24 - root] Processed utg000001c:2430001.0-2440000.0 (median depth 36.0)
[17:02:24 - root] Processed utg000001c:3543505.0-3550000.1 (median depth 69.0)
[17:02:25 - root] Processed utg000001c:2750001.0-2760000.4 (median depth 50.0)
[17:02:25 - root] Processed utg000001c:850001.0-860000.0 (median depth 69.0)
[17:02:25 - root] Processed utg000001c:910001.0-920000.0 (median depth 41.0)
[17:02:27 - root] Processed utg000001c:370001.0-380000.0 (median depth 13.0)
[17:02:27 - root] Processed utg000001c:1010001.0-1020000.0 (median depth 19.0)
[17:02:27 - root] Processed utg000001c:1930001.0-1940000.1 (median depth 61.0)
[17:02:27 - root] Processed utg000001c:2160001.0-2170000.0 (median depth 40.0)
[17:02:28 - root] Processed utg000001c:2080001.0-2090000.0 (median depth 55.0)
[17:02:29 - root] Processed utg000001c:920001.0-930000.0 (median depth 21.0)
[17:02:29 - root] Processed utg000001c:1630747.0-1640000.1 (median depth 64.0)
[17:02:29 - root] Processed utg000001c:330001.0-335222.0 (median depth 45.0)
[17:02:29 - root] Processed utg000001c:640001.0-650000.0 (median depth 28.0)
[17:02:29 - root] Processed utg000001c:335223.0-340000.0 (median depth 7.0)
[17:02:30 - root] Processed utg000001c:1300001.0-1310000.1 (median depth 102.0)
[17:02:30 - root] Processed utg000001c:150001.0-160000.0 (median depth 20.0)
[17:02:31 - root] Processed utg000001c:3320001.0-3330000.1 (median depth 79.0)
[17:02:31 - root] Processed utg000001c:3590001.0-3600000.0 (median depth 33.0)
[17:02:31 - root] Processed utg000001c:1020001.0-1030000.0 (median depth 21.0)
[17:02:31 - root] Processed utg000001c:2480001.0-2490000.1 (median depth 26.0)
[17:02:32 - root] Processed utg000001c:2340001.0-2350000.1 (median depth 75.0)
[17:02:33 - root] Processed utg000001c:140001.0-150000.0 (median depth 4.0)
[17:02:33 - root] Processed utg000001c:390001.0-400000.2 (median depth 43.0)
[17:02:34 - root] Processed utg000001c:1290001.0-1300000.1 (median depth 72.0)
[17:02:34 - root] Processed utg000001c:1040001.0-1050000.1 (median depth 33.0)
[17:02:35 - root] Processed utg000001c:2880001.0-2890000.1 (median depth 21.0)
[17:02:35 - root] Processed utg000001c:3210001.0-3220000.1 (median depth 60.0)
[17:02:35 - root] Processed utg000001c:2220001.0-2228768.0 (median depth 14.0)
[17:02:35 - root] Processed utg000001c:690001.0-700000.1 (median depth 59.0)
[17:02:35 - root] Processed utg000001c:2228769.0-2230000.1 (median depth 38.0)
[17:02:36 - root] Processed utg000001c:3360001.0-3370000.1 (median depth 76.0)
[17:02:37 - root] Processed utg000001c:960001.0-970000.1 (median depth 58.0)
[17:02:37 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2530000, end=2540000).
[17:02:37 - root] Processed utg000001c:2130001.0-2140000.0 (median depth 94.0)
[17:02:38 - root] Processed utg000001c:870001.0-880000.2 (median depth 68.0)
[17:02:39 - root] Processed utg000001c:2950001.0-2960000.0 (median depth 45.0)
[17:02:39 - root] Processed utg000001c:1060001.0-1070000.0 (median depth 46.0)
[17:02:40 - root] Processed utg000001c:2940001.0-2945532.2 (median depth 78.0)
[17:02:40 - root] Processed utg000001c:670001.0-680000.0 (median depth 23.0)
[17:02:40 - root] Processed utg000001c:1520001.0-1530000.0 (median depth 21.0)
[17:02:41 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2570000, end=2580000).
[17:02:41 - root] Processed utg000001c:1240001.0-1250000.1 (median depth 88.0)
[17:02:41 - root] Processed utg000001c:3500001.0-3510000.1 (median depth 97.0)
[17:02:41 - root] Processed utg000001c:1590001.0-1600000.2 (median depth 102.0)
[17:02:42 - root] Processed utg000001c:2945533.0-2950000.0 (median depth 44.0)
[17:02:42 - root] Processed utg000001c:1980001.0-1990000.2 (median depth 75.0)
[17:02:42 - root] Processed utg000001c:930001.0-933194.0 (median depth 17.0)
[17:02:43 - root] Processed utg000001c:3600001.0-3610000.0 (median depth 25.0)
[17:02:43 - root] Processed utg000001c:2800001.0-2810000.0 (median depth 5.0)
[17:02:43 - root] Processed utg000001c:2090001.0-2100000.0 (median depth 23.0)
[17:02:43 - root] Processed utg000001c:933195.0-940000.3 (median depth 25.0)
[17:02:44 - root] Processed utg000001c:430001.0-434861.0 (median depth 59.0)
[17:02:44 - root] Processed utg000001c:770001.0-780000.0 (median depth 64.0)
[17:02:44 - root] Processed utg000001c:434862.0-440000.0 (median depth 6.0)
[17:02:44 - root] Processed utg000001c:3150001.0-3160000.1 (median depth 54.0)
[17:02:45 - root] Processed utg000001c:3290001.0-3300000.2 (median depth 44.0)
[17:02:46 - root] Processed utg000001c:2870001.0-2880000.0 (median depth 26.0)
[17:02:46 - root] Processed utg000001c:2020001.0-2029389.0 (median depth 16.0)
[17:02:46 - root] Processed utg000001c:200001.0-210000.2 (median depth 35.0)
[17:02:46 - root] Processed utg000001c:3120001.0-3130000.0 (median depth 17.0)
[17:02:47 - root] Processed utg000001c:2270001.0-2280000.1 (median depth 9.0)
[17:02:47 - root] Processed utg000001c:1120001.0-1130000.0 (median depth 55.0)
[17:02:47 - root] Processed utg000001c:490001.0-500000.0 (median depth 21.0)
[17:02:48 - root] Processed utg000001c:3690001.0-3700000.0 (median depth 55.0)
[17:02:48 - root] Processed utg000001c:1700001.0-1710000.1 (median depth 84.0)
[17:02:48 - root] Processed utg000001c:3720001.0-3730000.0 (median depth 53.0)
[17:02:48 - root] Processed utg000001c:230001.0-235528.0 (median depth 50.0)
[17:02:50 - root] Processed utg000001c:235529.0-240000.0 (median depth 37.0)
[17:02:50 - root] Processed utg000001c:730001.0-733907.0 (median depth 55.0)
[17:02:50 - root] Processed utg000001c:733908.0-740000.0 (median depth 10.0)
[17:02:50 - root] Processed utg000001c:280001.0-290000.0 (median depth 27.0)
[17:02:50 - root] Processed utg000001c:1720001.0-1730000.0 (median depth 26.0)
[17:02:51 - root] Processed utg000001c:1080001.0-1090000.2 (median depth 20.0)
[17:02:52 - root] Processed utg000001c:2740001.0-2746118.0 (median depth 22.0)
[17:02:52 - root] Processed utg000001c:3400001.0-3410000.0 (median depth 70.0)
[17:02:52 - root] Processed utg000001c:2170001.0-2180000.0 (median depth 81.0)
[17:02:53 - root] Processed utg000001c:1350001.0-1360000.0 (median depth 100.0)
[17:02:53 - root] Processed utg000001c:3510001.0-3520000.0 (median depth 13.0)
[17:02:53 - root] Processed utg000001c:1430001.0-1431388.6 (median depth 20.0)
[17:02:53 - root] Processed utg000001c:2240001.0-2250000.1 (median depth 90.0)
[17:02:54 - root] Processed utg000001c:2746119.0-2750000.1 (median depth 64.0)
[17:02:54 - root] Processed utg000001c:760001.0-770000.0 (median depth 22.0)
[17:02:54 - root] Processed utg000001c:3310001.0-3320000.1 (median depth 51.0)
[17:02:54 - root] Processed utg000001c:1810001.0-1820000.0 (median depth 23.0)
[17:02:55 - root] Processed utg000001c:2290001.0-2300000.1 (median depth 22.0)
[17:02:55 - root] Processed utg000001c:1431389.0-1440000.3 (median depth 31.0)
[17:02:56 - root] Processed utg000001c:2380001.0-2390000.1 (median depth 14.0)
[17:02:56 - root] Processed utg000001c:1160001.0-1170000.0 (median depth 25.0)
[17:02:57 - root] Processed utg000001c:3440001.0-3443822.2 (median depth 62.0)
[17:02:57 - root] Processed utg000001c:3090001.0-3100000.0 (median depth 44.0)
[17:02:57 - root] Processed utg000001c:2650001.0-2660000.0 (median depth 109.0)
[17:02:58 - root] Processed utg000001c:3443823.0-3450000.0 (median depth 12.0)
[17:02:58 - root] Processed utg000001c:810001.0-820000.1 (median depth 36.0)
[17:02:59 - root] Processed utg000001c:2030001.0-2040000.0 (median depth 60.0)
[17:02:59 - root] Processed utg000001c:120001.0-130000.0 (median depth 15.0)
[17:02:59 - root] Processed utg000001c:2990001.0-3000000.1 (median depth 81.0)
[17:03:00 - root] Processed utg000001c:1900001.0-1910000.2 (median depth 74.0)
[17:03:00 - root] Processed utg000001c:100001.0-110000.0 (median depth 19.0)
[17:03:00 - root] Processed utg000001c:2410001.0-2420000.0 (median depth 58.0)
[17:03:00 - root] Processed utg000001c:1640001.0-1650000.0 (median depth 15.0)
[17:03:01 - root] Processed utg000001c:1760001.0-1770000.0 (median depth 38.0)
[17:03:02 - root] Processed utg000001c:2050001.0-2060000.0 (median depth 12.0)
[17:03:02 - root] Processed utg000001c:3100001.0-3110000.0 (median depth 44.0)
[17:03:03 - root] Processed utg000001c:210001.0-220000.4 (median depth 75.0)
[17:03:03 - root] Processed utg000001c:400001.0-410000.1 (median depth 80.0)
[17:03:04 - root] Processed utg000001c:250001.0-260000.0 (median depth 64.0)
[17:03:05 - root] Processed utg000001c:3350001.0-3360000.0 (median depth 18.0)
[17:03:06 - root] Processed utg000001c:1260001.0-1270000.0 (median depth 75.0)
[17:03:06 - root] Processed utg000001c:3220001.0-3230000.0 (median depth 82.0)
[17:03:06 - root] Processed utg000001c:2720001.0-2730000.0 (median depth 58.0)
[17:03:06 - root] Processed utg000001c:2830001.0-2840000.0 (median depth 64.0)
[17:03:07 - root] Processed utg000001c:1750001.0-1760000.1 (median depth 79.0)
[17:03:07 - root] Processed utg000001c:340001.0-350000.0 (median depth 68.0)
Traceback (most recent call last):
File "/home/ziels/virtual-envs/medaka/bin/hp_compress", line 11, in
sys.exit(main())
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 650, in main
args.func(args)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 615, in choose_feature_func
training_batches(args)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 514, in training_batches
write_yaml_data(fname, to_save)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py", line 263, in write_yaml_data
hdf[group] = yaml.dump(d)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/init.py", line 217, in dump
return dump_all([data], stream, Dumper=Dumper, **kwds)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/init.py", line 196, in dump_all
dumper.represent(data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 26, in represent
node = self.represent_data(data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 57, in represent_data
node = self.yaml_representers[None](self, data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 229, in represent_undefined
raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', Counter({4: 2748771, 1: 969842, 2: 954640, 0: 936965, 3: 933782}))
I am able to run medaka_consensus okay with the walkthrough dataset, and so I believe that medaka is installed okay. Do you have any ideas on what could be causing this yaml- associated error above?
Thanks!
Hi, I installed medaka into /python/Python-3.6.3/bin using:
pip install medaka
I wanted to use medaka _variant to call variants, but I didn't find it in the installed directory:
ls medaka*
medaka medaka_consensus medaka_counts medaka_data_path medaka_version_report
Did I miss something?
Thanks in advance!
Dear developper,
i`m using medaka train with the option ephoc set to 10 but I noticed that the software never finish. It is one day that it is doing nothing. Additionally, the software medaka fix does not exists.
Cheers
Luigi
Hi,
I am trying to use the specified model by follow the commands on Walkthrough. In the last step, medaka_consensus command went wrong. And get the following error message.
RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (training/model.best.val.hdf5)
Failed to run medaka consensus.
To use a model run medaka_consensus for the default model (specifying the model using the -m option):
cd ${WALKTHROUGH}
source ${MEDAKA}
CONSENSUS=consensus_trained
MODEL=${TRAINNAME}/model.best.val.hdf5
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m ${MODEL}
full message below.
(medaka) [JH@bip7 medaka_walkthrough]$ cd ${WALKTHROUGH}
(medaka) [JH@bip7 medaka_walkthrough]$ source ${MEDAKA}
(medaka) [JH@bip7 medaka_walkthrough]$ CONSENSUS=consensus_trained
(medaka) [JH@bip7 medaka_walkthrough]$ MODEL=${TRAINNAME}/model.best.val.hdf5
(medaka) [JH@bip7 medaka_walkthrough]$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m ${MODEL}
Checking program versions
Program Version Required Pass
bgzip 1.9 1.9 True
minimap2 2.11 2.11 True
samtools 1.9 1.9 True
tabix 1.9 1.9 True
Warning: Output consensus_trained already exists, may use old results.
Not aligning basecalls to draft, calls_to_draft.bam exists.
Running medaka consensus
Traceback (most recent call last):
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 37, in call
val = model_dict[val]
KeyError: 'training/model.best.val.hdf5'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/b_disk/JH/medaka_walkthrough/medaka/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.6.2', 'console_scripts', 'medaka')()
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 249, in main
args = parser.parse_args()
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1730, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1762, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1950, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1927, in consume_positionals
take_action(action, args)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1836, in take_action
action(self, namespace, argument_values, option_string)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1133, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1762, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1968, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1908, in consume_optional
take_action(action, args, option_string)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1836, in take_action
action(self, namespace, argument_values, option_string)
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 41, in call
self.dest, val)
RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (training/model.best.val.hdf5)
Failed to run medaka consensus.
So I changed the other files under the medaka/medaka/data/ to test.
I changed to r941_flip235_model.hdf5 and r941_trans_model.hdf5 and I will finish running successfully.
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m medaka/medaka/data/r941_flip235_model.hdf5
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m medaka/medaka/data/r941_trans_model.hdf5
I change to r941_213_model.hdf5 and the same error message will appear.
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o training_213 -t ${NPROC} -m medaka/medaka/data/r941_213_model.hdf5
error message
RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (/b_disk/JH/medaka_walkthrough/medaka/medaka/data/r941_213_model.hdf5)
Failed to run medaka consensus.
I want to use my own data to train the model, but I will get an error message according to the instructions on the Walkthrough. I want to know how to solve this problem.
While following the instructions, I found a small problem. There is no --max_label_len parameter in features, but it is used in train. Removing the --max_label_len parameter in the command should not affect the final result.
cd ${WALKTHROUGH}
source ${MEDAKA}
REFNAME=utg000001c
TRAINEND=3762624
TRAINFEATURES=train_features.hdf
FRACTION="0.1 1"
BATCHSIZE=200
MODEL_FEAT_OPT=medaka/medaka/data/medaka_model.hdf5
medaka features ${CALLS2DRAFT}.bam ${TRAINFEATURES} --truth ${TRUTH2DRAFT}.bam --threads ${NPROC} --region ${REFNAME}:-${TRAINEND} --batch_size ${BATCHSIZE} --read_fraction ${FRACTION} --chunk_len 1000 --chunk_ovlp 0 --model ${MODEL_FEAT_OPT} --max_label_len 1
Thank you.
Commit 2f896a7
I am getting the following error while polishing a 550Mb genome with about 60x coverage. The assembly has 1169 contigs with an N50 of about 3Mb. Gut fealing is, that the loading of the feature.hdf fails silently. The mapping bam file is 65G and the resulting feature.hdf only 104Mb.
[17:14:55 - medaka.compress] Skipping sample contig_1034:1.0-1072.0 which has 1435 columns < min 10000.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 247, in bam_to_sample
logging.info('Processed {} (median depth {})'.format(encode_sample_name(sample), np.median(depth_array)))
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 171, in encode_sample_name
p['major'][0] + 1, p['minor'][0],
IndexError: index 0 is out of bounds for axis 0 with size 0
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/bin/hp_compress", line 11, in <module>
load_entry_point('medaka==0.3.0', 'console_scripts', 'hp_compress')()
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 650, in main
args.func(args)
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 617, in choose_feature_func
features(args)
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 404, in features
overlap=args.chunk_ovlp)
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 215, in write_samples_to_hdf
for s in samples:
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 330, in alphabet_filter
for s in sample_gen:
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 514, in <genexpr>
return (c for s in samples for c in chunk_sample(s, chunk_len=chunk_len, overlap=overlap))
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 345, in min_positions_filter
for s in sample_gen:
File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 398, in <genexpr>
samples = (s for g in sample_gens for s in g)
File "/lib/python3.5/multiprocessing/pool.py", line 731, in next
raise value
IndexError: index 0 is out of bounds for axis 0 with size 0
Medaka started with the following after mapping:
Using TensorFlow backend.
{'max_hp_len': 1, 'is_compressed': False, 'ref_mode': None, 'with_depth': False, 'consensus_as_ref': False, 'log_min': None, 'normalise': 'total'}
[17:11:09 - root] FeatureEncoder options:
max_hp_len: 1
is_compressed: False
ref_mode: None
with_depth: False
consensus_as_ref: False
log_min: None
normalise: total
[17:11:09 - root] Creating consensus features.
Any ideas where to start debugging?
After polishing with medaka it appears that I have fewer sequences than I put in. I assume that medaka gets rid of some of them for some reason e.g. they do not get polished.
Any chance that you can include an option to keep all sequences similarly to the "--include-unpolished" option in racon?
Dear medaka developers,
I run medaka on a 2.7 Gb genome to polish with about 20x nanopore data after Racon polishing.
Stitching is now running for 1.5 days. I gave it 32 threads to work with. Is this an expected runtime? It seems superslow for the last step.
I am a bit worried as i have larger datasets to polish in the future with a 3 Gb genome of about 60x nanopore data.
Could you tell me how to improve performance with medaka to speed up the process.
Racon presents another major bottleneck in runtime but i can parallelize it by chopping to genome. Does something similar work for medaka?
As medaka creates the alignments itself, i am not sure if a bold minimap2-bam could be fed into a sepearte directory for subsequenes of the genome.
Any tips would be apprecieated,
Michel
Hello,
We are getting an error at the stitch.py step of medaka 0.7.0
trying to run on CHM13 sample.
These are the two commands we are running:
medaka consensus \
--model r941_flip213 \
--threads 64 \
<path_to>/CHM13.shasta.racon4x.hg38_chrX.bam \
<path_to>/CHM13_medaka_consensus_prob.hdf 2>&1 | tee <path_to>/consensus.log
Which ends successfully. And we run:
medaka stitch \
<path_to>/CHM13_medaka_consensus_prob.hdf \
<path_to>/CHM13_shasta_racon_medaka_consensus.fasta
The Log:
[14:52:39 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[14:52:40 - Stitch] Processing 1022.
[14:52:40 - Stitch] Processing 1096.
[14:52:40 - Stitch] Processing 138.
[14:52:47 - Stitch] Processing 156.
[14:53:12 - Stitch] Processing 1562.
[14:53:12 - Stitch] Processing 1564.
[14:53:12 - Stitch] Processing 1622.
[14:53:13 - Stitch] Processing 164.
[14:54:24 - Stitch] Processing 180.
[14:54:39 - Stitch] Processing 280.
[14:54:41 - Stitch] Processing 342.
[14:54:41 - Stitch] Processing 358.
[14:54:41 - Stitch] Processing 36.
[14:57:42 - Stitch] Processing 360.
[14:57:43 - Stitch] Processing 40.
[15:01:08 - Stitch] Processing 44.
Traceback (most recent call last):
File "/home/kishwar/software/medaka/venv/bin/medaka", line 11, in <module>
load_entry_point('medaka==0.7.0', 'console_scripts', 'medaka')()
File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/medaka.py", line 350, in main
args.func(args)
File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/stitch.py", line 85, in stitch
joined = stitch_from_probs(args.inputs, regions=args.regions)
File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/stitch.py", line 66, in stitch_from_probs
end_1_ind, start_2_ind = medaka.common.Sample.overlap_indices(s1, s2)
File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/common.py", line 256, in overlap_indices
raise OverlapException(msg.format(s1.name, s2.name, repr(rel)))
medaka.common.OverlapException: Cannot overlap samples 44:54257964.0-54264156.0 and 44:54264162.0-54265439.6 with relationhip <Relationship.forward_gapped: 's2 follows s1 with a gab inbetween.'>
Things we have tried:
I have made these two files available for you if you want to look into this.
wget https://storage.googleapis.com/kishwar-helen/medaka_error_issue/CHM13.shasta.racon4x.hg38_chrX.bam
wget https://storage.googleapis.com/kishwar-helen/medaka_error_issue/CHM13.shasta.racon_4x.hg38_chrX.fa
Please let me know if you can help in this regard.
i am using Conda Env.
Traceback (most recent call last):
File "medaka.py", line 38, in call
val = model_dict[val]
KeyError: 'r94'
Hello,
I am having trouble with the installation of Medaka. I tried conda and pip, pip, as recommended and build from the source but the following error occur:
Nadas-MBP:~ nadakubikova$ conda install -c bioconda medaka
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
Nadas-MBP:~ nadakubikova$ pip install medaka
Collecting medaka
Could not find a version that satisfies the requirement medaka (from versions: )
No matching distribution found for medaka
Nadas-MBP:~ nadakubikova$ virtualenv medaka --python=python3 --prompt "(medaka) "
Running virtualenv with interpreter /Users/nadakubikova/anaconda3/bin/python3
Using base prefix '/Users/nadakubikova/anaconda3'
/Library/Python/2.7/site-packages/virtualenv.py:1041: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Overwriting /Users/nadakubikova/medaka/lib/python3.7/orig-prefix.txt with new content
New python executable in /Users/nadakubikova/medaka/bin/python3
Not overwriting existing python script /Users/nadakubikova/medaka/bin/python (you must use /Users/nadakubikova/medaka/bin/python3)
Installing setuptools, pip, wheel...done.
Nadas-MBP:~ nadakubikova$ . medaka/bin/activate
(medaka) Nadas-MBP:~ nadakubikova$ pip install medaka
Collecting medaka
Could not find a version that satisfies the requirement medaka (from versions: )
No matching distribution found for medaka
Both conda and pip are up to date. Any thoughts?
Thanks,
Nada
A medaka_variant process which was running under version v0.7.0-alpha.1 just crashed, error message below.
Let me know if there are any (intermediate) files you would like. This is NA19240 at 20x coverage, full genome. I will restart it but parallelized in windows of 1Mb using the -r option, so I'm not sure if I'll reproduce this issue. I don't know if this bug will still be there be in the most recent version, so feel free to close this if you believe it is already solved.
======================================
Running medaka variant with threshold 1
======================================
[09:50:44 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
Traceback (most recent call last):
File "/home/wdecoster/anaconda3/envs/medaka/bin/medaka", line 10, in <module>
sys.exit(main())
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
args.func(args)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 460, in variants_from_hdf
decoder = decoder_cls(index.meta, ref_vcf=args.ref_vcf)
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 160, in __init__
self.feature_row_names = [fmt_feat(x) for x in meta['medaka_feature_decoding']]
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 160, in <listcomp>
self.feature_row_names = [fmt_feat(x) for x in meta['medaka_feature_decoding']]
File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 159, in <lambda>
fmt_feat = lambda x: '{}{}{}'.format(x[0], 'rev' if x[1] else 'fwd', x[3] * (x[2] if x[2] is not None else '-'))
IndexError: tuple index out of range
Default TensorFlow allocates all available GPUs on machine, but only compute with first GPU and the rest of GPUs remain idle.
Maybe allow to specify GPU device like guppy -x "coda:0"
?
I am using medaka_variant
for PromethION data of a human genome and notice that after the medaka_consensus
step the program starts hanging. It has reached 100%, but hasn't moved to the next stage (calling medaka snp). Is this the same issue as #42? Or should I just be more patient?
medaka consensus is now using one process, about 50GB of RAM (according to htop) and hasn't edited its hdf file round_0_hap_mixed_probs.hdf
in the last 4 days.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.