Coder Social home page Coder Social logo

nanoporetech / medaka Goto Github PK

View Code? Open in Web Editor NEW
391.0 45.0 73.0 63.61 MB

Sequence correction provided by ONT Research

Home Page: https://nanoporetech.com

License: Other

Makefile 1.19% Python 81.44% Shell 2.54% C 12.84% C++ 1.89% Dockerfile 0.11%
nanopore medaka

medaka's Introduction

Oxford Nanopore Technologies logo

Medaka

install with bioconda

medaka is a tool to create consensus sequences and variant calls from nanopore sequencing data. This task is performed using neural networks applied a pileup of individual sequencing reads against a draft assembly. It provides state-of-the-art results outperforming sequence-graph based methods and signal-based methods, whilst also being faster.

© 2018- Oxford Nanopore Technologies Ltd.

Features

  • Requires only basecalled data. (.fasta or .fastq)
  • Improved accuracy over graph-based methods (e.g. Racon).
  • 50X faster than Nanopolish (and can run on GPUs).
  • Includes extras for implementing and training bespoke correction networks.
  • Works on Linux and MacOS.
  • Open source (Mozilla Public License 2.0).

For creating draft assemblies we recommend Flye.

Installation

Medaka can be installed in one of several ways.

Installation with pip

Official binary releases of medaka are available on PyPI and can be installed using pip:

pip install medaka

On Linux platforms this will install a precompiled binary, on MacOS (and other) platforms this will fetch and compile a source distribution.

We recommend using medaka within a virtual environment, viz.:

virtualenv medaka --python=python3 --prompt "(medaka) "
. medaka/bin/activate
pip install --upgrade pip
pip install medaka

Using this method requires the user to provide several binaries:

and place these within the PATH. samtools/bgzip/tabix version 1.14 and minimap2 version 2.17 are recommended as these are those used in development of medaka. (Newer versions are almost certainly fine).

Installation with conda

The bioconda medaka packages are no longer supported by Oxford Nanopore Technologies.

For those who prefer the conda package manager, medaka is available via the bioconda channel:

conda create -n medaka -c conda-forge -c bioconda medaka

The bioconda releases lag behind the source code and PyPI releases.

Installation from source

This method is useful for macOS M1 devices as it will assist in building dependencies which will fail with the other methods above.

Medaka can be installed from its source quite easily on most systems.

Before installing medaka it may be required to install some prerequisite libraries, best installed by a package manager. On Ubuntu theses are:

bzip2 g++ zlib1g-dev libbz2-dev liblzma-dev libffi-dev libncurses5-dev
libcurl4-gnutls-dev libssl-dev curl make cmake wget python3-all-dev
python-virtualenv

In addition it is required to install and set up git LFS before cloning the repository.

A Makefile is provided to fetch, compile and install all direct dependencies into a python virtual environment. To set-up the environment run:

# Note: certain files are stored in git-lfs, https://git-lfs.github.com/,
#       which must therefore be installed first.
git clone https://github.com/nanoporetech/medaka.git
cd medaka
make install
. ./venv/bin/activate

Using this method both samtools and minimap2 are built from source and need not be provided by the user.

Using a GPU

Since version 1.1.0 medaka uses Tensorflow 2, prior versions used Tensorflow 1. For medaka 1.1.0 and higher installation from source or using pip can make immediate use of GPUs. However, note that the tensorflow package is compiled against specific versions of the NVIDIA CUDA and cuDNN libraries; users are directed to the tensorflow installation pages for further information. cuDNN can be obtained from the cuDNN Archive, whilst CUDA from the CUDA Toolkit Archive.

For medaka prior to version 1.1.0, to enable the use of GPU resource it is necessary to install the tensorflow-gpu package. Using the source code from github a working GPU-powered medaka can be configured with:

# Note: certain files are stored in git-lfs, https://git-lfs.github.com/,
#       which must therefore be installed first.
git clone https://github.com/nanoporetech/medaka.git
cd medaka
sed -i 's/tensorflow/tensorflow-gpu/' requirements.txt
make install

GPU Usage notes

Depending on your GPU, medaka may show out of memory errors when running. To avoid these the inference batch size can be reduced from the default value by setting the -b option when running medaka_consensus. A value -b 100 is suitable for 11Gb GPUs.

For users with RTX series GPUs it may be required to additionally set an environment variable to have medaka run without failure:

export TF_FORCE_GPU_ALLOW_GROWTH=true

In this situation a further reduction in batch size may be required.

Using Docker

The source code repository contains a Dockerfile which can be used to create a GPU compatible Docker container image with the appropriate CUDA and cuDNN library versions for running medaka. The image is built on top of images provided by NVIDIA designed to run with the NVIDIA Container Toolkit. With the toolkit setup on your host computer the following command can be used to run the latest version of medaka:

docker run --rm --gpus 0 ontresearch/medaka:latest medaka --help

(The --gpus option can be amended as appropriate for your environment). Versioned tags are also available.

Usage

medaka can be run using its default settings through the medaka_consensus program. An assembly in .fasta format and basecalls in .fasta or .fastq formats are required. The program uses both samtools and minimap2. If medaka has been installed using the from-source method these will be present within the medaka environment, otherwise they will need to be provided by the user.

source ${MEDAKA}  # i.e. medaka/venv/bin/activate
NPROC=$(nproc)
BASECALLS=basecalls.fa
DRAFT=draft_assm/assm_final.fa
OUTDIR=medaka_consensus
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r941_min_high_g303

The variables BASECALLS, DRAFT, and OUTDIR in the above should be set appropriately. For the selection of the model (-m r941_min_high_g303 in the example above) see the Model section following.

When medaka_consensus has finished running, the consensus will be saved to ${OUTDIR}/consensus.fasta.

Bacterial (ploidy-1) variant calling

Variant calling for monoploid samples is enabled through the medaka_haploid_variant workflow:

medaka_haploid_variant -i <reads.fastq> -r <ref.fasta>

which requires the reads as a .fasta or .fastq and a reference sequence as a .fasta file.

Diploid variant calling

The diploid variant calling workflow medaka_variant that was historically implemented within the medaka package has been surpassed in accuracy and compute performance by other methods, it has therefore been deprecated. Our current recommendation for performing this task is to use Clair3 either directly or through the Oxford Nanopore Technologies provided Nextflow implementation available through EPI2ME Labs.

Models

For best results it is important to specify the correct model, -m in the above, according to the basecaller used. Allowed values can be found by running medaka tools list\_models.

Recent basecallers

Recent basecaller versions annotate their output with their model version. In such cases medaka can inspect the files and attempt to select an appropriate model for itself. This typically works best in the case of BAM output from basecallers. It will work also for FASTQ input provided the FASTQ has been created from basecaller output using:

samtools fastq -T '*' dorado.bam | gzip -c > dorado.fastq.gz

The command medaka consensus will attempt to automatically determine a correct model by inspecting its BAM input file. The helper scripts medaka_consensus and medaka_haploid_variant will make similar attempts from their FASTQ input.

To inspect files for yourself, the command:

medaka tools resolve_model --auto_model <consensus/variant> <input.bam/input.fastq>

will print the model that automatic model selection will use.

For older basecallers and when automatic selection is unsuccessful

If the name of the basecaller model used is known, but has been lost from the input files, the basecaller model can been provided to medaka directly. It must however be appended with either :consensus or :variant according to whether the user wishing to use the consensus or variant calling medaka model. For example:

medaka consensus input.bam output.hdf \
    --model [email protected]:variant

will use the medaka variant calling model appropriate for use with the basecaller model named [email protected].

Medaka models are named to indicate i) the pore type, ii) the sequencing device (MinION or PromethION), iii) the basecaller variant, and iv) the basecaller version, with the format:

{pore}_{device}_{caller variant}_{caller version}

For example the model named r941_min_fast_g303 should be used with data from MinION (or GridION) R9.4.1 flowcells using the fast Guppy basecaller version 3.0.3. By contrast the model r941_prom_hac_g303 should be used with PromethION data and the high accuracy basecaller (termed "hac" in Guppy configuration files). Where a version of Guppy has been used without an exactly corresponding medaka model, the medaka model with the highest version equal to or less than the guppy version should be selected.

Improving parallelism

The medaka_consensus program is good for simple datasets but perhaps not optimal for running large datasets at scale. A higher level of parallelism can be achieved by running independently the component steps of medaka_consensus. The program performs three tasks:

  1. alignment of reads to input assembly (via mini_align which is a thin veil over minimap2)
  2. running of consensus algorithm across assembly regions (medaka consensus, note no underscore!)
  3. aggregation of the results of 2. to create consensus sequences (medaka stitch)

The three steps are discrete, and can be split apart and run independently. In most cases, Step 2. is the bottleneck and can be trivially parallelized. The medaka consensus program can be supplied a --regions argument which will restrict its action to particular assembly sequences from the .bam file output in Step 1. Therefore individual jobs can be run for batches of assembly sequences simultaneously. In the final step, medaka stitch can take as input one or more of the .hdf files output by Step 2.

So in summary something like this is possible:

# align reads to assembly
mini_align -i basecalls.fasta -r assembly.fasta -P -m \
    -p calls_to_draft.bam -t <threads>
# run lots of jobs like this, change model as appropriate
mkdir results
medaka consensus calls_to_draft.bam results/contigs1-4.hdf \
    --model r941_min_fast_g303 --batch 200 --threads 8 \
    --region contig1 contig2 contig3 contig4
...
# wait for jobs, then collate results
medaka stitch results/*.hdf polished.assembly.fasta

It is not recommended to specify a value of --threads greater than 2 for medaka consensus since the compute scaling efficiency is poor beyond this. Note also that medaka consensus may been seen to use resources equivalent to <threads> + 4 as an additional 4 threads are used for reading and preparing input data.

Origin of the draft sequence

Medaka has been trained to correct draft sequences output from the Flye assembler.

Processing a draft sequence from alternative sources (e.g. the output of canu or wtdbg2) may lead to different results.

Historical correction models in medaka were trained to correct draft sequences output from the canu assembler with racon applied either once, or four times iteratively. For contemporary models this is not the case and medaka should be used directly on the output of Flye.

Acknowledgements

We thank Joanna Pineda and Jared Simpson for providing htslib code samples which aided greatly development of the optimised feature generation code, and for testing the version 0.4 release candidates.

We thank Devin Drown for working through use of medaka with his RTX 2080 GPU.

Help

Licence and Copyright

© 2018- Oxford Nanopore Technologies Ltd.

medaka is distributed under the terms of the Mozilla Public License 2.0.

Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

medaka's People

Contributors

aineniamh avatar asaont avatar ccoulombe avatar cjw85 avatar dbrocklebank avatar ftostevin-ont avatar halfphoton avatar hb-nanopore avatar jchorl avatar julibeg avatar mwykes avatar philres avatar samstudio8 avatar sarahjeeeze avatar sry002 avatar tkonopka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medaka's Issues

Processing Short Regions Apparent Error

I'm attempting to run medaka on an unpolished assembly and I seem to have encountered an error. I don't recieve any error message and the tool is currently "running" but doesn't seem to be doing anything. I've posted the last message medaka created below. The time stamp is from over 12 hours ago.

[06:29:02 - Predict] Processing short regions
[06:29:02 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[06:29:04 - ModelLoad] Loading weights from /local/ifs2_projdata/9043/projects/MGX/localenv/anaconda/envs/medaka/lib/python3.6/site-packages/medaka-0.6.0a2-py3.6-linux-x86_64.egg/medaka/data/medaka_model.hdf5
[06:29:05 - PWorker] Running inference for 0.0M draft bases.
[06:29:05 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg865:0-7228.
[06:29:05 - Feature] Pileup-feature is zero-length for Consensus_Consensus_Consensus_ctg865:0-7228 indicating no reads in this region.
[06:29:05 - Sampler] Took 0.13s to make features.

Advice for medaka_consensus speed increases for metagenomes

Hello,

Thank you for the recent update! The speed does seem to have improved, and I understand the program does take time, however I am wondering if there is anything I can do to process my metagenomes faster. There is a 24hr limit on our HPC cluster, and the processes aren't finishing in time before they are killed. It still seems to be the short region processing slowing things down, and only one thread is being used even though I specified to use all the available threads (24) on a ~700 GB RAM node. Is it possible for me to make use of more threads at this point?

The command I am currently using is:

medaka_consensus -i $NPFASTQ -d draft_raconNP.fa -o medaka -t 24 -m r941_flip213

And the timing:

[14:52:29 - Predict] Processing 28934 long region(s) with batching.
...
[19:37:00 - Predict] Processing 11274 short region(s).
...
[14:11:07 - Sampler] Pileup for tig00053666:0.0-7357.0 is of width 8279 ### short read processing still going the next day

I am testing it on a draft assembly (metagenome) that is 700 Mbp in size (all contigs >4 kbp, mean length 24kb), but this is our smallest assembly - the others are between 1-3 Gbp. Is there something I can do, such as in issue #39, like subsetting the assemblies, and recombining the split assembly at the end?

If the process is killed due to the 24 hr limit, can medaka pick up where it left off with the short reads if I restart the command?

Thank you for your help!

IndexError: index 0 is out of bounds for axis 0 with size 0

Commit 2f896a7

I am getting the following error while polishing a 550Mb genome with about 60x coverage. The assembly has 1169 contigs with an N50 of about 3Mb. Gut fealing is, that the loading of the feature.hdf fails silently. The mapping bam file is 65G and the resulting feature.hdf only 104Mb.

[17:14:55 - medaka.compress] Skipping sample contig_1034:1.0-1072.0 which has 1435 columns < min 10000.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 247, in bam_to_sample
    logging.info('Processed {} (median depth {})'.format(encode_sample_name(sample), np.median(depth_array)))
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 171, in encode_sample_name
    p['major'][0] + 1, p['minor'][0],
IndexError: index 0 is out of bounds for axis 0 with size 0
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/bin/hp_compress", line 11, in <module>
    load_entry_point('medaka==0.3.0', 'console_scripts', 'hp_compress')()
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 650, in main
    args.func(args)
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 617, in choose_feature_func
    features(args)
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 404, in features
    overlap=args.chunk_ovlp)
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 215, in write_samples_to_hdf
    for s in samples:
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 330, in alphabet_filter
    for s in sample_gen:
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/common.py", line 514, in <genexpr>
    return (c for s in samples for c in chunk_sample(s, chunk_len=chunk_len, overlap=overlap))
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 345, in min_positions_filter
    for s in sample_gen:
  File "/lib/python3.5/site-packages/medaka-0.3.0-py3.5.egg/medaka/compress.py", line 398, in <genexpr>
    samples = (s for g in sample_gens for s in g)
  File "/lib/python3.5/multiprocessing/pool.py", line 731, in next
    raise value
IndexError: index 0 is out of bounds for axis 0 with size 0

Medaka started with the following after mapping:

Using TensorFlow backend.
{'max_hp_len': 1, 'is_compressed': False, 'ref_mode': None, 'with_depth': False, 'consensus_as_ref': False, 'log_min': None, 'normalise': 'total'}
[17:11:09 - root] FeatureEncoder options: 
max_hp_len: 1
is_compressed: False
ref_mode: None
with_depth: False
consensus_as_ref: False
log_min: None
normalise: total
[17:11:09 - root] Creating consensus features.

Any ideas where to start debugging?

stuck in a loop

Hi, I have been running medaka on a GPU machine, but it looks like the call consensus gets stuck in a loop after a while.

[23:00:04 - PWorker] All done, 0 remainder regions.
[23:00:04 - PWorker] Running inference for 0.0M draft bases.
[23:00:04 - Sampler] Initializing sampler for consensus of region 4684:48-3642.
[23:00:04 - Feature] Processed 4684:48.0-3641.0 (median depth 4.0)
[23:00:04 - Sampler] Took 0.13s to make features.
[23:00:04 - Sampler] Pileup for 4684:48.0-3641.0 is of width 4324
[23:02:19 - PWorker] All done, 0 remainder regions.
[23:02:19 - PWorker] Running inference for 0.0M draft bases.
[23:02:19 - Sampler] Initializing sampler for consensus of region 6506:749-8932.
[23:02:19 - Feature] Processed 6506:749.0-8931.0 (median depth 1.0)
[23:02:19 - Sampler] Took 0.13s to make features.
[23:02:19 - Sampler] Pileup for 6506:749.0-8931.0 is of width 8875
[23:04:34 - PWorker] All done, 0 remainder regions.
[23:04:34 - PWorker] Running inference for 0.0M draft bases.
[23:04:34 - Sampler] Initializing sampler for consensus of region 2556:0-7245.
[23:04:34 - Feature] Processed 2556:0.0-7244.0 (median depth 3.0)
[23:04:34 - Sampler] Took 0.12s to make features.
[23:04:34 - Sampler] Pileup for 2556:0.0-7244.0 is of width 8008

Also, you guys reported that polishing human genome takes ~5h. I have tried almost all possible parameters to make things finish under 10h, but the stitch itself takes more than 5 hours. Am I missing something here?

make install fails if -j is too big

During execution of make install if additional argument -j is added to make ( like make -j 4 install ) build fails, because of

/usr/bin/ld: htslib-1.9/libhts.a(hfile_s3.o): in function `s3_sign':           
hfile_s3.c:(.text+0x306): undefined reference to `EVP_sha1'                       
/usr/bin/ld: hfile_s3.c:(.text+0x32a): undefined reference to `HMAC'                                
collect2: error: ld returned 1 exit status                               
make[1]: *** [Makefile:144: samtools] Error 1

Everything works when -j is no greater than 2.

medaka_variant hanging

I am using medaka_variant for PromethION data of a human genome and notice that after the medaka_consensus step the program starts hanging. It has reached 100%, but hasn't moved to the next stage (calling medaka snp). Is this the same issue as #42? Or should I just be more patient?
medaka consensus is now using one process, about 50GB of RAM (according to htop) and hasn't edited its hdf file round_0_hap_mixed_probs.hdf in the last 4 days.

Error with v1.4.3.

Hi,

I am getting the following error with v1.4.3. Also to use use the flip-flop basecaller from Guppy the model should be called using -m r941_flip not -m r94_flip as suggested.

medaka_consensus -i Run12_all_guppy_v2.2.2.fastq -d Run12_guppy_contigs.fasta -o Run12 Aligning basecalls to draft Found minimap files. open: No such file or directory [bam_sort_core] fail to open file calls_to_draft.bam [M::main::0.003*1.04] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.003*1.02] mid_occ = 3 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.003*1.15] distinct minimizers: 1620 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.290 [samopen] SAM header is present: 1 sequences. open: No such file or directory [bam_index_build2] fail to open the BAM file. Running medaka consensus Using TensorFlow backend. [E::hts_open_format] Failed to open file calls_to_draft.bam Traceback (most recent call last): File "/home/dct7/medaka/bin/medaka", line 11, in <module> sys.exit(main()) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 170, in main args.func(args) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/inference.py", line 381, in predict args.regions = get_regions(args.bam, region_strs=args.regions) File "/home/dct7/medaka/lib/python3.6/site-packages/medaka/common.py", line 218, in get_regions with pysam.AlignmentFile(bam) as bam_fh: File "pysam/libcalignmentfile.pyx", line 736, in pysam.libcalignmentfile.AlignmentFile.__cinit__ File "pysam/libcalignmentfile.pyx", line 935, in pysam.libcalignmentfile.AlignmentFile._open FileNotFoundError: [Errno 2] could not open alignment file calls_to_draft.bam: No such file or directory

Any suggestions?

Thanks,
Damien

Error in stitch.py?

  • Tested medaka (v0.6.0a3) on chr3 of the NA12878 sample downloaded from the following link:
    https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-genome

  • cmd:
    medaka_variant -r GRCh38_full_analysis_set_plus_decoy_hla.fa -b rel5-guppy-0.3.0-chunk10k.sorted.bam -m r94 -R chr3

  • Errors:
    [13:59:00 - PWorker] 100.0% Done (198.5/198.5 Mbases) in 4397.1s
    [13:59:06 - PWorker] All done
    [13:59:06 - Predict] Finished processing all regions.
    [13:59:21 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
    [13:59:25 - SNPs] Processing chr3.
    Traceback (most recent call last):
    File "python/Python-3.6.3/bin/medaka", line 11, in
    sys.exit(main())
    File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/medaka.py", line 257, in main
    args.func(args)
    File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 347, in snps
    find_snps(args.inputs, args.ref_fasta, args.output, regions=args.regions, threshold=args.threshold, ref_vcf=args.ref_vcf)
    File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 178, in find_snps
    ref_seq_encoded = np.fromiter((label_encoding[ref_seq[i]] for i in major_pos), int, count=len(major_pos))
    File "python/Python-3.6.3/lib/python3.6/site-packages/medaka/stitch.py", line 178, in
    ref_seq_encoded = np.fromiter((label_encoding[ref_seq[i]] for i in major_pos), int, count=len(major_pos))
    KeyError: 'B'

  • The output and error files can be found in the attached zip folder.

chr3.zip

Thank you!

Speeding up medaka_variant / weird behaviour

I'm trying running Medaka Variant on an Nvidia DGX box (8xTesla GPUs). I have a few queries:

  1. Is there any way to get it to make better use of the GPUs (and to more finely control CUDA usage)? Based on gpustat, it's deployed processes to all 8, but is only really using the first one. Right now, it's not doing anything, but at peak it never seemed to use more than 25% of the processing power of the first GPU.

gpustat:

                 Tue Apr  2 13:04:00 2019  410.78
[0] Tesla V100-SXM2-32GB | 35'C,   0 % | 30932 / 32480 MB | koneill(30921M)
[1] Tesla V100-SXM2-32GB | 35'C,   0 % |   502 / 32480 MB | koneill(491M)
[2] Tesla V100-SXM2-32GB | 34'C,   0 % |   502 / 32480 MB | koneill(491M)
[3] Tesla V100-SXM2-32GB | 34'C,   0 % |   502 / 32480 MB | koneill(491M)
[4] Tesla V100-SXM2-32GB | 36'C,   0 % |   502 / 32480 MB | koneill(491M)
[5] Tesla V100-SXM2-32GB | 37'C,   0 % |   502 / 32480 MB | koneill(491M)
[6] Tesla V100-SXM2-32GB | 38'C,   0 % |   502 / 32480 MB | koneill(491M)
[7] Tesla V100-SXM2-32GB | 38'C,   0 % |   502 / 32480 MB | koneill(491M)

  1. I'm also testing it on CPUs, and it seems to be using ~70-80 CPus. However, the servers I'm running it on are shared, and it would be best if I could control CPU usage so as to be a considerate co-user of these resources. Is there a parameter that can be passed to medaka_variant to limit the maximum number of CPUs/threads?

  2. For the GPU run, it seems to have gotten stuck. It ran for 14 hours to do the "long regions", then had a 4:45 hour gap before processing the short regions (with no logging during this time). It's now another four hours later after doing the short regions, and it's been sitting in a single CPU thread, not using the GPU or other CPUs for that entire time.

Is this normal/expected behaviour?

Would it be possible to get more granular logging during these long delays?

Would it be possible to parallelise better during these times?

Logs below (snipped)

medaka_variant -r GRCh37-lite.fa \
        -b promethion_NA19240.bam.dup.bam \
        -o medaka_variant_gpu \
        -t 4
+ medaka_variant -r GRCh37-lite.fa -b promethion_NA19240.bam.dup.bam -o medaka_variant_gpu -t 4
Checking program versions
Program    Version    Required   Pass     
bgzip      1.9        1.9        True     
minimap2   2.11       2.11       True     
samtools   1.9        1.9        True     
tabix      1.9        1.9        True     

======================================
Running medaka consensus /projects/koneill_prj/promethion/promethion_NA19240.bam.dup.bam 
======================================

Using TensorFlow backend.
[15:11:30 - Predict] Processing region(s): 1:0-249250621 2:0-243199373 3:0-198022430 4:0-191154276 5:0-180915260 6:0-171115067 7:0-159138663 8:0-146364022 9:0-141213431 10:0-135534747 11:0-135006516 12:0-133851895 13:0-115169878 14:0-107349540 15:0-102531392 16:0-90354753 17:0-81195210 18:0-78077248 19:0-59128983 20:0-63025520 21:0-48129895 22:0-51304566 X:0-155270560 Y:0-59373566 MT:0-16569 GL000207.1:0-4262 GL000226.1:0-15008 GL000229.1:0-19913 GL000231.1:0-27386 GL000210.1:0-27682 GL000239.1:0-33824 GL000235.1:0-34474 GL000201.1:0-36148 GL000247.1:0-36422 GL000245.1:0-36651 GL000197.1:0-37175 GL000203.1:0-37498 GL000246.1:0-38154 GL000249.1:0-38502 GL000196.1:0-38914 GL000248.1:0-39786 GL000244.1:0-39929 GL000238.1:0-39939 GL000202.1:0-40103 GL000234.1:0-40531 GL000232.1:0-40652 GL000206.1:0-41001 GL000240.1:0-41933 GL000236.1:0-41934 GL000241.1:0-42152 GL000243.1:0-43341 GL000242.1:0-43523 GL000230.1:0-43691 GL000237.1:0-45867 GL000233.1:0-45941 GL000204.1:0-81310 GL000198.1:0-90085 GL000208.1:0-92689 GL000191.1:0-106433 GL000227.1:0-128374 GL000228.1:0-129120 GL000214.1:0-137718 GL000221.1:0-155397 GL000209.1:0-159169 GL000218.1:0-161147 GL000220.1:0-161802 GL000213.1:0-164239 GL000211.1:0-166566 GL000199.1:0-169874 GL000217.1:0-172149 GL000216.1:0-172294 GL000215.1:0-172545 GL000205.1:0-174588 GL000219.1:0-179198 GL000224.1:0-179693 GL000223.1:0-180455 GL000195.1:0-182896 GL000212.1:0-186858 GL000222.1:0-186861 GL000200.1:0-187035 GL000193.1:0-189789 GL000194.1:0-191469 GL000225.1:0-211173 GL000192.1:0-547496
[15:11:30 - Predict] Setting tensorflow threads to 4.
[15:11:36 - Predict] Found 3171 long and 2 short regions.
[15:11:36 - Predict] Processing long regions.
[15:11:36 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[15:11:37 - ModelLoad] Loading weights from /projects/koneill_prj/conda/envs/medaka_gpu/lib/python3.6/site-packages/medaka/data/r941_trans_model.hdf5
[15:11:38 - PWorker] Running inference for 3104.9M draft bases.
[15:11:38 - Sampler] Initializing sampler for consensus or region 1:0-1000000.
[15:11:39 - Feature] Pileup counts do not span requested region, requested 1:0-1000000, received 10000-999999.
[15:11:40 - Feature] Processed 1:10000.0-1000000.1 (median depth 64.0)
[15:11:40 - Sampler] Took 2.13s to make features.

<<<snip>>>

[05:05:09 - Sampler] Initializing sampler for consensus or region GL000225.1:0-211173.
[05:05:13 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50015.4s
[05:05:18 - Feature] Processed GL000225.1:0.0-211173.0 (median depth 709.0)
[05:05:18 - Sampler] Took 8.14s to make features.
[05:05:18 - Sampler] Initializing sampler for consensus or region GL000192.1:0-547496.
[05:05:19 - Feature] Pileup counts do not span requested region, requested GL000192.1:0-547496, received 3095-547495.
[05:05:19 - Feature] Processed GL000192.1:3095.0-547496.0 (median depth 59.0)
[05:05:19 - Sampler] Took 1.37s to make features.
[05:05:25 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50027.6s
[05:05:42 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50044.4s
[05:05:55 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50057.1s
[05:06:10 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50072.5s
[05:06:22 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50084.6s
[05:06:38 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50100.7s
[05:06:50 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50112.8s
[05:07:03 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50125.6s
[05:07:19 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50141.0s
[05:07:30 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50152.9s
[05:07:44 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50166.1s
[05:07:57 - PWorker] 100.0% Done (3104.9/3104.9 Mbases) in 50179.3s
[09:43:38 - PWorker] All done
[09:43:38 - Predict] Processing short regions
[09:43:38 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[09:43:39 - ModelLoad] Loading weights from /projects/koneill_prj/conda/envs/medaka_gpu/lib/python3.6/site-packages/medaka/data/r941_trans_model.hdf5
[09:43:39 - PWorker] Running inference for 0.0M draft bases.
[09:43:40 - Sampler] Initializing sampler for consensus or region 11:134999000-135006516.
[09:43:40 - Feature] Pileup-feature is zero-length for 11:134999000-135006516 indicating no reads in this region.
[09:43:40 - Sampler] Took 0.12s to make features.
[09:43:41 - PWorker] All done
[09:43:41 - PWorker] Running inference for 0.0M draft bases.
[09:43:41 - Sampler] Initializing sampler for consensus or region GL000207.1:0-4262.
[09:43:41 - Feature] Pileup counts do not span requested region, requested GL000207.1:0-4262, received 0-4254.
[09:43:41 - Feature] Processed GL000207.1:0.0-4255.0 (median depth 18.0)
[09:43:41 - Sampler] Took 0.09s to make features.

medaka "Processing short regions" super slow

Hi,
medaka 0.6.2 hdf creation runs through the large regions fairly quickly, but as soon its processing the short regions, its starting to be super slow. Although all sequences should be processed by then and hdf file has almost reached completion, it runs for days now...

Its stalling on many of contigs in the 1-5Kbp range with low depth coverage.
Is there a way to terminate hdf creation and resuce the unfinished hdf file to continue with stitching of the large contigs?

It also looks like it switched from all-CPU usage to single CPU usage for the Processing short regions.

I never came across such issues in previous versions of medaka.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
44047 prom      20   0 79.883g 0.063t  95740 R 100.3 17.1  77535:50 medaka
[02:32:46 - Sampler] Initializing sampler for consensus or region ctg9:7.0-10099823.0:10999000-11149746.
[02:32:47 - Feature] Processed ctg9:7.0-10099823.0:10999000.0-11149746.0 (median depth 55.0)
[02:32:47 - Sampler] Took 1.11s to make features.
[02:33:58 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57004.8s
[02:35:11 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57077.5s
[02:36:24 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57150.7s
[02:37:36 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57222.3s
[02:38:49 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57295.1s
[02:40:02 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57368.2s
[02:41:15 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57441.5s
[02:42:27 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57513.3s
[02:43:40 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57586.2s
[02:44:53 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57659.5s
[02:46:06 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57732.1s
[02:47:18 - PWorker] 100.0% Done (682.5/682.5 Mbases) in 57804.4s
[02:57:37 - PWorker] All done
[02:57:37 - Predict] Processing short regions
[02:57:37 - ModelLoad] Building model (steps, features, classes): (None, 10, 5)
[02:57:37 - ModelLoad] With cudnn: False
[02:57:37 - ModelLoad] Loading weights from /home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/data/r941_flip235_model.hdf5
[02:57:38 - PWorker] Running inference for 0.0M draft bases.
[02:57:38 - Sampler] Initializing sampler for consensus or region ctg1027:1.0-9521.0:0-9160.
[02:57:38 - Feature] Pileup counts do not span requested region, requested ctg1027:1.0-9521.0:0-9160, received 745-8244.
[02:57:38 - Feature] Processed ctg1027:1.0-9521.0:745.0-8245.0 (median depth 3.0)
[02:57:38 - Sampler] Took 0.02s to make features.
[03:08:26 - PWorker] All done
[03:08:26 - PWorker] Running inference for 0.0M draft bases.
[03:08:26 - Sampler] Initializing sampler for consensus or region ctg1036:1.0-13004.0:0-8879.
[03:08:26 - Feature] Pileup-feature is zero-length for ctg1036:1.0-13004.0:0-8879 indicating no reads in this region.
[03:08:26 - Sampler] Took 0.03s to make features.
[03:08:27 - PWorker] All done
[03:08:27 - PWorker] Running inference for 0.0M draft bases.
[03:08:27 - Sampler] Initializing sampler for consensus or region ctg1045:23.0-8454.0:0-8489.
[03:08:27 - Feature] Processed ctg1045:23.0-8454.0:0.0-8489.0 (median depth 17.0)
[03:08:27 - Sampler] Took 0.05s to make features.
[03:18:56 - PWorker] All done
[03:18:56 - PWorker] Running inference for 0.0M draft bases.
[03:18:56 - Sampler] Initializing sampler for consensus or region ctg1133:1.0-8117.0:0-8415.
[03:18:56 - Feature] Processed ctg1133:1.0-8117.0:0.0-8415.0 (median depth 10.0)
[03:18:56 - Sampler] Took 0.04s to make features.
[03:29:43 - PWorker] All done
[03:29:43 - PWorker] Running inference for 0.0M draft bases.
[03:29:43 - Sampler] Initializing sampler for consensus or region ctg1143:885.0-10511.0:0-9562.
[03:29:43 - Feature] Pileup counts do not span requested region, requested ctg1143:885.0-10511.0:0-9562, received 651-9561.
[03:29:43 - Feature] Processed ctg1143:885.0-10511.0:651.0-9562.0 (median depth 2.0)
[03:29:43 - Sampler] Took 0.04s to make features.
[03:40:31 - PWorker] All done
[03:40:31 - PWorker] Running inference for 0.0M draft bases.
[03:40:31 - Sampler] Initializing sampler for consensus or region ctg1147:174.0-3204.0:0-3063.
[03:40:31 - Feature] Pileup counts do not span requested region, requested ctg1147:174.0-3204.0:0-3063, received 0-3048.
[03:40:31 - Feature] Processed ctg1147:174.0-3204.0:0.0-3049.0 (median depth 3.0)
[03:40:31 - Sampler] Took 0.03s to make features.
[03:51:04 - PWorker] All done
[03:51:05 - PWorker] Running inference for 0.0M draft bases.
[03:51:05 - Sampler] Initializing sampler for consensus or region ctg1148:15.0-8458.0:0-8555.
[03:51:05 - Feature] Pileup counts do not span requested region, requested ctg1148:15.0-8458.0:0-8555, received 2667-7312.
[03:51:05 - Feature] Processed ctg1148:15.0-8458.0:2667.0-7313.0 (median depth 10.0)
[03:51:05 - Sampler] Took 0.05s to make features.
[04:01:45 - PWorker] All done
[04:01:45 - PWorker] Running inference for 0.0M draft bases.
[04:01:45 - Sampler] Initializing sampler for consensus or region ctg1162:5722.0-11740.0:0-6230.
[04:01:45 - Feature] Processed ctg1162:5722.0-11740.0:0.0-6230.0 (median depth 146.0)
[04:01:45 - Sampler] Took 0.11s to make features.
[04:12:27 - PWorker] All done
[04:12:27 - PWorker] Running inference for 0.0M draft bases.
[04:12:27 - Sampler] Initializing sampler for consensus or region ctg1171:1.0-3927.0:0-3682.
[04:12:27 - Feature] Processed ctg1171:1.0-3927.0:0.0-3682.0 (median depth 24.0)
[04:12:27 - Sampler] Took 0.04s to make features.

medaka ignores threads option

Hi
I have been running medaka with the new flip flop model on a system with 80 threads but used the -t option to specify 60 threads. It appears that medaka frequently ignores this option and uses all of the threads.

I have attached a screendump of the CPU usage at 8000% (80 threads) even though it was supposed to be maxing out at 6000 % (60 threads).
medaka_uses_morecpus_thanprovided pn g

I hope you find a fix for this.

Best regards
Rasmus

Definitions missing in vcf header

Hi,

bcftools complains about missing definitions in the vcf header:

[W::vcf_parse] INFO 'pos2' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'q2' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GT' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GQ' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'pos1' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'q1' is not defined in the header, assuming Type=String
[E::bcf_write] Unchecked error (2), exiting

I'll write a script to modify 3000 vcf files before being able to merge the results, so not an urgent problem, but might be good for the next release.

Thanks!

KeyError: r94

i am using Conda Env.

Traceback (most recent call last):
File "medaka.py", line 38, in call
val = model_dict[val]
KeyError: 'r94'

Custom guppy model

Hi,

I have a sort of unrelated question. Is there a chance to (re-)train guppy and also tailor medals towards the restrained guppy model?

Cheers,
F

Error in training a consensus network

I am trying to conduct the Walkthrough on my machine to train a consensus network using the example data and commands given here: https://nanoporetech.github.io/medaka/

I am encountering the following error when I try to run 'hp_compress':
$ hp_compress features ${CALLS2DRAFT}.bam ${TRAINFEATURES} -T ${TRUTH2DRAFT}.bam -t ${NUM_THREADS} -r ${REFNAME}:-${TRAINEND} --batch_size ${BATCHSIZE} --read_fraction ${FRACTION} --chunk_len 1000 --chunk_ovlp 0 -m ${MODEL_FEAT_OPT} --max_label_len 1

Using TensorFlow backend.
{'consensus_as_ref': False, 'is_compressed': False, 'log_min': None, 'max_hp_len': 1, 'normalise': 'total', 'ref_mode': None, 'with_depth': False}
[17:00:31 - root] FeatureEncoder options:
consensus_as_ref: False
is_compressed: False
log_min: None
max_hp_len: 1
normalise: total
ref_mode: None
with_depth: False
[17:00:31 - root] Got regions:
utg000001c:0-3762624
[17:00:32 - root] Processed utg000001c:3620001.0-3630000.0 (median depth 13.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:33 - root] Processed utg000001c:40001.0-50000.0 (median depth 15.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:33 - root] Processed utg000001c:3160001.0-3170000.0 (median depth 18.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:2110001.0-2120000.0 (median depth 32.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:1150001.0-1160000.0 (median depth 40.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:34 - root] Processed utg000001c:3660001.0-3670000.1 (median depth 19.0)
[17:00:35 - root] Processed utg000001c:3730001.0-3740000.1 (median depth 45.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:35 - root] Processed utg000001c:380001.0-390000.0 (median depth 12.0)
[17:00:36 - root] Processed utg000001c:2440001.0-2450000.1 (median depth 43.0)
[17:00:36 - root] Processed utg000001c:220001.0-230000.1 (median depth 26.0)
[17:00:36 - root] Processed utg000001c:2230001.0-2240000.1 (median depth 73.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:37 - root] Processed utg000001c:320001.0-330000.0 (median depth 57.0)
[17:00:37 - root] Processed utg000001c:36198.0-40000.1 (median depth 47.0)
[17:00:38 - root] Processed utg000001c:1220001.0-1230000.0 (median depth 9.0)
[17:00:38 - root] Processed utg000001c:620001.0-630000.1 (median depth 48.0)
[17:00:38 - root] Processed utg000001c:2660001.0-2670000.1 (median depth 100.0)
[17:00:38 - root] Processed utg000001c:2420001.0-2428111.0 (median depth 28.0)
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:476: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py:481: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.
yield a[slicee]
[17:00:38 - root] Processed utg000001c:2428112.0-2430000.0 (median depth 8.0)
[17:00:39 - root] Processed utg000001c:1460001.0-1470000.1 (median depth 58.0)
[17:00:40 - root] Processed utg000001c:1600001.0-1610000.0 (median depth 28.0)
[17:00:41 - root] Processed utg000001c:3050001.0-3060000.2 (median depth 92.0)
[17:00:41 - root] Processed utg000001c:1800001.0-1810000.0 (median depth 18.0)
[17:00:42 - root] Processed utg000001c:3450001.0-3460000.1 (median depth 44.0)
[17:00:42 - root] Processed utg000001c:2980001.0-2990000.0 (median depth 55.0)
[17:00:42 - root] Processed utg000001c:3000001.0-3010000.0 (median depth 76.0)
[17:00:43 - root] Processed utg000001c:3330001.0-3340000.1 (median depth 80.0)
[17:00:43 - root] Processed utg000001c:1550001.0-1560000.0 (median depth 23.0)
[17:00:44 - root] Processed utg000001c:3520001.0-3530000.1 (median depth 82.0)
[17:00:44 - root] Processed utg000001c:3750001.0-3760000.0 (median depth 63.0)
[17:00:44 - root] Processed utg000001c:1610001.0-1620000.2 (median depth 32.0)
[17:00:45 - root] Processed utg000001c:3190001.0-3200000.0 (median depth 36.0)
[17:00:46 - root] Processed utg000001c:2450001.0-2460000.0 (median depth 42.0)
[17:00:46 - root] Processed utg000001c:480001.0-490000.1 (median depth 71.0)
[17:00:46 - root] Processed utg000001c:660001.0-670000.3 (median depth 58.0)
[17:00:47 - root] Processed utg000001c:1990001.0-2000000.0 (median depth 40.0)
[17:00:47 - root] Processed utg000001c:2460001.0-2470000.1 (median depth 38.0)
[17:00:48 - root] Processed utg000001c:3130001.0-3140000.1 (median depth 69.0)
[17:00:48 - root] Processed utg000001c:1330001.0-1331747.1 (median depth 78.0)
[17:00:48 - root] Processed utg000001c:300001.0-310000.0 (median depth 46.0)
[17:00:49 - root] Processed utg000001c:2280001.0-2290000.0 (median depth 29.0)
[17:00:49 - root] Processed utg000001c:1670001.0-1680000.0 (median depth 74.0)
[17:00:49 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=20000, end=30000).
[17:00:49 - root] Processed utg000001c:890001.0-900000.0 (median depth 25.0)
[17:00:51 - root] Processed utg000001c:60001.0-70000.0 (median depth 22.0)
[17:00:51 - root] Processed utg000001c:350001.0-360000.0 (median depth 35.0)
[17:00:51 - root] Processed utg000001c:3170001.0-3180000.1 (median depth 72.0)
[17:00:52 - root] Processed utg000001c:1331748.0-1340000.0 (median depth 75.0)
[17:00:52 - root] Processed utg000001c:3340001.0-3344122.2 (median depth 47.0)
[17:00:53 - root] Processed utg000001c:2590001.0-2600000.1 (median depth 93.0)
[17:00:53 - root] Processed utg000001c:1570001.0-1580000.1 (median depth 16.0)
[17:00:54 - root] Processed utg000001c:1940001.0-1950000.0 (median depth 88.0)
[17:00:54 - root] Processed utg000001c:1830035.0-1840000.0 (median depth 67.0)
[17:00:54 - root] Processed utg000001c:3460001.0-3470000.0 (median depth 41.0)
[17:00:55 - root] Processed utg000001c:610001.0-620000.0 (median depth 20.0)
[17:00:55 - root] Processed utg000001c:3344123.0-3350000.0 (median depth 47.0)
[17:00:55 - root] Processed utg000001c:1050001.0-1060000.2 (median depth 51.0)
[17:00:55 - root] Processed utg000001c:2970001.0-2980000.0 (median depth 85.0)
[17:00:56 - root] Processed utg000001c:3040001.0-3045139.0 (median depth 39.0)
[17:00:56 - root] Processed utg000001c:970001.0-980000.0 (median depth 6.0)
[17:00:57 - root] Processed utg000001c:2010001.0-2020000.1 (median depth 26.0)
[17:00:57 - root] Processed utg000001c:2840001.0-2845841.0 (median depth 45.0)
[17:00:57 - root] Processed utg000001c:720001.0-730000.1 (median depth 38.0)
[17:00:57 - root] Processed utg000001c:1680001.0-1690000.0 (median depth 45.0)
[17:00:58 - root] Processed utg000001c:2640001.0-2646413.0 (median depth 35.0)
[17:00:58 - root] Processed utg000001c:3045140.0-3050000.1 (median depth 54.0)
[17:00:58 - root] Processed utg000001c:2646414.0-2650000.0 (median depth 22.0)
[17:00:59 - root] Processed utg000001c:2845842.0-2850000.1 (median depth 60.0)
[17:00:59 - root] Processed utg000001c:1390001.0-1400000.0 (median depth 81.0)
[17:01:00 - root] Processed utg000001c:3010001.0-3020000.0 (median depth 67.0)
[17:01:00 - root] Processed utg000001c:450001.0-460000.0 (median depth 31.0)
[17:01:00 - root] Processed utg000001c:1560001.0-1570000.2 (median depth 51.0)
[17:01:01 - root] Processed utg000001c:3260001.0-3270000.0 (median depth 51.0)
[17:01:01 - root] Processed utg000001c:1030001.0-1032827.0 (median depth 14.0)
[17:01:01 - root] Processed utg000001c:2610001.0-2620000.0 (median depth 37.0)
[17:01:01 - root] Processed utg000001c:570001.0-580000.0 (median depth 68.0)
[17:01:01 - root] Processed utg000001c:1580001.0-1590000.4 (median depth 17.0)
[17:01:02 - root] Processed utg000001c:2700001.0-2710000.1 (median depth 45.0)
[17:01:03 - root] Processed utg000001c:1032828.0-1040000.4 (median depth 34.0)
[17:01:03 - root] Processed utg000001c:1540001.0-1550000.0 (median depth 18.0)
[17:01:03 - root] Processed utg000001c:290001.0-300000.0 (median depth 23.0)
[17:01:03 - root] Processed utg000001c:2690001.0-2700000.1 (median depth 76.0)
[17:01:05 - root] Processed utg000001c:3640001.0-3643166.0 (median depth 39.0)
[17:01:05 - root] Processed utg000001c:1280001.0-1290000.2 (median depth 58.0)
[17:01:05 - root] Processed utg000001c:1110001.0-1120000.0 (median depth 15.0)
[17:01:05 - root] Processed utg000001c:3110001.0-3120000.1 (median depth 43.0)
[17:01:05 - root] Processed utg000001c:990001.0-1000000.1 (median depth 68.0)
[17:01:05 - root] Processed utg000001c:3530001.0-3540000.0 (median depth 24.0)
[17:01:06 - root] Processed utg000001c:2000001.0-2010000.0 (median depth 45.0)
[17:01:06 - root] Processed utg000001c:460001.0-470000.0 (median depth 17.0)
[17:01:06 - root] Processed utg000001c:1650001.0-1660000.1 (median depth 14.0)
[17:01:06 - root] Processed utg000001c:2730001.0-2740000.0 (median depth 43.0)
[17:01:07 - root] Processed utg000001c:70001.0-80000.2 (median depth 14.0)
[17:01:07 - root] Processed utg000001c:3643167.0-3650000.1 (median depth 50.0)
[17:01:07 - root] Processed utg000001c:110001.0-120000.0 (median depth 33.0)
[17:01:08 - root] Processed utg000001c:2360001.0-2370000.0 (median depth 28.0)
[17:01:09 - root] Processed utg000001c:2120001.0-2129059.0 (median depth 43.0)
[17:01:10 - root] Processed utg000001c:180001.0-190000.1 (median depth 57.0)
[17:01:10 - root] Processed utg000001c:2370001.0-2380000.1 (median depth 73.0)
[17:01:10 - root] Processed utg000001c:2190001.0-2200000.0 (median depth 22.0)
[17:01:11 - root] Processed utg000001c:590001.0-600000.1 (median depth 14.0)
[17:01:11 - root] Processed utg000001c:560001.0-570000.2 (median depth 63.0)
[17:01:11 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=0, end=10000).
[17:01:11 - root] Processed utg000001c:1250001.0-1260000.0 (median depth 61.0)
[17:01:11 - root] Processed utg000001c:2040001.0-2050000.1 (median depth 18.0)
[17:01:12 - root] Processed utg000001c:2670001.0-2680000.0 (median depth 65.0)
[17:01:12 - root] Processed utg000001c:1200001.0-1210000.0 (median depth 11.0)
[17:01:12 - root] Processed utg000001c:2070001.0-2080000.0 (median depth 29.0)
[17:01:13 - root] Processed utg000001c:3410001.0-3420000.2 (median depth 82.0)
[17:01:13 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2560000, end=2570000).
[17:01:14 - root] Processed utg000001c:50001.0-60000.1 (median depth 43.0)
[17:01:14 - root] Processed utg000001c:1410001.0-1420000.0 (median depth 27.0)
[17:01:14 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=10000, end=20000).
[17:01:15 - root] Processed utg000001c:260001.0-270000.2 (median depth 9.0)
[17:01:15 - root] Processed utg000001c:780001.0-790000.2 (median depth 53.0)
[17:01:16 - root] Processed utg000001c:740001.0-750000.3 (median depth 58.0)
[17:01:16 - root] Processed utg000001c:2860001.0-2870000.1 (median depth 82.0)
[17:01:16 - root] Processed utg000001c:2300001.0-2310000.0 (median depth 26.0)
[17:01:16 - root] Processed utg000001c:3140001.0-3144796.2 (median depth 83.0)
[17:01:17 - root] Processed utg000001c:1230001.0-1232100.1 (median depth 54.0)
[17:01:17 - root] Processed utg000001c:3490001.0-3500000.1 (median depth 77.0)
[17:01:17 - root] Processed utg000001c:540001.0-550000.0 (median depth 8.0)
[17:01:17 - root] Processed utg000001c:90001.0-100000.2 (median depth 77.0)
[17:01:18 - root] Processed utg000001c:3144797.0-3150000.1 (median depth 47.0)
[17:01:19 - root] Processed utg000001c:2490001.0-2500000.0 (median depth 33.0)
[17:01:19 - root] Processed utg000001c:3740001.0-3742853.1 (median depth 74.0)
[17:01:20 - root] Processed utg000001c:1730367.0-1740000.3 (median depth 67.0)
[17:01:20 - root] Processed utg000001c:1232101.0-1240000.0 (median depth 55.0)
[17:01:20 - root] Processed utg000001c:1340001.0-1350000.5 (median depth 58.0)
[17:01:20 - root] Processed utg000001c:680001.0-690000.1 (median depth 45.0)
[17:01:21 - root] Processed utg000001c:2630001.0-2640000.3 (median depth 54.0)
[17:01:21 - root] Processed utg000001c:1530001.0-1531081.2 (median depth 66.0)
[17:01:21 - root] Processed utg000001c:130001.0-135808.0 (median depth 72.0)
[17:01:22 - root] Processed utg000001c:270001.0-280000.2 (median depth 15.0)
[17:01:22 - root] Processed utg000001c:1531082.0-1540000.0 (median depth 11.0)
[17:01:22 - root] Processed utg000001c:520001.0-530000.0 (median depth 37.0)
[17:01:23 - root] Processed utg000001c:135809.0-140000.3 (median depth 37.0)
[17:01:23 - root] Processed utg000001c:3742855.0-3750000.2 (median depth 77.0)
[17:01:23 - root] Processed utg000001c:3550001.0-3560000.1 (median depth 15.0)
[17:01:24 - root] Processed utg000001c:3710001.0-3720000.3 (median depth 76.0)
[17:01:24 - root] Processed utg000001c:360001.0-370000.1 (median depth 68.0)
[17:01:25 - root] Processed utg000001c:950001.0-960000.0 (median depth 25.0)
[17:01:25 - root] Processed utg000001c:1500001.0-1510000.0 (median depth 19.0)
[17:01:25 - root] Processed utg000001c:1100001.0-1110000.0 (median depth 26.0)
[17:01:26 - root] Processed utg000001c:2790001.0-2800000.0 (median depth 54.0)
[17:01:26 - root] Processed utg000001c:700001.0-710000.1 (median depth 17.0)
[17:01:27 - root] Processed utg000001c:1130001.0-1132473.1 (median depth 70.0)
[17:01:27 - root] Processed utg000001c:2810001.0-2820000.0 (median depth 69.0)
[17:01:27 - root] Processed utg000001c:3630001.0-3640000.2 (median depth 99.0)
[17:01:28 - root] Processed utg000001c:3380001.0-3390000.0 (median depth 25.0)
[17:01:29 - root] Processed utg000001c:600001.0-610000.1 (median depth 59.0)
[17:01:29 - root] Processed utg000001c:2310001.0-2320000.0 (median depth 75.0)
[17:01:30 - root] Processed utg000001c:1132474.0-1140000.1 (median depth 57.0)
[17:01:30 - root] Processed utg000001c:3480001.0-3490000.2 (median depth 71.0)
[17:01:30 - root] Processed utg000001c:420001.0-430000.0 (median depth 31.0)
[17:01:30 - root] Processed utg000001c:940001.0-950000.1 (median depth 38.0)
[17:01:30 - root] Processed utg000001c:3430001.0-3440000.1 (median depth 71.0)
[17:01:31 - root] Processed utg000001c:1970001.0-1980000.0 (median depth 28.0)
[17:01:31 - root] Processed utg000001c:3650001.0-3660000.0 (median depth 10.0)
[17:01:32 - root] Processed utg000001c:1190001.0-1200000.1 (median depth 16.0)
[17:01:32 - root] Processed utg000001c:550001.0-560000.0 (median depth 23.0)
[17:01:32 - root] Processed utg000001c:1070001.0-1080000.1 (median depth 35.0)
[17:01:33 - root] Processed utg000001c:2500001.0-2510000.1 (median depth 18.0)
[17:01:33 - root] Processed utg000001c:2400001.0-2410000.0 (median depth 70.0)
[17:01:34 - root] Processed utg000001c:1910001.0-1920000.0 (median depth 64.0)
[17:01:35 - root] Processed utg000001c:3670001.0-3680000.5 (median depth 58.0)
[17:01:35 - root] Processed utg000001c:2600001.0-2610000.1 (median depth 36.0)
[17:01:35 - root] Processed utg000001c:1400001.0-1410000.3 (median depth 49.0)
[17:01:36 - root] Processed utg000001c:1180001.0-1190000.2 (median depth 68.0)
[17:01:36 - root] Processed utg000001c:3420001.0-3430000.0 (median depth 39.0)
[17:01:37 - root] Processed utg000001c:410001.0-420000.1 (median depth 26.0)
[17:01:38 - root] Processed utg000001c:3270001.0-3280000.0 (median depth 86.0)
[17:01:38 - root] Processed utg000001c:1820001.0-1830000.0 (median depth 74.0)
[17:01:38 - root] Processed utg000001c:830001.0-833529.0 (median depth 15.0)
[17:01:38 - root] Processed utg000001c:500001.0-510000.0 (median depth 44.0)
[17:01:39 - root] Processed utg000001c:2960001.0-2970000.2 (median depth 71.0)
[17:01:39 - root] Processed utg000001c:710001.0-720000.0 (median depth 9.0)
[17:01:39 - root] Processed utg000001c:1090001.0-1100000.1 (median depth 61.0)
[17:01:39 - root] Processed utg000001c:2210001.0-2220000.0 (median depth 49.0)
[17:01:40 - root] Processed utg000001c:2520001.0-2527743.2 (median depth 65.0)
[17:01:40 - root] Processed utg000001c:790001.0-800000.0 (median depth 9.0)
[17:01:40 - root] Processed utg000001c:2510001.0-2520000.0 (median depth 18.0)
[17:01:40 - root] Processed utg000001c:1470001.0-1480000.0 (median depth 35.0)
[17:01:41 - root] Processed utg000001c:833530.0-840000.2 (median depth 70.0)
[17:01:43 - root] Processed utg000001c:2581818.0-2590000.0 (median depth 43.0)
[17:01:44 - root] Processed utg000001c:3070001.0-3080000.1 (median depth 67.0)
[17:01:44 - root] Processed utg000001c:1310001.0-1320000.0 (median depth 74.0)
[17:01:44 - root] Processed utg000001c:840001.0-850000.3 (median depth 58.0)
[17:01:45 - root] Processed utg000001c:1270001.0-1280000.1 (median depth 65.0)
[17:01:45 - root] Processed utg000001c:3080001.0-3090000.1 (median depth 89.0)
[17:01:45 - root] Processed utg000001c:1740001.0-1750000.4 (median depth 77.0)
[17:01:46 - root] Processed utg000001c:310001.0-320000.0 (median depth 41.0)
[17:01:46 - root] Processed utg000001c:1210001.0-1220000.0 (median depth 18.0)
[17:01:46 - root] Processed utg000001c:1000001.0-1010000.0 (median depth 19.0)
[17:01:47 - root] Processed utg000001c:170001.0-180000.0 (median depth 27.0)
[17:01:47 - root] Processed utg000001c:2200001.0-2210000.4 (median depth 82.0)
[17:01:48 - root] Processed utg000001c:2100001.0-2110000.1 (median depth 28.0)
[17:01:48 - root] Processed utg000001c:3700001.0-3710000.1 (median depth 33.0)
[17:01:49 - root] Processed utg000001c:2910001.0-2920000.0 (median depth 39.0)
[17:01:49 - root] Processed utg000001c:2920001.0-2930000.1 (median depth 58.0)
[17:01:49 - root] Processed utg000001c:3470001.0-3480000.2 (median depth 70.0)
[17:01:50 - root] Processed utg000001c:1490001.0-1500000.1 (median depth 11.0)
[17:01:50 - root] Processed utg000001c:3300001.0-3310000.1 (median depth 48.0)
[17:01:50 - root] Processed utg000001c:1770001.0-1780000.0 (median depth 11.0)
[17:01:51 - root] Processed utg000001c:750001.0-760000.0 (median depth 10.0)
[17:01:51 - root] Processed utg000001c:1790001.0-1800000.2 (median depth 62.0)
[17:01:52 - root] Processed utg000001c:860001.0-870000.1 (median depth 55.0)
[17:01:52 - root] Processed utg000001c:2890001.0-2900000.2 (median depth 84.0)
[17:01:52 - root] Processed utg000001c:1860001.0-1870000.1 (median depth 56.0)
[17:01:53 - root] Processed utg000001c:2930001.0-2940000.0 (median depth 28.0)
[17:01:53 - root] Processed utg000001c:2900001.0-2910000.0 (median depth 56.0)
[17:01:55 - root] Processed utg000001c:1850001.0-1860000.1 (median depth 20.0)
[17:01:55 - root] Processed utg000001c:2180001.0-2190000.2 (median depth 52.0)
[17:01:55 - root] Processed utg000001c:2770001.0-2780000.0 (median depth 45.0)
[17:01:55 - root] Processed utg000001c:2150001.0-2160000.0 (median depth 62.0)
[17:01:55 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2550000, end=2560000).
[17:01:55 - root] Processed utg000001c:2780001.0-2790000.1 (median depth 46.0)
[17:01:56 - root] Processed utg000001c:1320001.0-1330000.0 (median depth 39.0)
[17:01:56 - root] Processed utg000001c:1920001.0-1929745.0 (median depth 56.0)
[17:01:56 - root] Processed utg000001c:1170001.0-1180000.0 (median depth 57.0)
[17:01:56 - root] Processed utg000001c:3200001.0-3210000.0 (median depth 16.0)
[17:01:56 - root] Processed utg000001c:240001.0-250000.0 (median depth 21.0)
[17:01:57 - root] Processed utg000001c:2060001.0-2070000.2 (median depth 15.0)
[17:01:57 - root] Processed utg000001c:3390001.0-3400000.0 (median depth 13.0)
[17:01:58 - root] Processed utg000001c:630001.0-634255.1 (median depth 53.0)
[17:01:58 - root] Processed utg000001c:800001.0-810000.0 (median depth 53.0)
[17:01:59 - root] Processed utg000001c:3560001.0-3570000.0 (median depth 37.0)
[17:02:00 - root] Processed utg000001c:1840001.0-1850000.2 (median depth 61.0)
[17:02:00 - root] Processed utg000001c:3060001.0-3070000.0 (median depth 72.0)
[17:02:00 - root] Processed utg000001c:1380001.0-1390000.0 (median depth 17.0)
[17:02:00 - root] Processed utg000001c:2620001.0-2630000.4 (median depth 48.0)
[17:02:00 - root] Processed utg000001c:3570001.0-3580000.0 (median depth 43.0)
[17:02:00 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2540000, end=2550000).
[17:02:01 - root] Processed utg000001c:1780001.0-1790000.0 (median depth 21.0)
[17:02:01 - root] Processed utg000001c:634256.0-640000.1 (median depth 69.0)
[17:02:01 - root] Processed utg000001c:510001.0-520000.0 (median depth 11.0)
[17:02:02 - root] Processed utg000001c:1660001.0-1670000.1 (median depth 62.0)
[17:02:02 - root] Processed utg000001c:3240001.0-3244456.2 (median depth 63.0)
[17:02:03 - root] Processed utg000001c:2320001.0-2328420.0 (median depth 59.0)
[17:02:03 - root] Processed utg000001c:3680001.0-3690000.0 (median depth 34.0)
[17:02:04 - root] Processed utg000001c:3244457.0-3250000.0 (median depth 29.0)
[17:02:04 - root] Processed utg000001c:2328421.0-2330000.2 (median depth 56.0)
[17:02:05 - root] Processed utg000001c:900001.0-910000.1 (median depth 54.0)
[17:02:05 - root] Processed utg000001c:650001.0-660000.1 (median depth 43.0)
[17:02:05 - root] Processed utg000001c:1710001.0-1720000.2 (median depth 67.0)
[17:02:05 - root] Processed utg000001c:3370001.0-3380000.2 (median depth 55.0)
[17:02:06 - root] Processed utg000001c:1960001.0-1970000.1 (median depth 88.0)
[17:02:07 - root] Processed utg000001c:3250001.0-3260000.0 (median depth 25.0)
[17:02:07 - root] Processed utg000001c:2470001.0-2480000.0 (median depth 11.0)
[17:02:07 - root] Processed utg000001c:160001.0-170000.0 (median depth 29.0)
[17:02:07 - root] Processed utg000001c:2850001.0-2860000.0 (median depth 49.0)
[17:02:08 - root] Processed utg000001c:3610001.0-3620000.0 (median depth 59.0)
[17:02:08 - root] Processed utg000001c:3580001.0-3590000.1 (median depth 15.0)
[17:02:09 - root] Processed utg000001c:820001.0-830000.0 (median depth 62.0)
[17:02:09 - root] Processed utg000001c:3280001.0-3290000.2 (median depth 65.0)
[17:02:10 - root] Processed utg000001c:1950001.0-1960000.0 (median depth 30.0)
[17:02:10 - root] Processed utg000001c:1510001.0-1520000.0 (median depth 18.0)
[17:02:10 - root] Processed utg000001c:3020001.0-3030000.0 (median depth 76.0)
[17:02:10 - root] Processed utg000001c:190001.0-200000.3 (median depth 47.0)
[17:02:10 - root] Processed utg000001c:980001.0-990000.1 (median depth 23.0)
[17:02:11 - root] Processed utg000001c:2350001.0-2360000.0 (median depth 18.0)
[17:02:11 - root] Processed utg000001c:1450001.0-1460000.0 (median depth 66.0)
[17:02:12 - root] Processed utg000001c:2260001.0-2270000.0 (median depth 26.0)
[17:02:12 - root] Processed utg000001c:2710001.0-2720000.0 (median depth 64.0)
[17:02:12 - root] Processed utg000001c:530001.0-534557.0 (median depth 57.0)
[17:02:12 - root] Processed utg000001c:3030001.0-3040000.0 (median depth 45.0)
[17:02:13 - root] Processed utg000001c:534558.0-540000.0 (median depth 10.0)
[17:02:13 - root] Processed utg000001c:2820001.0-2830000.1 (median depth 40.0)
[17:02:14 - root] Processed utg000001c:1360001.0-1370000.0 (median depth 24.0)
[17:02:14 - root] Processed utg000001c:80001.0-90000.0 (median depth 39.0)
[17:02:15 - root] Processed utg000001c:1140001.0-1150000.1 (median depth 52.0)
[17:02:16 - root] Processed utg000001c:1480001.0-1490000.1 (median depth 75.0)
[17:02:16 - root] Processed utg000001c:1370001.0-1380000.2 (median depth 45.0)
[17:02:16 - root] Processed utg000001c:2680001.0-2690000.1 (median depth 57.0)
[17:02:16 - root] Processed utg000001c:580001.0-590000.0 (median depth 31.0)
[17:02:16 - root] Processed utg000001c:2390001.0-2400000.1 (median depth 41.0)
[17:02:17 - root] Processed utg000001c:1870001.0-1880000.0 (median depth 32.0)
[17:02:17 - root] Processed utg000001c:440001.0-450000.0 (median depth 20.0)
[17:02:17 - root] Processed utg000001c:2330001.0-2340000.0 (median depth 68.0)
[17:02:18 - root] Processed utg000001c:2760001.0-2770000.0 (median depth 10.0)
[17:02:18 - root] Processed utg000001c:2250001.0-2260000.0 (median depth 15.0)
[17:02:19 - root] Processed utg000001c:2140001.0-2150000.0 (median depth 12.0)
[17:02:19 - root] Processed utg000001c:470001.0-480000.0 (median depth 39.0)
[17:02:19 - root] Processed utg000001c:1880001.0-1890000.0 (median depth 35.0)
[17:02:20 - root] Processed utg000001c:880001.0-890000.0 (median depth 15.0)
[17:02:21 - root] Processed utg000001c:1690001.0-1700000.2 (median depth 21.0)
[17:02:21 - root] Processed utg000001c:3230001.0-3240000.0 (median depth 68.0)
[17:02:21 - root] Processed utg000001c:3540001.0-3543504.1 (median depth 36.0)
[17:02:22 - root] Processed utg000001c:1440001.0-1450000.2 (median depth 75.0)
[17:02:22 - root] Processed utg000001c:3760001.0-3762623.1 (median depth 20.0)
[17:02:22 - root] Processed utg000001c:1890001.0-1900000.2 (median depth 81.0)
[17:02:23 - root] Processed utg000001c:1620001.0-1630000.1 (median depth 53.0)
[17:02:23 - root] Processed utg000001c:3180001.0-3190000.1 (median depth 72.0)
[17:02:24 - root] Processed utg000001c:1420001.0-1430000.0 (median depth 92.0)
[17:02:24 - root] Processed utg000001c:2430001.0-2440000.0 (median depth 36.0)
[17:02:24 - root] Processed utg000001c:3543505.0-3550000.1 (median depth 69.0)
[17:02:25 - root] Processed utg000001c:2750001.0-2760000.4 (median depth 50.0)
[17:02:25 - root] Processed utg000001c:850001.0-860000.0 (median depth 69.0)
[17:02:25 - root] Processed utg000001c:910001.0-920000.0 (median depth 41.0)
[17:02:27 - root] Processed utg000001c:370001.0-380000.0 (median depth 13.0)
[17:02:27 - root] Processed utg000001c:1010001.0-1020000.0 (median depth 19.0)
[17:02:27 - root] Processed utg000001c:1930001.0-1940000.1 (median depth 61.0)
[17:02:27 - root] Processed utg000001c:2160001.0-2170000.0 (median depth 40.0)
[17:02:28 - root] Processed utg000001c:2080001.0-2090000.0 (median depth 55.0)
[17:02:29 - root] Processed utg000001c:920001.0-930000.0 (median depth 21.0)
[17:02:29 - root] Processed utg000001c:1630747.0-1640000.1 (median depth 64.0)
[17:02:29 - root] Processed utg000001c:330001.0-335222.0 (median depth 45.0)
[17:02:29 - root] Processed utg000001c:640001.0-650000.0 (median depth 28.0)
[17:02:29 - root] Processed utg000001c:335223.0-340000.0 (median depth 7.0)
[17:02:30 - root] Processed utg000001c:1300001.0-1310000.1 (median depth 102.0)
[17:02:30 - root] Processed utg000001c:150001.0-160000.0 (median depth 20.0)
[17:02:31 - root] Processed utg000001c:3320001.0-3330000.1 (median depth 79.0)
[17:02:31 - root] Processed utg000001c:3590001.0-3600000.0 (median depth 33.0)
[17:02:31 - root] Processed utg000001c:1020001.0-1030000.0 (median depth 21.0)
[17:02:31 - root] Processed utg000001c:2480001.0-2490000.1 (median depth 26.0)
[17:02:32 - root] Processed utg000001c:2340001.0-2350000.1 (median depth 75.0)
[17:02:33 - root] Processed utg000001c:140001.0-150000.0 (median depth 4.0)
[17:02:33 - root] Processed utg000001c:390001.0-400000.2 (median depth 43.0)
[17:02:34 - root] Processed utg000001c:1290001.0-1300000.1 (median depth 72.0)
[17:02:34 - root] Processed utg000001c:1040001.0-1050000.1 (median depth 33.0)
[17:02:35 - root] Processed utg000001c:2880001.0-2890000.1 (median depth 21.0)
[17:02:35 - root] Processed utg000001c:3210001.0-3220000.1 (median depth 60.0)
[17:02:35 - root] Processed utg000001c:2220001.0-2228768.0 (median depth 14.0)
[17:02:35 - root] Processed utg000001c:690001.0-700000.1 (median depth 59.0)
[17:02:35 - root] Processed utg000001c:2228769.0-2230000.1 (median depth 38.0)
[17:02:36 - root] Processed utg000001c:3360001.0-3370000.1 (median depth 76.0)
[17:02:37 - root] Processed utg000001c:960001.0-970000.1 (median depth 58.0)
[17:02:37 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2530000, end=2540000).
[17:02:37 - root] Processed utg000001c:2130001.0-2140000.0 (median depth 94.0)
[17:02:38 - root] Processed utg000001c:870001.0-880000.2 (median depth 68.0)
[17:02:39 - root] Processed utg000001c:2950001.0-2960000.0 (median depth 45.0)
[17:02:39 - root] Processed utg000001c:1060001.0-1070000.0 (median depth 46.0)
[17:02:40 - root] Processed utg000001c:2940001.0-2945532.2 (median depth 78.0)
[17:02:40 - root] Processed utg000001c:670001.0-680000.0 (median depth 23.0)
[17:02:40 - root] Processed utg000001c:1520001.0-1530000.0 (median depth 21.0)
[17:02:41 - root] Filtering removed all alignments of truth to ref from Region(ref_name='utg000001c', start=2570000, end=2580000).
[17:02:41 - root] Processed utg000001c:1240001.0-1250000.1 (median depth 88.0)
[17:02:41 - root] Processed utg000001c:3500001.0-3510000.1 (median depth 97.0)
[17:02:41 - root] Processed utg000001c:1590001.0-1600000.2 (median depth 102.0)
[17:02:42 - root] Processed utg000001c:2945533.0-2950000.0 (median depth 44.0)
[17:02:42 - root] Processed utg000001c:1980001.0-1990000.2 (median depth 75.0)
[17:02:42 - root] Processed utg000001c:930001.0-933194.0 (median depth 17.0)
[17:02:43 - root] Processed utg000001c:3600001.0-3610000.0 (median depth 25.0)
[17:02:43 - root] Processed utg000001c:2800001.0-2810000.0 (median depth 5.0)
[17:02:43 - root] Processed utg000001c:2090001.0-2100000.0 (median depth 23.0)
[17:02:43 - root] Processed utg000001c:933195.0-940000.3 (median depth 25.0)
[17:02:44 - root] Processed utg000001c:430001.0-434861.0 (median depth 59.0)
[17:02:44 - root] Processed utg000001c:770001.0-780000.0 (median depth 64.0)
[17:02:44 - root] Processed utg000001c:434862.0-440000.0 (median depth 6.0)
[17:02:44 - root] Processed utg000001c:3150001.0-3160000.1 (median depth 54.0)
[17:02:45 - root] Processed utg000001c:3290001.0-3300000.2 (median depth 44.0)
[17:02:46 - root] Processed utg000001c:2870001.0-2880000.0 (median depth 26.0)
[17:02:46 - root] Processed utg000001c:2020001.0-2029389.0 (median depth 16.0)
[17:02:46 - root] Processed utg000001c:200001.0-210000.2 (median depth 35.0)
[17:02:46 - root] Processed utg000001c:3120001.0-3130000.0 (median depth 17.0)
[17:02:47 - root] Processed utg000001c:2270001.0-2280000.1 (median depth 9.0)
[17:02:47 - root] Processed utg000001c:1120001.0-1130000.0 (median depth 55.0)
[17:02:47 - root] Processed utg000001c:490001.0-500000.0 (median depth 21.0)
[17:02:48 - root] Processed utg000001c:3690001.0-3700000.0 (median depth 55.0)
[17:02:48 - root] Processed utg000001c:1700001.0-1710000.1 (median depth 84.0)
[17:02:48 - root] Processed utg000001c:3720001.0-3730000.0 (median depth 53.0)
[17:02:48 - root] Processed utg000001c:230001.0-235528.0 (median depth 50.0)
[17:02:50 - root] Processed utg000001c:235529.0-240000.0 (median depth 37.0)
[17:02:50 - root] Processed utg000001c:730001.0-733907.0 (median depth 55.0)
[17:02:50 - root] Processed utg000001c:733908.0-740000.0 (median depth 10.0)
[17:02:50 - root] Processed utg000001c:280001.0-290000.0 (median depth 27.0)
[17:02:50 - root] Processed utg000001c:1720001.0-1730000.0 (median depth 26.0)
[17:02:51 - root] Processed utg000001c:1080001.0-1090000.2 (median depth 20.0)
[17:02:52 - root] Processed utg000001c:2740001.0-2746118.0 (median depth 22.0)
[17:02:52 - root] Processed utg000001c:3400001.0-3410000.0 (median depth 70.0)
[17:02:52 - root] Processed utg000001c:2170001.0-2180000.0 (median depth 81.0)
[17:02:53 - root] Processed utg000001c:1350001.0-1360000.0 (median depth 100.0)
[17:02:53 - root] Processed utg000001c:3510001.0-3520000.0 (median depth 13.0)
[17:02:53 - root] Processed utg000001c:1430001.0-1431388.6 (median depth 20.0)
[17:02:53 - root] Processed utg000001c:2240001.0-2250000.1 (median depth 90.0)
[17:02:54 - root] Processed utg000001c:2746119.0-2750000.1 (median depth 64.0)
[17:02:54 - root] Processed utg000001c:760001.0-770000.0 (median depth 22.0)
[17:02:54 - root] Processed utg000001c:3310001.0-3320000.1 (median depth 51.0)
[17:02:54 - root] Processed utg000001c:1810001.0-1820000.0 (median depth 23.0)
[17:02:55 - root] Processed utg000001c:2290001.0-2300000.1 (median depth 22.0)
[17:02:55 - root] Processed utg000001c:1431389.0-1440000.3 (median depth 31.0)
[17:02:56 - root] Processed utg000001c:2380001.0-2390000.1 (median depth 14.0)
[17:02:56 - root] Processed utg000001c:1160001.0-1170000.0 (median depth 25.0)
[17:02:57 - root] Processed utg000001c:3440001.0-3443822.2 (median depth 62.0)
[17:02:57 - root] Processed utg000001c:3090001.0-3100000.0 (median depth 44.0)
[17:02:57 - root] Processed utg000001c:2650001.0-2660000.0 (median depth 109.0)
[17:02:58 - root] Processed utg000001c:3443823.0-3450000.0 (median depth 12.0)
[17:02:58 - root] Processed utg000001c:810001.0-820000.1 (median depth 36.0)
[17:02:59 - root] Processed utg000001c:2030001.0-2040000.0 (median depth 60.0)
[17:02:59 - root] Processed utg000001c:120001.0-130000.0 (median depth 15.0)
[17:02:59 - root] Processed utg000001c:2990001.0-3000000.1 (median depth 81.0)
[17:03:00 - root] Processed utg000001c:1900001.0-1910000.2 (median depth 74.0)
[17:03:00 - root] Processed utg000001c:100001.0-110000.0 (median depth 19.0)
[17:03:00 - root] Processed utg000001c:2410001.0-2420000.0 (median depth 58.0)
[17:03:00 - root] Processed utg000001c:1640001.0-1650000.0 (median depth 15.0)
[17:03:01 - root] Processed utg000001c:1760001.0-1770000.0 (median depth 38.0)
[17:03:02 - root] Processed utg000001c:2050001.0-2060000.0 (median depth 12.0)
[17:03:02 - root] Processed utg000001c:3100001.0-3110000.0 (median depth 44.0)
[17:03:03 - root] Processed utg000001c:210001.0-220000.4 (median depth 75.0)
[17:03:03 - root] Processed utg000001c:400001.0-410000.1 (median depth 80.0)
[17:03:04 - root] Processed utg000001c:250001.0-260000.0 (median depth 64.0)
[17:03:05 - root] Processed utg000001c:3350001.0-3360000.0 (median depth 18.0)
[17:03:06 - root] Processed utg000001c:1260001.0-1270000.0 (median depth 75.0)
[17:03:06 - root] Processed utg000001c:3220001.0-3230000.0 (median depth 82.0)
[17:03:06 - root] Processed utg000001c:2720001.0-2730000.0 (median depth 58.0)
[17:03:06 - root] Processed utg000001c:2830001.0-2840000.0 (median depth 64.0)
[17:03:07 - root] Processed utg000001c:1750001.0-1760000.1 (median depth 79.0)
[17:03:07 - root] Processed utg000001c:340001.0-350000.0 (median depth 68.0)
Traceback (most recent call last):
File "/home/ziels/virtual-envs/medaka/bin/hp_compress", line 11, in
sys.exit(main())
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 650, in main
args.func(args)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 615, in choose_feature_func
training_batches(args)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/compress.py", line 514, in training_batches
write_yaml_data(fname, to_save)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/medaka/common.py", line 263, in write_yaml_data
hdf[group] = yaml.dump(d)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/init.py", line 217, in dump
return dump_all([data], stream, Dumper=Dumper, **kwds)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/init.py", line 196, in dump_all
dumper.represent(data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 26, in represent
node = self.represent_data(data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 57, in represent_data
node = self.yaml_representers[None](self, data)
File "/home/ziels/virtual-envs/medaka/lib/python3.6/site-packages/yaml/representer.py", line 229, in represent_undefined
raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', Counter({4: 2748771, 1: 969842, 2: 954640, 0: 936965, 3: 933782}))

I am able to run medaka_consensus okay with the walkthrough dataset, and so I believe that medaka is installed okay. Do you have any ideas on what could be causing this yaml- associated error above?

Thanks!

medaka polishing and flappie data

I would like to test flappie basecalled data with medaka, but I guess I need a new model for medaka since the error profile has changed. Would it be possible for you to train and provide such a model?

AssertionError thrown during stitching

The process ran until

...
[03:05:22 - Stitch] Processing ctg254.
[03:05:29 - Stitch] Processing ctg255.
[03:05:36 - Stitch] Processing ctg256.
[03:05:47 - Stitch] Processing ctg257.
[03:05:59 - Stitch] Processing ctg258.
Traceback (most recent call last):
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.4.3', 'console_scripts', 'medaka')()
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/medaka.py", line 170, in main
args.func(args)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/stitch.py", line 81, in stitch
joined = stitch_from_probs(args.inputs, regions=args.regions, model_yml=args.model_yml)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/stitch.py", line 62, in stitch_from_probs
end_1_ind, start_2_ind = get_sample_overlap(s1, s2)
File "/usr/local/bioinfo/src/Medaka/medaka-1.4.3/venv/lib/python3.6/site-packages/medaka-0.4.3-py3.6-linux-x86_64.egg/medaka/common.py", line 487, in get_sample_overlap
assert len(pos1_ovl) == len(pos2_ovl)
AssertionError

Originally posted by @chklopp in #16 (comment)

resolve_cigar2: Assertion `k < c->n_cigar' failed.

However I got an error when running:
medaka_variant -r GRCh38_full_analysis_set_plus_decoy_hla.fa -b rel5-guppy-0.3.0-chunk10k.sorted.bam -m r94 -R chr22

The reference file and bam file are downloaded from
https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-genome/rel5.md

[15:11:34 - Feature] Processed chr22:43999000.0-45000000.0 (median depth 34.0)
[15:11:34 - Sampler] Took 3.93s to make features.
[15:11:53 - PWorker] 46.8% Done (23.8/50.9 Mbases) in 532.3s
[15:11:54 - Sampler] Initializing sampler for consensus or region chr22:44999000-46000000.
python3.6: sam.c:1550: resolve_cigar2: Assertion `k < c->n_cigar' failed.
python/Python-3.6.3/bin/medaka_variant: line 75: 24999 Aborted (core dumped) medaka consensus ${BAM} ${PROBS} --model ${MODEL} --batch_size ${BATCH_SIZE} ${REGIONS} --threads ${THREADS} ${EXTRAOPTS}

From the error message above, it seems there are some bad cigar in the aligned bam? This problem happened to other chromosomes as well. Please let me know if I shall open a separate ticket on this. Thank you!

Originally posted by @sharon558 in #25 (comment)

Installation Problem

Hello,
I am having trouble with the installation of Medaka. I tried conda and pip, pip, as recommended and build from the source but the following error occur:

Nadas-MBP:~ nadakubikova$ conda install -c bioconda medaka
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • medaka
  • intervaltree[version='>=3.0.0']
  • medaka
  • whatshap==0.18

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Nadas-MBP:~ nadakubikova$ pip install medaka
Collecting medaka
Could not find a version that satisfies the requirement medaka (from versions: )
No matching distribution found for medaka
Nadas-MBP:~ nadakubikova$ virtualenv medaka --python=python3 --prompt "(medaka) "
Running virtualenv with interpreter /Users/nadakubikova/anaconda3/bin/python3
Using base prefix '/Users/nadakubikova/anaconda3'
/Library/Python/2.7/site-packages/virtualenv.py:1041: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Overwriting /Users/nadakubikova/medaka/lib/python3.7/orig-prefix.txt with new content
New python executable in /Users/nadakubikova/medaka/bin/python3
Not overwriting existing python script /Users/nadakubikova/medaka/bin/python (you must use /Users/nadakubikova/medaka/bin/python3)
Installing setuptools, pip, wheel...done.
Nadas-MBP:~ nadakubikova$ . medaka/bin/activate
(medaka) Nadas-MBP:~ nadakubikova$ pip install medaka
Collecting medaka
Could not find a version that satisfies the requirement medaka (from versions: )
No matching distribution found for medaka

Both conda and pip are up to date. Any thoughts?

Thanks,
Nada

medaka_variant model for flipflop data

Hi,

I would like to test the medaka_variant pipeline to do SNP detection and haplotyping.
We did basecalling of nanopore data using guppy version 2.2.3 and the model dna_r9.4.1_450bps_flipflop_prom.cfg (json file: template_r9.4.1_450bps_large_flipflop_prom.jsn).

There are three possible options to choose for base-call models in medaka_variant (r941_trans', 'r941_flip213', 'r941_flip235') and i am unsure which one (r941_flip213 or r941_flip235) supports best our data.
Could you tell me which is preferentially used for the data described?

Thank you,
Michel

Conda release

Hi,

we are regularly using medaka now. Are you planning to build a Conda package?

Cheers,
Felix

Support multi-GPU or allow to specify GPU

Default TensorFlow allocates all available GPUs on machine, but only compute with first GPU and the rest of GPUs remain idle.
Maybe allow to specify GPU device like guppy -x "coda:0"?

"Failed to run medaka consensus"

Running nanopore draft assemblies from Flye, run through racon 4 times and medaka fails at the consensus stage. The basecalls were made by guppy version 2.3.5. Medaka was installed into its own environment in conda.

Checking program versions
Program    Version    Required   Pass
bgzip      1.9        1.9        True
minimap2   2.11       2.11       True
samtools   1.9        1.9        True
tabix      1.9        1.9        True
Aligning basecalls to draft
-P option is deprecated
Found minimap files.
[M::main::1.814*1.00] loaded/built the index for 5496 target sequence(s)
[M::mm_mapopt_update::2.293*1.00] mid_occ = 52
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 5496
[M::mm_idx_stat::2.685*1.00] distinct minimizers: 20489295 (72.38% are singletons); average occurrences: 1.722; average spacing: 5.361
[M::worker_pipeline::51.221*14.92] mapped 79682 sequences
[M::worker_pipeline::83.219*19.17] mapped 76831 sequences
[M::worker_pipeline::107.127*21.37] mapped 75735 sequences
[M::worker_pipeline::130.271*22.80] mapped 77551 sequences
[M::worker_pipeline::156.134*23.91] mapped 76292 sequences
[M::worker_pipeline::181.773*24.29] mapped 75865 sequences
[M::worker_pipeline::211.141*24.11] mapped 76293 sequences
[M::worker_pipeline::236.326*24.68] mapped 75231 sequences
[M::worker_pipeline::260.724*25.09] mapped 77703 sequences
[M::worker_pipeline::287.333*25.15] mapped 76443 sequences
[M::worker_pipeline::314.211*24.76] mapped 75857 sequences
[M::worker_pipeline::339.408*25.12] mapped 78103 sequences
[M::worker_pipeline::355.932*25.30] mapped 83441 sequences
[M::worker_pipeline::374.894*25.32] mapped 84073 sequences
[M::worker_pipeline::400.614*25.06] mapped 78833 sequences
[M::worker_pipeline::428.461*25.36] mapped 72657 sequences
[M::worker_pipeline::452.870*25.56] mapped 72243 sequences
[M::worker_pipeline::480.282*25.62] mapped 73771 sequences
[M::worker_pipeline::508.286*25.35] mapped 71908 sequences
[M::worker_pipeline::536.047*25.57] mapped 71866 sequences
[M::worker_pipeline::560.878*25.73] mapped 72110 sequences
[M::worker_pipeline::587.958*25.77] mapped 71773 sequences
[M::worker_pipeline::609.060*25.68] mapped 78327 sequences
[M::worker_pipeline::613.150*25.58] mapped 55049 sequences
[M::main] Version: 2.11-r797
[M::main] CMD: minimap2 -x map-ont --MD -t 32 -a /home/ubuntu/stan/nanopore/flye_assemblies/D3S/D3S.flye.assembly.racon4.fasta.mmi /home/ubuntu/stan/nanopore/raw_data/D3S/D3S_cat.fasta
[M::main] Real time: 613.456 sec; CPU: 15683.244 sec
[bam_sort_core] merging from 0 files and 32 in-memory blocks...
Running medaka consensus
Using TensorFlow backend.
/home/ubuntu/miniconda3/envs/medaka/bin/medaka_consensus: line 88:  3068 Floating point exception(core dumped) medaka consensus ${CALLS2DRAFT}.bam ${CONSENSUSPROBS} --model ${MODEL} --batch_size ${BATCH_SIZE} --threads ${THREADS}
Failed to run medaka consensus.

This is what's in the output file:

total 2128508
-rw-rw-r-- 1 ubuntu ubuntu 2178192425 Apr 25 14:30 calls_to_draft.bam
-rw-rw-r-- 1 ubuntu ubuntu    1392768 Apr 25 14:30 calls_to_draft.bam.bai

Anyone know where I'm going wrong?

OOM in tensorflow

I'm consistently getting an OOM error from tensorflow with the most recent version of medaka and r941_min_high. I'm running it now on a dedicated Tesla V100 (16GB) - there's definitely, absolutely nothing else running on it. Of note, it usually happens after 5-10 chunks have already been processed. I've had the same issue with all 4 samples we've tried since upgrading, and never had this issue with the previous version and r941_flip.

[23:10:10 - Predict] Setting tensorflow threads to 8.
[23:14:04 - Predict] Processing 838 long region(s) with batching.
[23:14:04 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[23:14:04 - ModelLoad] With cudnn: True
[23:14:08 - ModelLoad] Loading weights from /home/ubuntu/.conda/envs/medaka/lib/python3.6/site-packages/medaka-0.7.1-py3.6-linux-x86_64.egg/medaka/data/r941_min_high_model.hdf5
[23:14:08 - PWorker] Running inference for 163.8M draft bases.
[23:14:08 - Sampler] Initializing sampler for consensus of region utg000001l:0-1000000.
[23:14:11 - Feature] Processed utg000001l:0.0-999999.0 (median depth 35.0)
[23:14:11 - Sampler] Took 2.51s to make features.
...
[23:14:30 - Sampler] Pileup for utg000001l:7999000.0-8999999.0 is of width 1849572
[23:14:30 - Sampler] Initializing sampler for consensus of region utg000001l:8999000-10000000.
2019-05-24 23:14:31.331254: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] OOM when allocating tensor with shape[1536000000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

medaka model for promethION data

Hi, i would like to do polishing of genome assemblies with promethION data but am unsure if r94 model is suitable for the data since a different ligation kit and flowcell is used than for minION and gridION.

Do you have other models or would r94 just work fine?

Thank you,
Michel

Issue with yaml library

Hi there,

I'm trying to get medaka working using an RTX2080 GTi card that needs cuda 10.0. I can run tensorflow and keras in the medaka virtualenv, so that part of it is working. The conda install uses the wrong libraries for cuda 10 (and fails looking for libcublas.9.0), so I had to install it using from git. For that I also had to install pyyaml using pip. However, now when medaka runs, it throws an error (file attached):

yaml.constructor.ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
  in "<unicode string>", line 3, column 5:
      - !!python/object/apply:numpy.core ... 

Any advice on how I can fix this?
Thanks,
Ben
medaka.log

Medaka Variant apparent Error

I'm currently testing the medaka variant pipeline and seem to have encountered an error. I've copied the tail of the stderr below...

[23:36:46 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg586:0-59347.
[23:36:48 - Feature] Pileup counts do not span requested region, requested Consensus_Consensus_Consensus_ctg586:0-59347, received 0-59345.
[23:36:48 - Feature] Processed Consensus_Consensus_Consensus_ctg586:0.0-59346.0 (median depth 42.0)
[23:36:48 - Sampler] Took 1.21s to make features.
[23:36:48 - Sampler] Initializing sampler for consensus or region Consensus_Consensus_Consensus_ctg587:0-57289.
[23:36:48 - Feature] Could not process sample with bam_to_sample_c, using python code instead.
(index 0 is out of bounds for axis 0 with size 0).
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/common.py", line 446, in gen_to_queue
for item in generator:
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/common.py", line 477, in grouper
batch.append(next(gen))
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/inference.py", line 478, in sample_gen
yield from data_gen.samples
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 607, in samples
self._fill_features()
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 592, in _fill_features
self.bam, self.region, self.rle_ref, self.read_fraction)
File "/home/nhartwic/.conda/envs/new_medaka/lib/python3.6/site-packages/medaka/features.py", line 316, in bam_to_sample
raise NotImplementedError("Filtering alignments by tag is not supported in python code.")
NotImplementedError: Filtering alignments by tag is not supported in python code.

[23:38:02 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221342.7s
[23:39:47 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221447.4s
[23:41:34 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221554.1s
[23:43:20 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221660.9s
[23:45:07 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221767.1s
[23:46:58 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221878.1s
[23:48:47 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 221987.6s
[23:50:36 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222096.1s
[23:52:23 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222203.6s
[23:54:10 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222310.1s
[23:55:55 - PWorker] 100.0% Done (2809.2/2809.2 Mbases) in 222415.4s

The medaka variant process still seems to be running, though it hasn't produced any output in over 12 hours and seems to be hung.

Any plans to support filtering alignments by tag in python code? What causes medaka to use a pythonic implementation for this contig? Is it just length? Is there any way I could modify my data set to run medaka variant?

TrimOverlap: RuntimeError: Unexpected sample relationship

Hi,

I encountered the following error in the TrimOverlap step, using medaka v0.7.0-alpha.1:

Traceback (most recent call last):
  File "/home/wdecoster/anaconda3/envs/medaka/bin/medaka", line 10, in <module>
    sys.exit(main())
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
    args.func(args)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 472, in variants_from_hdf
    vcf_writer.write_variants(variants, sort=True)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/vcf.py", line 272, in write_variants
    variants = medaka.common.loose_version_sort(variants, key=lambda v: '{}-{}'.format(v.chrom, v.pos))
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/common.py", line 708, in loose_version_sort
    it = list(it)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 206, in decode_variants
    for s, _ in yield_trimmed_consensus_chunks(samples):
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 69, in yield_trimmed_consensus_chunks
    raise RuntimeError('Unexpected sample relationship {} between {} and {}'.format(repr(rel), s1.name, s2.name))
RuntimeError: Unexpected sample relationship <Relationship.reverse_overlap: 'The end of s2 overlaps the start of s1.'> between chr1:198506700.10723-198510472.2 and chr1:198506700.1723-198506883.0
Failed to call variants from consensus chunks.

Please let me know if there is a file which I can share to help you debug this issue. The bam file I'm using is 80x coverage of the human genome...

Cheers,
Wouter

v0.7.0-alpha.1: IndexError tuple index out of range

A medaka_variant process which was running under version v0.7.0-alpha.1 just crashed, error message below.
Let me know if there are any (intermediate) files you would like. This is NA19240 at 20x coverage, full genome. I will restart it but parallelized in windows of 1Mb using the -r option, so I'm not sure if I'll reproduce this issue. I don't know if this bug will still be there be in the most recent version, so feel free to close this if you believe it is already solved.

======================================
Running medaka variant with threshold 1
======================================

[09:50:44 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
Traceback (most recent call last):
  File "/home/wdecoster/anaconda3/envs/medaka/bin/medaka", line 10, in <module>
    sys.exit(main())
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
    args.func(args)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 460, in variants_from_hdf
    decoder = decoder_cls(index.meta, ref_vcf=args.ref_vcf)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 160, in __init__
    self.feature_row_names = [fmt_feat(x) for x in meta['medaka_feature_decoding']]
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 160, in <listcomp>
    self.feature_row_names = [fmt_feat(x) for x in meta['medaka_feature_decoding']]
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/medaka/variant.py", line 159, in <lambda>
    fmt_feat = lambda x: '{}{}{}'.format(x[0], 'rev' if x[1] else 'fwd', x[3] * (x[2] if x[2] is not None else '-'))
IndexError: tuple index out of range

program interuption after alignment

I try to polish a 1G genome using 70X of ONT reads.

NPROC=$(nproc)
BASECALLS=reads.fastq.gz
DRAFT=assembly.fa
OUTDIR=medaka_consensus
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t ${NPROC} -m r94

medaka performs the genome indexing and alignment. The bam and bai files are created in OUTDIR.
Then I get the following error

Creating features
Using TensorFlow backend.
None
Traceback (most recent call last):
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/bin/hp_compress", line 11, in
load_entry_point('medaka==0.3.0', 'console_scripts', 'hp_compress')()
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 650, in main
args.func(args)
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 617, in choose_feature_func
features(args)
File "/usr/local/bioinfo/src/Medaka/medaka-0.3.0/venv/lib/python3.6/site-packages/medaka-0.3.0-py3.6.egg/medaka/compress.py", line 365, in features
opt_str = '\n'.join(['{}: {}'.format(k,v) for k, v in fe_kwargs.items()])
AttributeError: 'NoneType' object has no attribute 'items'

How can I find what went wrong?

Username / password

Hi,

I am trying to clone the repository, and I am being asked for a username and password
My username and password lead to a "Authentication failed" response

Cheers,
Vineeth

Medaka uses way too many threads

Hi,
Maybe related to #17
I am running medaka from a Singularity container. I specified 10 threads for Medaka but it seems to just use most of the cores available on the node (28):
image

Dominik

using hdf from medaka_variant for polishing

Hello,

I would like to call variants on realigned reads to an assembly and at the same time polish the assembly. I see that medaka_variant produces a hdf file (round_0_hap_mixed_probs.hdf) similar to the one used in medaka_consensus. Is it advisable to use the same hdf, so one could go directly for stitching with round_0_hap_mixed_probs.hdf?
Or are different mapping parameters used to produce consensus_probs.hdf?

Thanks,
Michel

Medaka stitch fails

Hello,

I'm trying to run medaka on my assembly. It seems to work until the stitch step, but then fails with the following error:

Running medaka stitch
[20:25:55 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[20:25:56 - Stitch] Processing ctg1.
Traceback (most recent call last):
  File "/ceph/users/lstevens/.conda/envs/medaka_env/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/medaka.py", line 350, in main
    args.func(args)
  File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/stitch.py", line 85, in stitch
    joined = stitch_from_probs(args.inputs, regions=args.regions)
  File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/stitch.py", line 66, in stitch_from_probs
    end_1_ind, start_2_ind = medaka.common.Sample.overlap_indices(s1, s2)
  File "/ceph/users/lstevens/.conda/envs/medaka_env/lib/python3.6/site-packages/medaka/common.py", line 256, in overlap_indices
    raise OverlapException(msg.format(s1.name, s2.name, repr(rel)))
medaka.common.OverlapException: Cannot overlap samples ctg1:0.0-260.0 and ctg1:266.0-3840.0 with relationhip <Relationship.forward_gapped: 's2 follows s1 with a gab inbetween.'>
Failed to stitch consensus chunks.

The other steps appear to have completed properly (ie there is no error output and the files calls_to_draft.bam, calls_to_draft.bam.bai, and consensus_probs.hdf exist and are not empty).

I'm running medaka with the following command:

medaka_consensus -i [gzipped_read_fastq] -d [raconpolished_fasta] -o medaka -t 32

The assembly is from wtdbg2 and polished 4x with racon using the command suggested in your README.

Medaka version is 0.7.0 and installed using conda.

Any ideas what might be wrong?

Thanks for your help,

Lewis

Unable to use the specified model

Hi,

I am trying to use the specified model by follow the commands on Walkthrough. In the last step, medaka_consensus command went wrong. And get the following error message.

RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (training/model.best.val.hdf5)
Failed to run medaka consensus.

To use a model run medaka_consensus for the default model (specifying the model using the -m option):

cd ${WALKTHROUGH}
source ${MEDAKA}
CONSENSUS=consensus_trained
MODEL=${TRAINNAME}/model.best.val.hdf5
medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m ${MODEL}

full message below.

(medaka) [JH@bip7 medaka_walkthrough]$ cd ${WALKTHROUGH}
(medaka) [JH@bip7 medaka_walkthrough]$ source ${MEDAKA}
(medaka) [JH@bip7 medaka_walkthrough]$ CONSENSUS=consensus_trained
(medaka) [JH@bip7 medaka_walkthrough]$ MODEL=${TRAINNAME}/model.best.val.hdf5
(medaka) [JH@bip7 medaka_walkthrough]$ medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m ${MODEL}
Checking program versions
Program Version Required Pass
bgzip 1.9 1.9 True
minimap2 2.11 2.11 True
samtools 1.9 1.9 True
tabix 1.9 1.9 True
Warning: Output consensus_trained already exists, may use old results.
Not aligning basecalls to draft, calls_to_draft.bam exists.
Running medaka consensus
Traceback (most recent call last):
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 37, in call
val = model_dict[val]
KeyError: 'training/model.best.val.hdf5'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/b_disk/JH/medaka_walkthrough/medaka/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.6.2', 'console_scripts', 'medaka')()
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 249, in main
args = parser.parse_args()
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1730, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1762, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1950, in _parse_known_args
positionals_end_index = consume_positionals(start_index)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1927, in consume_positionals
take_action(action, args)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1836, in take_action
action(self, namespace, argument_values, option_string)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1133, in call
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1762, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1968, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1908, in consume_optional
take_action(action, args, option_string)
File "/home/JH/miniconda3/lib/python3.6/argparse.py", line 1836, in take_action
action(self, namespace, argument_values, option_string)
File "/b_disk/JH/medaka_walkthrough/medaka/venv/lib/python3.6/site-packages/medaka-0.6.2-py3.6-linux-x86_64.egg/medaka/medaka.py", line 41, in call
self.dest, val)
RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (training/model.best.val.hdf5)
Failed to run medaka consensus.

So I changed the other files under the medaka/medaka/data/ to test.

I changed to r941_flip235_model.hdf5 and r941_trans_model.hdf5 and I will finish running successfully.

medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m medaka/medaka/data/r941_flip235_model.hdf5

medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${CONSENSUS} -t ${NPROC} -m medaka/medaka/data/r941_trans_model.hdf5

I change to r941_213_model.hdf5 and the same error message will appear.

medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o training_213 -t ${NPROC} -m medaka/medaka/data/r941_213_model.hdf5

error message

RuntimeError: Filepath for '--model' argument does not exist and is not a known model ID (/b_disk/JH/medaka_walkthrough/medaka/medaka/data/r941_213_model.hdf5)
Failed to run medaka consensus.

I want to use my own data to train the model, but I will get an error message according to the instructions on the Walkthrough. I want to know how to solve this problem.

While following the instructions, I found a small problem. There is no --max_label_len parameter in features, but it is used in train. Removing the --max_label_len parameter in the command should not affect the final result.

cd ${WALKTHROUGH}
source ${MEDAKA}
REFNAME=utg000001c
TRAINEND=3762624
TRAINFEATURES=train_features.hdf
FRACTION="0.1 1"
BATCHSIZE=200
MODEL_FEAT_OPT=medaka/medaka/data/medaka_model.hdf5
medaka features ${CALLS2DRAFT}.bam ${TRAINFEATURES} --truth ${TRUTH2DRAFT}.bam --threads ${NPROC} --region ${REFNAME}:-${TRAINEND} --batch_size ${BATCHSIZE} --read_fraction ${FRACTION} --chunk_len 1000 --chunk_ovlp 0 --model ${MODEL_FEAT_OPT} --max_label_len 1

Thank you.

0.6.0-alpha.1 version ModuleNotFoundError

Hi,

I am trying to run the 0.6.0-alpha.1 version of medaka to look at variant calling. I installed from source but upon running I get the following error:

Running medaka consensus /home/dct7/medaka/scripts/test.reads.sorted.bam
======================================

Traceback (most recent call last):
  File "/home/dct7/medaka/venv/bin/medaka", line 11, in <module>
    load_entry_point('medaka==0.6.0a1', 'console_scripts', 'medaka')()
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2793, in load_entry_point
    return ep.load()
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2411, in load
    return self.resolve()
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2417, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/medaka.py", line 11, in <module>
    from medaka.inference import train, predict
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/inference.py", line 22, in <module>
    from medaka.features import SampleGenerator
  File "/home/dct7/medaka/venv/lib/python3.6/site-packages/medaka-0.6.0a1-py3.6.egg/medaka/features.py", line 23, in <module>
    import libmedaka
ModuleNotFoundError: No module named 'libmedaka'
Failed to run medaka consensus.

Any suggestions?

Many thanks,
Damien

Polishing "short" references

I would like to perform polishing using medaka on "short" references, which are approximately 1400 bp long. Medaka breaks when I attempt to do this and the error seems to be connected to the length of my references/sequences:


...
[13:20:48 - root] Creating consensus features.
[13:20:48 - root] Got regions:
NODE_1_length_1395_cov_24.495757:0-1395
[13:20:48 - root] Got read fraction None
[13:20:48 - root] Writing samples to features.hdf
[13:20:48 - root] Processed NODE_1_length_1395_cov_24.495757:1.0-1394.0 (median depth 86.0)
[13:20:48 - medaka.compress] Skipping sample NODE_1_length_1395_cov_24.495757:1.0-1394.0 which has 3412 columns < min 10000.
[13:20:48 - root] Label counts:
...

The reference data set in this case only has one sequence, and it seems to be skipped here.

Would it be possible to change the < min 10000 setting, with out breaking downstream processing?

Thanks for developing and making medaka available to us!

Best regards
Søren

medaka train never finish

Dear developper,
i`m using medaka train with the option ephoc set to 10 but I noticed that the software never finish. It is one day that it is doing nothing. Additionally, the software medaka fix does not exists.

Cheers
Luigi

Medaka consensus Error: medaka.common.OverlapException

Hello,

We are getting an error at the stitch.py step of medaka 0.7.0 trying to run on CHM13 sample.

These are the two commands we are running:

medaka consensus \
--model r941_flip213 \
--threads 64 \
<path_to>/CHM13.shasta.racon4x.hg38_chrX.bam \
<path_to>/CHM13_medaka_consensus_prob.hdf 2>&1 | tee <path_to>/consensus.log

Which ends successfully. And we run:

medaka stitch \
<path_to>/CHM13_medaka_consensus_prob.hdf \ 
<path_to>/CHM13_shasta_racon_medaka_consensus.fasta

The Log:

[14:52:39 - DataIndex] Loaded sample-index from 1/1 (100.00%) of feature files.
[14:52:40 - Stitch] Processing 1022.
[14:52:40 - Stitch] Processing 1096.
[14:52:40 - Stitch] Processing 138.
[14:52:47 - Stitch] Processing 156.
[14:53:12 - Stitch] Processing 1562.
[14:53:12 - Stitch] Processing 1564.
[14:53:12 - Stitch] Processing 1622.
[14:53:13 - Stitch] Processing 164.
[14:54:24 - Stitch] Processing 180.
[14:54:39 - Stitch] Processing 280.
[14:54:41 - Stitch] Processing 342.
[14:54:41 - Stitch] Processing 358.
[14:54:41 - Stitch] Processing 36.
[14:57:42 - Stitch] Processing 360.
[14:57:43 - Stitch] Processing 40.
[15:01:08 - Stitch] Processing 44.
Traceback (most recent call last):
  File "/home/kishwar/software/medaka/venv/bin/medaka", line 11, in <module>
    load_entry_point('medaka==0.7.0', 'console_scripts', 'medaka')()
  File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/medaka.py", line 350, in main
    args.func(args)
  File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/stitch.py", line 85, in stitch
    joined = stitch_from_probs(args.inputs, regions=args.regions)
  File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/stitch.py", line 66, in stitch_from_probs
    end_1_ind, start_2_ind = medaka.common.Sample.overlap_indices(s1, s2)
  File "/home/kishwar/software/medaka/venv/lib/python3.6/site-packages/medaka-0.7.0-py3.6-linux-x86_64.egg/medaka/common.py", line 256, in overlap_indices
    raise OverlapException(msg.format(s1.name, s2.name, repr(rel)))
medaka.common.OverlapException: Cannot overlap samples 44:54257964.0-54264156.0 and 44:54264162.0-54265439.6 with relationhip <Relationship.forward_gapped: 's2 follows s1 with a gab inbetween.'>

Things we have tried:

  • Different builds (on different machines)
  • Different thread size
  • Different models
  • Whole genome execution of CHM13

I have made these two files available for you if you want to look into this.

wget https://storage.googleapis.com/kishwar-helen/medaka_error_issue/CHM13.shasta.racon4x.hg38_chrX.bam
wget https://storage.googleapis.com/kishwar-helen/medaka_error_issue/CHM13.shasta.racon_4x.hg38_chrX.fa

Please let me know if you can help in this regard.

running medaka on gridion, tensorflow-gpu not compatible with GPU

I removed TensorFlow and installed the GPU version but when running medaka_consensus I get a crash (below).

I installed medaka (0.6.0-py36h2b5150b_0) and tensorflow-gpu (1.12.0-h0d30ee6_0) using miniconda in a py3 conda env but do not dare to install CUDA software by fear of interfering with the original ONT software on the GridION.

Is there a way to run medaka with GPU on my gridION (2xGeForce GTX 1080 Ti) or should I use only cpu's?

Thanks

...
Running medaka consensus
Using TensorFlow backend.
[13:01:54 - Predict] Processing region(s): tig00000001:0-1469063 tig00000003:0-25183 tig00000005:0-1053078 tig00000009:0-1042324 tig00000012:0-24609 tig00000014:0-988579 tig00000016:0-875005 tig00000019:0-1047544 tig00000022:0-873730 tig00000027:0-920396 tig00000031:0-676909 tig00000035:0-748747 tig00000037:0-764060 tig00000039:0-209276 tig00000042:0-653884 tig00000058:0-692998 tig00000059:0-772142 tig00000062:0-816770 tig00000066:0-553488 tig00000067:0-580323 tig00000068:0-530610 tig00000077:0-409975 tig00000079:0-417209 tig00000080:0-429052 tig00000082:0-357446 tig00000093:0-308749 tig00000095:0-36675 tig00000098:0-265607 tig00000102:0-200009 tig00000105:0-261715 tig00000109:0-104085 tig00000112:0-36029 tig00000114:0-40744 tig00000117:0-27693 tig00000119:0-179847 tig00000150:0-12418 tig00000156:0-23170 tig00000159:0-16860 tig00000161:0-1825 tig00000170:0-3380 tig00000172:0-58317 tig00000191:0-1836 tig00000202:0-15093 tig00000210:0-9564 tig00000213:0-16630 tig00000276:0-3889 tig00000286:0-560358 tig00000287:0-68667 tig00000288:0-462709 tig00000289:0-13786 tig00000290:0-581156 tig00000291:0-208241 tig00007300:0-12563 tig00007301:0-1285605 tig00007302:0-954263 tig00007303:0-16559 tig00007304:0-795945 tig00007305:0-35880 tig00007306:0-30213 tig00007307:0-38796 tig00007308:0-6443 tig00007309:0-4305 tig00007310:0-24086 tig00007311:0-20001
/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/datastore.py:131: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  return {g: yaml.load(self.fh[g][()]) for g in groups if g in self.fh}
[13:01:54 - Predict] Setting tensorflow threads to 8.
Traceback (most recent call last):
  File "/opt/miniconda3/envs/py3/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/medaka.py", line 261, in main
    args.func(args)
  File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/medaka/inference.py", line 535, in predict
    inter_op_parallelism_threads=args.threads)
  File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/opt/miniconda3/envs/py3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Failed to run medaka consensus.

medaka GPU

Hi,

I am currently testing medaka 0.6.2 on GPUs using the promethion machine with its Tensor V100 socket.

I subset the origina file for testing to a single contig to polish, only working with data from a single contig: ctg55_21516

hdf production seemed so run through, but i got an error thrown after it reached the last contig:

Checking program versions
Program    Version    Required   Pass
bgzip      1.9        1.9        True
minimap2   2.11       2.11       True
samtools   1.9        1.9        True
tabix      1.9        1.9        True
Warning: Output ctg already exists, may use old results.
Not aligning basecalls to draft, calls_to_draft.bam exists.
Running medaka consensus
Using TensorFlow backend.
[14:25:43 - Predict] Processing region(s): ctg1000_107:0-12339 ctg1002_24:0-14544 ctg1004.....
[14:25:43 - Predict] Setting tensorflow threads to 1.
[14:25:43 - Predict] Found 1557 long and 256 short regions.
[14:25:43 - Predict] Processing long regions.
[14:25:43 - ModelLoad] Building model (steps, features, classes): (10000, 10, 5)
[14:25:43 - ModelLoad] With cudnn: True
[14:25:43 - ModelLoad] Loading weights from /home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/data/r941_flip235_model.hdf5
[14:25:43 - PWorker] Running inference for 673.1M draft bases.
[14:25:43 - Sampler] Initializing sampler for consensus or region ctg1000_107:0-12339.
[14:25:43 - Feature] Pileup-feature is zero-length for ctg1000_107:0-12339 indicating no reads in this region.
[14:25:43 - Sampler] Took 0.01s to make features.....


2019-04-09 13:39:55.779570: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] OOM when allocating tensor with shape[1536000000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/prom/.conda/envs/medaka/bin/medaka", line 11, in <module>
    sys.exit(main())
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/medaka.py", line 261, in main
    args.func(args)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/inference.py", line 560, in predict
    tag_name=args.tag_name, tag_value=args.tag_value, tag_keep_missing=args.tag_keep_missing
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/medaka/inference.py", line 489, in run_prediction
    class_probs = model.predict_on_batch(x_data)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/engine/training.py", line 1274, in predict_on_batch
    outputs = self.predict_function(ins)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/prom/.conda/envs/medaka/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, seq_length, batch_size]: [1, 256, 128, 1, 10000, 200]
         [[{{node bidirectional_2/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="gru", seed=87654321, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bidirectional_2/transpose, bidirectional_2/ExpandDims_1, bidirectional_2/Const, bidirectional_2/concat)]]
Failed to run medaka consensus.

I wonder if its subsetting related or if there is some issue with the tensorflow-gpu installation as tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: is indicating?

Stdout for ctg55:

[13:39:26 - Sampler] Initializing sampler for consensus or region ctg54_1:999000-2000000.
[13:39:26 - Feature] Pileup-feature is zero-length for ctg54_1:999000-2000000 indicating no reads in this region.
[13:39:26 - Sampler] Took 0.02s to make features.
[13:39:26 - Sampler] Initializing sampler for consensus or region ctg54_1:1999000-2207334.
[13:39:26 - Feature] Pileup-feature is zero-length for ctg54_1:1999000-2207334 indicating no reads in this region.
[13:39:26 - Sampler] Took 0.00s to make features.
[13:39:26 - Sampler] Initializing sampler for consensus or region ctg55_21516:0-1000000.
[13:39:32 - Feature] Processed ctg55_21516:0.0-1000000.3 (median depth 59.0)
[13:39:32 - Sampler] Took 5.60s to make features.
[13:39:32 - Sampler] Initializing sampler for consensus or region ctg55_21516:999000-2000000.
[13:39:37 - Feature] Processed ctg55_21516:999000.0-2000000.1 (median depth 56.0)
[13:39:37 - Sampler] Took 5.21s to make features.
[13:39:37 - Sampler] Initializing sampler for consensus or region ctg55_21516:1999000-3000000.
[13:39:39 - Feature] Pileup counts do not span requested region, requested ctg55_21516:1999000-3000000, received 1999000-2207333.
[13:39:39 - Feature] Processed ctg55_21516:1999000.0-2207334.0 (median depth 57.0)
[13:39:39 - Sampler] Took 1.56s to make features.
[13:39:39 - Sampler] Initializing sampler for consensus or region ctg55_21516:2999000-4000000.
[13:39:39 - Feature] Pileup-feature is zero-length for ctg55_21516:2999000-4000000 indicating no reads in this region.
[13:39:39 - Sampler] Took 0.37s to make features.
[13:39:39 - Sampler] Initializing sampler for consensus or region ctg55_21516:3999000-5000000.
[13:39:40 - Feature] Pileup-feature is zero-length for ctg55_21516:3999000-5000000 indicating no reads in this region.
[13:39:40 - Sampler] Took 0.34s to make features.
[13:39:40 - Sampler] Initializing sampler for consensus or region ctg55_21516:4999000-6000000.
[13:39:40 - Feature] Pileup-feature is zero-length for ctg55_21516:4999000-6000000 indicating no reads in this region.
[13:39:40 - Sampler] Took 0.43s to make features.
[13:39:40 - Sampler] Initializing sampler for consensus or region ctg55_21516:5999000-7000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:5999000-7000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.42s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:6999000-8000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:6999000-8000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.40s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:7999000-9000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:7999000-9000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.17s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:8999000-10000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:8999000-10000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.02s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:9999000-11000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:9999000-11000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.01s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:10999000-12000000.
[13:39:41 - Feature] Pileup-feature is zero-length for ctg55_21516:10999000-12000000 indicating no reads in this region.
[13:39:41 - Sampler] Took 0.02s to make features.
[13:39:41 - Sampler] Initializing sampler for consensus or region ctg55_21516:11999000-13000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:11999000-13000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg55_21516:12999000-14000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:12999000-14000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.03s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg55_21516:13999000-14035920.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg55_21516:13999000-14035920 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.01s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:0-1000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:0-1000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:999000-2000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:999000-2000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.01s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:1999000-3000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:1999000-3000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.02s to make features.
[13:39:42 - Sampler] Initializing sampler for consensus or region ctg57_1:2999000-4000000.
[13:39:42 - Feature] Pileup-feature is zero-length for ctg57_1:2999000-4000000 indicating no reads in this region.
[13:39:42 - Sampler] Took 0.03s to make features.

medaka to output unpolished sequences

After polishing with medaka it appears that I have fewer sequences than I put in. I assume that medaka gets rid of some of them for some reason e.g. they do not get polished.

Any chance that you can include an option to keep all sequences similarly to the "--include-unpolished" option in racon?

medaka_consensus compatabile with guppy v3.0.3

Hi,
Just wanted to check whether the models released as part of guppy v3.0.3 compatible with the medaka models specifically the r941_flip235 model. As of 3.0.3 all of the r9.4.1 models are now flip-flop models so are they the same as those in medaka. Reason I ask is that I noticed a very slight drop in accuracy (99.87 -> 99.85!) when re-basecalling the same data with v3.0.3.

Thanks.
Damien

medaka variant produces unsorted vcf, crashing whatshap

I've noticed more than once now that medaka_variant produces an unsorted vcf file, which will then downstream crash the whatshap phase step with the following error:

 jTraceback (most recent call last):
  File "/home/wdecoster/anaconda3/envs/medaka/bin/whatshap", line 12, in <module>
    sys.exit(main())
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/__main__.py", line 83, in main
    module.main(args)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/phase.py", line 1114, in main
    run_whatshap(**vars(args))
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/phase.py", line 663, in run_whatshap
    for variant_table in vcf_reader:
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/vcf.py", line 329, in __iter__
    yield self._process_single_chromosome(chromosome, records)
  File "/home/wdecoster/anaconda3/envs/medaka/lib/python3.6/site-packages/whatshap/vcf.py", line 379, in _process_single_chromosome
    raise VcfNotSortedError('VCF not ordered: {}:{} appears before {}:{}'.format(chromosome, prev_position+1, chromosome, pos+1))
whatshap.vcf.VcfNotSortedError: VCF not ordered: chr4:155958777 appears before chr4:155957327

Cheers,
Wouter

Label encoding

Hi,

I got medaka running, and successfully trained it on one of our datasets - I stopped the training at Epoch 108 as I saw acc not improving (it had reached 1.0000, and val_acc was at 0.9996)

On trying to generate the consensus, I end up with this error:

Traceback (most recent call last):
File "/home/ngs/medaka/venv/bin/medaka", line 11, in
load_entry_point('medaka==0.1.0', 'console_scripts', 'medaka')()
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/medaka.py", line 69, in main
args.func(args)
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/inference.py", line 372, in predict
model, encoding = load_model_hdf(args.model, args.encoding, need_encoding=True)
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/medaka-0.1.0-py3.5.egg/medaka/inference.py", line 127, in load_model_hdf
raise KeyError("Could not find label encodings in the model, please provide an encoding json")
KeyError: 'Could not find label encodings in the model, please provide an encoding json'
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f7d0b991c50>>
Traceback (most recent call last):
File "/home/ngs/medaka/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 696, in __del__
TypeError: 'NoneType' object is not callable

Since I see that the alphabet considered is A,T,G,C,* and the draft reference I use has a N in it, could this be the problem ?

Cheers and TIA,
Vineeth

medaka stitch runtime

Dear medaka developers,

I run medaka on a 2.7 Gb genome to polish with about 20x nanopore data after Racon polishing.

Stitching is now running for 1.5 days. I gave it 32 threads to work with. Is this an expected runtime? It seems superslow for the last step.

I am a bit worried as i have larger datasets to polish in the future with a 3 Gb genome of about 60x nanopore data.
Could you tell me how to improve performance with medaka to speed up the process.
Racon presents another major bottleneck in runtime but i can parallelize it by chopping to genome. Does something similar work for medaka?
As medaka creates the alignments itself, i am not sure if a bold minimap2-bam could be fed into a sepearte directory for subsequenes of the genome.
Any tips would be apprecieated,
Michel

Failed to stitch consensus chunks.

Hi,

When running medaka, the stitching step seems to run into some kind of exceptions with the assembly we have:

[03:18:31 - Stitch] Processing ctg1641.
[03:18:34 - Stitch] Processing ctg1642.
[03:18:42 - Stitch] There is no overlap betwen ctg1642:32609.1-37277.1643 and ctg1642:37276.3367-41429.0
[03:18:43 - Stitch] ctg1642:37276.644-39589.1 ends before ctg1642:37276.3367-41429.0, skipping.
ctg127:259136.66-261946.5049 and ctg127:261945.13050-264482.0 do not overlap
ctg148:271139.1918-271157.3508 and ctg148:271156.11509-273932.0 do not overlap
ctg149:1355570.47-1356768.7744 and ctg149:1356767.15745-1362547.0 do not overlap
ctg1642:32609.1-37277.1643 and ctg1642:37276.3367-41429.0 do not overlap
Traceback (most recent call last):
 File "/local/genome/packages/anaconda3/latest/bin/medaka", line 11, in <module>
   sys.exit(main())
 File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/medaka.py", line 213, in main
   args.func(args)
 File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/stitch.py", line 80, in stitch
   joined = stitch_from_probs(args.inputs, regions=args.regions)
 File "/mnt/users/tinagr/.local/lib/python3.6/site-packages/medaka/stitch.py", line 71, in stitch_from_probs
   logger.info(msg.format(s1.name, s2.name))
AttributeError: 'NoneType' object has no attribute 'name'
Failed to stitch consensus chunks.

I thought its similar to #20 , but we are using version medaka 0.5.0 already.

command used:

DRAFT=wtdbg2-medakaPolish_p19-k0-L10k-AS3-s0p05-e3.ctg.fa
BASECALLS=1-fastp-guppyNewVer-Recall-bases/Cod.recalled.4kb.q7.50h.fastq.gz
OUTDIR=medaka_trial_j

medaka_consensus -i ${BASECALLS} -d ${DRAFT} -o ${OUTDIR} -t 25 -m 'r941_flip'

Thank you,
Michel

medaka_variant is not available

Hi, I installed medaka into /python/Python-3.6.3/bin using:
pip install medaka

I wanted to use medaka _variant to call variants, but I didn't find it in the installed directory:

ls medaka*
medaka medaka_consensus medaka_counts medaka_data_path medaka_version_report

Did I miss something?
Thanks in advance!

Pre-built model?

Enhancement request: I'm keen to give Medaka a try, but I was wondering if ONT could provide a pre-trained model to use? Perhaps a model trained on the same datasets ONT uses to train basecallers? Or maybe separate models based on organism type: bacterial, human, etc.

I saw the medaka/test/data/test_model.hdf5 file and assumed that was just for running the tests. But let me know if it is appropriate to use that model more generally.

Ryan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.