Coder Social home page Coder Social logo

algolab / malvirus Goto Github PK

View Code? Open in Web Editor NEW
2.0 9.0 2.0 6.01 MB

A fast and accurate tool for genotyping haploid individuals (such as SARS-CoV-2)

Home Page: https://algolab.github.io/MALVIRUS

License: GNU General Public License v3.0

Python 11.97% Dockerfile 1.05% Shell 0.18% HTML 0.34% JavaScript 43.97% CSS 1.55% C 9.18% Less 0.41% Makefile 0.23% C++ 31.11%
genotyping genotyping-by-sequencing alignment-free sars-cov-2 virology kmer

malvirus's Introduction

Docker Hub Build

MALVIRUS

MALVIRUS is a fast and accurate tool for genotyping haploid individuals that does not require to assemble the read nor mapping them to a reference genome. It is tailored to work with virological data (including but not limited to SARS-CoV-2) and can genotype an individual directly from sequencing data in minutes.

MALVIRUS is divided into two logically distinct steps: the creation of a variant catalog from a set of assemblies and the genotype calling. The first step is based on mafft [1] and snp-sites [2], whereas the second step is based on KMC [3], MALVA [4], and SnpEff [5].

The variant catalog can be built once and reused for genotyping multiple individuals.

Please see the help directory for additional details.

MALVIRUS is distributed as a Docker image and is publicly available on GitHub and Docker Hub under the terms of the GNU General Public License version 3 or later. MALVIRUS was mainly developed and tested under Ubuntu GNU/Linux version 18.04 but works wherever Docker is available.

Citation

MALVIRUS: an integrated web application for viral variant calling
Simone Ciccolella, Luca Denti, Paola Bonizzoni, Gianluca Della Vedova, Yuri Pirola, Marco Previtali
bioRxiv 2020.05.05.076992; doi: 10.1101/2020.05.05.076992

References

[1] Katoh, Kazutaka, and Daron M. Standley. 2013. “MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability.” Molecular Biology and Evolution 30 (4): 772–80. doi:10.1093/molbev/mst010.

[2] Page, Andrew J., Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, and Simon R. Harris. 2016. “SNP-Sites: Rapid Efficient Extraction of Snps from Multi-Fasta Alignments.” Microbial Genomics 2 (4). doi:10.1099/mgen.0.000056.

[3] Kokot, Marek, Maciej Dlugosz, and Sebastian Deorowicz. 2017. “KMC 3: counting and manipulating k-mer statistics.” Bioinformatics 33 (17): 2759–61. doi:10.1093/bioinformatics/btx304.

[4] Denti, Luca, Marco Previtali, Giulia Bernardini, Alexander Schönhuth, and Paola Bonizzoni. 2019. “MALVA: Genotyping by Mapping-Free Allele Detection of Known Variants.” iScience 18: 20–27. doi:10.1016/j.isci.2019.07.011.

[5] Pablo Cingolani et al. 2012. “A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3” Fly 6(2): 80-92. doi:10.4161/fly.19695.

malvirus's People

Contributors

dependabot[bot] avatar gdv avatar ldenti avatar mpre avatar yp avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

malvirus's Issues

Dockerfile new version

Con la commit e45a729 ho sistemato le issue #1 e #2 ma forse non nel migliore dei modi. Riassumo qui le modifiche. Se non siete d'accordo, possiamo trovare un altro modo:

  1. Mi sono dimenticato che fra le varie prove avevo cambiato la versione di Miniconda alla "latest". La latest è per py37 quindi funziona. @yp c'era un motivo particolare per cui avevi forzato la versione?
    https://github.com/ldenti/malva_covid_service/blob/e45a729664b53c66dd4b83707e4228b24c5f1691/Dockerfile#L12

  2. ho rimosso snakemake dall'environment ed ho aggiunto l'installazione nel dockerfile:
    https://github.com/ldenti/malva_covid_service/blob/e45a729664b53c66dd4b83707e4228b24c5f1691/Dockerfile#L28-L29

  3. installo tutti i programmi che non ci sono in conda (format_vcf.py, snp-sites e malva) in /software

  4. malva dipende da sdsl-lite che dipende da cmake. Installo cmake con apt
    https://github.com/ldenti/malva_covid_service/blob/e45a729664b53c66dd4b83707e4228b24c5f1691/Dockerfile#L53-L54
    Una volta che malva 1.3.0 (ossia quella con mode aploide) sarà su conda basta decommentare la linea dall'environment ed eliminare le righe dal Dockerfile

Tenere traccia dei vari job

Bisogna decidere come strutturare i log json dei vari tool.

  • Sicuramente serve aggiungere un ID per ogni job, in modo tale da avere la storia e poter accedere ad un job in particolare.
  • Come suddividere i json? uno per job? uno per tool? uno solo e basta?
  • Altro se vi viene in mente

Individuazione del tipo di sample per KMC

Per KMC il sample va indicato con -f<a/q/m/bam> sulla base del tipo.
Si fa in automatico o viene dall'interfaccia?
È ipotizzabile realizzare un autoriconoscitore di tipi (da inglobare nello Snakefile)?

Help! Cannot create new reference vcf file?

Hello,

I'm trying to create new reference vcf file for GISAID sequences of COVID19. But MALVIRUS failed every time for these sequences. I've 24,497 sequences. And I take these error message from log.json;

{"alias":"20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af","description":"europe","filename":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta","gtf":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/sars-cov-2.gff","id":"20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af","log":{"last_time":"2020-08-13 11:11:18","status":"Failed","steps":{"mafft":{"command":"mafft --thread 4 --auto --keeplength --addfragments /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta > /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/multi_alignment.unfilled.msa 2> /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.log","config":{"cores":4,"gtf":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/sars-cov-2.gff","multifa":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta","reference":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta","workdir":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af"},"input":{"fa":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta","mfa":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta"},"log":"/opt/conda/envs/malva-env/bin/mafft: line 2745: 897 Killed "$prefix/addsingle" -Q 100 $legacygapopt -W $tuplesize -O $outnum $addsinglearg $addarg $add2ndhalfarg -C $numthreads $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -f "-"$gop -h $aof $param_fft $localparam $algopt $treealg $scoreoutarg < infile > /dev/null 2>> "$progressfile"\n","output":{"msa":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/multi_alignment.unfilled.msa"},"params":{},"result":"Failed","return_code":1,"time":"2020-08-13 11:11:18"}}},"params":{"cores":"4"},"reference":"/jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta","snakemake":"Building DAG of jobs...\nUsing shell: /bin/bash\nProvided cores: 4\nRules claiming more threads will be scaled down.\nJob counts:\n\tcount\tjobs\n\t1\tfill_msa\n\t1\tindex_reference\n\t1\tmulti_align\n\t1\trun\n\t1\tvcf_add_freqs\n\t1\tvcf_build\n\t1\tvcf_clean_header\n\t7\n\n[Thu Aug 13 11:10:42 2020]\nrule multi_align:\n input: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta, /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta\n output: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/multi_alignment.unfilled.msa\n log: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.log, /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.json\n jobid: 6\n threads: 4\n\n\u001b[33mJob counts:\n\tcount\tjobs\n\t1\tmulti_align\n\t1\u001b[0m\n\u001b[32m[Thu Aug 13 11:11:18 2020]\u001b[0m\n\u001b[31mError in rule multi_align:\u001b[0m\n\u001b[31m jobid: 0\u001b[0m\n\u001b[31m output: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/multi_alignment.unfilled.msa\u001b[0m\n\u001b[31m log: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.log, /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.json (check log file(s) for error message)\u001b[0m\n\u001b[31m\u001b[0m\n\u001b[31mRuleException:\nCalledProcessError in line 102 of /snakemake/Snakefile.vcf:\nCommand 'mafft --thread 4 --auto --keeplength --addfragments /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/europe_2020_07_20_19-2.fasta /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/reference.fasta > /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/multi_alignment.unfilled.msa 2> /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/mafft/mafft.log' returned non-zero exit status 1.\n File "/snakemake/Snakefile.vcf", line 112, in __rule_multi_align\n File "/snakemake/Snakefile.vcf", line 102, in __rule_multi_align\n File "/opt/conda/envs/malva-env/lib/python3.7/subprocess.py", line 411, in check_output\n File "/opt/conda/envs/malva-env/lib/python3.7/subprocess.py", line 512, in run\n File "/opt/conda/envs/malva-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run\u001b[0m\n\u001b[31mExiting because a job execution failed. Look above for error message\u001b[0m\nShutting down, this might take some time.\nExiting because a job execution failed. Look above for error message\nComplete log: /jobs/vcf/20200813-111040_3c253ebf-57bc-4783-95f1-d861b32e84af/.snakemake/log/2020-08-13T111041.968450.snakemake.log\n","submission_time":1597317041}

I'm new this area and I'm trying to understand. Can anybody help me about this error message?

Thanks

Conda installtion

Hi,

Can I use conda for installing this tool? I am not familiar with docker but I have downloaded it and try to follow the instruction but no luck, maybe because I do not know how to adjust the port. I see in the previous issue conda but I don't understand the Italian language.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.