Coder Social home page Coder Social logo

louis-mg / metadbgwas Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 266.72 MB

Beginig of the expansion of DBGWAS to a metagenomic scale.

License: Other

Shell 1.00% CMake 0.14% C++ 14.74% C 1.47% HTML 1.01% CSS 0.76% Less 2.88% SCSS 2.92% JavaScript 62.86% R 12.20% Dockerfile 0.02%
bioinformatics de-bruijn-graphs metagenomics

metadbgwas's Introduction

Metadbgwas

Motivation

This tool expands the work of DBGWAS (Jaillard et al, 2018) and brings it to the metagenomic scale. It finds variants significantly associated with a given phenotype of interest, and output its findings in a web page. That web page shows the selected unitigs with their surounding graph component, and a user-provided annotation can be used.

Overview

You can find the internship report for this project in the report/ folder. Schematic of metadbgwas

Output

Here is an example of the output of Metadbgwas: significant components are shown in preview with annotation if provided, and a click on them will take you to the page with full information and interactive graph.

Output example

Requirements

Installation

  1. Use the following command to download the repository :
    git clone --recursive https://github.com/Louis-MG/Metadbgwas.git
  2. Complete the installation :
    cd Metadbgwas
    sh install.sh

Usage

	* General
NOTE: path should be absolute.
--files <path> path to the directory containing the read files.
--output <path> path to the output folder. Default set to ./ .
--threads <int> number of threads to use. !! Default set to 4 !!
--verbose <int> level of verbosity. Default to 1, 0-1. 0 is equivalent to --quiet.
--clean removes intermediary files to save space if you are worried about your storage.
--skip1 skips the Lighter correction step. Corrected files are supposed to be in the output folder.
--skip2 skips the Lighter and Bcalm2 steps. Corrected files and unitigs folder are supposed to be in the output folder.
--skip3 skips the Lighter, Bcalm2 and REINDEER steps. Corrected files, unitigs and matrix folder are supposed to be in the output folder.

        * Lighter
NOTE: if your datset contains different bacterial genomes with very different size, it is better to choose --k option and provide the pick-rate (noted alpha).
--K <int> <int> kmer length and approximate genome size (in base). Recommended is 17 G.
        or
--k <int> <int> <float> kmer length and genome size (in base), alpha (probability of sampling a kmer). Recommended is 17 G alpha. alpha is best chosen at 7/coverage.

	* Bcalm2
--kmer <int> kmer length used for unitigs build. Default to 31.
--abundance-min <int> minimum number of occurence of a kmer to keep it in the union DBG. Default to 5, highly recommended to change to the 2.5% quantile of the Poisson law with lambda = coverage.

	* Bifrost
Bifrost uses kmer, threads, and output parameters. No others need to be specified.

        * DBGWAS
--strains A text file describing the strains containing 3 columns: 1) ID of the strain; 2) Phenotype (a real number or NA); 3) Path to a multi-fasta file containing the sequences of the strain. This fil>
--newick Optional path to a newick tree file. If (and only if) a newick tree file is provided, the lineage effect analysis is computed and PCs figures are generated.
--nc-db A list of Fasta files separated by comma containing annotations in a nucleotide alphabet format (e.g.: -nc-db path/to/file_1.fa,path/to/file_2.fa,etc). You can customize these files to work better with DBGWAS (see https://gitlab.com/leoisl/dbgwas/tree/master#customizing-annotation-databases).
--pt-db A list of Fasta files separated by comma containing annotations in a protein alphabet format (e.g.: -pt-db path/to/file_1.fa,path/to/file_2.fa,etc). You can customize these files to work better with DBGWAS (see https://gitlab.com/leoisl/dbgwas/tree/master#customizing-annotation-databases).
--threshold maximum value for which phenotype will be considered to be 0.

        * Miscellaneous
--license prints the license text in standard output.
--help displays help.

Exemple

bash metadbgwas.sh --files ./input --output ./output --K 17 6000000 --strains ./strains --threads 4

Docker

An image is hosted on Docker hub. You can also build it localy using the dockerfile located in /docker. You might have to add sudo if you didnt run the post-installation steps of docker.

docker pull 007ptar007/metadbgwas:latest
docker run -v 'path/to/input/folder:/input' metadbgwas --files ./input --strains ./input/strains --threads 40 --output ./output --K 17 G

Singularity

You can also run the docker image with singularity:

singularity pull docker://007ptar007/metadbgwas
singularity run -H /path/to/input metadbgwas_latest.sif --files ./input --strains ./input/strains --threads 40 --output ./output --K 17 G

Output

User can find in the output folder :

  • the corrected fasta files.
  • unitigs folder with bcalm2 output, sample-wise and dataset-wise.
  • step1, step2 and step3 that contains internal files of the modified DBGWAS
  • visualisation contains visulatisation files.
  • command_line.txt with the paremeters used for the execution

How to reference :

Please cite this tool as :

Metadbgwas, Louis-Mael Gueguen, 2022.

Issues :

You can post issues in the issue section of the github repository. You can also email me at lm<dot>gueguen<at>orange<dot>fr . I will do my best to resolve them.

License

The work is available under the zlib license.

metadbgwas's People

Contributors

fgindraud avatar louis-mg avatar

Watchers

 avatar

metadbgwas's Issues

Can't write cytoscape output

After all calculations, the textual and graphical output of cytoscape is not produced. The error comes after rendering the files for the first component of the graph. Error is:

Building Cytoscape graph and textual output...
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::copy_file: File exists: "/home/lmgueguen/test/Metadbgwas/tools/src/../../csjs/lib/xml/DBGWAS_cytoscape_style.xml", "./output/visualisations/components/lib/xml/DBGWAS_cytoscape_style.xml"

It happens between lines 372 and 604 in generate_output.cpp .

visulatisation is empty

After all the calculation, files in textualOutput are empty (just the template) except first comp_nodes_1.tsv, which has only one node.

visualisation contains only templates as well. This might be due to:

  • improper parsing of the gfa
  • bad creation of the graph

Bifrost not built

It does not look like Bifrost gets built along with the other programs, at least not in my build.

Metadbgwas/metadbgwas.sh: line 334: /home/user/Metadbgwas/bifrost/build/src/Bifrost: No such file or directory
Metadbgwas/metadbgwas.sh: line 336: /home/user/Metadbgwas/bifrost/build/src/Bifrost: No such file or directory

I double checked and the src directory and program file are not made in the build directory.
I would appreciate any help you could provide!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.