Coder Social home page Coder Social logo

metagentools / gbintk Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 5.0 52.53 MB

🌟🧬 GraphBin-Tk: assembly graph-based metagenomic binning toolkit

Home Page: https://gbintk.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 100.00%
contigs metagenomics metagenomic-binning assembly-graph bioinformatics

gbintk's Introduction

GraphBin-Tk: assembly graph-based metagenomic binning toolkit

GitHub License install with bioconda Conda PyPI version CI codecov CodeQL Documentation Status Code style: black

GraphBin-Tk combines assembly graph-based metagenomic bin-refinement and binning techniques GraphBin, GraphBin2 and MetaCoAG along with additional processing functionality to visualise and evaluate results, into one comprehensive toolkit.

Initial binning

For detailed instructions on installation and usage, please refer to the documentation hosted at Read the Docs.

NEW: GraphBin-Tk is now available on bioconda and PyPI.

Installing GraphBin-Tk

Using conda

You can install GraphBin-Tk using the bioconda distribution. You can download conda from Anaconda or Miniconda. You can also use mamba instead of conda.

# add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment
conda create -n gbintk

# activate conda environment
conda activate gbintk

# install gbintk
conda install -c bioconda gbintk

# check gbintk installation
gbintk --help

Using pip

You can install GraphBin-Tk using pip from the PyPI distribution.

# install gbintk
pip install gbintk

# check gbintk installation
gbintk --help

For development

Please follow the steps below to install gbintk using flit for development.

# clone repository
git clone https://github.com/metagentools/gbintk.git

# move to gbintk directory
cd gbintk

# create and activate conda env
conda env create -f environment.yml
conda activate gbintk

# install using flit
flit install -s --python `which python`

# test installation
gbintk --help

Available subcommands in GraphBin-Tk

Run gbintk --help or gbintk -h to list the help message for GraphBin-Tk.

Usage: gbintk [OPTIONS] COMMAND [ARGS]...

  gbintk (GraphBin-Tk): Assembly graph-based metagenomic binning toolkit

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  graphbin   GraphBin: Refined Binning of Metagenomic Contigs using...
  graphbin2  GraphBin2: Refined and Overlapped Binning of Metagenomic...
  metacoag   MetaCoAG: Binning Metagenomic Contigs via Composition,...
  prepare    Format the initial binning result from an existing binning tool
  visualise  Visualise binning and refinement results
  evaluate   Evaluate the binning results given a ground truth

Citation

If you use GraphBin-Tk in your work, please cite the relevant tools.

GraphBin

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: https://doi.org/10.1093/bioinformatics/btaa180

GraphBin2

Vijini G. Mallawaarachchi, Anuradha S. Wickramarachchi, and Yu Lin. GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 8:1-8:21, Schloss Dagstuhl – Leibniz-Zentrum fΓΌr Informatik (2020). DOI: https://doi.org/10.4230/LIPIcs.WABI.2020.8

Mallawaarachchi, V.G., Wickramarachchi, A.S. & Lin, Y. Improving metagenomic binning results with overlapped bins using assembly graphs. Algorithms Mol Biol 16, 3 (2021). DOI: https://doi.org/10.1186/s13015-021-00185-6

MetaCoAG

Mallawaarachchi, V., Lin, Y. (2022). MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs. In: Pe'er, I. (eds) Research in Computational Molecular Biology. RECOMB 2022. Lecture Notes in Computer Science(), vol 13278. Springer, Cham. DOI: https://doi.org/10.1007/978-3-031-04749-7_5

Vijini Mallawaarachchi and Yu Lin. Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs. Journal of Computational Biology 2022 29:12, 1357-1376. DOI: https://doi.org/10.1089/cmb.2022.0262

Funding

GraphBin-Tk is funded by an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

gbintk's People

Contributors

anuradhawick avatar gavinhuttley avatar katherinecaley avatar rmcar17 avatar vini2 avatar yapenglang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gbintk's Issues

TST: check that visualising works

Look into using pillow to

  • check that the output format is correct
  • check that the file is consistent with the declared output
  • generate both png and jpeg
  • update the pyproject.toml to include pillow in the testing dependencies
  • pin the pillow version
  • Can pillow extract other information from the file that's useful for testing? e.g. pixels, size in bytes, etc.

File name: test_visualise.py
Function name: test_generate_images

Inputs: file names in the tests/data/5G_metaSPAdes directory.

  • initial: metacoag_res.csv
  • final: graphbin_res.csv
  • graph: assembly_graph_with_scaffolds.gfa
  • contigs: contigs.fasta
  • paths: contigs.paths
  • imgtype: png

How to invoke:

gbintk visualise --assembler spades --initial {initial} --final {final} --graph {graph} --contigs {contigs} --paths {paths} --imgtype {imgtype} --output {outpath}

TST: add tests for MetaCoAG

Add tests for MetaCoAG in gbintk.

MetaCoAG on spades

File name: test_cli_metacoag.py
Function name: test_metacoag_spades_run

Inputs: file names in the tests/data/5G_metaSPAdes directory.

  • graph: assembly_graph_with_scaffolds.gfa
  • contigs: contigs.fasta
  • paths: contigs.paths
  • abundance: coverm_mean_coverage.tsv
  • outpath: tmp_dir

How to invoke:

gbintk metacoag --assembler spades --graph {graph} --contigs {contigs} --paths {paths} --abundance {abundance} --output {outpath}

MetaCoAG on megahit

File name: test_cli_metacoag.py
Function name: test_metacoag_megahit_run

Inputs: file names in the tests/data/5G_MEGAHIT directory.

  • graph: final.gfa
  • contigs: final.contigs.fa
  • abundance: abundance.tsv
  • outpath: tmp_dir

How to invoke:

gbintk metacoag --assembler megahit --graph {graph} --contigs {contigs} --abundance {abundance} --output {outpath}

MetaCoAG on flye

File name: test_cli_metacoag.py
Function name: test_metacoag_flye_run

Inputs: file names in the tests/data/1Y3B_Flye directory.

  • graph: assembly_graph.gfa
  • contigs: assembly.fasta
  • paths: assembly_info.txt
  • abundance: abundance.tsv
  • outpath: tmp_dir

How to invoke:

gbintk metacoag --assembler flye --graph {graph} --contigs {contigs} --paths {paths} --abundance {abundance} --output {outpath}

TST: add tests for GraphBin

Add tests for GraphBin in gbintk.

  • test spades input
  • test megahit input
  • test flye input

GraphBin on spades

File name: test_cli_graphbin.py
Function name: test_graphbin_spades_run

Inputs: file names in the tests/data/5G_metaSPAdes directory.

  • graph: assembly_graph_with_scaffolds.gfa
  • contigs: contigs.fasta
  • paths: contigs.paths
  • binned: initial_contig_bins.csv
  • outpath: tmp_dir

How to invoke:

gbintk graphbin --assembler spades --graph {graph} --contigs {contigs} --paths {paths} --binned {binned} --output {outpath}

GraphBin on megahit

File name: test_cli_graphbin.py
Function name: test_graphbin_megahit_run

Inputs: file names in the tests/data/5G_MEGAHIT directory.

  • graph: final.gfa
  • contigs: final.contigs.fa
  • binned: initial_contig_bins.csv
  • outpath: tmp_dir

How to invoke:

gbintk graphbin --assembler megahit --graph {graph} --contigs {contigs} --binned {binned} --output {outpath}

GraphBin on flye

File name: test_cli_graphbin.py
Function name: test_graphbin_flye_run

Inputs: file names in the tests/data/1Y3B_Flye directory.

  • graph: assembly_graph.gfa
  • contigs: assembly.fasta
  • paths: assembly_info.txt
  • binned: initial_contig_bins.csv
  • outpath: tmp_dir

How to invoke:

gbintk graphbin --assembler flye --graph {graph} --contigs {contigs} --paths {paths} --binned {binned} --output {outpath}

TST: add tests for GraphBin2

Add tests for GraphBin2 in gbintk.

  • test spades input
  • test megahit input
  • test flye input

GraphBin2 on spades

File name: test_cli_graphbin2.py
Function name: test_graphbin2_spades_run

Inputs: file names in the tests/data/5G_metaSPAdes directory.

  • graph: assembly_graph_with_scaffolds.gfa
  • contigs: contigs.fasta
  • paths: contigs.paths
  • binned: initial_contig_bins.csv
  • abundance: abundance.abund
  • outpath: tmp_dir

How to invoke:

gbintk graphbin2 --assembler spades --graph {graph} --contigs {contigs} --paths {paths} --binned {binned} --abundance {abundance} --output {outpath}

GraphBin2 on megahit

File name: test_cli_graphbin2.py
Function name: test_graphbin2_megahit_run

Inputs: file names in the tests/data/5G_MEGAHIT directory.

  • graph: final.gfa
  • contigs: final.contigs.fa
  • binned: initial_contig_bins.csv
  • abundance: abundance.tsv
  • outpath: tmp_dir

How to invoke:

gbintk graphbin2 --assembler megahit --graph {graph} --contigs {contigs} --binned {binned} --abundance {abundance} --output {outpath}

GraphBin2 on flye

File name: test_cli_graphbin2.py
Function name: test_graphbin2_flye_run

Inputs: file names in the tests/data/1Y3B_Flye directory.

  • graph: assembly_graph.gfa
  • contigs: assembly.fasta
  • paths: assembly_info.txt
  • binned: initial_contig_bins.csv
  • abundance: abundance.tsv
  • outpath: tmp_dir

How to invoke:

gbintk graphbin2 --assembler flye --graph {graph} --contigs {contigs} --paths {paths} --binned {binned} --abundance {abundance} --output {outpath}

TST: check that formatting binning results works

Add test cases for prepare subcommand.

  • test megahit results
  • test remaining results (e.g., spades)

MEGAHIT

File name: test_prepare.py
Function name: test_prepare_megahit_res

Inputs: file/folder names in the tests/data/5G_MEGAHIT directory.

  • resfolder: initial_bins
  • output: tmp_dir

How to invoke:

gbintk prepare --assembler megahit --resfolder {resfolder} --output {output}

Other

File name: test_prepare.py
Function name: test_prepare_spades_res

Inputs: file/folder names in the tests/data/5G_metaSPAdes directory.

  • resfolder: initial_bins
  • output: tmp_dir

How to invoke:

gbintk prepare --assembler spades --resfolder {resfolder} --output {output}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.