Coder Social home page Coder Social logo

metagentools / graphbin Goto Github PK

View Code? Open in Web Editor NEW
81.0 1.0 7.0 55.51 MB

✨🧬 Refined binning of metagenomic contigs using assembly graphs

Home Page: https://graphbin.readthedocs.io/en/stable/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
metagenomics binning contigs assembly-graph label-propagation

graphbin's Introduction

GraphBin logo GraphBin logo

GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs

DOI Anaconda-Server Badge Anaconda-Server Badge PyPI version Downloads

CI codecov Code style: black CodeQL Documentation Status

GraphBin is an NGS data-based metagenomic contig bin refinement tool that makes use of the contig connectivity information from the assembly graph to bin contigs. It utilizes the binning result of an existing binning tool and a label propagation algorithm to correct mis-binned contigs and predict the labels of contigs which are discarded due to short length.

For detailed instructions on installation, usage and visualisation, please refer to the documentation hosted at Read the Docs.

Dependencies

GraphBin installation requires python 3 to run. The following dependencies are required to run GraphBin and related support scripts.

Installing GraphBin

Using Conda

You can install GraphBin using the bioconda distribution. You can download Anaconda or Miniconda which contains conda.

# add channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# create conda environment
conda create -n graphbin

# activate conda environment
conda activate graphbin

# install graphbin
conda install -c bioconda graphbin

# check graphbin installation
graphbin -h

Using pip

You can install GraphBin using pip from the PyPI distribution.

pip install graphbin

For development purposes, please clone the repository and install via flit.

# clone repository to your local machine
git clone https://github.com/metagentools/GraphBin.git

# go to repo directory
cd GraphBin

# install flit
pip install flit

# install graphbin via flit
flit install -s --python `which python`

Example Usage

# SPAdes version
graphbin --assembler spades --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fasta --paths /path/to/paths_file.paths --binned /path/to/binning_result.csv --output /path/to/output_folder

# SGA version
graphbin --assembler sga --graph /path/to/graph_file.asqg --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

# MEGAHIT version
graphbin --assembler megahit --graph /path/to/graph_file.gfa --contigs /path/to/contigs.fa --binned /path/to/binning_result.csv --output /path/to/output_folder

Visualization of the Assembly Graph of ESC+metaSPAdes Test Dataset

Initial Assembly Graph

Initial assembly graph

TAXAassign Labelling

TAXAassign Labelling

Original MaxBin Labelling with 2 Mis-binned Contigs

MaxBin Labelling

Refined Labels

Refined Labels

Final Labelling of GraphBin

Final Labelling

Citation

If you use GraphBin in your work, please cite GraphBin as,

Vijini Mallawaarachchi, Anuradha Wickramarachchi, Yu Lin. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics, Volume 36, Issue 11, June 2020, Pages 3307–3313, DOI: https://doi.org/10.1093/bioinformatics/btaa180

@article{10.1093/bioinformatics/btaa180,
    author = {Mallawaarachchi, Vijini and Wickramarachchi, Anuradha and Lin, Yu},
    title = "{GraphBin: refined binning of metagenomic contigs using assembly graphs}",
    journal = {Bioinformatics},
    volume = {36},
    number = {11},
    pages = {3307-3313},
    year = {2020},
    month = {03},
    abstract = "{The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning. We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs. The source code of GraphBin is available at https://github.com/Vini2/[email protected] or [email protected] data are available at Bioinformatics online.}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa180},
    url = {https://doi.org/10.1093/bioinformatics/btaa180},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/11/3307/33329097/btaa180.pdf},
}

Funding

GraphBin is funded by an Essential Open Source Software for Science Grant from the Chan Zuckerberg Initiative.

graphbin's People

Contributors

alienzj avatar anuradhawick avatar dependabot[bot] avatar gavinhuttley avatar telatin avatar vini2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

graphbin's Issues

TypeError & Exception with spades/MaxBin2 files

Hi GraphBin group, I was able to run GraphBin and get output that looks correct for the majority of my files. I have a subset of 8 of my 45 files that are all getting the same errors. I have double checked the content of these files, which seem to be fine. I'm copying the code & output below, would you let me know if there are any workarounds you might suggest? Thanks!

python ${path}/GraphBin/graphbin.py --assembler spades --graph ${spades_dir}/${name}/assembly_graph_with_scaffolds.gfa --paths ${spades_dir}/${name}/contigs.paths --binned ${path}/graphbin/inputs/CSVs/${name}_initial_contig_bins.csv --output ${outdir}/${name}

2020-07-23 13:12:22,545 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2020-07-23 13:12:22,547 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2020-07-23 13:12:22,547 - INFO - Input arguments:
2020-07-23 13:12:22,547 - INFO - Assembly graph file: ${path}/PM2-C1D1/assembly_graph_with_scaffolds.gfa
2020-07-23 13:12:22,547 - INFO - Contig paths file: ${path}/PM2-C1D1/contigs.paths
2020-07-23 13:12:22,547 - INFO - Existing binning output file: ${path}/inputs/CSVs/PM2-C1D1_initial_contig_bins.csv
2020-07-23 13:12:22,547 - INFO - Final binning output file: ${path}/gb_bins/PM2-C1D1/
2020-07-23 13:12:22,547 - INFO - Maximum number of iterations: 100
2020-07-23 13:12:22,547 - INFO - Difference threshold: 0.1
2020-07-23 13:12:22,547 - INFO - GraphBin started
2020-07-23 13:12:22,567 - INFO - Number of bins available in the initial binning result: 14
2020-07-23 13:12:22,567 - INFO - Constructing the assembly graph
2020-07-23 13:12:23,228 - INFO - Total number of contigs available: 60554
2020-07-23 13:12:28,439 - INFO - Total number of edges in the assembly graph: 7011
2020-07-23 13:12:28,439 - INFO - Obtaining the initial binning result
2020-07-23 13:12:28,452 - INFO - Determining ambiguous vertices
2020-07-23 13:12:28,936 - INFO - Removing labels of ambiguous vertices
2020-07-23 13:12:28,988 - INFO - Obtaining the refined binning result
2020-07-23 13:12:28,988 - INFO - Deteremining vertices which are not isolated and not in components without any labels
2020-07-23 13:12:36,162 - INFO - Number of non-isolated contigs: 4678
Traceback (most recent call last):
File "${path}/GraphBin/src/labelpropagation/labelprop.py", line 113, in process_data_line
for edge in edges:
TypeError: 'int' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "${path}/GraphBin/src/graphbin_SPAdes.py", line 497, in
lp.load_data_from_mem(data)
File "${path}/GraphBin/src/labelpropagation/labelprop.py", line 99, in load_data_from_mem
self.process_data_line(line)
File "${path}/apps/GraphBin/src/labelpropagation/labelprop.py", line 121, in process_data_line
raise Exception("Coundn't parse vertex from line")
Exception: Coundn't parse vertex from line

SPAdes-MaxBin2 bins with renamed contigs

Hello,

I ran SPAdes assemblies and, before binning with MaxBin2, I renamed the assembly contigs with simple deflines (eg >c_0000001, >c_0000002, etc.). All the bins thus have the new simpler contig names. To run GraphBin, I replaced all the contig names in the original SPAdes 'contigs.paths' file with the corresponding renamed deflines. The bin mapping file also uses the new contig names.

I've modified all the input files with the renamed contig deflines, but GraphBin still seems to think the contigs.paths file does not exist. Does it require contigs to have the standard SPAdes name formats if the assembler input is --spades? My full command is below.

graphbin --assembler spades --contigs contigs-renamed.fasta --graph assembly_graph_with_scaffolds.gfa --paths contigs-renamed.paths --binned MaxBin2_graphbin_map.csv --output graphbin

thanks,
Nastassia

Confused with binning output file

Hello, thanks for the exciting tool.
I would like to try out the tool but I am not sure about the requested binning file.

I tried using the prepResult.py script but I suspect the output is wrong. As input, I used the folder of the SPAdes output (metasample1/metaSpades). I ran it as following
python prepResult.py --binned 'metasample1/metaSpades' --assembler SPAdes --output 'metasample1/metaSpades/z_graphbin'

The following message was sent to stdout:
Formatting initial binning results

Writing initial binning results to output file

Formatted initial binning results can be found at /metasample1/metaSpades/z_graphbin/initial_contig_bins.csv

Bin IDs and corresponding names of fasta files can be found at metasample1/metaSpades/z_graphbin/bin_ids.csv

Thank you for using prepResult for GraphBin!

The file bin_ids.csv has this:
before_rr.fasta,1
contigs.fasta,2
first_pe_contigs.fasta,3
scaffolds.fasta,4

While the file initial_contig_bins.csv has this:
NODE_1,1
NODE_2,1
NODE_3,1
...
NODE_452809,4
NODE_452810,4

If I understood correctly, does this mean that all contigs belong to 4 bins?
Also, if this is correct which .gfa file should I use as input? SPAdes produces assembly_graph_after_simplification.gfa, assembly_graph_with_scaffolds.gfa, and strain_graph.gfa. I tried using all of them with contigs.paths and initial_contig_bins.csv
and obtained the same error:
ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.

Sorry for the long post I am new to whole metagenomics, trying to catch up.

ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.

(graphbin_env) jespinozlt2-osx:GraphBin jespinoz$ python graphbin --graph ~/assembly_graph_with_scaffolds.gfa --binned ~/scaffolds_to_bins.csv --output graphin_output --paths ~/scaffolds.paths --assembler "spades"
2021-03-27 13:24:53,962 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2021-03-27 13:24:53,962 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2021-03-27 13:24:53,962 - INFO - Input arguments:
2021-03-27 13:24:53,962 - INFO - Assembly graph file: /Users/jespinoz/assembly_graph_with_scaffolds.gfa
2021-03-27 13:24:53,962 - INFO - Contig paths file: /Users/jespinoz/scaffolds.paths
2021-03-27 13:24:53,962 - INFO - Existing binning output file: /Users/jespinoz/binning/scaffolds_to_bins.csv
2021-03-27 13:24:53,962 - INFO - Final binning output file: graphin_output/
2021-03-27 13:24:53,962 - INFO - Maximum number of iterations: 100
2021-03-27 13:24:53,962 - INFO - Difference threshold: 0.1
2021-03-27 13:24:53,962 - INFO - GraphBin started
2021-03-27 13:24:53,964 - INFO - Number of bins available in the initial binning result: 2
2021-03-27 13:24:53,964 - INFO - Constructing the assembly graph
2021-03-27 13:24:54,173 - INFO - Total number of contigs available: 25728
2021-03-27 13:24:59,473 - INFO - Total number of edges in the assembly graph: 1373
2021-03-27 13:24:59,473 - INFO - Obtaining the initial binning result
2021-03-27 13:24:59,473 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2021-03-27 13:24:59,473 - INFO - Exiting GraphBin... Bye...!

I can't figure out what is going wrong with my files. I used metaspades and MaxBin2 for my binning.

Here is my version:

(graphbin_env) jespinozlt2-osx:GraphBin jespinoz$ python graphbin --version
GraphBin version 1.3

Also, "SPAdes" isn't an accepted argument.

Any help would be greatly appreciated.

files.zip

graphbin_SPAdes.py path and ModuleNotFoundError: No module named 'igraph'

Hi, I recently heard about your tool and am hoping it can improve some of my binning results.

I cloned the version on github today (July 17 2020) and installed following the recommendations on the wiki. I have encountered a couple of errors that may be easy to resolve but wanted to share.

1 - I think the path to the assembler-specific scripts may be missing a forward slash. i got the following Errno 2:

python ${path}/apps/GraphBin/graphbin.py --assembler spades --graph assembly_graph_with_scaffolds.gfa --paths contigs.paths --binned ../initial_contig_bins.csv --output ../../gb_bins/
python: can't open file '${path}/apps/GraphBinsrc/graphbin_SPAdes.py': [Errno 2] No such file or directory

Looking in the graphbin.py script, I added a forward slash in the SPAdes section so it points to: ${path}/apps/GraphBin/src/graphbin_SPAdes.py

which worked.

2 - then I got a missing module error:

python ${path}//apps/GraphBin/graphbin.py --assembler spades --graph assembly_graph_with_scaffolds.gfa --paths contigs.paths --binned ../initial_contig_bins.csv --output ../../gb_bins/
Traceback (most recent call last):
File "${path}//apps/GraphBin/src/graphbin_SPAdes.py", line 24, in
from igraph import *
ModuleNotFoundError: No module named 'igraph'

I'm working on a cluster and was able to install igraph locally and get GraphBin to run but wanted to share this in case others have these issues.

qusetion of score


Hello, I want to ask you how to calculate ARI in graphbin, because the number of contings marked by different bining tools is different. For example, metabat has a very high precison, but the number of contings that can be obtained is very small. How do you weigh the different number of different tools when calculating ARI? If only calculate the corresponding number of contings bined into bins , the ARI of metabat should be very high,is it?

Another question. When will metacoag be officially released? Can I quote your method in mt paper?It’s a good tools.

Thank you very much!

using pytest fixtures for cleaning up test output directories

import pytest

@pytest.fixture(scope="session")
def tmp_dir(tmpdir_factory):
    return tmpdir_factory.mktemp("sqlitedb")


@pytest.fixture(autouse=True)
def workingdir(tmp_dir, monkeypatch):
    # this set's the working directory for all tests in this module
    # as a tmp dir
    monkeypatch.chdir(tmp_dir)

def test_assert_something(tmp_dir):
    # this will be running within workingdir auto-magically thanks to pytest
    # run commands so that they write output to tmp_dir

How to run fastg2gfa?

Hello,

A long-time user of GraphBin recommended this program to me, and I'm excited to use it. Yesterday, I was able to install the software successfully using the instructions on Github (the ones on readthedocs page didn't work out), but since then, I'm having a couple of issues running GraphBin.

My main issue deals with the fastg2gfa script. I have questions about this.

1.1: My install of the parent software, gfaview is failing. After git clone and make, I get the following error

$ make
make: Warning: File `gfa.c' has modification time 17 s in the future
gcc -c -g -Wall -Wc++-compat -O2  -I. gfa.c -o gfa.o
gfa.c: In function ‘gfa_print’:
gfa.c:534:17: warning: variable ‘len’ set but not used [-Wunused-but-set-variable]
    int max = 0, len;
                 ^
gfa.c:564:17: warning: variable ‘len’ set but not used [-Wunused-but-set-variable]
    int max = 0, len;
                 ^
gcc -c -g -Wall -Wc++-compat -O2  -I. gfaview.c -o gfaview.o
gcc -g -Wall -Wc++-compat -O2 gfa.o gfaview.o -o gfaview -lz
make: warning:  Clock skew detected.  Your build may be incomplete.

How do I fix this, please?

1.2 Even if I got gfaview to compile properly, how do I run a script that is in the misc directory of this program?

Any help troubleshooting this will be much appreciated. Thank you very much.

Running flye assemblies and getting error wanting contigs.paths file for spades

Hello,

I am running graphbin v 1.7.1 and python version 3.10.13

First, I was unable to get the python support scripts to work so I renamed all of my files to make naming consistent and made the csv file with custom scripts.

When I tried to run the following code:

graphbin --contigs /home/ejunkins/LS01_hifi_coveragebin/LS01_001_assembly_renamed_edges.fasta --binned /home/ejunkins/LS01_hifi_coveragebin/bins/001/contignames/edges/LS01_001_all_edges_graphbin.csv --graph /home/ejunkins/jgi_assemblygraphs/NGXTG/flye/assembly_graph.gfa --output /home/ejunkins/LS01_hifi_coveragebin/bins/001/graphbin_out --prefix graphbin_metabat2_LS01_001_bin_with_cov --assembler flye

I get this error:
2024-02-12 12:28:23,452 - ERROR - Please make sure to provide the path to the contigs.paths file. 2024-02-12 12:28:23,453 - INFO - Exiting GraphBin... Bye...!

My understanding was that this was only for spades assemblies...

GraphBin won't work with MEGAHIT graph

Hi!

I'm testing GraphBin with my data and I'm unable to use it with a MEGAHIT graph.

2020-07-01 19:14:50,429 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2020-07-01 19:14:50,429 - INFO - This version of GraphBin makes use of the assembly graph produced by MEGAHIT which is based on the de Bruijn graph approach.
2020-07-01 19:14:50,429 - INFO - Assembly graph file: ../assembly/assembly.graph.gfa
2020-07-01 19:14:50,429 - INFO - Existing binning output file: ../metabat_bins.csv
2020-07-01 19:14:50,429 - INFO - Final binning output file: ../graphbin_result/
2020-07-01 19:14:50,430 - INFO - Maximum number of iterations: 100
2020-07-01 19:14:50,430 - INFO - Difference threshold: 0.1
2020-07-01 19:14:50,430 - INFO - GraphBin started
2020-07-01 19:14:50,464 - INFO - Number of bins available in the initial binning result: 26
2020-07-01 19:14:50,464 - INFO - Constructing the assembly graph
2020-07-01 19:14:59,047 - INFO - Total number of contigs available: 0
2020-07-01 19:14:59,177 - INFO - Total number of edges in the assembly graph: 0
2020-07-01 19:14:59,178 - INFO - Obtaining the initial binning result
2020-07-01 19:14:59,179 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2020-07-01 19:14:59,179 - INFO - Exiting GraphBin... Bye...!

The graph was generated with megahit_toolkit contig2fastg, however the format is different from the examples in this repository:

>NODE_1_length_302_cov_2.0000_ID_1;
GTGGACCTCTCAGCGGTCATTCACGAAGAAACCCAGGATGACCTCCATCGCCGCCGACGGCGTTTCGTACGCACGCCAGCAGTCGGATTTCGATCTGTACCGCCGTGGAAGCACGTGGTACCTGGTGGAGAACGGCGTCTGGTTCCGCTCCGATTCGTGGAAGGGCCCTTTCGTGTCGATCCGCGCGAAGGATGTTCCGAGGGCCATCTGGAGCATCCCGCCGGCCTACCGACGCCACTGGGTTCCAGCCGTTCGCTAGACGAGCGGGGTCCCTGGGCGCCGGGGCTGTATAGCGCCTCGGG
>NODE_1_length_302_cov_2.0000_ID_1';
CCCGAGGCGCTATACAGCCCCGGCGCCCAGGGACCCCGCTCGTCTAGCGAACGGCTGGAACCCAGTGGCGTCGGTAGGCCGGCGGGATGCTCCAGATGGCCCTCGGAACATCCTTCGCGCGGATCGACACGAAAGGGCCCTTCCACGAATCGGAGCGGAACCAGACGCCGTTCTCCACCAGGTACCACGTGCTTCCACGGCGGTACAGATCGAAATCCGACTGCTGGCGTGCGTACGAAACGCCGTCGGCGGCGATGGAGGTCATCCTGGGTTTCTTCGTGAATGACCGCTGAGAGGTCCAC
>NODE_2_length_305_cov_1.0000_ID_3;
GTGCCGCCGCCGCCGAAGAAGATGCCACTGACTACGGCGTTCCAGCCGCTGATTCGGGCCAGCTTGAACCGCACGTTTCCGGCCACGTTCCAGATCAGGTATGCACTCTTCCGTGGCAATAAAGCGCATGGCGCGCAAAGCCCGAACTTTATGCAGCGAGTTTCCCTCTTTAATCAGCTCCCCTAAATTTTCTTGCAGGGCCGTCCGTTCTTGGGCAATTTTTGCGGGCGGAATTTGGCCGGCGGCTTCGGCGGCTATAAAGAGCGCGCGCCGGCGCGCCAGCTTTTCGCCGTCGCTTTTTAGGC
>NODE_2_length_305_cov_1.0000_ID_3';
GCCTAAAAAGCGACGGCGAAAAGCTGGCGCGCCGGCGCGCGCTCTTTATAGCCGCCGAAGCCGCCGGCCAAATTCCGCCCGCAAAAATTGCCCAAGAACGGACGGCCCTGCAAGAAAATTTAGGGGAGCTGATTAAAGAGGGAAACTCGCTGCATAAAGTTCGGGCTTTGCGCGCCATGCGCTTTATTGCCACGGAAGAGTGCATACCTGATCTGGAACGTGGCCGGAAACGTGCGGTTCAAGCTGGCCCGAATCAGCGGCTGGAACGCCGTAGTCAGTGGCATCTTCTTCGGCGGCGGCGGCAC
(...)

Starting from a failed point

Hello,

Is there a way to restart Graphbin from a checkpoint if something fails? I had a script running for 4 days that failed due to a node issue and I'd like not to have to wait that long again.

Please add this feature if it currently does not exist. It would be very helpful.

Thank you,
Taruna

project refactor to improve portability

Suggest the following structural changes to enable distribution via PyPI and also for Windows users

src/
   graph bin/
      __init__.py  # this should be your current graphbin file
      utils/
         ... # all files currently under graphbin_utils
      support/
         ... # all files currently under support
tests/
   data/  # test_data dir renamed to here
   ... test scripts
pyproject.toml  # replace setup.py with this, hook into scripts

useful helper function for testing cli apps

import subprocess

def exec_command(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE):
    """executes shell command and returns stdout if completes exit code 0

    Parameters
    ----------

    cmnd : str
      shell command to be executed
    stdout, stderr : streams
      Default value (PIPE) intercepts process output, setting to None
      blocks this."""
    proc = subprocess.Popen(cmnd, shell=True, stdout=stdout, stderr=stderr)
    out, err = proc.communicate()
    if proc.returncode != 0:
        raise RuntimeError(f"FAILED: {cmnd}\n{err}")
    return out.decode("utf8") if out is not None else None

Then you can write test functions as

import pytest

def test_some_command():
    cmd = "graphbin <args>"
    exec_command(cmd)

Can GraphBin be used with a co-binning approach?

I've used MEGAHIT to assemble samples individually, and then ran vamb in order to bin them all together.

I was wondering whether GraphBin can cope with / be used to refine this type of input. There is one binning input, so that should be all right as long as I make sure the contig names are the same. For the contigs file, I can concatenate the individual contigs so there is one input file. But I'm confused about the assembly graph file. I guess I could concatenate all neccesary fastg files, while taking care to have only one begin and end line, and then convert to gfa...but in that case, should the fastg file include one, or multiple 'assembly name' lines? Do you have any idea?

Kind regards,

Laura

Suggestion for prepResult.py

Hi!

I've noticed that prepResult.py doesn't support .fna files, which is pretty common for bins. I'd be cool if support for this extension was added.

Also, I noticed that subprocess is not being imported into the script, causing a NameError: name 'subprocess' is not defined error.

About the weights between two contigs

Hello!
Thanks for your research.I found that this research did not consider the weight between two contigs.I wonder if the weight between two contigs has an effect on the final clustering result. At the same time,whether the connection between two contigs with weight can be generated through the script in this article.
Thanks!

Conda release

Hi,

thank you for providing this package. I'm excited to use it.

Would you consider adding it a Conda repository such as Bioconda? I believe this would improve the installation process and make it more accessible for users.

I'd be glad to help drafting a recipe so the package can be added to Bioconda, if you agree.

Best wishes,
Vini

Threads option for GraphBin

Hi!

I was wondering if there is any threads option for GraphBin. The help page for the command did not mention any such option for the tool and wanted to know if the tool automatically picks up that information?

Also, if the tool is single-threaded, is it possible to explore a multi-threaded version of the tool for future updates?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.