Comments (12)
Thanks for including the prepResult.py
to preprocess everything.
I'll give it a try. So it truncates all of the contig identifiers?
In the future would it be possible to make this a bit more seamless to integrate into existing pipelines? For example, I'm running metabat2 and maxbin2 then I feed everything into DAS_Tool which takes in [scaffold]\t[bin]
tsv files with the identifiers unmodified (created using https://github.com/cmks/DAS_Tool/blob/master/src/Scaffolds2Bin_to_Fasta.sh).
If graphbin could take in unmodified identifiers and do the conversion in the backend automatically it would make it much more practical to use in a complicated pipeline where similar versions to the required files are already generated. In an ideal scenario, it would be awesome to have DAS_Tool take in the pre/post graphbin binning for it to calculate a consensus for everything w/ as few steps as possible.
from graphbin.
Dear @jolespin,
Thank you for posting the issue with the relevant data. I went through your files and found that the binning results file scaffolds_to_bins.csv
is not in the format required as input for GraphBin. As the documentation on input format outlines, the binning output file should have comma separated values (contig_identifier, bin_number)
for each contig. contig_identifier
should follow the pattern specified and bin_number
should be a number without any letters or other characters.
GraphBin provides a support script named prepresult.py to generate similar files once the initial binning output folder is provided. You can refer to support/README.md for more details.
Let me know if this won't solve the issue.
I also fixed the --assembler
argument and now you can provide SPAdes
as well.
Thank you!
from graphbin.
Thanks for getting back to me. I just tried it with the bins relabeled and got the same error.
(graphbin_env) jespinozlt2-osx:GraphBin jespinoz$ head /Users/jespinoz/Downloads/files/scaffolds_to_bins.relabeled.csv
NODE_14_length_9160_cov_4.404833,0
NODE_15_length_7837_cov_3.578386,0
NODE_16_length_7509_cov_4.024014,0
NODE_17_length_7205_cov_3.142238,0
NODE_18_length_6867_cov_2.992073,0
NODE_19_length_6076_cov_3.252782,0
NODE_20_length_5936_cov_3.975514,0
NODE_21_length_5583_cov_3.496020,0
NODE_22_length_5539_cov_3.360686,0
NODE_24_length_5345_cov_3.093384,0
Most of the binning programs output alphanumeric strings so it seems a bit more difficult to integrate this into existing pipelines. Does prepresult.py
format the graph and contigs/scaffolds as well?
from graphbin.
As the documentation on input format outlines, the (contig_identifier, bin_number)
pairs should be in the form
NODE_1,1
NODE_2,1
NODE_3,1
NODE_4,2
NODE_5,2
...
Note that the numbering of bins should start from 1
prepresult.py
handles all these cases and outputs the binning result as shown. You can first run the prepresult.py
script as
python prepResult.py --binned /path/to/folder_with_binning_result --assembler spades --output /path/to/output_folder
Since MaxBin2 outputs the binning result to .fasta files for each bin, you can provide the path containing these .fasta files to the --binned
option.
After getting the formatted binning result, you can input it to GraphBin.
Let me know if it still doesn't work.
from graphbin.
That would be a great option. I will modify GraphBin to support a more generic convention for inputs and outputs as the one you mentioned [scaffold]\t[bin]
where scaffolds/contigs and the bins will have their original identifiers and add a new version of GraphBin.
Thank you very much for the suggestion! I will keep this issue open until I fix it. In the meantime, please let me know if it still doesn't work.
Thank you!
from graphbin.
Hello, I have a similar issue. I can see that it is reporting my binning file (file attached) to contain two bins . However, I can confirm that my binning file has just one bin number. Please see the file attached.
2021-04-14 17:44:53,372 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2021-04-14 17:44:53,375 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2021-04-14 17:44:53,376 - INFO - Input arguments:
2021-04-14 17:44:53,377 - INFO - Assembly graph file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15p5_Map_GBII/assembly_graph_after_simplification.gfa
2021-04-14 17:44:53,378 - INFO - Contig paths file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15p5_Map_GBII/contigs.paths
2021-04-14 17:44:53,379 - INFO - Existing binning output file: /home/users/s/saini7/scratch/MS2/Anvio2/conc2/spa_concoct2/clustering_merged_mod.csv
2021-04-14 17:44:53,381 - INFO - Final binning output file: ./
2021-04-14 17:44:53,382 - INFO - Maximum number of iterations: 100
2021-04-14 17:44:53,383 - INFO - Difference threshold: 0.1
2021-04-14 17:44:53,384 - INFO - GraphBin started
2021-04-14 17:44:53,393 - INFO - Number of bins available in the initial binning result: 2
2021-04-14 17:44:53,395 - INFO - Constructing the assembly graph
2021-04-14 17:44:53,700 - INFO - Total number of contigs available: 2655
2021-04-14 17:44:57,181 - INFO - Total number of edges in the assembly graph: 10554
2021-04-14 17:44:57,184 - INFO - Obtaining the initial binning result
2021-04-14 17:44:57,188 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2021-04-14 17:44:57,188 - INFO - Exiting GraphBin... Bye...!```
from graphbin.
Hello @JSSaini,
Your binning file has a header contig_id,cluster_id
which should not be included. This is why GraphBin counts cluster_id
and 43
as 2 bins. Also I noticed that your contig ids are in the form c_00000000000*
. These contig ids should be the form NODE_*
as found ids found in your contigs.paths
file. You can double check that as well.
GraphBin provides a support script named prepresult.py to generate the binning file once the initial binning output folder is provided. You can refer to support/README.md for more details.
Let me know if this won't solve the issue.
Thank you!
from graphbin.
Still the same error, this time I made sure I have given the appropriate binning file. Have you tried this tool with concoct binning results? Thank you.
conc_graph.txt
2021-04-16 13:27:36,494 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2021-04-16 13:27:36,591 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2021-04-16 13:27:36,593 - INFO - Input arguments:
2021-04-16 13:27:36,595 - INFO - Assembly graph file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15mw_Map_GBII/assembly_graph_with_scaffolds.gfa
2021-04-16 13:27:36,596 - INFO - Contig paths file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15mw_Map_GBII/contigs.paths
2021-04-16 13:27:36,597 - INFO - Existing binning output file: /home/users/s/saini7/scratch/MS2/Anvio2/conc2/spa_concoct2/conc_graph.csv
2021-04-16 13:27:36,599 - INFO - Final binning output file: ./genomes/
2021-04-16 13:27:36,600 - INFO - Maximum number of iterations: 100
2021-04-16 13:27:36,602 - INFO - Difference threshold: 0.1
2021-04-16 13:27:36,603 - INFO - GraphBin started
2021-04-16 13:27:36,609 - INFO - Number of bins available in the initial binning result: 1
2021-04-16 13:27:36,610 - INFO - Constructing the assembly graph
2021-04-16 13:27:36,852 - INFO - Total number of contigs available: 2845
2021-04-16 13:27:37,584 - INFO - Total number of edges in the assembly graph: 11581
2021-04-16 13:27:37,585 - INFO - Obtaining the initial binning result
2021-04-16 13:27:37,589 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2021-04-16 13:27:37,590 - INFO - Exiting GraphBin... Bye...!```
from graphbin.
Hello @JSSaini,
Sorry for getting back to you late. I fixed some errors in the indexing of the input bins. Can you please get a git pull of the latest code of GraphBin and check your results? It should work fine now.
Let me know if this won't solve the issue.
Thank you!
PS: I haven't added a new release yet with the fix as I plan to add some more fixes. I will add a new release including all the fixes ASAP.
from graphbin.
I have added a new release of GraphBin with fixes to this issue. Please give it a try when you have time.
https://github.com/Vini2/GraphBin/releases/tag/v1.4
Thank you very much for the input.
from graphbin.
"Fixed naming of SPAdes contigs to have original contig identifiers and updated prepResult.py to reflect this fix." 🙏🏽 thank you! This will make it much easier to implement into existing pipelines
from graphbin.
Closing issue after fixing.
from graphbin.
Related Issues (20)
- Can GraphBin be used with a co-binning approach? HOT 2
- SPAdes-MaxBin2 bins with renamed contigs HOT 2
- About the weights between two contigs HOT 2
- qusetion of score HOT 1
- project refactor to improve portability
- Fix script attributes and update contributors
- Setup test suit
- useful helper function for testing cli apps
- using pytest fixtures for cleaning up test output directories
- Change software license
- Speedup final file write process
- TST: Setup nox testing
- update docs
- BUG: Validate args.paths check for Flye input
- How to run fastg2gfa? HOT 1
- Starting from a failed point
- ENH: Convert to use `click`
- link disappeared HOT 3
- Please depend on 'igraph' instead of 'python-igraph' on PyPI HOT 2
- Running flye assemblies and getting error wanting contigs.paths file for spades HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphbin.