Coder Social home page Coder Social logo

ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format. about graphbin HOT 12 CLOSED

metagentools avatar metagentools commented on September 24, 2024
ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.

from graphbin.

Comments (12)

jolespin avatar jolespin commented on September 24, 2024 1

Thanks for including the prepResult.py to preprocess everything.
I'll give it a try. So it truncates all of the contig identifiers?

In the future would it be possible to make this a bit more seamless to integrate into existing pipelines? For example, I'm running metabat2 and maxbin2 then I feed everything into DAS_Tool which takes in [scaffold]\t[bin] tsv files with the identifiers unmodified (created using https://github.com/cmks/DAS_Tool/blob/master/src/Scaffolds2Bin_to_Fasta.sh).

If graphbin could take in unmodified identifiers and do the conversion in the backend automatically it would make it much more practical to use in a complicated pipeline where similar versions to the required files are already generated. In an ideal scenario, it would be awesome to have DAS_Tool take in the pre/post graphbin binning for it to calculate a consensus for everything w/ as few steps as possible.

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

Dear @jolespin,

Thank you for posting the issue with the relevant data. I went through your files and found that the binning results file scaffolds_to_bins.csv is not in the format required as input for GraphBin. As the documentation on input format outlines, the binning output file should have comma separated values (contig_identifier, bin_number) for each contig. contig_identifier should follow the pattern specified and bin_number should be a number without any letters or other characters.

GraphBin provides a support script named prepresult.py to generate similar files once the initial binning output folder is provided. You can refer to support/README.md for more details.

Let me know if this won't solve the issue.

I also fixed the --assembler argument and now you can provide SPAdes as well.

Thank you!

from graphbin.

jolespin avatar jolespin commented on September 24, 2024

Thanks for getting back to me. I just tried it with the bins relabeled and got the same error.

(graphbin_env) jespinozlt2-osx:GraphBin jespinoz$ head /Users/jespinoz/Downloads/files/scaffolds_to_bins.relabeled.csv
NODE_14_length_9160_cov_4.404833,0
NODE_15_length_7837_cov_3.578386,0
NODE_16_length_7509_cov_4.024014,0
NODE_17_length_7205_cov_3.142238,0
NODE_18_length_6867_cov_2.992073,0
NODE_19_length_6076_cov_3.252782,0
NODE_20_length_5936_cov_3.975514,0
NODE_21_length_5583_cov_3.496020,0
NODE_22_length_5539_cov_3.360686,0
NODE_24_length_5345_cov_3.093384,0

Most of the binning programs output alphanumeric strings so it seems a bit more difficult to integrate this into existing pipelines. Does prepresult.py format the graph and contigs/scaffolds as well?

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

As the documentation on input format outlines, the (contig_identifier, bin_number) pairs should be in the form

NODE_1,1
NODE_2,1
NODE_3,1
NODE_4,2
NODE_5,2
...

Note that the numbering of bins should start from 1

prepresult.py handles all these cases and outputs the binning result as shown. You can first run the prepresult.py script as

python prepResult.py --binned /path/to/folder_with_binning_result --assembler spades --output /path/to/output_folder

Since MaxBin2 outputs the binning result to .fasta files for each bin, you can provide the path containing these .fasta files to the --binned option.

After getting the formatted binning result, you can input it to GraphBin.

Let me know if it still doesn't work.

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

That would be a great option. I will modify GraphBin to support a more generic convention for inputs and outputs as the one you mentioned [scaffold]\t[bin] where scaffolds/contigs and the bins will have their original identifiers and add a new version of GraphBin.

Thank you very much for the suggestion! I will keep this issue open until I fix it. In the meantime, please let me know if it still doesn't work.

Thank you!

from graphbin.

JSSaini avatar JSSaini commented on September 24, 2024

Hello, I have a similar issue. I can see that it is reporting my binning file (file attached) to contain two bins . However, I can confirm that my binning file has just one bin number. Please see the file attached.

clustering_merged_mod.zip

2021-04-14 17:44:53,372 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2021-04-14 17:44:53,375 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2021-04-14 17:44:53,376 - INFO - Input arguments:
2021-04-14 17:44:53,377 - INFO - Assembly graph file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15p5_Map_GBII/assembly_graph_after_simplification.gfa
2021-04-14 17:44:53,378 - INFO - Contig paths file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15p5_Map_GBII/contigs.paths
2021-04-14 17:44:53,379 - INFO - Existing binning output file: /home/users/s/saini7/scratch/MS2/Anvio2/conc2/spa_concoct2/clustering_merged_mod.csv
2021-04-14 17:44:53,381 - INFO - Final binning output file: ./
2021-04-14 17:44:53,382 - INFO - Maximum number of iterations: 100
2021-04-14 17:44:53,383 - INFO - Difference threshold: 0.1
2021-04-14 17:44:53,384 - INFO - GraphBin started
2021-04-14 17:44:53,393 - INFO - Number of bins available in the initial binning result: 2
2021-04-14 17:44:53,395 - INFO - Constructing the assembly graph
2021-04-14 17:44:53,700 - INFO - Total number of contigs available: 2655
2021-04-14 17:44:57,181 - INFO - Total number of edges in the assembly graph: 10554
2021-04-14 17:44:57,184 - INFO - Obtaining the initial binning result
2021-04-14 17:44:57,188 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2021-04-14 17:44:57,188 - INFO - Exiting GraphBin... Bye...!```

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

Hello @JSSaini,

Your binning file has a header contig_id,cluster_id which should not be included. This is why GraphBin counts cluster_id and 43 as 2 bins. Also I noticed that your contig ids are in the form c_00000000000*. These contig ids should be the form NODE_* as found ids found in your contigs.paths file. You can double check that as well.

GraphBin provides a support script named prepresult.py to generate the binning file once the initial binning output folder is provided. You can refer to support/README.md for more details.

Let me know if this won't solve the issue.

Thank you!

from graphbin.

JSSaini avatar JSSaini commented on September 24, 2024

Still the same error, this time I made sure I have given the appropriate binning file. Have you tried this tool with concoct binning results? Thank you.
conc_graph.txt

2021-04-16 13:27:36,494 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2021-04-16 13:27:36,591 - INFO - This version of GraphBin makes use of the assembly graph produced by SPAdes which is based on the de Bruijn graph approach.
2021-04-16 13:27:36,593 - INFO - Input arguments:
2021-04-16 13:27:36,595 - INFO - Assembly graph file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15mw_Map_GBII/assembly_graph_with_scaffolds.gfa
2021-04-16 13:27:36,596 - INFO - Contig paths file: /home/users/s/saini7/scratch/MS2/Anvio2/Spades_13_15mw_Map_GBII/contigs.paths
2021-04-16 13:27:36,597 - INFO - Existing binning output file: /home/users/s/saini7/scratch/MS2/Anvio2/conc2/spa_concoct2/conc_graph.csv
2021-04-16 13:27:36,599 - INFO - Final binning output file: ./genomes/
2021-04-16 13:27:36,600 - INFO - Maximum number of iterations: 100
2021-04-16 13:27:36,602 - INFO - Difference threshold: 0.1
2021-04-16 13:27:36,603 - INFO - GraphBin started
2021-04-16 13:27:36,609 - INFO - Number of bins available in the initial binning result: 1
2021-04-16 13:27:36,610 - INFO - Constructing the assembly graph
2021-04-16 13:27:36,852 - INFO - Total number of contigs available: 2845
2021-04-16 13:27:37,584 - INFO - Total number of edges in the assembly graph: 11581
2021-04-16 13:27:37,585 - INFO - Obtaining the initial binning result
2021-04-16 13:27:37,589 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2021-04-16 13:27:37,590 - INFO - Exiting GraphBin... Bye...!```

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

Hello @JSSaini,

Sorry for getting back to you late. I fixed some errors in the indexing of the input bins. Can you please get a git pull of the latest code of GraphBin and check your results? It should work fine now.

Let me know if this won't solve the issue.

Thank you!

PS: I haven't added a new release yet with the fix as I plan to add some more fixes. I will add a new release including all the fixes ASAP.

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

Hello @jolespin and @JSSaini,

I have added a new release of GraphBin with fixes to this issue. Please give it a try when you have time.

https://github.com/Vini2/GraphBin/releases/tag/v1.4

Thank you very much for the input.

from graphbin.

jolespin avatar jolespin commented on September 24, 2024

"Fixed naming of SPAdes contigs to have original contig identifiers and updated prepResult.py to reflect this fix." 🙏🏽 thank you! This will make it much easier to implement into existing pipelines

from graphbin.

Vini2 avatar Vini2 commented on September 24, 2024

Closing issue after fixing.

from graphbin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.