Coder Social home page Coder Social logo

Comments (9)

apcamargo avatar apcamargo commented on July 16, 2024 2

Thank you! I just executed it and it seems to be working fine

from graphbin.

apcamargo avatar apcamargo commented on July 16, 2024 1

No problems! Actually, it was quite obvious that there was no way of GraphBin associating the nodes with the contig names. I could have noticed that before.

A suggestion that I can give is providing a option for the user to provide the regular FASTA as well as the FASTG so that GraphBin can associate the two naming schemes. This way the user can use GraphBin with existing bins and wouldn't have to re-bin their metagenomes using the FASTG file.

from graphbin.

Vini2 avatar Vini2 commented on July 16, 2024

Hello!

Thank you for raising this issue. The MEGAHIT version of GraphBin only works for the standard GFA format graphs. You must convert the fastg graph to gfa using fastg2gfa. We have mentioned this in our documentation here.

Let us know if it still does not work.

In future, we will incorporate the functionality to directly input the MEGAHIT assembly to GraphBin.

from graphbin.

apcamargo avatar apcamargo commented on July 16, 2024

Thanks for adding this to the documentation!

I'm now getting a different error.

2020-07-02 07:53:27,873 - INFO - Welcome to GraphBin: Refined Binning of Metagenomic Contigs using Assembly Graphs.
2020-07-02 07:53:27,873 - INFO - This version of GraphBin makes use of the assembly graph produced by MEGAHIT which is based on the de Bruijn graph approach.
2020-07-02 07:53:27,873 - INFO - Assembly graph file: ../assembly/hydrothermal_vent_lau_basin.graph.gfa
2020-07-02 07:53:27,873 - INFO - Existing binning output file: ../metabat_bins.csv
2020-07-02 07:53:27,873 - INFO - Final binning output file: ../graphbin_result/
2020-07-02 07:53:27,873 - INFO - Maximum number of iterations: 100
2020-07-02 07:53:27,874 - INFO - Difference threshold: 0.1
2020-07-02 07:53:27,874 - INFO - GraphBin started
2020-07-02 07:53:27,926 - INFO - Number of bins available in the initial binning result: 343
2020-07-02 07:53:27,927 - INFO - Constructing the assembly graph
2020-07-02 07:53:49,322 - INFO - Total number of contigs available: 2163552
2020-07-02 07:53:53,866 - INFO - Total number of edges in the assembly graph: 979926
2020-07-02 07:53:53,866 - INFO - Obtaining the initial binning result
2020-07-02 07:53:53,913 - ERROR - Please make sure that you have provided the correct assembler type and the correct path to the binning result file in the correct format.
2020-07-02 07:53:53,913 - INFO - Exiting GraphBin... Bye...!

The assembler and binning result file are correct. Is this caused by a difference between the naming schemes in the assembly (eg.: k141_2064713) and gfa (eg.: NODE_1_length_769_cov_1.0000_ID_1) files? I've noticed that the assemblies in the example data have the NODE naming scheme, but that's not MEGAHIT's default.

from graphbin.

Vini2 avatar Vini2 commented on July 16, 2024

The contig IDs of the MEGAHIT's assembly graph are used as they are (in the form NODE_1_length_301_cov_1.0000_ID_1).

As mentioned in the README file, the format for the initial binning result should be as (NODE_num,Bin_num)

NODE_1,1
NODE_2,2
NODE_3,1
...

Bin IDs start from 1.

Can you check whether the format of the binning result is like this? You can prepare the initial binning result using the script prepResult.py (tested on MaxBin2, MetaBAT2, MetaWatt or BusyBee).

Let us know if it still does not work.

from graphbin.

apcamargo avatar apcamargo commented on July 16, 2024

I executed MetaBAT using the output FASTA file from MEGAHIT, which has the k141_* naming scheme. Because the documentation says "Contigs are named according to their original identifier and the numbering of bins starts from 1" I thought I should have had maintained this form in the binning result file.

I prepared the binning result file manually because prepResult.py threw an error:

Contig naming does not match with the assembler type provided. Please make sure to provide the correct assembler type.

Now I understand that this was due to the contig name scheme.

So, GraphBins expects the binning to be performed with the fastg file instead of the regular FASTA?

from graphbin.

Vini2 avatar Vini2 commented on July 16, 2024

You have understood correctly. GraphBin expects the initial binning to be performed with the sequences in the fastg file produced, not the final fasta file with the k<>_ naming scheme.

We apologise for the confusing pipeline. I will keep the issue open and let you know once it is fixed ASAP.

Thank you very much for raising this issue!

from graphbin.

Vini2 avatar Vini2 commented on July 16, 2024

Hello @apcamargo,

I have fixed the issue with the confusion in contig IDs with the MEGAHIT version. Now it requires to input the contigs file to map the k<>_* IDs to the NODE_* IDs in the .gfa assembly graph file. I have updated the prepResult.py script to suit the k<>_* naming scheme as well.

Commit ID: dc6288c0729d85fd131c7c9a62b4160ec9de9423.

Thank you!

from graphbin.

Vini2 avatar Vini2 commented on July 16, 2024

Closing this issue after fixing.

from graphbin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.