Coder Social home page Coder Social logo

Groot parser implementation about hamronization HOT 5 CLOSED

pha4ge avatar pha4ge commented on August 28, 2024
Groot parser implementation

from hamronization.

Comments (5)

fmaguire avatar fmaguire commented on August 28, 2024

I figure because its dependent on what data you create the db with we can't parse too tightly for groot.

gene_name, gene_symbol, reference_accession are mandatory, but as we can't guarantee input formatting we should probably just sling the same thing into gene_symbol, gene_name and reference_accession

3003470 is the ARO accession and I think the other numbers are related to indexed locations in the variation graph and clusters.

from hamronization.

cimendes avatar cimendes commented on August 28, 2024

Maybe @will-rowe can be of assistance here? :)

from hamronization.

will-rowe avatar will-rowe commented on August 28, 2024

Heya. This looks like a great and much needed project! Groot hasn't received much love recently. What do you need? Sounds like @fmaguire is right though - as users can change the input DB, it is going to be hard to write a generic parser? Happy to make updates to groot if needed

from hamronization.

cimendes avatar cimendes commented on August 28, 2024

Hey @will-rowe! Thanks for joining the discussion! Could you please clarify what is in the groot's report? Maybe adding some headers to the tsv file would be a nice addition.. :P I've quite a bit of trouble mapping it to our AMR spec. (Warning: this is very much WIP!). Both gene symbol and gene name are mandatory fields, and having duplicated information there feel a little bit to me like "cheating". :P I would like to avoid that if possible.

from hamronization.

will-rowe avatar will-rowe commented on August 28, 2024

Sorry @cimendes - dropped the ball here.

The report is 4 column tsv where you have:

  • ARG name
  • mapped read count
  • ARG reference length
  • CIGAR to describe reference coverage

I thought this was in the docs but I can't find it - sorry! Will add it. The ARG name is just lifted from whatever input was used for indexing. So in your linked example, that is just the header from the CARD-3.0.4 multifasta.

This does need improving and groot seems to still be going strong so I need to work on this. Open to suggestions on how though. One way to do it could be to have a flag provided to the report subcommand which you can use to sanitise the report based on a database (CARD/resfinder). So it could lookup the multifasta header against your AMR spec for CARD/resfinder. Is there a way to do this already? This also means that if a user didn't use CARD/resfinder, it would fall back to the old behaviour of just using the multifasta header. I'd update the report format to have consistent fields regardless though (possibly just duplicating gene symbol and gene name if sanitisation wasn't possible/requested

from hamronization.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.