Comments (5)
I figure because its dependent on what data you create the db with we can't parse too tightly for groot.
gene_name
, gene_symbol
, reference_accession
are mandatory, but as we can't guarantee input formatting we should probably just sling the same thing into gene_symbol
, gene_name
and reference_accession
3003470
is the ARO accession and I think the other numbers are related to indexed locations in the variation graph and clusters.
from hamronization.
Maybe @will-rowe can be of assistance here? :)
from hamronization.
Heya. This looks like a great and much needed project! Groot hasn't received much love recently. What do you need? Sounds like @fmaguire is right though - as users can change the input DB, it is going to be hard to write a generic parser? Happy to make updates to groot if needed
from hamronization.
Hey @will-rowe! Thanks for joining the discussion! Could you please clarify what is in the groot's report? Maybe adding some headers to the tsv file would be a nice addition.. :P I've quite a bit of trouble mapping it to our AMR spec. (Warning: this is very much WIP!). Both gene symbol
and gene name
are mandatory fields, and having duplicated information there feel a little bit to me like "cheating". :P I would like to avoid that if possible.
from hamronization.
Sorry @cimendes - dropped the ball here.
The report is 4 column tsv where you have:
- ARG name
- mapped read count
- ARG reference length
- CIGAR to describe reference coverage
I thought this was in the docs but I can't find it - sorry! Will add it. The ARG name is just lifted from whatever input was used for indexing. So in your linked example, that is just the header from the CARD-3.0.4 multifasta.
This does need improving and groot seems to still be going strong so I need to work on this. Open to suggestions on how though. One way to do it could be to have a flag provided to the report subcommand which you can use to sanitise the report based on a database (CARD/resfinder). So it could lookup the multifasta header against your AMR spec for CARD/resfinder. Is there a way to do this already? This also means that if a user didn't use CARD/resfinder, it would fall back to the old behaviour of just using the multifasta header. I'd update the report format to have consistent fields regardless though (possibly just duplicating gene symbol
and gene name
if sanitisation wasn't possible/requested
from hamronization.
Related Issues (20)
- help understanding resfinder run HOT 1
- ORF_ID missing once RGI report hAMRonized HOT 1
- RgiIO.py: Typo in line 79 HOT 1
- Fix issue of very similar runs falsely combining results in summary
- Genetic_variation_type HOT 1
- hamronize summarize - local variable 'parsed_report' reference before assignment HOT 3
- Flag overlapping ranges in hAMRonization
- AMR Variant detection - Parsers to be updated HOT 1
- Obtain specification field data information from JSON schema HOT 1
- Update README
- Add xlsx output HOT 1
- Add CONTRIBUTING.md
- Add fARGene
- PyPi not updated, 1.0.4 tarball reports version 1.0.3 HOT 2
- [BUG] `KeyError: 'reference_database_name'` when running summarize HOT 9
- [BUG] Generated output does not follow CSP rules HOT 1
- Request for Zenodo archive
- [BUG] - RGI bwt gene_mapping HOT 3
- Updating version support for starmar, amrfinderplus, kmerresistance - summarize output breaks HOT 4
- nucleotide specific fileds are empty while importing AmrFinderPlus results based on nucleotide sequences HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hamronization.