Coder Social home page Coder Social logo

oss-lab / metqy Goto Github PK

View Code? Open in Web Editor NEW
18.0 10.0 9.0 283.81 MB

Repository for R package MetQy (read related publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247936/)

License: Other

R 99.10% TeX 0.90%
r-package kegg metabolism data-mining

metqy's People

Contributors

asmvernon avatar osoyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metqy's Issues

Information request regarding file upload

Dear Andreas,
I am trying to use MetQy following the manual provided, but I am a little bit lost because I am not very good in using R. The software is properly installed in my computer, I followed examples and they provide results (except some figures e.g. sunburst are missing the text).
Basically I have some genomes annotated using KEGG and I would simply perform the “query_genomes_to_modules” with the “user-specified gene sets”.
My input file has header and is organized as you suggests in the example:

ID ORG_ID ORGANISM KOs ECs
T09999 aaa A K00013;K00014;K00018;… “empty field”

Tabular values separate the different fields (ID ORG_ID ORGANISM KOs ECs) in the header and also in the first line. Is this correct? I do not have EC numbers (empty field), only KEGG IDs for genes.
Could you please report some minimal command lines to do the following:
1-Import the file in R in order to be usable from your software;
2- Calculate the module completion fraction (mcf) for all the modules;
3-Export to a text file the mcf values obtained for all the pathways.
Thanks a lot in advance for your help.
Sincerely

how to import the blastkoala results to MetQy

Hi, I have some bacteria genomes (Vibrio vulnificus strains), it was assembled using Spades, and then I use the generated contigs to generate a file.faa using prokka, an then I used it to generate a list of KO in blastKOALA.

my question is, it is possible to analyze those data with MetQy ? does anyone have a pipeline ??? or an example for bacteria genomes ??

Thanks so much

Sharing user question about mcf calculation

Original message

I have a couple of questions, and I was wondering if you could provide some clues. I couldn't find detailed information about the module completeness fraction calculation. Do you have any specification available? I surfed your code but got lost. For instance, what happens when a module which is defined as (say) K0001+K0002, only one of the genes is found. mcf = 0.5 or mcf = 0? I have doubts about how to interpret the index in different scenarios. I have a case where I can find either one or both genes in all my genomes, but still I get a mcf = 0 for some of them. I was expecting 0.5 for those which only have 1 of them. How is this fraction calculated?

Manual curation of used KEGG database

Dear Andreas,
I would like to use your tool to analyse the metabolic potential of MAGs. I have tested it already and found out that a few modules are missing. Unfortunately, I do not have access to the KEGG ftp but I wonder if I could make manual modifications (adding modules) to the existing KEGG database file that is used in the package.
Do you think that is possible?

Many thanks
Julia

KEGG module completeness estimation is wrong

I have come across a problem in the way this package evaluates the completeness of KEGG pathway modules, given a number KOs, or something comparable. Specifically, this is referring to the function query_missingGenes_from_module.R, but might be present in some of the other KEGG-module related functions?

The issue arrises from how the functions splits modules into blocks, based on spaces:

### SEARCH BLOCKS ----
  block_defs  <- strsplit(DEFINITION[index], split = " ")[[1]]
  nBlocks     <- length(block_defs)

This problem does not arise with every module, instead, it only occurs in more complicates modules which have "nested" blocks (for lack of a better word). For example, module 2 (https://www.genome.jp/kegg-bin/show_module?M00002) leads to this issue. The function in this package comes up with 6 blocks, (nBlocks), while there is only 5 blocks, as to my understanding of KEGG module definitions. This also matches the 5 blocks assumed in KEGGmapper. It must have something to do with the above chunk ignoring the presence of "(" or ")", ie when spaces occur WITHIN a block, instead of separating a block.

This problem should lead to a systematic underestimation of the completeness of pathways, when it inflates the number of blocks in a module. Adjusting the way the blocks are split to only split actual blocks (spaces outside of any "(" or ")"; something that can be done with regex I guess) should solve this issue.

using query_missingGenes_from_module on multiple modules

Dear Andreas,
I am trying to use your script query_missingGenes_from_module to spot missing genes from module but I need to do it recursively on multiple modules (a list provided) and then write the result in a file. When I do it on one module only I have no problem but when I put my list in a vector and call with a loop the elements of the vector to be my "ID_MODULE" I don't manage to format the output correctly.

Any hint? Thank you in advantage and for having provided this amazing resource.
Sincerely,
Arianna

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.