oss-lab / metqy Goto Github PK

View Code? Open in Web Editor NEW

18.0 10.0 9.0 283.81 MB

Repository for R package MetQy (read related publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247936/)

License: Other

R 99.10% TeX 0.90%

r-package kegg metabolism data-mining

metqy's People

Contributors

Stargazers

Watchers

Forkers

chinaargs wangjs haithamsghaier liupfskygre syssynbio yixiangzhang1996 juleny mattoslmp sarahsnie

metqy's Issues

Information request regarding file upload

Dear Andreas,
I am trying to use MetQy following the manual provided, but I am a little bit lost because I am not very good in using R. The software is properly installed in my computer, I followed examples and they provide results (except some figures e.g. sunburst are missing the text).
Basically I have some genomes annotated using KEGG and I would simply perform the “query_genomes_to_modules” with the “user-specified gene sets”.
My input file has header and is organized as you suggests in the example:

ID ORG_ID ORGANISM KOs ECs
T09999 aaa A K00013;K00014;K00018;… “empty field”

Tabular values separate the different fields (ID ORG_ID ORGANISM KOs ECs) in the header and also in the first line. Is this correct? I do not have EC numbers (empty field), only KEGG IDs for genes.
Could you please report some minimal command lines to do the following:
1-Import the file in R in order to be usable from your software;
2- Calculate the module completion fraction (mcf) for all the modules;
3-Export to a text file the mcf values obtained for all the pathways.
Thanks a lot in advance for your help.
Sincerely

how to import the blastkoala results to MetQy

Hi, I have some bacteria genomes (Vibrio vulnificus strains), it was assembled using Spades, and then I use the generated contigs to generate a file.faa using prokka, an then I used it to generate a list of KO in blastKOALA.

my question is, it is possible to analyze those data with MetQy ? does anyone have a pipeline ??? or an example for bacteria genomes ??

Thanks so much

Sharing user question about mcf calculation

Original message

I have a couple of questions, and I was wondering if you could provide some clues. I couldn't find detailed information about the module completeness fraction calculation. Do you have any specification available? I surfed your code but got lost. For instance, what happens when a module which is defined as (say) K0001+K0002, only one of the genes is found. mcf = 0.5 or mcf = 0? I have doubts about how to interpret the index in different scenarios. I have a case where I can find either one or both genes in all my genomes, but still I get a mcf = 0 for some of them. I was expecting 0.5 for those which only have 1 of them. How is this fraction calculated?

Preparation of input file.

Hello.
I have metagenome fsatq files.
How can I analyze files by converting them?

thanks.

Kohei

Manual curation of used KEGG database

Dear Andreas,
I would like to use your tool to analyse the metabolic potential of MAGs. I have tested it already and found out that a few modules are missing. Unfortunately, I do not have access to the KEGG ftp but I wonder if I could make manual modifications (adding modules) to the existing KEGG database file that is used in the package.
Do you think that is possible?

Many thanks
Julia

KEGG module completeness estimation is wrong

I have come across a problem in the way this package evaluates the completeness of KEGG pathway modules, given a number KOs, or something comparable. Specifically, this is referring to the function query_missingGenes_from_module.R, but might be present in some of the other KEGG-module related functions?

The issue arrises from how the functions splits modules into blocks, based on spaces:

### SEARCH BLOCKS ----
  block_defs  <- strsplit(DEFINITION[index], split = " ")[[1]]
  nBlocks     <- length(block_defs)

This problem does not arise with every module, instead, it only occurs in more complicates modules which have "nested" blocks (for lack of a better word). For example, module 2 (https://www.genome.jp/kegg-bin/show_module?M00002) leads to this issue. The function in this package comes up with 6 blocks, (nBlocks), while there is only 5 blocks, as to my understanding of KEGG module definitions. This also matches the 5 blocks assumed in KEGGmapper. It must have something to do with the above chunk ignoring the presence of "(" or ")", ie when spaces occur WITHIN a block, instead of separating a block.

This problem should lead to a systematic underestimation of the completeness of pathways, when it inflates the number of blocks in a module. Adjusting the way the blocks are split to only split actual blocks (spaces outside of any "(" or ")"; something that can be done with regex I guess) should solve this issue.

using query_missingGenes_from_module on multiple modules

Dear Andreas,
I am trying to use your script query_missingGenes_from_module to spot missing genes from module but I need to do it recursively on multiple modules (a list provided) and then write the result in a file. When I do it on one module only I have no problem but when I put my list in a vector and call with a loop the elements of the vector to be my "ID_MODULE" I don't manage to format the output correctly.

Any hint? Thank you in advantage and for having provided this amazing resource.
Sincerely,
Arianna

oss-lab / metqy Goto Github PK

metqy's People

Contributors

Stargazers

Watchers

Forkers

metqy's Issues

Information request regarding file upload

how to import the blastkoala results to MetQy

Sharing user question about mcf calculation

Preparation of input file.

Manual curation of used KEGG database

KEGG module completeness estimation is wrong

using query_missingGenes_from_module on multiple modules

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent