Comments (13)
Mycoplasma genitalium is supposedly one of the smallest genomes currently known and it has 470 genes...
I guess this is not a problem in CarveMe, so I will close this issue. My suggestion is that you ignore these genomes :)
from carveme.
Hi Francisco,
Can you run CarveMe in verbose mode (-v
option) to get a better idea of what is happening?
from carveme.
Thanks for the quick response! Sure, running right now with verbose mode.
Also I just updated the entire matplotlib package and some of the warnings did go away.
Will post update soon.
from carveme.
Ok I see the problem now:
Running diamond...
diamond blastx -d /c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/carveme/data/input/bigg_proteins.dmnd -q bins_organized/ERR260174_6.fa -o bins_organized/ERR260174_6.tsv --more-sensitive --top 10
Loading universe model...
Scoring reactions...
The input genome did not match sufficient genes/reactions in the database.
Would you recommend I just trash these genomes? Or is there some way I can still get models from them?
Thanks!
from carveme.
You said you built 3000 genomes from a metagenomics dataset. Are these metagenome-assembled genomes (MAGs)? Do you have a way to evaluate their quality?
from carveme.
Yeah, basically de novo assembled genomes from metagenomics data. I used CONCOCT for binning and then CheckM for evaluating completeness and contamination.
This evaluation actually filtered out like 90% of bins, so I was expecting the remaining ones to be good quality. I used cutoffs of 80% completeness and 10% contamination for filtering, but maybe those are too lenient?
from carveme.
I have never built MAGs myself, so I don't have any experience in this. I would say that with 80% completeness you should get a good enough genome for reconstruction.
Either these genomes have too few metabolic genes, or the organisms are too distantly related to those in the BiGG database to find any homology matching.
Can you check what is the number of genes in these genomes?
from carveme.
I see, do you have a rough estimate of what the min number of genes should be?
I checked a number of my genomes that failed to get carved and it looks like they have between 30-40 genes.. probably too low?
Kind of suspicious that these genomes managed to get past the 80% completeness cutoff..
from carveme.
Ok, thanks for the help!
One last question: I just noticed a genome with 201 genes that failed to get carved.
The log shows:
[Mon Feb 11 15:21:18 2019] Building DAG of jobs...
[Mon Feb 11 15:21:18 2019] Using shell: /usr/bin/bash
[Mon Feb 11 15:21:18 2019] Provided cores: 16
[Mon Feb 11 15:21:18 2019] Rules claiming more threads will be scaled down.
[Mon Feb 11 15:21:18 2019] Job counts:
[Mon Feb 11 15:21:18 2019] count jobs
[Mon Feb 11 15:21:18 2019] 1 carveme
[Mon Feb 11 15:21:18 2019] 1
[Mon Feb 11 15:21:18 2019] rule carveme:
[Mon Feb 11 15:21:18 2019] input: bins_organized/ERR260185_45.fa
[Mon Feb 11 15:21:18 2019] output: carvemeOut/ERR260185_45.xml
[Mon Feb 11 15:21:18 2019] jobid: 0
[Mon Feb 11 15:21:18 2019] wildcards: fasta=ERR260185_45
/c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/pandas/core/groupby/groupby.py:4315: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
stacked_values = np.vstack(map(np.asarray, values))
Running diamond...
diamond blastx -d /c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/carveme/data/input/bigg_proteins.dmnd -q bins_organized/ERR260185_45.fa -o bins_organized/ERR260185_45.tsv --more-sensitive --top 10
Loading universe model...
Scoring reactions...
Reconstructing a single model
[Mon Feb 11 16:11:54 2019] Error in rule carveme:
[Mon Feb 11 16:11:54 2019] jobid: 0
[Mon Feb 11 16:11:54 2019] output: carvemeOut/ERR260185_45.xml
[Mon Feb 11 16:11:54 2019] RuleException:
[Mon Feb 11 16:11:54 2019] CalledProcessError in line 29 of /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/Snakefile:
[Mon Feb 11 16:11:54 2019] Command ' set -euo pipefail;
[Mon Feb 11 16:11:54 2019] set +u;source activate concoct_env;set -u
[Mon Feb 11 16:11:54 2019] carve -v --dna bins_organized/ERR260185_45.fa -o carvemeOut/ERR260185_45.xml ' died with <Signals.SIGKILL: 9>.
[Mon Feb 11 16:11:54 2019] File "/c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/Snakefile", line 29, in __rule_carveme
[Mon Feb 11 16:11:54 2019] File "/c3se/users/zorrilla/Hebbe/.conda/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
[Mon Feb 11 16:11:54 2019] Shutting down, this might take some time.
[Mon Feb 11 16:11:54 2019] Exiting because a job execution failed. Look above for error message
[Mon Feb 11 16:11:54 2019] Complete log: /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/.snakemake/log/2019-02-11T152118.628448.snakemake.log
slurmstepd: error: Detected 1 oom-kill event(s) in step 76760.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
This time I assigned 16 cores with around a total of 45 GB RAM, but it is giving me an out of memory kill event.
Any idea how much memory I should be allocating?
Here is a plot showing resource utilization:
It looks like something is using up a lot of swap memory right before the job gets killed..
from carveme.
How many reconstructions do you do in a single job?
I created this database with 5587 models using 4GB per job and only 1 reconstruction per job. They all completed successfully:
https://github.com/cdanielmachado/embl_gems
from carveme.
Hmmm interesting. I was doing 1 reconstruction per job as well, not using the recursive options, just because its easier to spread the jobs using snakemake cluster jobs this way.
rule carveme:
input:
"bins_organized/{fasta}.fa"
output:
"carvemeOut/{fasta}.xml"
shell:
"""
set +u;source activate concoct_env;set -u
carve -v --dna {input} -o {output}
"""
Not sure why my jobs are demanding so much more memory ..
In any case, I will probably just set aside the last 150 genomes that are giving me trouble and just work with the 2800 that I have already generated.
This may be out the scope your experience, but have you noticed that carved models seem to fail/skip more than half memote tests? Not sure if this is generally the case in your experience as well, or maybe I just had bad MAGs?
Typical test results:
======== 81 failed, 43 passed, 21 skipped, 3 warnings in 18.38 seconds =========
from carveme.
I tried a very early version of memote, I don't know how it is performing now.
But if you want to use memote, I would suggest you use the option --flavor fbc2
when building models. CarveMe outputs the legacy cobra format by default (which memote does not like).
from carveme.
Ah that is good to know!
Thanks for the tips and discussion.
from carveme.
Related Issues (20)
- CarveMe doesnโt satisfy the original protein complex definitions during the building of GPRs
- GPR rules and COBRA HOT 1
- Where to find the biomass equation for FBA?
- Problems gapfilling with CPLEX and Gurobi HOT 4
- Diamond version incompatible / TypeError: sequence item 0
- 'BiGG_gene' is both an index level and a column label, which is ambiguous.
- incompatibility with BIGG formula
- Biomass reaction in bacteria_universe sums to 1.03
- Error when giving user universe
- ImportError: cannot import name 'SCIPSolver' from 'reframed.solvers' HOT 9
- How to create a universe?
- I meet some errors when I use CarveMe for gap-filling HOT 4
- Is it necessary to use a VNC server for the modeling process using CarveMe?
- Having issues with cyanobacteria universe in latest version, and cplex installation for carveme version 1.4
- Do not work with scipy 1.12.0 HOT 2
- Hidden universe?
- First time installation issue
- SCIP with gap-filling and init HOT 5
- How to include new single species metabolic model in the model database HOT 1
- Location of gapfill outputs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from carveme.