Coder Social home page Coder Social logo

Comments (13)

cdanielmachado avatar cdanielmachado commented on June 12, 2024 1

Mycoplasma genitalium is supposedly one of the smallest genomes currently known and it has 470 genes...

I guess this is not a problem in CarveMe, so I will close this issue. My suggestion is that you ignore these genomes :)

from carveme.

cdanielmachado avatar cdanielmachado commented on June 12, 2024

Hi Francisco,

Can you run CarveMe in verbose mode (-v option) to get a better idea of what is happening?

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Thanks for the quick response! Sure, running right now with verbose mode.
Also I just updated the entire matplotlib package and some of the warnings did go away.
Will post update soon.

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Ok I see the problem now:

Running diamond...
diamond blastx -d /c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/carveme/data/input/bigg_proteins.dmnd -q bins_organized/ERR260174_6.fa -o bins_organized/ERR260174_6.tsv --more-sensitive --top 10
Loading universe model...
Scoring reactions...
The input genome did not match sufficient genes/reactions in the database.

Would you recommend I just trash these genomes? Or is there some way I can still get models from them?

Thanks!

from carveme.

cdanielmachado avatar cdanielmachado commented on June 12, 2024

You said you built 3000 genomes from a metagenomics dataset. Are these metagenome-assembled genomes (MAGs)? Do you have a way to evaluate their quality?

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Yeah, basically de novo assembled genomes from metagenomics data. I used CONCOCT for binning and then CheckM for evaluating completeness and contamination.
This evaluation actually filtered out like 90% of bins, so I was expecting the remaining ones to be good quality. I used cutoffs of 80% completeness and 10% contamination for filtering, but maybe those are too lenient?

from carveme.

cdanielmachado avatar cdanielmachado commented on June 12, 2024

I have never built MAGs myself, so I don't have any experience in this. I would say that with 80% completeness you should get a good enough genome for reconstruction.

Either these genomes have too few metabolic genes, or the organisms are too distantly related to those in the BiGG database to find any homology matching.

Can you check what is the number of genes in these genomes?

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

I see, do you have a rough estimate of what the min number of genes should be?
I checked a number of my genomes that failed to get carved and it looks like they have between 30-40 genes.. probably too low?
Kind of suspicious that these genomes managed to get past the 80% completeness cutoff..

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Ok, thanks for the help!

One last question: I just noticed a genome with 201 genes that failed to get carved.
The log shows:

[Mon Feb 11 15:21:18 2019] Building DAG of jobs...
[Mon Feb 11 15:21:18 2019] Using shell: /usr/bin/bash
[Mon Feb 11 15:21:18 2019] Provided cores: 16
[Mon Feb 11 15:21:18 2019] Rules claiming more threads will be scaled down.
[Mon Feb 11 15:21:18 2019] Job counts:
[Mon Feb 11 15:21:18 2019] 	count	jobs
[Mon Feb 11 15:21:18 2019] 	1	carveme
[Mon Feb 11 15:21:18 2019] 	1

[Mon Feb 11 15:21:18 2019] rule carveme:
[Mon Feb 11 15:21:18 2019]     input: bins_organized/ERR260185_45.fa
[Mon Feb 11 15:21:18 2019]     output: carvemeOut/ERR260185_45.xml
[Mon Feb 11 15:21:18 2019]     jobid: 0
[Mon Feb 11 15:21:18 2019]     wildcards: fasta=ERR260185_45

/c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/pandas/core/groupby/groupby.py:4315: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
  stacked_values = np.vstack(map(np.asarray, values))
Running diamond...
diamond blastx -d /c3se/users/zorrilla/Hebbe/.conda/envs/concoct_env/lib/python2.7/site-packages/carveme/data/input/bigg_proteins.dmnd -q bins_organized/ERR260185_45.fa -o bins_organized/ERR260185_45.tsv --more-sensitive --top 10
Loading universe model...
Scoring reactions...
Reconstructing a single model
[Mon Feb 11 16:11:54 2019] Error in rule carveme:
[Mon Feb 11 16:11:54 2019]     jobid: 0
[Mon Feb 11 16:11:54 2019]     output: carvemeOut/ERR260185_45.xml

[Mon Feb 11 16:11:54 2019] RuleException:
[Mon Feb 11 16:11:54 2019] CalledProcessError in line 29 of /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/Snakefile:
[Mon Feb 11 16:11:54 2019] Command ' set -euo pipefail;  
[Mon Feb 11 16:11:54 2019]         set +u;source activate concoct_env;set -u
[Mon Feb 11 16:11:54 2019]         carve -v --dna bins_organized/ERR260185_45.fa -o carvemeOut/ERR260185_45.xml ' died with <Signals.SIGKILL: 9>.
[Mon Feb 11 16:11:54 2019]   File "/c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/Snakefile", line 29, in __rule_carveme
[Mon Feb 11 16:11:54 2019]   File "/c3se/users/zorrilla/Hebbe/.conda/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
[Mon Feb 11 16:11:54 2019] Shutting down, this might take some time.
[Mon Feb 11 16:11:54 2019] Exiting because a job execution failed. Look above for error message
[Mon Feb 11 16:11:54 2019] Complete log: /c3se/NOBACKUP/groups/c3-c3se605-17-8/projects_francisco/binning/metabolismPipe/.snakemake/log/2019-02-11T152118.628448.snakemake.log
slurmstepd: error: Detected 1 oom-kill event(s) in step 76760.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

This time I assigned 16 cores with around a total of 45 GB RAM, but it is giving me an out of memory kill event.

Any idea how much memory I should be allocating?

Here is a plot showing resource utilization:

image

It looks like something is using up a lot of swap memory right before the job gets killed..

from carveme.

cdanielmachado avatar cdanielmachado commented on June 12, 2024

How many reconstructions do you do in a single job?

I created this database with 5587 models using 4GB per job and only 1 reconstruction per job. They all completed successfully:

https://github.com/cdanielmachado/embl_gems

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Hmmm interesting. I was doing 1 reconstruction per job as well, not using the recursive options, just because its easier to spread the jobs using snakemake cluster jobs this way.

rule carveme:
    input:
        "bins_organized/{fasta}.fa"
    output:
        "carvemeOut/{fasta}.xml"
    shell:
        """
        set +u;source activate concoct_env;set -u
        carve -v --dna {input} -o {output}
        """

Not sure why my jobs are demanding so much more memory ..

In any case, I will probably just set aside the last 150 genomes that are giving me trouble and just work with the 2800 that I have already generated.

This may be out the scope your experience, but have you noticed that carved models seem to fail/skip more than half memote tests? Not sure if this is generally the case in your experience as well, or maybe I just had bad MAGs?

Typical test results:


======== 81 failed, 43 passed, 21 skipped, 3 warnings in 18.38 seconds =========

from carveme.

cdanielmachado avatar cdanielmachado commented on June 12, 2024

I tried a very early version of memote, I don't know how it is performing now.

But if you want to use memote, I would suggest you use the option --flavor fbc2 when building models. CarveMe outputs the legacy cobra format by default (which memote does not like).

from carveme.

franciscozorrilla avatar franciscozorrilla commented on June 12, 2024

Ah that is good to know!
Thanks for the tips and discussion.

from carveme.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.