Coder Social home page Coder Social logo

Comments (19)

apcamargo avatar apcamargo commented on July 29, 2024 1

Good to know that you didn't have any problems again, @asierFernandezP. If you are interested in using the --conservative flag, might be worth to take a quick look here: https://portal.nersc.gov/genomad/post_classification_filtering.html#default-parameters-and-presets

I'll close this issue for now.

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

Hi @MonicaSteffi

There are things that could be causing this:

  • You are using a version of MMseqs2 that is incompatible with geNomad. If you installed via conda/mamba that shouldn't be the case. Just to be sure, can you check which versions of geNomad and MMseqs2 you have installed?
  • Your machine is running out of memory. In that case, increasing the number of splits should solve the issue (try 12 or 16). Are you running geNomad on a server or a personal computer? Do you know how much memory the machine has available?

Also, geNomad saves a log of MMseqs2 execution at lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/mmseqs2.log. Can you paste it here?

from genomad.

MonicaSteffi avatar MonicaSteffi commented on July 29, 2024

Hi @apcamargo ,

Thank you for the reply.
MMseqs Version: 14.7e284
genomad version is 1.4.0

  • Your machine is running out of memory. In that case, increasing the number of splits should solve the issue (try 12 or 16). Are you running geNomad on a server or a personal computer? Do you know how much memory the machine has available?

I also tried with higher memory, But still the same error

and the mmseq logfile:

`createdb lim1_1_genomad_output/contigs_annotate/contigs_proteins.faa lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db

MMseqs Version: 14.7e284
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3

Converting sequences
[
Time for merging to query_db_h: 0h 0m 0s 14ms
Time for merging to query_db: 0h 0m 0s 13ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 52ms
search lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/genomad_db/genomad_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/search_db/search_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp --threads 56 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 8 --split-mode 0

MMseqs Version: 14.7e284
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.2
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 56
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4.2
k-mer length 5
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 8
Split mode 0
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

Failed to execute lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp/8896424563579662339/searchtargetprofile.sh with error 13.`

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

It seems that this is being caused because the filesystem where you are writing the results doesn't allow execution of scripts (see soedinglab/MMseqs2#534). During its execution MMseqs2 generates and runs a couple of scripts, which are failing because of this limitation.

I can try to add an option to geNomad to allow MMseqs2 directory to be written in a separate location. In the meantime, can you try to write the results in a different place (e.g. your home directory)?

from genomad.

deminatanja avatar deminatanja commented on July 29, 2024

Hi,
I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8. What can be tried to solve the memory issue?
Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.
Best regards,
Tatiana

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

Hi @deminatanja

I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8. What can be tried to solve the memory issue?

I don't think this a memory issue. No space left on device means you don't have enough storage space. Have you checked your disk usage?

Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.

Can you send me the log?

from genomad.

deminatanja avatar deminatanja commented on July 29, 2024

The device storage space is over 2 Tb available, so should be enough...

Here is the log from NMDC EDGE run:

Generate WDL and inputs json
submit workflow to cromwell
Cromwell job status: Running
Cromwell job status: Failed
viral.gn
Traceback (most recent call last):
File "/opt/conda/bin/genomad", line 10, in
sys.exit(cli())
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/genomad/cli.py", line 1023, in end_to_end
ctx.invoke(
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/genomad/cli.py", line 338, in annotate
genomad.annotate.main(
File "/opt/conda/lib/python3.9/site-packages/genomad/modules/annotate.py", line 178, in main
prodigal_obj.run_parallel_prodigal(threads)
File "/opt/conda/lib/python3.9/site-packages/genomad/prodigal.py", line 92, in run_parallel_prodigal
self._append_prodigal_fasta(current_file_path, protid_start)
File "/opt/conda/lib/python3.9/site-packages/genomad/prodigal.py", line 42, in _append_prodigal_fasta
match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
slurmstepd: error: Detected 17 oom-kill event(s) in StepId=20795329.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

from genomad.

deminatanja avatar deminatanja commented on July 29, 2024

Here is also a full log from the HPC run:

[22:14:27] Executing genomad annotate.
Traceback (most recent call last):
File "/projappl/project_2006548/genomad/bin/genomad", line 10, in
sys.exit(cli())
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/opt/conda/lib/python3.10/site-packages/genomad/modules/annotate.py", line 178, in main
prodigal_obj.run_parallel_prodigal(threads)
File "/opt/conda/lib/python3.10/site-packages/genomad/prodigal.py", line 78, in run_parallel_prodigal
current_file.write(line)
OSError: [Errno 28] No space left on device

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

Ok. These issues seem to be distinct.

The error you got in your HPC is most likely not memory. It is failing during the prodigal-gv execution step (which uses very little memory) while writing a file. It does seem that, for some reason, the process is being killed because you are out of storage. How big is the input (in number of sequences and average sequence length)?

There seems to be a problem with memory in NMDC Edge. I'll try to get this solved as quick as possible.

from genomad.

deminatanja avatar deminatanja commented on July 29, 2024

Here is some statistics about the input file:

contigs 1275686
contigs (>= 0 bp) 4189226
contigs (>= 1000 bp) 376034
contigs (>= 5000 bp) 19849
contigs (>= 10000 bp) 4814
contigs (>= 25000 bp) 584
contigs (>= 50000 bp) 115
Largest contig 189326
Total length (>= 1000 bp) 802707590
Total length (>= 5000 bp) 184943840
Total length (>= 10000 bp) 84208373
Total length (>= 25000 bp) 24221139
Total length (>= 50000 bp) 8424764
N50 1172
N75 714
L50 284205
L75 678615

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of df -h?

Another option is to just split your input and run geNomad in batches to avoid this sort of problem.

from genomad.

MonicaSteffi avatar MonicaSteffi commented on July 29, 2024

Hi @apcamargo
Thank you. It worked when I changed the output directory

Regards
Monica

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

No problems :)

from genomad.

MonicaSteffi avatar MonicaSteffi commented on July 29, 2024

No problems :)

How do i get the gtf or gff3 files for annotation which can be visualized using any software?

from genomad.

deminatanja avatar deminatanja commented on July 29, 2024

Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of df -h?

Another option is to just split your input and run geNomad in batches to avoid this sort of problem.

The disc resources are (used/total): 351G/3.0T, 3.8M/10M files.

I was testing geNomad with a smaller input file, but ran into another error. Please see a separate issue opened here.

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

@MonicaSteffi You can use the script below to convert geNomad's tabular gene file to a GFF:

chmod +x convert_tabular_to_gff.py
# ./convert_tabular_to_gff.py [INPUT] [OUTPUT]
./convert_tabular_to_gff.py genomad_output/metagenome_summary/metagenome_plasmid_genes.tsv plasmid.gff

Outputting GFF files is pretty useful. I might make geNomad output GFF files by default in a future update.

convert_tabular_to_gff.py
#!/usr/bin/env python3

import sys
from collections import namedtuple

Row = namedtuple(
    "Row",
    [
        "gene",
        "start",
        "end",
        "length",
        "strand",
        "gc_content",
        "genetic_code",
        "rbs_motif",
        "marker",
        "evalue",
        "bitscore",
        "uscg",
        "plasmid_hallmark",
        "virus_hallmark",
        "taxid",
        "taxname",
        "annotation_conjscan",
        "annotation_amr",
        "annotation_accessions",
        "annotation_description",
    ],
)

input = sys.argv[1]
output = sys.argv[2]

with open(input) as fin, open(output, "w") as fout:
    next(fin)
    for row in fin:
        row = row.strip("\n").split("\t")
        row = Row(*row)
        fout.write(f"{row.gene.rsplit('_', 1)[0]}\t")
        fout.write(f".\tCDS\t{row.start}\t{row.end}\t.\t")
        fout.write("+\t.\t" if row.strand == "1" else "-\t.\t")
        fout.write(f"ID={row.gene};Name={row.gene};")
        fout.write(f"gc_content={row.gc_content};genetic_code={row.genetic_code};")
        fout.write(f"rbs_motif={row.rbs_motif};marker={row.marker};")
        fout.write(f"uscg={row.uscg};plasmid_hallmark={row.plasmid_hallmark};")
        fout.write(f"virus_hallmark={row.virus_hallmark};taxid={row.taxid};")
        fout.write(f"taxname={row.taxname};annotation_conjscan={row.annotation_conjscan};")
        fout.write(f"annotation_amr={row.annotation_amr};")
        fout.write(f"annotation_accessions={row.annotation_accessions.replace(';', ',')};")
        fout.write(f"annotation_description={row.annotation_description.replace(';', ',')}\n")

from genomad.

asierFernandezP avatar asierFernandezP commented on July 29, 2024

Hi,

I am also getting the same error (non-zero exit status 1) since I updated to the last version of geNomad. I tried to change the directory of the output but I still get the same error. Here is the content of the log file:

[10:06:18] Executing genomad annotate.                                          
[10:06:18] Creating the                                                         
           /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annota
           te directory.                                                        
[15:19:05] Proteins predicted with prodigal-gv were written to                  
           all_predicted_viral_contigs_proteins.faa.                            
Traceback (most recent call last):
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 131, in run_mmseqs2
    subprocess.run(command, stdout=fout, stderr=fout, check=True)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'createdb', PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa'), PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db')]' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/umcg-afernandez/.conda/envs/genomad/bin/genomad", line 10, in <module>
    sys.exit(cli())
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
    ctx.invoke(
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
    genomad.annotate.main(
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
    mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
  File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
    raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs createdb /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db' failed.

from genomad.

apcamargo avatar apcamargo commented on July 29, 2024

Hi @asierFernandezP

Can you share the contents of /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/mmseqs2.log? Did it work in the previous version?

from genomad.

asierFernandezP avatar asierFernandezP commented on July 29, 2024

Hi,

Thanks for the quick answer! I am sorry I deleted this file, but after rerunning it 4 times (without any changes) it worked. This problem only appeared after reinstalling geNomad using conda (24-02-2023) in order to get the --conservative option, not present in the previous version. But, as I said, it worked after a few trials without any modifications. I will let you know if I see this problem again.

from genomad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.