Comments (19)
Good to know that you didn't have any problems again, @asierFernandezP. If you are interested in using the --conservative
flag, might be worth to take a quick look here: https://portal.nersc.gov/genomad/post_classification_filtering.html#default-parameters-and-presets
I'll close this issue for now.
from genomad.
There are things that could be causing this:
- You are using a version of MMseqs2 that is incompatible with geNomad. If you installed via conda/mamba that shouldn't be the case. Just to be sure, can you check which versions of geNomad and MMseqs2 you have installed?
- Your machine is running out of memory. In that case, increasing the number of splits should solve the issue (try 12 or 16). Are you running geNomad on a server or a personal computer? Do you know how much memory the machine has available?
Also, geNomad saves a log of MMseqs2 execution at lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/mmseqs2.log
. Can you paste it here?
from genomad.
Hi @apcamargo ,
Thank you for the reply.
MMseqs Version: 14.7e284
genomad version is 1.4.0
- Your machine is running out of memory. In that case, increasing the number of splits should solve the issue (try 12 or 16). Are you running geNomad on a server or a personal computer? Do you know how much memory the machine has available?
I also tried with higher memory, But still the same error
and the mmseq logfile:
`createdb lim1_1_genomad_output/contigs_annotate/contigs_proteins.faa lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db
MMseqs Version: 14.7e284
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Converting sequences
[
Time for merging to query_db_h: 0h 0m 0s 14ms
Time for merging to query_db: 0h 0m 0s 13ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 52ms
search lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/query_db/query_db /dss/dssfs02/lwp-dss-0001/u7b03/u7b03-dss-0000/ra78zut/genomad_db/genomad_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/search_db/search_db lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp --threads 56 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 8 --split-mode 0
MMseqs Version: 14.7e284
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.2
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 56
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4.2
k-mer length 5
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 8
Split mode 0
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Failed to execute lim1_1_genomad_output/contigs_annotate/contigs_mmseqs2/tmp/8896424563579662339/searchtargetprofile.sh with error 13.`
from genomad.
It seems that this is being caused because the filesystem where you are writing the results doesn't allow execution of scripts (see soedinglab/MMseqs2#534). During its execution MMseqs2 generates and runs a couple of scripts, which are failing because of this limitation.
I can try to add an option to geNomad to allow MMseqs2 directory to be written in a separate location. In the meantime, can you try to write the results in a different place (e.g. your home directory)?
from genomad.
Hi,
I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device
, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8
. What can be tried to solve the memory issue?
Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.
Best regards,
Tatiana
from genomad.
Hi @deminatanja
I also have the memory issue. I have the error of OSError: [Errno 28] No space left on device, while running geNomad at HPC with over 2 Tb memory on the disk and 300 Gb reservation for the batch job run. I have been using --split 8. What can be tried to solve the memory issue?
I don't think this a memory issue. No space left on device
means you don't have enough storage space. Have you checked your disk usage?
Btw, I also tried running geNomad at NMDC EDGE and there has also been a memory limit error, although the input file was much smaller, and there, one can't choose much when submitting a job.
Can you send me the log?
from genomad.
The device storage space is over 2 Tb available, so should be enough...
Here is the log from NMDC EDGE run:
Generate WDL and inputs json
submit workflow to cromwell
Cromwell job status: Running
Cromwell job status: Failed
viral.gn
Traceback (most recent call last):
File "/opt/conda/bin/genomad", line 10, in
sys.exit(cli())
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/genomad/cli.py", line 1023, in end_to_end
ctx.invoke(
File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/genomad/cli.py", line 338, in annotate
genomad.annotate.main(
File "/opt/conda/lib/python3.9/site-packages/genomad/modules/annotate.py", line 178, in main
prodigal_obj.run_parallel_prodigal(threads)
File "/opt/conda/lib/python3.9/site-packages/genomad/prodigal.py", line 92, in run_parallel_prodigal
self._append_prodigal_fasta(current_file_path, protid_start)
File "/opt/conda/lib/python3.9/site-packages/genomad/prodigal.py", line 42, in _append_prodigal_fasta
match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
slurmstepd: error: Detected 17 oom-kill event(s) in StepId=20795329.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
from genomad.
Here is also a full log from the HPC run:
[22:14:27] Executing genomad annotate.
Traceback (most recent call last):
File "/projappl/project_2006548/genomad/bin/genomad", line 10, in
sys.exit(cli())
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/opt/conda/lib/python3.10/site-packages/genomad/modules/annotate.py", line 178, in main
prodigal_obj.run_parallel_prodigal(threads)
File "/opt/conda/lib/python3.10/site-packages/genomad/prodigal.py", line 78, in run_parallel_prodigal
current_file.write(line)
OSError: [Errno 28] No space left on device
from genomad.
Ok. These issues seem to be distinct.
The error you got in your HPC is most likely not memory. It is failing during the prodigal-gv execution step (which uses very little memory) while writing a file. It does seem that, for some reason, the process is being killed because you are out of storage. How big is the input (in number of sequences and average sequence length)?
There seems to be a problem with memory in NMDC Edge. I'll try to get this solved as quick as possible.
from genomad.
Here is some statistics about the input file:
contigs 1275686
contigs (>= 0 bp) 4189226
contigs (>= 1000 bp) 376034
contigs (>= 5000 bp) 19849
contigs (>= 10000 bp) 4814
contigs (>= 25000 bp) 584
contigs (>= 50000 bp) 115
Largest contig 189326
Total length (>= 1000 bp) 802707590
Total length (>= 5000 bp) 184943840
Total length (>= 10000 bp) 84208373
Total length (>= 25000 bp) 24221139
Total length (>= 50000 bp) 8424764
N50 1172
N75 714
L50 284205
L75 678615
from genomad.
Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of df -h
?
Another option is to just split your input and run geNomad in batches to avoid this sort of problem.
from genomad.
Hi @apcamargo
Thank you. It worked when I changed the output directory
Regards
Monica
from genomad.
No problems :)
from genomad.
No problems :)
How do i get the gtf or gff3 files for annotation which can be visualized using any software?
from genomad.
Ok. The input is pretty big, so maybe you are running out of storage when writing the outputs? What's the output of
df -h
?Another option is to just split your input and run geNomad in batches to avoid this sort of problem.
The disc resources are (used/total): 351G/3.0T, 3.8M/10M files.
I was testing geNomad with a smaller input file, but ran into another error. Please see a separate issue opened here.
from genomad.
@MonicaSteffi You can use the script below to convert geNomad's tabular gene file to a GFF:
chmod +x convert_tabular_to_gff.py
# ./convert_tabular_to_gff.py [INPUT] [OUTPUT]
./convert_tabular_to_gff.py genomad_output/metagenome_summary/metagenome_plasmid_genes.tsv plasmid.gff
Outputting GFF files is pretty useful. I might make geNomad output GFF files by default in a future update.
convert_tabular_to_gff.py
#!/usr/bin/env python3
import sys
from collections import namedtuple
Row = namedtuple(
"Row",
[
"gene",
"start",
"end",
"length",
"strand",
"gc_content",
"genetic_code",
"rbs_motif",
"marker",
"evalue",
"bitscore",
"uscg",
"plasmid_hallmark",
"virus_hallmark",
"taxid",
"taxname",
"annotation_conjscan",
"annotation_amr",
"annotation_accessions",
"annotation_description",
],
)
input = sys.argv[1]
output = sys.argv[2]
with open(input) as fin, open(output, "w") as fout:
next(fin)
for row in fin:
row = row.strip("\n").split("\t")
row = Row(*row)
fout.write(f"{row.gene.rsplit('_', 1)[0]}\t")
fout.write(f".\tCDS\t{row.start}\t{row.end}\t.\t")
fout.write("+\t.\t" if row.strand == "1" else "-\t.\t")
fout.write(f"ID={row.gene};Name={row.gene};")
fout.write(f"gc_content={row.gc_content};genetic_code={row.genetic_code};")
fout.write(f"rbs_motif={row.rbs_motif};marker={row.marker};")
fout.write(f"uscg={row.uscg};plasmid_hallmark={row.plasmid_hallmark};")
fout.write(f"virus_hallmark={row.virus_hallmark};taxid={row.taxid};")
fout.write(f"taxname={row.taxname};annotation_conjscan={row.annotation_conjscan};")
fout.write(f"annotation_amr={row.annotation_amr};")
fout.write(f"annotation_accessions={row.annotation_accessions.replace(';', ',')};")
fout.write(f"annotation_description={row.annotation_description.replace(';', ',')}\n")
from genomad.
Hi,
I am also getting the same error (non-zero exit status 1) since I updated to the last version of geNomad. I tried to change the directory of the output but I still get the same error. Here is the content of the log file:
[10:06:18] Executing genomad annotate.
[10:06:18] Creating the
/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annota
te directory.
[15:19:05] Proteins predicted with prodigal-gv were written to
all_predicted_viral_contigs_proteins.faa.
Traceback (most recent call last):
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 131, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'createdb', PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa'), PosixPath('/home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db')]' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/umcg-afernandez/.conda/envs/genomad/bin/genomad", line 10, in <module>
sys.exit(cli())
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1208, in end_to_end
ctx.invoke(
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/umcg-afernandez/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs createdb /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_proteins.faa /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/query_db/query_db' failed.
from genomad.
Can you share the contents of /home/umcg-afernandez/cons_geNomad/all_predicted_viral_contigs_annotate/all_predicted_viral_contigs_mmseqs2/mmseqs2.log
? Did it work in the previous version?
from genomad.
Hi,
Thanks for the quick answer! I am sorry I deleted this file, but after rerunning it 4 times (without any changes) it worked. This problem only appeared after reinstalling geNomad using conda (24-02-2023) in order to get the --conservative
option, not present in the previous version. But, as I said, it worked after a few trials without any modifications. I will let you know if I see this problem again.
from genomad.
Related Issues (20)
- Error downloading database HOT 2
- Inquiry on virus from MAG HOT 4
- [feature request] query database clustering HOT 1
- Whether measures have been taken by genomad to avoid identifying genomic islands as viruses? HOT 5
- AMR annotations on chromsome? HOT 1
- Errors when download and the same issue when running genomad -h HOT 3
- The virus identified by genomad weren't annotated as virus sequence by VIBRANT? HOT 3
- geNomad taxonomy about Baltimore classification HOT 1
- Error with geNomad v1.8.0, missing tensorflow.keras HOT 5
- mmseqs2 error HOT 3
- Different protein number from genomad and pyrodigal-gv HOT 2
- Small (reference) data for testing HOT 9
- Error while classifying sequences HOT 6
- Error mmseqs prefilter HOT 4
- genomad annotate fastq file is empty or contains multiple entries HOT 3
- plasmid classified as virus? HOT 7
- Optimization Request for Analyzing Large Number of MAGs with geNomad HOT 5
- Fewer viral contigs identified from genomad vs virsorter2 HOT 4
- The question about --disable-nn-classification HOT 1
- Provirus detection in genomad vs checkv HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genomad.