Dear Authors and Experts ARMOR, I am a microbiologist, I am learning

As you predicted, there are still errors. <p dir="auto"

Hi, there seem to be two issues here: <code class="notrans

You still need to check the .Rout file (<code class="

Error when using bacterial fa and gtf file in ARMOR,about csoneson/armor

Comments (26)

csoneson commented on May 27, 2024 2

Thanks for the report. ARMOR is really intended for analysis of experiments where you have an annotation obtained from either Gencode or Ensembl. If you specify 'Ensembl' as the annotation source, tximeta will try to generate an EnsDb object (or find one from AnnotationHub). I don't have experience with RAST annotations, so I don't know how they differ from Ensembl/Gencode ones and whether they can be reconciled. You could try to set 'annotation' in the config file to e.g. 'other', in which case tximeta should not try to create the EnsDb object. However, you should be aware that ARMOR is not really tested for these situations.

from armor.

csoneson commented on May 27, 2024 1

As you predicted, there are still errors.

Please be more precise - what type of error? The same as before, or a different one?

Regarding the settings for STAR, you need to check the size of the genome that you provide (the fasta file), and derive the appropriate settings from there (according to the post I linked to above). Once you have the settings, you can add them to the additional_star_index parameter in the ARMOR config file. For example:

additional_star_index: "--genomeSAindexNbases 9"

if your genome is 1M bases (see the linked post for how to get the proper value for your genome).

from armor.

csoneson commented on May 27, 2024 1

This is an argument of the STAR alignment step, so you need to add it to additional_star_align: "" in the same way as you did above with additional_star_index: "" to modify the indexing step.

from armor.

csoneson commented on May 27, 2024

Hi,
there seem to be two issues here:

STAR segfaults during mapping
tximeta fails when reading the Salmon output

For the first issue, I wonder if the problem is that you have a very small genome. See for example alexdobin/STAR#573 (comment) - you need to set some additional parameters when indexing small genomes. In ARMOR, you can do this in the config file, by changing the additional_star_index parameter.

For the second issue, can you have a look at the file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout - it should indicate what the problem is.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Professor,

I would like to send you the error messages: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout
R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

args <- (commandArgs(trailingOnly = TRUE))
for (i in seq_len(length(args))) {

```
eval(parse(text = args[[i]]))
```
}

suppressPackageStartupMessages({

```
library(dplyr)
```
```
library(tximport)
```
```
library(tximeta)
```
```
library(SingleCellExperiment)
```
})

print(salmondir)
[1] "/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon"
print(json)
[1] "/home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json"
print(metafile)
[1] "/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt"
print(outrds)
[1] "/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds"
print(annotation)
[1] "Ensembl"
print(organism)
[1] "Ecoli"

Load json linkedTxome

loadLinkedTxome(json)
saving linkedTxome in bfc (first time)

Read metadata

metadata <- read.delim(metafile, header = TRUE, as.is = TRUE, sep = "\t")

List Salmon directories

salmonfiles <- paste0(salmondir, "/", metadata$names, "/quant.sf")
names(salmonfiles) <- metadata$names

Add file column to metadata and import annotated abundances

In transcript level

coldata <- cbind(metadata, files = salmonfiles, stringsAsFactors = FALSE)
st <- tximeta::tximeta(coldata)
importing quantifications
reading in files with read_tsv
1 2
found matching linked transcriptome:
[ Ensembl - Ecoli NA - release 1 ]
useHub=TRUE: checking for EnsDb via 'AnnotationHub'
using temporary cache /tmp/RtmpXow0vE/BiocFileCache
snapshotDate(): 2020-04-27
did not find matching EnsDb via 'AnnotationHub'
building EnsDb with 'ensembldb' package
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
Attribute availability:
o gene_id ... OK
o gene_name ... Nope
o entrezid ... Nope
o gene_biotype ... Nope
OK
Processing transcripts ...
Attribute availability:
o transcript_id ... OK
o gene_id ... OK
o source ... OK
I can't find type=='CDS'! The resulting database will lack CDS information!
Error in FUN(X[[i]], ...) :
Column 'tx_cds_seq_start' is not of type integer!
Calls: -> getTxDb
Execution halted

Currently, I constructed a complete bacterial genome my own, I do not know how to create the gtf file for my strain to make it fit with ARMOR. I am using the RAST server for the annotation process but RAST gtf output file was different from the reference gtf file in the ARMOR reference directory. I tried to fix part of it, but it seems difficult to me to make a good gtf file and txome file. When you have time, could you give me some instructions on how to create a gtf file and txome file to fit ARMOR input?

Thank you very much for your precious timen and valuable help,

Kind regards,
Le Phuong.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you very much for your suggestions, I tried to change the settings to "other". As you predicted, there are still errors. I would like to describe the situation in prokaryotic annotation pipeline. Currently, there are multiple tools for prokaryotic annotations such as:

prokka (https://github.com/tseemann/prokka; rapid prokaryotic genome annotation);
dfast (https://github.com/nigyta/dfast_core;DDBJ Fast Annotation and Submission Tool);
pgap (https://github.com/ncbi/pgap;NCBI Prokaryotic Genome Annotation Pipeline);
RAST (https://rast.nmpdr.org/; Rapid annotation using subsystem technology);
For the tool from 1 to 3, they only produce gff, gb file (even the NCBI based tool). Currently, I can only find RAST tool will make the output as gtf file. I attached the gtf file from RAST in this comment for your reference.(The RAST did not use ENSEMBL database as eukaryotic, this tool used European Molecular Biology Laboratory (EMBL) database)
There are many similar columns as compared to reference files provided by ARMOR, I tried to modify the RAST gtf file to make it fit with ARMOR but it seems complicated. I will try more. It would be helpful if you can give me some instructions on how to make a correct gtf file

Additionally, as you mentioned above, there are issues related to STAR and ximeta. The bacteria genome usually ranges from 4 Mbps to 5 Mbps (for Escherichia coli and Klebsiella pneumoniae). But there are plasmid genome inside bacteria as well which usually ranges from 2kbp to 200kbp. I would like ask how I can change the default settings inside ARMOR, currently I do not know how to change those settings according to your suggestions. I hope that I can apply ARMOR in my research,

Thank you very much for your precious time and kind instructions,

Kind regards,
Le Phuong.

Escherichia_coli_O157_H7_str_Sakai_complete.zip

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Here is the messages that I received when I changed to "other":
(snakemake) leo@leo:~/ARMOR$ snakemake --use-conda --core 10
<\Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 10
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 DRIMSeq
1 all
1 edgeR
1 multiqc
1 shiny
1 tximeta
6

[Wed Jul 1 21:55:47 2020]
rule tximeta:
input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E109_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E453_4_1/quant.sf, /home/leo/ARMOR/E109_E453/metadata_E109_E453.txt, /home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx/versionInfo.json, /home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json, scripts/run_tximeta.R
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout
jobid: 19
benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/tximeta_se.txt

[Wed Jul 1 21:55:47 2020]
rule multiqc:
input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E109_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E453_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_1_R1_val_1.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_1_R2_val_2.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R1_val_1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R1_val_1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R2_val_2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R2_val_2_fastqc.zip
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/multiqc.log
jobid: 1
benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/multiqc.txt

Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374
Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5
/home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:45: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:51: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
/home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:45: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
configs = yaml.load(f)
/home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:51: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
sp = yaml.load(f)
[WARNING] multiqc : MultiQC Version v1.9 now available!
[INFO ] multiqc : This is MultiQC v1.7 (1b88491)
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC'
[INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon'
[INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed'
Searching 58 files.. [####################################] 100%
[INFO ] salmon : Found 2 meta reports
[INFO ] salmon : Found 2 fragment length distributions
[INFO ] cutadapt : Found 4 reports
[INFO ] fastqc : Found 8 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html
[INFO ] multiqc : Data : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_data
[INFO ] multiqc : MultiQC complete
[Wed Jul 1 21:55:55 2020]
Finished job 1.
1 of 6 steps (17%) done
[Wed Jul 1 21:59:02 2020]
Error in rule tximeta:
jobid: 19
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5
shell:
R CMD BATCH --no-restore --no-save "--args salmondir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon' json='/home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json' metafile='/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt' outrds='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' annotation='Other' organism='Ecoli'" scripts/run_tximeta.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-01T215547.504384.snakemake.log>

I rerun the with --genomeSAindexNbases 10, (I calculated with 4.4Mbps genome size: log2(4,400,000)/2-1=10.03). But there are still errors:

<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374
Jul 02 00:30:37 ..... started STAR run
Jul 02 00:30:37 ..... loading genome
Jul 02 00:30:38 ..... started mapping
/bin/bash: line 1: 32212 Segmentation fault (core dumped) STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 10 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c
[Thu Jul 2 00:30:39 2020]
Error in rule starPE:
jobid: 22
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374
shell:
echo 'STAR version:
' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 10 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job starPE since they might be corrupted:
/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T001832.033298.snakemake.log
(snakemake) leo@leo:~/ARMOR$>

I would like to receive your advice on these issues,

Thank you very much for your precious time and kind help,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

You still need to check the .Rout file (/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout) for the tximeta error, the output above just says that something went wrong in this step.

For the STAR error, was the index recreated? Snakemake does not automatically rerun a rule if only the parameters of the call changed (in this case, the input files did not change). Otherwise, try to delete the STAR index and run it again.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you very much for your instructions, I deleted the STAR index and rerun. This time it showed fatal error due to lack of memory

<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374
Jul 02 02:15:50 ..... started STAR run
Jul 02 02:15:50 ..... loading genome
Jul 02 02:15:51 ..... started mapping
Jul 02 02:16:50 ..... finished mapping
Jul 02 02:16:50 ..... started sorting BAM

EXITING because of fatal ERROR: not enough memory for BAM sorting:
SOLUTION: re-run STAR with at least --limitBAMsortRAM 1188211575
Jul 02 02:16:50 ...... FATAL ERROR, exiting
[Thu Jul 2 02:16:50 2020]
Error in rule starPE:
jobid: 22
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374
shell:
echo 'STAR version:
' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 16 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

tried to find on the parameter on the config file, but I can not find this parameter. I would like to ask how I can edit the parameter "--limitBAMsortRAM 1188211575".

Thank you very much for your precious time and your kind help,

Kind regards,
Le Phuong.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,
Thank you so much for your precious time and kind help,
It is late in Korea now around 3AM. I will rerun the pipeline again and will let you know the results tomorrow,

Thank you so much,
Kind regards,
Le Phuong

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

I would like to update the errors after adding the addtional_star_align:"--limitBAMsortRAM 1288211575"

<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374
Jul 02 09:12:02 ..... started STAR run
Jul 02 09:12:02 ..... loading genome
Jul 02 09:12:03 ..... started mapping

BAMoutput.cpp:27:BAMoutput: exiting because of OUTPUT FILE error: could not create output file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1__STARtmp//BAMsort/20/16
SOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files.

Jul 02 09:12:04 ...... FATAL ERROR, exiting
[Thu Jul 2 09:12:04 2020]
Error in rule starPE:
jobid: 22
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374
shell:
echo 'STAR version:
' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 40 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_ --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c --limitBAMsortRAM 1288211575
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

I tried to use several commands such as:

sudo chmod -R o+rw /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1__STARtmp/BAMsort/20/
sudo chmod +rwx /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1__STARtmp/BAMsort/20/
chmod ugo+rwx /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1__STARtmp/BAMsort/20/

After each command, I rerun again, but there are no changes in the error messages. When you have time, I would like to receive your instructions on solving the problem,

Thank you very much for your precious time and kind help,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

It may be related to this: alexdobin/STAR#528. Try to change the ulimit or reduce the number of threads as suggested there. These are not really ARMOR errors (they would happen even if you ran STAR directly, outside of ARMOR), so I would suggest also checking the issues previously reported (and solved) for STAR.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,
I understood that I have to change the ulimit parameter. This may be my naive question, I would like to ask whether it is possible to change the parameter of ulimit in the config file of ARMOR. Or is it possible to change the parameter in the ARMOR snakemake file.

Thank you very much for your kind help and your patience,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

I have not tried this, but I suppose you could add a line in the beginning of the shell part of the appropriate rule in the Snakefile, setting the ulimit to the desired value for the current session. There is no parameter for changing this in the config file, since this is more of a system setting than directly related to the workflow.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you so much for your suggestions. I am not expert in bioinformatic field, I do not know how to add the shell part in the snakefile, but I will try furthermore. According to your advice, I reduced the threads from 40 to 10 and the problem were solved.

However, now I have other error messages:

<Searching 70 files.. [############################--------] 78%[Thu Jul 2 17:28:45 2020]
Finished job 26.
6 of 13 steps (46%) done

[Thu Jul 2 17:28:45 2020]
rule edgeR:
input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds, scripts/run_render.R, scripts/edgeR_dge.Rmd
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout
jobid: 25
benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/run_dge_edgeR.txt

Searching 70 files.. [####################################] 100%
[INFO ] salmon : Found 2 meta reports
[INFO ] salmon : Found 2 fragment length distributions
[INFO ] star : Found 2 reports
[INFO ] cutadapt : Found 4 reports
Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5
[INFO ] fastqc : Found 8 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html
[INFO ] multiqc : Data : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_data
[INFO ] multiqc : MultiQC complete
[Thu Jul 2 17:28:47 2020]
Finished job 1.
7 of 13 steps (54%) done
[Thu Jul 2 17:28:49 2020]
Error in rule edgeR:
jobid: 25
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5
shell:
R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' organism='Ecoli' design='~0+celline' contrast='celline1-celline2,celline2-celline1' genesets='H,C5' rmdtemplate='scripts/edgeR_dge.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='edgeR_dge.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu Jul 2 17:29:26 2020]
Finished job 19.
8 of 13 steps (62%) done
[Thu Jul 2 17:29:39 2020]
Finished job 18.
9 of 13 steps (69%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T172518.020533.snakemake.log>

I checked the file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout, and it showed this message:

Arguments that are only used for some of the reports

if (exists("organism")) {

```
print(organism)
```
} else {
```
organism <- NULL
```
}
[1] "Ecoli"

if (exists("design")) {

```
print(design)
```
} else {
```
design <- NULL
```
}
[1] "~0+celline"

if (exists("contrast")) {

contrast <- strsplit(gsub(" ","",contrast), ",")[[1]]

```
print(contrast)
```
} else {
```
contrast <- NULL
```
}
[1] "celline1-celline2" "celline2-celline1"

if (exists("genesets")) {

genesets <- strsplit(gsub(" ","",genesets), ",")[[1]]

```
print(genesets)
```
} else {
```
genesets <- NULL
```
}
[1] "H" "C5"

if (exists("gtffile")) {

```
print(gtffile)
```
} else {
```
gtffile <- NULL
```
}

if (exists("ncores")) {

```
ncores <- as.numeric(ncores)
```
```
if(is.na(ncores))
```
```
    ncores <- 1
```
```
print(ncores)
```
} else {
```
ncores <- 1
```
}

if (exists("bigwigdir")) {

```
bigwigdir <- normalizePath(bigwigdir)
```
```
print(bigwigdir)
```
} else {
```
bigwigdir <- NULL
```
}

source("scripts/generate_report.R")

generateReport(se = se, organism = organism, gtffile = gtffile,

           contrast = contrast, design = design, genesets = genesets,

           bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,

           outputDir = outputdir, outputFile = outputfile, ncores = ncores,

           forceOverwrite = TRUE, showCode = TRUE)

Error in generateReport(se = se, organism = organism, gtffile = gtffile, :
organism must be one of the organisms listed in msigdbr::msigdbr_show_species()
Execution halted>

I would like to to ask what I should do to solve the problem,

Thank you very much for your valuble time and kind help,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

Error in generateReport(se = se, organism = organism, gtffile = gtffile, :
organism must be one of the organisms listed in msigdbr::msigdbr_show_species()

This tells you that you can't perform gene set analysis for E.coli, since it's not included among the supported species. You need to set this to False:

ARMOR/config.yaml

Line 127 in 1b88491

run_camera: True

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you very much for your instructions. I changed "run_camera: False" and rerun. Here is the message error:
<22 of 28 steps (79%) done
[Thu Jul 2 21:13:14 2020]
Error in rule edgeR:
jobid: 25
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5
shell:
R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' organism='Ecoli' design='~0+celline' contrast='celline1-celline2,celline2-celline1' rmdtemplate='scripts/edgeR_dge.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='edgeR_dge.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu Jul 2 21:13:42 2020]
Finished job 19.
23 of 28 steps (82%) done
[Thu Jul 2 21:13:58 2020]
Finished job 18.
24 of 28 steps (86%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T204357.942352.snakemake.log>

Here is the Rout error message:

<> source("scripts/generate_report.R")

generateReport(se = se, organism = organism, gtffile = gtffile,

           contrast = contrast, design = design, genesets = genesets,

           bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,

           outputDir = outputdir, outputFile = outputfile, ncores = ncores,

           forceOverwrite = TRUE, showCode = TRUE)

Quitting from lines 232-234 (edgeR_dge.Rmd)
Error in eval(ej, envir = levelsenv) : object 'celline1' not found
Calls: generateReport ... eval -> as.data.frame -> makeContrasts -> eval -> eval
Execution halted>

I would like to ask what is the "celline1". In my case, is it my control bacterial strain?
I would like to send you the metadata.tsv:
names type celline treatment
E109_4_1 PE 1 None
E453_4_1 PE 1 Yes
I just included one set of data. Is it the correct way to make metadata.tsv file? I would like to receive your instructions

Thank you very much for your patience and your kind help,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

Yes, metadata.tsv should be a tab-delimited text file. You need to specify the design and contrasts appropriately in the configuration file. These are specified in the standard format used by (e.g.) edgeR - if you are uncertain about what that means I would suggest looking into the extensive edgeR user guide (there are lots of great examples of different models). ARMOR can do some checks of the specified parameters (including the design and contrasts) via the checkinputs rule (see https://github.com/csoneson/ARMOR/wiki/Setting-up-the-DGE-analysis).

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you very much for your instructions and your recommendations. I am still reading the instructions from edgeR wiki. I make the metadata file similar to edgE examples with tab delimited style. Then, I used the command line to check inputs, but it still has the errors: can not find cellline1. Does it mean that I have to change cellline1 to "None" and cellline2 to "Yes"? I am sorry this question looks stupid, but I cannot find the reason

Here is the error message

<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5
[Fri Jul 3 03:23:08 2020]
Error in rule checkinputs:
jobid: 0
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.Rout (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5
shell:
R CMD BATCH --no-restore --no-save "--args metafile='/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt' design='~0+celline' contrast='celline1-celline2,celline2-celline1' outFile='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt' gtf='/home/leo/ARMOR/E109_E453/E109.gtf' genome='/home/leo/ARMOR/E109_E453/E109.fa' fastqdir='/home/leo/ECOPAIR_RNAseq_rawdata' fqsuffix='fq' fqext1='R1' fqext2='R2' txome='/home/leo/ARMOR/E109_E453/E109_gene.fa' run_camera='False' organism='Ecoli' annotation='Other'" scripts/check_input.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.Rout;
cat /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-03T032306.257087.snakemake.log>

Here is the error of the run_dge_edgeR.Rout:

source("scripts/generate_report.R")

generateReport(se = se, organism = organism, gtffile = gtffile,

           contrast = contrast, design = design, genesets = genesets,

           bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,

           outputDir = outputdir, outputFile = outputfile, ncores = ncores,

           forceOverwrite = TRUE, showCode = TRUE)

I think it is a very last problem that I am facing now.

Thank you very much for your patience and your kind help,
Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

Ok, so I'm going to guess that the problem is that the cell line column is interpreted as a numeric rather than a categorical variable. Could you change to e.g. C1/C2 instead of 1/2, and then do cellineC1-cellineC2?

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

I have changed to C1/C2 in the metadata.tsv. I also changed cellineC1, cellineC2 in the yaml file. I rerun the whole process. However, there is another error:

I checked the run_dge_edgeR.Rout file:

< if (exists("bigwigdir")) {

```
bigwigdir <- normalizePath(bigwigdir)
```
```
print(bigwigdir)
```
} else {
```
bigwigdir <- NULL
```
}

source("scripts/generate_report.R")

generateReport(se = se, organism = organism, gtffile = gtffile,

           contrast = contrast, design = design, genesets = genesets,

           bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,

           outputDir = outputdir, outputFile = outputfile, ncores = ncores,

           forceOverwrite = TRUE, showCode = TRUE)

Quitting from lines 301-356 (edgeR_dge.Rmd)
Error: Can't subset columns that don't exist.
✖ Column gene_name doesn't exist.
Backtrace:
█

├─global::generateReport(...)
│ ├─base::do.call("render", args = args)
│ ├─rmarkdown::render(...)
│ │ └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
│ │ └─knitr:::process_file(text, output)
│ │ ├─base::withCallingHandlers(...)
│ │ ├─knitr:::process_group(group)
│ │ └─knitr:::process_group.block(group)
│ │ └─knitr:::call_block(x)
│ │ └─knitr:::block_exec(params)
│ │ ├─knitr:::in_dir(...)
│ │ └─knitr:::evaluate(...)
│ │ └─evaluate::evaluate(...)
│ │ └─evaluate:::evaluate_call(...)
│ │ ├─evaluate:::timing_fn(...)
│ │ ├─base:::handle(...)
│ │
Execution halted

According to your suggestion, the reason is truely due to the numeric rather than the categorical variable. This time, the problem is due to the empty gene name column. I need to spend time to think how to make the gene name column.

Thank you very much for your valuable time, patience and kind help,
Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

Yes, the edgeR script assumes that the data object has an annotation column gene_name (which it will have if you are using a Gencode or Ensembl annotation). In your case, you could create one e.g. by adding a line here:

ARMOR/scripts/run_tximeta.R

Line 48 in 1b88491

defining what you want the gene_name to be. For example

rowData(sg)$gene_name <- rowData(sg)$gene_id

depending on which columns you have in your object.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,

Thank you very much for your instructions. I added "rowData(sg)$gene_name <- rowData(sg)$gene_id" in line 48. And rerun again, then it showed another message:

<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5
[Sat Jul 4 01:05:00 2020]
Finished job 43.
2 of 5 steps (40%) done

[Sat Jul 4 01:05:00 2020]
rule DRIMSeq:
input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds, scripts/run_render.R, scripts/DRIMSeq_dtu.Rmd
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout
jobid: 36
benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/run_dtu_drimseq.txt
threads: 10

Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5
[Sat Jul 4 01:05:13 2020]
Error in rule DRIMSeq:
jobid: 36
output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.rds
log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout (check log file(s) for error message)
conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5
shell:
R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds' design='~0+celline' contrast='cellineC1-cellineC2,cellineC2-cellineC1' ncores='10' rmdtemplate='scripts/DRIMSeq_dtu.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='DRIMSeq_dtu.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.>

I checked the file: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout. And then it showed the message:

<> if (exists("bigwigdir")) {

```
bigwigdir <- normalizePath(bigwigdir)
```
```
print(bigwigdir)
```
} else {
```
bigwigdir <- NULL
```
}

source("scripts/generate_report.R")

generateReport(se = se, organism = organism, gtffile = gtffile,

           contrast = contrast, design = design, genesets = genesets,

           bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,

           outputDir = outputdir, outputFile = outputfile, ncores = ncores,

           forceOverwrite = TRUE, showCode = TRUE)

Quitting from lines 111-114 (DRIMSeq_dtu.Rmd)
Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr, :
!No genes left after filtering!
Calls: generateReport ... eval -> dmFilter -> dmFilter -> .local -> dmDS_filter
Execution halted>

Here are the lines 111-114:

d <- dmFilter(d, min_samps_gene_expr = 3, min_samps_feature_expr = 3,
              min_gene_expr = 10, min_feature_expr = 5)
plotData(d)

When you have time, I would like to receive your instructions on the next step,

Thank you very much for your valuable time and kind help,

Kind regards,
Le Phuong.

from armor.

csoneson commented on May 27, 2024

Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr, :
!No genes left after filtering!

This tells you what the error is - there are no genes left after the filtering that you have applied. This is not an ARMOR error - you may need to adjust the DRIMSeq filtering parameters to fit your data set. Please keep in mind that you should not just run the pipeline blindly - it is important that you understand what the different steps do in order to interpret the output properly.

from armor.

csoneson commented on May 27, 2024

I think we have solved the ARMOR-related problems here, so I'm closing this issue. Feel free to reopen if that's not the case.

from armor.

luongphekidz07 commented on May 27, 2024

Dear Expert,
Thank you so much for your valuable time and your kind help. Your advice and instructions helped a me a lot,

Kind regards,
Le Phuong.

from armor.

Error when using bacterial fa and gtf file in ARMOR about armor HOT 26 CLOSED

Comments (26)

Load json linkedTxome

Read metadata

List Salmon directories

Add file column to metadata and import annotated abundances

In transcript level

Arguments that are only used for some of the reports

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent