snakemake-workflows / rna-seq-star-deseq2 Goto Github PK

View Code? Open in Web Editor NEW

316.0 11.0 192.0 16.95 MB

RNA-seq workflow using STAR and DESeq2

License: MIT License

Python 71.54% R 28.46%

snakemake sciworkflows reproducibility gene-expression-analysis deseq2

rna-seq-star-deseq2's Introduction

Snakemake workflow: rna-seq-star-deseq2

This workflow performs a differential gene expression analysis with STAR and Deseq2.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and its DOI (see above).

rna-seq-star-deseq2's People

Contributors

Stargazers

Watchers

Forkers

davycats vincent-van-hoef bioinformaticssolutioncenter jbalberge rick-de-graaf mariabernard icedevil2001 jfallmann deryasebukhan bluegenes eggzilla smyang2018 gbcs-embl arupgsh htnani thomas-keller him72 zz2liu jihedc west-rynes caipine alphaneer priyankamaripuri guosongjia cihernand ropolomx sue02 joscolgan aleighbrown sschmeier jestabrook jafors olgasigalova hpon matrs c-kroeger phspo bgfritz1 ola-sam01 easypipi dominikszabo1 rimjhimroy davidecarlson manninm lcwheeler pburnham50 deanpettinga nsteinau scientistsmc kpakari jack-etheredge jianglch dkoppstein gmstanle jukre111 yachenhu dence pablommesas marykthompson davelunt zax-bioinforer tinter7 wenliangz helen-zhu heoly32 mumichae xjyx christian-heyer robinmeyers yztxwd jccdigital lipeishao tjbencomo geparada monovich quanrd ctb pagnini sbaghai haridh akhanf hermanzhaozzzz pittachalk ryanjoel charleshauser jenna-labelle hyphaltip yixiangzhang1996 jamiecfreeman phireu ekariuki-sleepy smped smartgamer lxd13579 jameyzhu thedbstern rosapatrycja lparsons zoulf001 muluayele999

rna-seq-star-deseq2's Issues

trim.smk file name

Good afternoon, thanks for the workflow!

I'm running into an issue with running the pipeline with trimming enabled. I'm getting the following error:

Missing input files for rule align:
results/trimmed/21_02764_01_S1_lane1_R2.fastq.gz
results/trimmed/21_02764_01_S1_lane1_R1.fastq.gz

After reading through the snake files, I think it is because the output from trim.smk has a slightly different name than expected by align.smk. For example:

trim.smk, line 29: fastq1="results/trimmed/{sample}-{unit}_R1.fastq.gz",
common.smk, line 98: "results/trimmed/{sample}_{unit}_{group}.fastq.gz",

Note that the first line has wildcards separated by a hyphen, while the second are separated by an underscore.

Could this be causing the 'missing input file' error?

Thanks again!

Matt.

`Align` step fails with `could not open input file /geneInfo.tab`

I have deployed the latest version of this workflow (v2.1.2) and only adjusted the config files. I use the ENSEMBL GRCm38 mouse reference genome (release 102).
Unfortunately, the mapping step fails with the following error:

Transcriptome.cpp:18:Transcriptome: exiting because of *INPUT FILE* error: could not open input file /geneInfo.tab
Solution: check that the file exists and you have read permission for this file
          SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Jun 06 19:42:37 ...... FATAL ERROR, exiting

This seems to be the same error as reported here.
Thank you for looking into this!

How to add quality control step (fastq_screen) to pipeline

Hello,

I've been trying to expand this pipeline as a means of learning how to use snakemake and I'm starting to understand it better, but I've been struggling with adding a quality control step (e.q., fastqc or fastq_screen). From my understanding, I think the problem is that the results of fastq_screen are "stand-alone" (i.e., they aren't intermediate files necessary for generating the final output in this case). I'm not entirely sure how to get snakemake to recognize that it needs to run this fastq_screen rule below on my fastq files (pre-trimmed in this case). I think it'll require adding something like the qc/{sample}-{unit}.fastq_screen.txt results to the all rule, but I can't get the wildcards to work correctly. Some help would be greatly appreciated! Thank you so much!

My Snakefile is:

shell.executable("bash")

configfile: "config.yaml"
samples = pd.read_table(config["samples"], index_col="sample")
units = pd.read_table(config["units"], index_col=["sample", "unit"], dtype=str)
units.index = units.index.set_levels([i.astype(str) for i in units.index.levels])  # enforce str in index


def is_single_end(sample, unit):
    return pd.isnull(units.loc[(sample, unit), "fq2"])

rule all:
    input:
        expand("results/diffexp/{contrast}.diffexp.tsv",
               contrast=config["diffexp"]["contrasts"]),
        "results/pca.svg",

def get_fastq(wildcards):
    return units.loc[(wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()

rule fastq_screen:
    input:
        get_fastq
    output:
        txt="qc/{sample}-{unit}.fastq_screen.txt",
        png="qc/{sample}-{unit}.fastq_screen.png"
    params:
        fastq_screen_config="{}".format(config["params"]["fastq_screen"]),
        subset=100000,
        aligner='bowtie2'
    threads: 8
    wrapper:
        "0.19.3/bio/fastq_screen"

include: "rules/trim.smk"
include: "rules/align.smk"
include: "rules/diffexp.smk"

When I run snakemake -np it doesn't seem to recognize that it needs to run fastq_screen, even if I try to use the -R option. How can I write the all rule so that it will also look for the fastq_screen results?

Job counts:
	count	jobs
	8	align
	1	all
	1	count_matrix
	8	cutadapt
	1	deseq2
	1	deseq2_init
	1	pca
	21

Multiqc error: module 'collections' has no attribute 'Mapping'

Hello,

I recently followed the instructions on the Snakemake workflow catalog to deploy this workflow. When running the workflow on a small test dataset, all rules up until multiqc ran without error and provided reasonable output. The multiqc rule failed, and inspecting the log revealed that the cause of the failure was an AttributeError: "module 'collections' has no attribute 'Mapping'".

I ran the workflow in a conda environment setup as the workflow catalog described, and ran the workflow using snakemake --cores all --use-conda --cache with the necessary environment variable set to specify the cache directory. I can provide additional information if anyone thinks it is a problem due to an error on my end configuring the pipeline, but judging by this issue report on the multiqc github, it seems like a problem that could be resolved by updating the multiqc wrapper that the workflow uses (seems to currently load multiqc 1.10.1, and the problem was fixed in version 1.11.0). Also, the conda environment that the multiqc rule runs in seems to be loading Python version 3.11.0, but according to this stack overflow post if it were to load a version older than 3.10.0, that could also solve the problem.

Again, happy to provide any additional information, and thank you all for a great tool that I am excited to start using!

Isaac

read_distribution.py

In the qc.smk file, many python script are needed, but I can't find them in the folder.
such as read_distribution.py

rule rseqc_readdis:
    input:
        bam="results/star/{sample}-{unit}/Aligned.sortedByCoord.out.bam",
        bed="results/qc/rseqc/annotation.bed",
    output:
        "results/qc/rseqc/{sample}-{unit}.readdistribution.txt",
    priority: 1
    log:
        "logs/rseqc/rseqc_readdis/{sample}-{unit}.log",
    conda:
        "../envs/rseqc.yaml"
    shell:
        "read_distribution.py -r {input.bed} -i {input.bam} > {output} 2> {log}"

`count-matrix.py` isn't for stranded RNA-seq protocols

The count-matrix.py script uses column 1 (second column) to get the counts, which is for unstranded RNA-seq protocols (STAR manual, section "Counting number of reads per gene"). I'm testing the pipeline with data from a single-end stranded protocol and to my understanding, the 3rd column should be used, which is the equivalent to htseq-count --stranded yes. Other popular stranded protocols should use the fourth column. Are you interested in accepting a change to this script to accept user input to choose the count column/s? If so, how would you do it (config file, others)?

SyntaxError in line 14 of Snakefile

Thanks for creating this package.
I've followed instructions here through to step 4, at which point when I try to run the workflow I get the following:

$snakemake --cores all -n
SyntaxError in line 14 of /path/to/workflow/Snakefile:
invalid syntax (Snakefile, line 14)

Line 14 being the config: line in the following context

...
module rna_seq_star_deseq2:
    snakefile:
        /path/to/workflow/Snakefile
    config:
        config
...

I can't see anything wrong with my config file. As far as I'm aware the packages from snakemake/deploy are all there. I also undid all of my project-specific entry data to the files in config and set them back to being identical to the original template. Do you have any idea where this is coming from?

base_level and level_of_interest in example config.yaml

In the diffexp section of the example config.yaml the two variables base_level and level_of_interest are set to A, B, and C, which are the sample names defined in samples.tsv.

However, shouldn't the level refer to the treatment_# column of samples.tsv, thus in the example files 'treated' or 'untreated'. (e.g. like in the .test/configs)

Confusion about the trimming for single-end data

Hello,

I am new to snakemake and very eager to learn more.

I am trying to play with this workflow to learn more about how to properly use snakemake and I'm using data from GEO, I'm a little stuck on the trimming step. There are two rules, one for paired-end trimming, and one for single-end trimming. I have single-end data, but the paired-end trimming rule is only getting called. My units.tsv file looks like this:

sample	unit	fq1
GSM945742	SRR504812	SRR504812.fastq
GSM945741	SRR504810	SRR504810.fastq
GSM945741	SRR504811	SRR504811.fastq

and the results of snakemake -np for one of the fastq files show that it's calling the paired-end rule. My question is, how can I make the workflow call the appropriate rule here?

rule cutadapt_pe:
    input: SRR504811.fastq
    output: trimmed/GSM945741-SRR504811.1.fastq.gz, trimmed/GSM945741-SRR504811.2.fastq.gz, trimmed/GSM945741-SRR504811.qc.txt
    log: logs/cutadapt/GSM945741-SRR504811.log
    jobid: 19
    wildcards: sample=GSM945741, unit=SRR504811

Thank you, I'm sorry if this is a dumb question

Failed to run with cutadapt

Hello,

when I'm trying to trim my fastq, after setting "True" to the config file, I'm getting an error.

Traceback (most recent call last):  File "/XXXXX/workflow/.snakemake/scripts/tmp1j2yk7de.wrapper.py", line 27, in <module>
    shell(
  File "/XXXXX/mambaforge/envs/snakemake_7.25.0/lib/python3.11/site-packages/snakemake/shell.py", line 300, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  cutadapt --cores 8 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  -o results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz -p results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz > results/trimmed/HIPPVG_CD4V_lane1.paired.qc.txt  2> logs/cutadapt/HIPPVG_CD4V_lane1.log' returned non-zero exit status 2.

I'm trying to trace back the error and I've look at the job lists.

[Fri Feb  9 20:09:00 2024]
    rule cutadapt_pe:
        input: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz, pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        output: results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1.paired.qc.txt
        log: logs/cutadapt/HIPPVG_CD4V_lane1.log
        jobid: 501
        reason: Missing output files: results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz; Input files updated by another job: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz, pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1
        threads: 8
        resources: tmpdir=/tmp

    [Fri Feb  9 20:09:00 2024]
    rule cutadapt_pipe:
        input: XXXXXXXX/HIPPVG_CD4V.cleaned.R1.fastq.gz
        output: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz (pipe)
        log: logs/pipe-fastqs/catadapt/HIPPVG_CD4V_lane1.fq1.fastq.gz.log
        jobid: 502
        reason: Missing output files: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1, fq=fq1, ext=fastq.gz
        threads: 0
        resources: tmpdir=/tmp


    [Fri Feb  9 20:09:00 2024]
    rule cutadapt_pipe:
        input: XXXXXXXX/HIPPVG_CD4V.cleaned.R2.fastq.gz
        output: pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz (pipe)
        log: logs/pipe-fastqs/catadapt/HIPPVG_CD4V_lane1.fq2.fastq.gz.log
        jobid: 503
        reason: Missing output files: pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1, fq=fq2, ext=fastq.gz
        threads: 0
        resources: tmpdir=/tmp

It seems the error can be trace back to the rule cutadapt_pipe. The rule should copy the fq1 and fq2 to a temporary folder pipe/cutadapt/{sample}. But the copies failed.

The shell command is cat {input} > {output} 2> {log}, but if output needs to be in a non-existant folder (pipe/cutadapt/{sample}) it can not work.

I couldn't found a line in the workflow creating these folders. I tried to change the shell command to "mkdir -p pipe/cutadapt/{wildcards.sample} && cat {input} > {output} 2> {log}", but it's not working and my snakemake knowledge is not good enough.

My fix is to run manually beforehand cutadapt on my 256 files and use the workflow without trimming, but I would rather have a working solution.

Regards

correct installation of workflow with R

Hello, thanks so much for making this repository.

I started out with following the installation steps linked in the readme, but noticed that if I follow steps 1-3 there, I get a Conda environment that does not contain any distribution of R.

I see r packages in the workflow/envs directory here on GitHub and was wondering, what is the way to do the installation such that the environment files such as biomart.yaml, deseq2.yaml etc. are set up properly?

Missing input files for rule rseqc_gtf2bed

When I try to run the pipeline, I get the error:

MissingInputException in line 3 of /home/yixin/Desktop/snakemake_workflow/rna-seq-star-deseq2-1.0.0/rules/qc.smk:
Missing input files for rule rseqc_gtf2bed:
~/Desktop/project_data/Ensembl/index/star/dmel/annotation.gtf

However, I check the file and it is indeed there.

head ~/Desktop/project_data/Ensembl/index/star/dmel/annotation.gtf
#!genome-build BDGP6.22
#!genome-version BDGP6.22
#!genome-build-accession GCA_000001215.4
3R	FlyBase	gene	567076	2532932	.	+	.	gene_id "FBgn0267431"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding";
3R	FlyBase	transcript	567076	2532932	.	+	.	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding";
3R	FlyBase	exon	567076	567268	.	+	.	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; exon_number "1"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding"; exon_id "FBtr0392909-E1";
3R	FlyBase	exon	835376	835491	.	+	.	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; exon_number "2"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding"; exon_id "FBtr0392909-E2";
3R	FlyBase	CDS	835378	835491	.	+	0	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; exon_number "2"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding"; protein_id "FBpp0352251";
3R	FlyBase	start_codon	835378	835380	.	+	0	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; exon_number "2"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding";
3R	FlyBase	exon	869486	869548	.	+	.	gene_id "FBgn0267431"; transcript_id "FBtr0392909"; exon_number "3"; gene_name "Myo81F"; gene_source "FlyBase"; gene_biotype "protein_coding"; transcript_name "Myo81F-RB"; transcript_source "FlyBase"; transcript_biotype "protein_coding"; exon_id "FBtr0392909-E3";

Any hints would be highly appreciated.

More a discussion point: Should trimming be made optional

I argue yes it should be optional. Trimming costs time and often get better (or similar good) results with directly using STAR without trimming.

Happy to make suggestions how to facilitate this. I have done it locally in my workflow by including different rule sets according to a flag in the config-file.

Seb

count-matrix.py

line#12 the groupby() operation could reorder columns, causing problem later when constructing the DESeqDataSet object.

Ubuntu 18.04.2 LTS - raise sp.CalledProcessError(retcode, cmd)

When I did the dry-run, the pipeline seems working, but when I run it, I got some errors related to star. The error is as followed:

Traceback (most recent call last):
File "/home/yixin/Desktop/snakemake_workflow/rna-seq-star-deseq2-1.0.0/.snakemake/scripts/tmpqyit_18m.wrapper.py", line 32, in
"STAR "
File "/home/yixin/.local/lib/python3.6/site-packages/snakemake/shell.py", line 149, in new
raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; STAR --quantMode GeneCounts --sjdbGTFfile /home/yixin/Desktop/project_data/Ensembl/index/star/dmel/annotation.gtf --runThreadN 4 --genomeDir /home/yixin/Desktop/project_data/Ensembl/index/star/dmel --readFilesIn trimmed/Dmel_control_1-rep2.fastq.gz --readFilesCommand zcat --outSAMtype BAM Unsorted --outFileNamePrefix star/Dmel_control_1-rep2/ --outStd Log > logs/star/Dmel_control_1-rep2.log 2>&1' returned non-zero exit status 105.

Any hint will be highly appreciated.

Data availability?

Hi! I'm using this as a test workflow for an orchestration tool, and I'm not familiar with the workflow itself and am looking for some dummy data so this doesn't happen:

Building DAG of jobs...
MissingInputException in rule align  in line 1 of https://github.com/snakemake-workflows/rna-seq-star-deseq2/raw/v1.2.0/workflow/rules/align.smk:
Missing input files for rule align:
    output: results/star/A-lane1/Aligned.sortedByCoord.out.bam, results/star/A-lane1/ReadsPerGene.out.tab
    wildcards: sample=A, unit=lane1
    affected files:
        A.2.fq.gz
        A.1.fq.gz

Is this readily available, and if not, is there another workflow in the catalog with data that is (maybe beyond the basic Snakemake getting started workflow?) Thank you!

Rule count_matrix job submission fails when used with --cluster parameter

Rule count_matrix in diffexp.smk works as expected when run locally, but it fails when used with lsf cluster using --cluster "bsub -o cluster.log". Job submission never happens in this case, and my preliminary debugging shows this happens when parsing params; commenting out params section lead to successful job submission.

Snakemake ver: 5.2.0

Below is the output I get:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 10
Job counts:
        count   jobs
        1       all
        1       count_matrix
        1       deseq2
        1       deseq2_init
        1       pca
        5

rule count_matrix:
    input: star/A-lane1/ReadsPerGene.out.tab, star/B-lane1/ReadsPerGene.out.tab
    output: counts/all.tsv
    jobid: 4

    Traceback (most recent call last):
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/__init__.py", line 541, in snakemake
    report=report)
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/workflow.py", line 653, in execute
    success = scheduler.schedule()
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/scheduler.py", line 286, in schedule
    self.run(job)
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/scheduler.py", line 302, in run
    error_callback=self._error)
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/executors.py", line 642, in run
    jobfailed=jobfailed)
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/executors.py", line 534, in write_jobscript
    **kwargs)
File "yoyoyo/.virtualenvs/rna-seq-star-deseq2-ueAqQq5F/lib/python3.6/site-packages/snakemake/executors.py", line 514, in format_job
    cluster=self.cluster_params(job))),
File "/python-3.6.0/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
File "/python-3.6.0/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
File "/python-3.6.0/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
File "/python-3.6.0/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'DataFrame' is not JSON serializable

snakemake version 5.9.1 raised an error with get_deseq2_threads called as a pointer

this line:

rna-seq-star-deseq2/rules/diffexp.smk

Line 75 in a83f301

threads: get_deseq2_threads

raised this error snakemake version 5.9.1

int() argument must be a string, a bytes-like object or a number, not 'function'

the error is not present with version 5.5.1

Replacing by threads: get_deseq2_threads() silences the error

deseq2-init.R returned non-zero exit status 1.

[Mon Oct 15 15:42:46 2018]
rule deseq2_init:
    input: counts/all.tsv
    output: deseq2/all.rds
    log: logs/deseq2/init.log
    jobid: 3

Activating conda environment: /media/leochong/backup4/rna-seq-star-deseq2/.snakemake/conda/28f98caa
[Mon Oct 15 15:43:01 2018]
Error in rule deseq2_init:
    jobid: 3
    output: deseq2/all.rds
    log: logs/deseq2/init.log
    conda-env: /media/leochong/backup4/rna-seq-star-deseq2/.snakemake/conda/28f98caa

RuleException:
CalledProcessError in line 33 of /media/leochong/backup4/rna-seq-star-deseq2/rules/diffexp.smk:
Command 'source activate /media/leochong/backup4/rna-seq-star-deseq2/.snakemake/conda/28f98caa; set -euo pipefail;  Rscript /media/leochong/backup4/rna-seq-star-deseq2/.snakemake/scripts/tmpju9bqzqy.deseq2-init.R ' returned non-zero exit status 1.
  File "/media/leochong/backup4/rna-seq-star-deseq2/rules/diffexp.smk", line 33, in __rule_deseq2_init
  File "/home/leochong/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Any input on why the pipeline is failing?

fq1 is not None (rule align)

Dear Snakemake lovers,
I am giving a first try at the rna-seq-star-deseq2 pipeline with my own samples.

Traceback (most recent call last): File "/home/mgalland/rna-seq-star-deseq2/.snakemake/scripts/afy96dg4.wrapper.py", line 18, in <module> assert fq1 is not None, "input-> fq1 is a required input parameter" AssertionError: input-> fq1 is a required input parameter

When I look at the trimmed fastq file (fq1), it contains reads and amounts to 1.8Mb. So I don't really know what's going on there.

Would there be a possibility to add test samples in order to run the pipeline fresh out of the box?

Thank you
Marc

rseqc version 3.0.0

I just realized that rseqc now is in version 3.0.0, supposedly for python3 but otherwise I think no big changes. You think it beneficial to update the rseqc env?

Seb

Costum genome as reference

It appears as if only genomes listed on ensemble [1] are supported.
If the species of interest is not listed, is it possible to use a costum fasta/gff combination instead?

[1] ftp://ftp.ensembl.org/pub

Issue running site-packages

Hello,
Thank you for useful workflow.
I think I have created the snakemake environment as described as well as I deployed your workflow.
But it gives me back error when running, supposedly due to unavailable packages.

(/CONDAS/users/bbidon/snakemake) [bbidon@gknwwd2 covid]$ snakemake 
Traceback (most recent call last):
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/cli.py", line 1933, in args_to_api
    dag_api = workflow_api.dag(
              ^^^^^^^^^^^^^^^^^
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 328, in dag
    return DAGApi(
           ^^^^^^^
  File "<string>", line 6, in __init__
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 438, in __post_init__
    self.workflow_api._workflow.dag_settings = self.dag_settings
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/api.py", line 385, in _workflow
    workflow.include(
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1386, in include
    exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
  File "/SCRATCH-BIRD/users/bbidon/ntu/covid/workflow/Snakefile", line 24, in <module>
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 2125, in decorate
    module.use_rules(
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/modules.py", line 104, in use_rules
    self.workflow.include(snakefile, overwrite_default_target=True)
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1386, in include
    exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
  File "https://raw.githubusercontent.com/snakemake-workflows/rna-seq-star-deseq2/v2.1.0/workflow/Snakefile", line 31, in <module>
    RerunTrigger,
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/workflow.py", line 1386, in include
    exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
  File "https://raw.githubusercontent.com/snakemake-workflows/rna-seq-star-deseq2/v2.1.0/workflow/rules/common.smk", line 7, in <module>
    import hashlib
          ^^^^^^^^^
  File "/CONDAS/users/bbidon/snakemake/lib/python3.12/site-packages/snakemake/remote/__init__.py", line 8, in __init__
    raise NotImplementedError(
NotImplementedError: Remote providers have been replaced by Snakemake storage plugins. Please use the corresponding storage plugin instead (snakemake-storage-plugin-*).
(/CONDAS/users/bbidon/snakemake) [bbidon@gknwwd2 covid]$ ls -lsha

Do you have any idea about the origin of the problem.
Thank you,
Baptiste

Edit by @dlaehnemann: turn into multi-line column

Prefered way to do several idependent runs

Hey @johanneskoester

Question not an issue.

I try to understand what the best way currently is to use this workflow in independent analyses.
Do you clone the repo new when doing a new analyses?
Do you run it in a subfolder like the .test one?

In my other workflows I define a base/result dir variable in the config.yaml that the Snakefile will use, where all results/logs/benchmarks for one run will be collected. Thus, doing a new analysis is as simple as creating a new config.yaml and changing the base/result dir variable in it.

Just wondering how you use this workflow in the best way.

Cheers,
Seb

Problem when defining SRA accession instead of fq1/fq2 hardfiles

Hi,

Firstly, thanks so much for the workflow, it's mostly working fine.

However, I'm receiving the following error when providing SRA input instead of fq1/fq2 (hard-files):

InputFunctionException in line 1 of https://github.com/snakemake-workflows/rna-seq-star-deseq2/raw/v1.2.0/workflow/rules/align.smk:
Error:
KeyError: 'fq1'
Wildcards:
sample=SRR5344335
unit=lane1
Traceback:
File "https://github.com/snakemake-workflows/rna-seq-star-deseq2/raw/v1.2.0/workflow/rules/common.smk", line 109, in get_fq

This seems to correspond to the following code:

u = units.loc[(wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()
if pd.isna(u["fq1"]):

where the first line is dropping the columns fq1/fq2 as they are empty, and line 2 is then false, so the accession is not defined/throws an error.

Here are the two input files. Working fine with fq1/fq1 input, but not with SRR input.

Thanks!
Sam

rule rseqc_gtf2bed failing with test dataset

Hi,

Thanks for all your work creating snakemake and putting together this pipeline!

I am running into an issue when trying to run the pipeline with the test data, particularly the rsec_gtf2bed rule. I am using snakemake version 5.7.0 (on a cluster). The error I get is pretty complicated and I am having trouble interpreting it. I think the first few error messages about the JSON Schema are unrelated. I have attached the output below of the command: snakemake --use-conda rseqc_gtf2bed

[user]$ snakemake --use-conda rseqc_gtf2bed
No validator found for JSON Schema version identifier 'http://json-schema.org/draft-06/schema#'
Defaulting to validator for JSON Schema version 'http://json-schema.org/draft-04/schema#'
Note that schema file may not be validated correctly.
No validator found for JSON Schema version identifier 'http://json-schema.org/draft-06/schema#'
Defaulting to validator for JSON Schema version 'http://json-schema.org/draft-04/schema#'
Note that schema file may not be validated correctly.
No validator found for JSON Schema version identifier 'http://json-schema.org/draft-06/schema#'
Defaulting to validator for JSON Schema version 'http://json-schema.org/draft-04/schema#'
Note that schema file may not be validated correctly.
Building DAG of jobs...
Creating conda environment envs/gffutils.yaml...
Downloading and installing remote packages.
Environment for envs/gffutils.yaml created (location: .snakemake/conda/a2417d57)
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 rseqc_gtf2bed
1

[Tue Oct 15 14:25:51 2019]
rule rseqc_gtf2bed:
input: .test/data/ref/annotation.chr21.gtf
output: qc/rseqc/annotation.bed, qc/rseqc/annotation.db
log: logs/rseqc_gtf2bed.log
jobid: 0

Activating conda environment: /data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/conda/a2417d57
Traceback (most recent call last):
File "/data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/scripts/tmp1qu6f74g.gtf2bed.py", line 3, in
import sys; sys.path.extend(["/share/apps/rc/software/snakemake/5.7.0-foss-2018b-Python-3.6.6/lib/python3.6/site-packages", "/data/project/rna-seq-star-deseq2-public-pipeline/scripts"]); import pickle; snakemake = pickle.loads(b'\x80\x03csnakemake.script\nSnakemake\nq\x00)\x81q\x01}q\x02(X\x05\x00\x00\x00inputq\x03csnakemake.io\nInputFiles\nq\x04)\x81q\x05X#\x00\x00\x00.test/data/ref/annotation.chr21.gtfq\x06a}q\x07X\x06\x00\x00\x00_namesq\x08}q\tsbX\x06\x00\x00\x00outputq\ncsnakemake.io\nOutputFiles\nq\x0b)\x81q\x0c(X\x17\x00\x00\x00qc/rseqc/annotation.bedq\rX\x16\x00\x00\x00qc/rseqc/annotation.dbq\x0ee}q\x0f(h\x08}q\x10(X\x03\x00\x00\x00bedq\x11K\x00N\x86q\x12X\x02\x00\x00\x00dbq\x13K\x01N\x86q\x14uh\x11h\rh\x13h\x0eubX\x06\x00\x00\x00paramsq\x15csnakemake.io\nParams\nq\x16)\x81q\x17}q\x18h\x08}q\x19sbX\t\x00\x00\x00wildcardsq\x1acsnakemake.io\nWildcards\nq\x1b)\x81q\x1c}q\x1dh\x08}q\x1esbX\x07\x00\x00\x00threadsq\x1fK\x01X\t\x00\x00\x00resourcesq csnakemake.io\nResources\nq!)\x81q"(K\x01K\x01e}q#(h\x08}q$(X\x06\x00\x00\x00_coresq%K\x00N\x86q&X\x06\x00\x00\x00_nodesq'K\x01N\x86q(uh%K\x01h'K\x01ubX\x03\x00\x00\x00logq)csnakemake.io\nLog\nq*)\x81q+X\x16\x00\x00\x00logs/rseqc_gtf2bed.logq,a}q-h\x08}q.sbX\x06\x00\x00\x00configq/}q0(X\x07\x00\x00\x00samplesq1X\x0b\x00\x00\x00samples.tsvq2X\x05\x00\x00\x00unitsq3X\t\x00\x00\x00units.tsvq4X\x08\x00\x00\x00trimmingq5}q6(X\x04\x00\x00\x00skipq7\x89X\x07\x00\x00\x00adapterq8X\x16\x00\x00\x00ACGGATCGATCGATCGATCGATq9uX\x03\x00\x00\x00refq:}q;(X\x05\x00\x00\x00indexq<X\x14\x00\x00\x00.test/data/ref/indexq=X\n\x00\x00\x00annotationq>X#\x00\x00\x00.test/data/ref/annotation.chr21.gtfq?uX\x03\x00\x00\x00pcaq@}qAX\x06\x00\x00\x00labelsqB]qCX\t\x00\x00\x00conditionqDasX\x07\x00\x00\x00diffexpqE}qFX\t\x00\x00\x00contrastsqG}qHX\x14\x00\x00\x00treated-vs-untreatedqI]qJ(X\x07\x00\x00\x00treatedqKX\t\x00\x00\x00untreatedqLessX\x06\x00\x00\x00paramsqM}qN(X\x04\x00\x00\x00starqOX\x00\x00\x00\x00qPX\x0b\x00\x00\x00cutadapt-seqQhPX\x0b\x00\x00\x00cutadapt-peqRhPuuX\x04\x00\x00\x00ruleqSX\r\x00\x00\x00rseqc_gtf2bedqTX\x0f\x00\x00\x00bench_iterationqUNX\t\x00\x00\x00scriptdirqVXL\x00\x00\x00/data/project/rna-seq-star-deseq2-public-pipeline/scriptsqWub.'); from snakemake.logging import logger; logger.printshellcmds = False; real_file = file; file = '/data/project/rna-seq-star-deseq2-public-pipeline/scripts/gtf2bed.py';
File "/share/apps/rc/software/snakemake/5.7.0-foss-2018b-Python-3.6.6/lib/python3.6/site-packages/snakemake/init.py", line 21, in
from snakemake.workflow import Workflow
File "/share/apps/rc/software/snakemake/5.7.0-foss-2018b-Python-3.6.6/lib/python3.6/site-packages/snakemake/workflow.py", line 30, in
from snakemake.dag import DAG
File "/share/apps/rc/software/snakemake/5.7.0-foss-2018b-Python-3.6.6/lib/python3.6/site-packages/snakemake/dag.py", line 1533
return f"#{hex_r:0>2X}{hex_g:0>2X}{hex_b:0>2X}"
^
SyntaxError: invalid syntax
[Tue Oct 15 14:25:53 2019]
Error in rule rseqc_gtf2bed:
jobid: 0
output: qc/rseqc/annotation.bed, qc/rseqc/annotation.db
log: logs/rseqc_gtf2bed.log (check log file(s) for error message)
conda-env: /data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/conda/a2417d57

RuleException:
CalledProcessError in line 14 of /data/project/rna-seq-star-deseq2-public-pipeline/rules/qc.smk:
Command 'source /share/apps/rc/software/Anaconda3/5.3.1/bin/activate '/data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/conda/a2417d57'; set -euo pipefail; python /data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/scripts/tmp1qu6f74g.gtf2bed.py' returned non-zero exit status 1.
File "/data/project/rna-seq-star-deseq2-public-pipeline/rules/qc.smk", line 14, in __rule_rseqc_gtf2bed
File "/share/apps/rc/software/Python/3.6.6-foss-2018b/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /data/project/rna-seq-star-deseq2-public-pipeline/.snakemake/log/2019-10-15T142444.388725.snakemake.log

I apologize in advance if this is due to an error on my part but I welcome any insight you may have.

Thanks,
Sam

config.yaml defaults

Good afternoon, thanks for the workflow!

In the default config.yaml on github, there is an example option like so:

config.yaml, line 37: star: "--outSAMtype BAM Unsorted"

However, if left in the config file it causes an error because then the STAR step gets two "--outSAMtype" flags. I realise defaults are meant to be changed, but I don't think this particular example can ever be used (because the flag is already hard-coded into the workflow).

Thanks again!

Matt.

Adjusting output based on multiple types of input (single end vs paired end)

Hello, sorry I have another basic question, I am trying to run fastqc before trimming on both single end and paired end data (to determine extent of adapters in pre-trimmed fastq), but I'm having trouble figuring out how to get the output of fastqc to change depending on whether there is paired-end or single-end data. I'm also getting a pair of fastq files returned from the get_fastq def, but I can't figure out how to get fastqc to run each file separately, and I'm not sure how to format the output in the qc rules to expect either paired-end or single end data. For example, this is my Snakefile in which I have attempted to code both rules separately:

import pandas as pd
shell.executable("bash")

configfile: "config.yaml"
samples = pd.read_table(config["samples"], index_col="sample")
units = pd.read_table(config["units"], index_col=["sample", "unit"], dtype=str)
units.index = units.index.set_levels([i.astype(str) for i in units.index.levels])  # enforce str in index

def is_single_end(sample, unit):
    return pd.isnull(units.loc[(sample, unit), "fq2"])

rule qc_se:
    input:
        expand("qc/{unit.sample}-{unit.unit}_fastqc.html", unit=units.reset_index().itertuples()),

rule qc_pe:
    input:
        expand("qc/{unit.sample}-{unit.unit}_{group}_fastqc.html", group=[1,2], unit=units.reset_index().itertuples()),

include: "rules/fastqc.smk"

This is my fastqc rule:

def get_fastq(wildcards):
    return units.loc[(wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()

rule fastqc:
    input:
        get_fastq
    output:
        html="qc/{sample}-{unit}_fastqc.html",
        zip="qc/{sample}-{unit}_fastqc.zip"
    params: ""
    wrapper:
        "0.19.3/bio/fastqc"

rule fastqc_pe:
    input:
        get_fastq
    output:
        html="qc/{sample}-{unit}_{group}_fastqc.html",
        zip="qc/{sample}-{unit}_{group}_fastqc.zip",
    params: ""
    wrapper:
        "0.19.3/bio/fastqc"

I have to force rule qc_pe or else it does not run the paired-end rule. When I do run snakemake -np -R qc_pe I get the output below:

I cannot figure out how to iterate through the list of input to run fastqc individually, and not sure how to tell the qc rule to expect paired-end data and not single-end data when there is a second fastq file, it would be nice to have one qc rule that can handle both cases

rule fastqc_se:
    input: SRR5282942_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/WT-SRR5282942_fastqc.zip, qc/WT-SRR5282942_fastqc.html
    jobid: 2
    wildcards: sample=WT, unit=SRR5282942


rule fastqc_pe:
    input: SRR5282942_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/WT-SRR5282942_2_fastqc.zip, qc/WT-SRR5282942_2_fastqc.html
    jobid: 4
    wildcards: sample=WT, unit=SRR5282942, group=2


rule fastqc_pe:
    input: SRR5282942_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/WT-SRR5282942_1_fastqc.zip, qc/WT-SRR5282942_1_fastqc.html
    jobid: 6
    wildcards: sample=WT, unit=SRR5282942, group=1


rule fastqc_pe:
    input: SRR5282943_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/KO-SRR5282943_2_fastqc.zip, qc/KO-SRR5282943_2_fastqc.html
    jobid: 7
    wildcards: sample=KO, unit=SRR5282943, group=2


rule fastqc_se:
    input: SRR5282943_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/KO-SRR5282943_fastqc.zip, qc/KO-SRR5282943_fastqc.html
    jobid: 3
    wildcards: sample=KO, unit=SRR5282943


rule fastqc_pe:
    input: SRR5282943_1.fastq.gz, SRR5282942_2.fastq.gz
    output: qc/KO-SRR5282943_1_fastqc.zip, qc/KO-SRR5282943_1_fastqc.html
    jobid: 5
    wildcards: sample=KO, unit=SRR5282943, group=1

localrule qc_se:
    input: qc/WT-SRR5282942_fastqc.html, qc/KO-SRR5282943_fastqc.html
    jobid: 0


localrule qc_pe:
    input: qc/WT-SRR5282942_1_fastqc.html, qc/WT-SRR5282942_2_fastqc.html, qc/KO-SRR5282943_1_fastqc.html, qc/KO-SRR5282943_2_fastqc.html
    jobid: 1

Job counts:
        count   jobs
        4       fastqc_pe
        2       fastqc_se
        1       qc_pe
        1       qc_se
        8

Thank you so much for the help, I am inspired by your workflow system and I am eager to learn the ins and outs

Error in code

Hi. I got the following error when I ran this pipeline. Can you please look at the code as this is a code error?

check "--outSAMtype" command. The second one (not first).

subprocess.CalledProcessError: Command 'set -euo pipefail;  STAR --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --sjdbGTFfile resources/genome.gtf --outSAMtype BAM Unsorted --runThreadN 10 --genomeDir resources/star_genome --readFilesIn /lustre7/home/bhimbiswa/Domestication_genes/Old/Matsumoto_2021/DRR238202.fastq   --outFileNamePrefix results/star/DRR238202-1/ --outStd Log  > logs/star/DRR238202-1.log 2>&1' returned non-zero exit status 102.

Can you please help in troubleshooting this error
These are the logs of snakemake run, and star.

DRR238202-1.log

snakemake.txt

deseq2 error in SummarizedExperiment and GenomicRanges

Hey,

I tried running the snakemake pipeline provided within this repository, but ran into an issue with the deseq init job.

Error in validObject(.Object) : 
  invalid class “SummarizedExperiment” object: undefined class for slot "NAMES" ("characterORNULL")
Calls: DESeqDataSetFromMatrix ... new_SummarizedExperiment -> new -> initialize -> initialize -> validObject

This error appear to be caused by a mismatching versions of SummarizedExperiments.
SummarizedExperiment 1.4.0 was installed by conda for this rule's eviroment.
I tried specifying another version of SummarizedExperiments in the deseq2.yaml eviroment file, but I still got another error (this time in GenomicRanges (1.30.0)):

Using SummarizedExperiment 1.8.0:

Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class “GRangesList” is not valid for slot ‘rowRanges’ in an object of class “DESeqDataSet”; is(value, "GenomicRangesORGRangesList") is not TRUE
Calls: DESeqDataSetFromMatrix ... initialize -> as<- -> asMethod -> slot<- -> checkSlotAssignment

Which versions of these packages should be used/are you using? Perhaps this should be noted in the deseq2.yaml file to avoid these issues?

my sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: 
LAPACK: 

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    methods   stats     graphics  grDevices utils    
[8] datasets  base     

other attached packages:
 [1] DESeq2_1.16.1              SummarizedExperiment_1.8.0
 [3] DelayedArray_0.4.1         matrixStats_0.52.2        
 [5] Biobase_2.38.0             GenomicRanges_1.30.0      
 [7] GenomeInfoDb_1.14.0        IRanges_2.12.0            
 [9] S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.60.0       locfit_1.5-9.1          splines_3.4.1          
 [4] lattice_0.20-34         colorspace_1.3-2        htmltools_0.3.6        
 [7] base64enc_0.1-3         blob_1.1.0              survival_2.40-1        
[10] XML_3.98-1.6            rlang_0.1.2             DBI_0.6-1              
[13] foreign_0.8-67          BiocParallel_1.6.6      bit64_0.9-5            
[16] RColorBrewer_1.1-2      GenomeInfoDbData_0.99.1 plyr_1.8.4             
[19] stringr_1.2.0           zlibbioc_1.24.0         munsell_0.4.3          
[22] gtable_0.2.0            htmlwidgets_0.9         memoise_1.1.0          
[25] latticeExtra_0.6-28     knitr_1.16              geneplotter_1.56.0     
[28] AnnotationDbi_1.40.0    htmlTable_1.9           Rcpp_0.12.13           
[31] acepack_1.4.1           xtable_1.8-2            scales_0.4.1           
[34] backports_1.0.5         checkmate_1.8.2         Hmisc_4.0-3            
[37] annotate_1.56.0         XVector_0.18.0          bit_1.1-12             
[40] gridExtra_2.2.1         ggplot2_2.2.0           digest_0.6.12          
[43] stringi_1.1.2           grid_3.4.1              tools_3.4.1            
[46] bitops_1.0-6            magrittr_1.5            RSQLite_2.0            
[49] lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3           
[52] Formula_1.2-1           cluster_2.0.6           Matrix_1.2-7.1         
[55] data.table_1.10.4       rpart_4.1-10            nnet_7.3-12            
[58] compiler_3.4.1

annotation.bed file size zero

After added QC process, I realized that annotation.bed (gtf2bed module) newly included.
However, my pipeline got error in QC step.

I notice that annotation.bed file is zero size, but annotation.db (temporal file) was not.

Which I use GTF file? My GTF file downloaded from iGenome illumina (Rat).

Thank you.

specify snakelike version during installation?

Instructions for installation currently start with

mamba create -c conda-forge -c bioconda --name snakemake snakemake snakedeploy

However, it seems like with the latest version of snakemake (8.2.3), running snakemake leads to the error

NotImplementedError: Remote providers have been replaced by Snakemake storage plugins. Please use the corresponding storage plugin instead (snakemake-storage-plugin-*).

Condition in samples.tsv is too specific

For a generic RNA-Seq pipeline, 'condition' is too specific, as a label for grouping samples.

It may be that the user wants to make a comparison between sample groups from different genotypes, different timepoints, rather than different conditions or treatments. 'Group' would be a more generic label than 'Condition'.

DESeq2 Plots Incorrectly Displayed

The MA and PCA plots produced by DESeq2 lack any text labels. These labels
are replaced by empty rectangles. The same issue occurs with JPEG and SVG files (only included JPEG because GitHub prevents SVG uploads).

The issue occurs whether I'm using the deseq2.yaml environment provided in this repo or an updated environment using newer versions of the software packages.

Pipeline failing with cutadapt

Hello,

When I set trimming to true in the config file, I get the following error

cutadapt: error: unrecognized arguments: pipe/cutadapt/1/lane1.fq1.fastq.gz pipe/cutadapt/1/lane1.fq2.fastq.gz

In looking through log files, it appears this error occurs with the following command (which is called by the cutadapt_pe recipe):

subprocess.CalledProcessError: Command 'set -euo pipefail; cutadapt --cores 8 nan -o results/trimmed/1_lane2_R1.fastq.gz -p results/trimmed/1_lane2_R2.fastq.gz pipe/cutadapt/1/lane2.fq1.fastq.gz pipe/cutadapt/1/lane2.fq2.fastq.gz > results/trimmed/1_lane2.paired.qc.txt 2> logs/cutadapt/1_lane2.log' returned non-zero exit status 2.

I haven't been able to figure out why there is this "unrecognized arguments" error, since commands seems to look good to me. Any help with this would be appreciated!

Pipeline is delisted in Snakemake workflow catalog

Hi,

I am not sure what happened but the pipeline is delisted from https://snakemake.github.io/snakemake-workflow-catalog/. Which is very unfortunate as this repo points to the manual at https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows%2Frna-seq-star-deseq2 which, as a consequence, is broken.

AttributeError: module 'sqlite3' has no attribute 'OptimizedUnicode'

daler/gffutils#221

You need to change the dependency version of gffutils to 0.12.

snakedeploy fails to deploy workflow

I am getting TypeError: copytree() got an unexpected keyword argument 'dirs_exist_ok' when I try to deploy the rna-seq-star-deseq2 workflow as shown below:

$ snakedeploy deploy-workflow https://github.com/snakemake-workflows/rna-seq-star-deseq2 . --tag v1.1.2 --force --verbose
Writing Snakefile with module definition...
Obtaining source repository...
Cloning into '.'...
remote: Enumerating objects: 548, done.
remote: Counting objects: 100% (79/79), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 548 (delta 22), reused 44 (delta 12), pack-reused 469
Receiving objects: 100% (548/548), 16.90 MiB | 0 bytes/s, done.
Resolving deltas: 100% (265/265), done.
Checking connectivity... done.
Writing template configuration...
Traceback (most recent call last):
  File "/home/kodalivk/.conda/envs/snakemake/bin/snakedeploy", line 10, in <module>
    sys.exit(main())
  File "/home/kodalivk/.conda/envs/snakemake/lib/python3.6/site-packages/snakedeploy/client.py", line 155, in main
    force=args.force,
  File "/home/kodalivk/.conda/envs/snakemake/lib/python3.6/site-packages/snakedeploy/deploy.py", line 62, in deploy
    shutil.copytree(config_dir, dest_config, dirs_exist_ok=force)
TypeError: copytree() got an unexpected keyword argument 'dirs_exist_ok'

I do not have mamba so I used conda to install snakemake and snakedeploy as follows:

$ conda create -c bioconda -c conda-forge --name snakemake snakemake snakedeploy

snakemake-workflows / rna-seq-star-deseq2 Goto Github PK

rna-seq-star-deseq2's Introduction

Snakemake workflow: rna-seq-star-deseq2

Usage

rna-seq-star-deseq2's People

Contributors

Stargazers

Watchers

Forkers

rna-seq-star-deseq2's Issues

Recommend Projects

Recommend Topics

Recommend Org