nf-core / cutandrun Goto Github PK

Analysis pipeline for CUT&RUN and CUT&TAG experiments that includes QC, support for spike-ins, IgG controls, peak calling and downstream analysis.

Home Page: https://nf-co.re/cutandrun

License: MIT License

HTML 0.79% Python 14.39% Nextflow 72.54% Groovy 10.31% Dockerfile 0.05% Awk 0.59% Perl 1.32%

nf-core nextflow workflow pipeline cutandrun-seq cutandrun cutandtag-seq cutandtag

cutandrun's Introduction

[![GitHub Actions CI Status](https://github.com/nf-core/cutandrun/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/cutandrun/actions?query=workflow%3A%22nf-core+CI%22) [![GitHub Actions Linting Status](https://github.com/nf-core/cutandrun/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/cutandrun/actions?query=workflow%3A%22nf-core+linting%22) [![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?logo=Amazon%20AWS)](https://nf-co.re/cutandrun/results) [![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.5653535-1073c8)](https://doi.org/10.5281/zenodo.5653535)

Introduction

nf-core/cutandrun is a best-practice bioinformatic analysis pipeline for CUT&RUN, CUT&Tag, and TIPseq experimental protocols that were developed to study protein-DNA interactions and epigenomic profiling.

CUT&RUN

Meers, M. P., Bryson, T. D., Henikoff, J. G., & Henikoff, S. (2019). Improved CUT&RUN chromatin profiling tools. eLife, 8. https://doi.org/10.7554/eLife.46314

CUT&Tag

Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., Ahmad, K., & Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications, 10(1), 1930. https://doi.org/10.1038/s41467-019-09982-5]

TIPseq

Bartlett, D. A., Dileep, V., Handa, T., Ohkawa, Y., Kimura, H., Henikoff, S., & Gilbert, D. M. (2021). High-throughput single-cell epigenomic profiling by targeted insertion of promoters (TIP-seq). Journal of Cell Biology, 220(12), e202103078. https://doi.org/10.1083/jcb.202103078

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a portable, reproducible manner. It is capable of using containerisation and package management making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process, which makes it easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules.

The pipeline has been developed with continuous integration (CI) and test driven development (TDD) at its core. nf-core code and module linting as well as a battery of over 100 unit and integration tests run on pull request to the main repository and on release of the pipeline. On official release, automated CI tests run the pipeline on a full-sized dataset on AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

Check input files
Merge re-sequenced FastQ files (cat)
Read QC (FastQC)
Adapter and quality trimming (Trim Galore!)
Alignment to both target and spike-in genomes (Bowtie 2)
Filter on quality, sort and index alignments (samtools)
Duplicate read marking (picard)
Create bedGraph files (bedtools
Create bigWig coverage files (bedGraphToBigWig)
Peak calling (SEACR, MACS2)
Consensus peak merging and reporting (bedtools)
Library complexity ([preseq](Preseq | The Smith Lab))
Fragment-based quality control (deepTools)
Peak-based quality control (bedtools, custom python)
Heatmap peak analysis (deepTools)
Genome browser session (IGV)
Present all QC in web-based report (MultiQC)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

group,replicate,fastq_1,fastq_2,control
h3k27me3,1,h3k27me3_rep1_r1.fastq.gz,h3k27me3_rep1_r2.fastq.gz,igg_ctrl
h3k27me3,2,h3k27me3_rep2_r1.fastq.gz,h3k27me3_rep2_r2.fastq.gz,igg_ctrl
igg_ctrl,1,igg_rep1_r1.fastq.gz,igg_rep1_r2.fastq.gz,
igg_ctrl,2,igg_rep2_r1.fastq.gz,igg_rep2_r2.fastq.gz,

Each row represents a pair of fastq files (paired end).

Now, you can run the pipeline using:

nextflow run nf-core/cutandrun
-profile <docker/singularity/.../institute>
--input samplesheet.csv
--peakcaller 'seacr,MACS2'
--genome GRCh38
--outdir

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Typical command for CUT&Run/CUT&Tag/TIPseq analysis:

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/cutandrun was originally written by Chris Cheshire (@chris-cheshire) and Charlotte West (@charlotte-west) from Luscombe Lab at The Francis Crick Institute, London, UK.

The pipeline structure and parts of the downstream analysis were adapted from the original CUT&Tag analysis protocol from the Henikoff Lab. The removal of duplicates arising from linear amplification (also known as T7 duplicates) in the TIPseq protocol was implemented as described in the original TIPseq paper.

We thank Harshil Patel (@drpatelh) and everyone in the Luscombe Lab (@luslab) for their extensive assistance in the development of this pipeline.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #cutandrun channel (you can join with this invite).

Citations

If you use nf-core/cutandrun for your analysis, please cite it using the following doi: 10.5281/zenodo.5653535

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

cutandrun's People

Contributors

Stargazers

Watchers

Forkers

luslab simiro dladd jordeu fredhutch hpcbio tomazoulab mashehu noirot cmwilson24 felixlee0608 bioinfojavierrelab drpatelh inayat45shaikh cloxd taliaora carmoa9y teemuronkko goodwright ryan-moreno smoe cynthiamoncadareid skaonis adrijak brickmanlab tomrrr1 yulong-su dapluggg yungpanini randolium biofriends jpcartailler trogenixbio kaze66zzh crick-pipelines-stp ahepperla bhc-rbct bioinf-lab dhslab gtk-lab latchbio-nfcore

cutandrun's Issues

Review documentation

Add option for SEACR "relaxed" mode

Description of feature

In case, I think I can prepare a PR with some guidance.

Would you prefer something like --seacr_mode [stringent|relaxed] with a stringent default, or a boolean switch like --seacr_relaxed?

Or maybe add a new mode SEACR_RELAXED to --peakcaller? this would allow to run both, and choose one for further analysis

Add additional options for peak callers

Is your feature request related to a problem? Please describe

SEACR has limited parameters to play with to adjust peak calling. It tends to be overly sensitive in our hands, and it has no good metrics for downstream peak filtering,

Describe the solution you'd like

Additional options for peakcallers (MACS, MACS2, etc) would be appreciated and allow for consensus and thresholding approaches to deal with false positives downstream. Being able to run more than one peakcaller would be an added bonus.

Upgrade Nextflow minimum version to 21.04.3

Missing chromosome sizes in genome.fa.sizes

Description of the bug

Dear nf-core team,

When running the nf-core/cutandrun pipeline via singularity I am running into an error during the UCSC_BEDCLIP process.
It states the " Chromosome chr17 isn't in genome.fa.sizes line 94596 of igg_R1.sorted.bedGraph: chr17:75974-76048". This is confirmed when checking the genome.fa.sizes file in the working directory. It only contains chr1-16.

Thanks in advance for your support.

Best wishes,

Joschka

Command used and terminal output

export NXF_HOME=/omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7
nextflow run nf-core/cutandrun \
    -resume -profile singularity \
    -c cutandrun.config \
    -r 1.1 \
    --input samplesheet.csv \
    --genome hg19 \
    --outdir /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7 \
    --email [email protected] 

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:UCSC_BEDCLIP (igg_R1)'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:UCSC_BEDCLIP (igg_R1)` terminated with an error exit status (255)

Command executed:

  bedClip \
      igg_R1.sorted.bedGraph \
      genome.fa.sizes \
      igg_R1.clipped.bedGraph
  
  cat <<-END_VERSIONS > versions.yml
  UCSC_BEDCLIP:
      ucsc: $(echo 377)
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  WARNING: While bind mounting '/omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7:/omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7': destination is already in the mount point list
  WARNING: While bind mounting '/omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/work/5d/b324fb0c20d7b1ba4ec31634e6bfd2:/omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/work/5d/b324fb0c20d7b1ba4ec31634e6bfd2': destination is already in the mount point list
  Chromosome chr17 isn't in genome.fa.sizes line 94596 of igg_R1.sorted.bedGraph: chr17:75974-76048

Work dir:
  /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/work/5d/b324fb0c20d7b1ba4ec31634e6bfd2

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`


Pulling Singularity image docker://luslab/cutandrun-dev-plot-consensus-peaks:latest [cache /omics/groups/OE0219/internal/Joschka/singularity/cache/luslab-cutandrun-dev-plot-consensus-peaks-latest.img]
-[nf-core/cutandrun] Sent summary e-mail to [email protected] (sendmail)-
-[nf-core/cutandrun] Pipeline completed with errors-
WARN: Killing pending tasks (4)

Relevant files

chrM 16571
chr1 249250621
chr2 243199373
chr3 198022430
chr4 191154276
chr5 180915260
chr6 171115067
chr7 159138663
chr8 146364022
chr9 141213431
chr10 135534747
chr11 135006516
chr12 133851895
chr13 115169878
chr14 107349540
chr15 102531392
chr16 52015733

System information

N E X T F L O W ~ version 21.10.6
Launching nf-core/cutandrun [grave_davinci] - revision: c30a37f [1.1]

Core Nextflow options
revision : 1.1
runName : grave_davinci
containerEngine : singularity
launchDir : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7
workDir : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/work
projectDir : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/assets/nf-core/cutandrun
userName : heyj
profile : singularity
configFiles : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/assets/nf-core/cutandrun/nextflow.config, /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/cutandrun.config

Input/output options
input : samplesheet.csv
outdir : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7
email :

Reference data options
genome : hg19
bowtie2 : s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/
gtf : s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf
blacklist : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7/assets/nf-core/cutandrun/assets/blacklists/hg19-blacklist.bed
spikein_bowtie2 : s3://ngi-igenomes/igenomes//Escherichia_coli_K_12_MG1655/NCBI/2001-10-15/Sequence/Bowtie2Index/
spikein_fasta : s3://ngi-igenomes/igenomes//Escherichia_coli_K_12_MG1655/NCBI/2001-10-15/Sequence/WholeGenomeFasta/genome.fa
fasta : s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa
igenomes_base : s3://ngi-igenomes/igenomes/

Pipeline Options
gene_bed : s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed

Institutional config options
config_profile_description: ODCF cluster.
config_profile_contact : Joschka Hey
config_profile_url : https://www.dkfz.de/

Input/output options
input : samplesheet.csv
outdir : /omics/groups/OE0219/internal/AML712_ipsc/210130_ACT_seq_nf_processing/14D7
email : [email protected]

Pipeline Options
gene_bed : s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed

Institutional config options
config_profile_description: ODCF cluster.
config_profile_contact : Joschka Hey
config_profile_url : https://www.dkfz.de/

Max job request options
max_cpus : 128
max_memory : 150 GB
max_time : 2d

How to skip the spike-in

Hello, I'm the beginner of the nf-core framework.
In my experiment, there is no spike-in control. But I want to use the cutandrun to analysis my data. Found no option to skip the spike-in in pipeline.

Impliment new nf-core module system

Description of feature

Internal task

pipeline fail in NFCORE_CUTANDRUN:CUTANDRUN:CONSENSUS_PEAKS:PLOT_CONSENSUS_PEAKS

Description of the bug

I use the cutandrun V2.0. it works well while using the test file. When I use the other data, it's show NFCORE_CUTANDRUN:CUTANDRUN:CONSENSUS_PEAKS:PLOT_CONSENSUS_PEAKS.

Command used and terminal output

nextflow run nf-core/cutandrun -r 2.0 -profile singularity -params-file nf-params.json

the nf-params.json:
{
    "input": "test.csv",
    "outdir": "cuttag",
    "genome": "hg38",
    "bowtie2": "\/home\/rjz\/Zhitai\/Genomes\/Homo_sapiens\/UCSC\/hg38\/Sequence\/Bowtie2Index\/",
    "gtf": "\/home\/rjz\/Zhitai\/Genomes\/Homo_sapiens\/UCSC\/hg38\/annotation\/hg38.ncbiRefSeq.gtf",
    "spikein_bowtie2": "\/home\/rjz\/Zhitai\/Genomes\/Escherichia_coli_K_12_DH10B\/Ensembl\/EB1\/Sequence\/Bowtie2Index\/",
    "spikein_fasta": "\/home\/rjz\/Zhitai\/Genomes\/Escherichia_coli_K_12_DH10B\/Ensembl\/EB1\/Sequence\/WholeGenomeFasta\/genome.fa",
    "fasta": "\/home\/rjz\/Zhitai\/Genomes\/Homo_sapiens\/UCSC\/hg38\/Sequence\/WholeGenomeFasta\/genome.fa",
    "skip_trimming": true,
    "skip_removeduplicates": true,
    "peakcaller": "macs2",
    "max_cpus":12,
    "max_memory":"14GB"
}

Relevant files

Error information:
Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:CONSENSUS_PEAKS:PLOT_CONSENSUS_PEAKS'

Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:CONSENSUS_PEAKS:PLOT_CONSENSUS_PEAKS terminated with an error exit status (1)

Command executed:

consensus_peaks.py
--peaks "*.peaks.bed"
--outpath .

cat <<-END_VERSIONS > versions.yml
"NFCORE_CUTANDRUN:CUTANDRUN:CONSENSUS_PEAKS:PLOT_CONSENSUS_PEAKS":
python: $(python --version | grep -E -o "([0-9]{1,}.)+[0-9]{1,}")
numpy: $(python -c 'import numpy; print(numpy.version)')
pandas: $(python -c 'import pandas; print(pandas.version)')
upsetplot: $(python -c 'import upsetplot; print(upsetplot.version)')
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'count'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/rjz/.nextflow/assets/nf-core/cutandrun/bin/consensus_peaks.py", line 75, in
peak_counts = upsetplot.from_memberships(cat_list, data = df_i['count'])
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'count'

Work dir:
/mnt/data/cuttag/work/66/9943b42cf673625faa38e8b513510e

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Failed to invoke workflow.onComplete event handler

-- Check script '/home/rjz/.nextflow/assets/nf-core/cutandrun/./workflows/cutandrun.nf' at line: 967 or see '.nextflow.log' file for more details

System information

nextflow version 22.04.0.5697
Hardware:desktop
Container engine:Singularity
singularity version:singularity-ce version 3.10.0-focal
OS: ubuntu 20LTS
Version of nf-core/cutandrun: 2.0

Bump python versions to 3.9.10

Description of feature

Internal task

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRIP

Hi,
I am getting an error running the pipelne using docker profile. Below I paste pipeline info 👍
Run Name: gigantic_ekeblad

####################################################

nf-core/cutandrun execution completed unsuccessfully!

####################################################
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRIP (ABE24-N1X_R1)'

Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRIP (ABE24-N1X_R1) terminated with an error exit status (1)

Command executed:

frip.py
--bams ".bam"
--peaks ".bed"
--threads 12
--outpath .

python --version | grep -E -o "([0-9]{1,}.)+[0-9]{1,}" > python.version.txt

Command exit status:
1

Command output:
Calculating ABE24-N1X_R1.target.markdup.bam using ABE24-N1X_R1.peaks.bed.stringent.bed

Command error:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-vkw9ufgl because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "/home/jmartinezv/.nextflow/assets/nf-core/cutandrun/bin/frip.py", line 44, in
reads_at_peaks = cr.run()
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 356, in run
imap_res = mapReduce.mapReduce([],
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptools/mapReduce.py", line 85, in mapReduce
bed_interval_tree = GTF(bedFile, defaultGroup=defaultGroup, transcriptID=transcriptID, exonID=exonID, transcript_id_designator=transcript_id_designator, keepExons=keepExons)
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptoolsintervals/parse.py", line 591, in init
ftype = self.inferType(fp, line, labelColumn)
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptoolsintervals/parse.py", line 166, in inferType
raise RuntimeError('{0} does not seem to be a recognized file type!'.format(self.filename))
RuntimeError: ABE24-N1X_R1.peaks.bed.stringent.bed does not seem to be a recognized file type!

Work dir:
/local/ljmartinezv/IreneF_Cut_Run/work/53/ddece62198fb63faeb9c87bd4c48cd

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

The workflow was completed at 2022-01-04T14:55:17.532437+01:00 (duration: 5h 9m 26s)

The command used to launch the workflow was as follows:

nextflow run nf-core/cutandrun --input samplesheet_N1X.csv --genome hg19 --spikein_bowtie2 's3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/' --spikein_fasta 's3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/genome.fa' --skip_trimming --max_memory 60GB --outdir /local/ljmartinezv/IreneF_Cut_Run/results_N1X -profile docker

Pipeline Configuration:

revision: master
runName: gigantic_ekeblad
containerEngine: docker
launchDir: /local/ljmartinezv/IreneF_Cut_Run
workDir: /local/ljmartinezv/IreneF_Cut_Run/work
projectDir: /home/jmartinezv/.nextflow/assets/nf-core/cutandrun
userName: jmartinezv
profile: docker
configFiles: /home/jmartinezv/.nextflow/assets/nf-core/cutandrun/nextflow.config
input: samplesheet_N1X.csv
outdir: /local/ljmartinezv/IreneF_Cut_Run/results_N1X
genome: hg19
bowtie2: s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/
gtf: s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf
blacklist: /home/jmartinezv/.nextflow/assets/nf-core/cutandrun/assets/blacklists/hg19-blacklist.bed
spikein_bowtie2: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/
spikein_fasta: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/genome.fa
fasta: s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa
igenomes_base: s3://ngi-igenomes/igenomes/
gene_bed: s3://ngi-igenomes/igenomes//Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed
skip_trimming: true
max_memory: 60GB
Date Started: 2022-01-04T09:45:51.189377+01:00
Date Completed: 2022-01-04T14:55:17.532437+01:00
Pipeline script file path: /home/jmartinezv/.nextflow/assets/nf-core/cutandrun/main.nf
Pipeline script hash ID: 8e8fba88a8cec012928bfc5c32179c0e
Pipeline repository Git URL: https://github.com/nf-core/cutandrun
Pipeline repository Git Commit: 81ede33
Pipeline Git branch/tag: master
Nextflow Version: 21.04.3
Nextflow Build: 5560
Nextflow Compile Timestamp: 21-07-2021 15:09 UTC

Most striking thing is that it is working just fine in another set of samples.
Thanks very much in advance,
Best.
Jaime.

Remove Pipeline warnings

Description of the bug

WARN: There's no process matching config selector: NFCORE_CUTANDRUN:CUTANDRUN:DEEPTOOLS_COMPUTEMATRIX_GENE
WARN: There's no process matching config selector: NFCORE_CUTANDRUN:CUTANDRUN:DEEPTOOLS_COMPUTEMATRIX_PEAKS
WARN: There's no process matching config selector: NFCORE_CUTANDRUN:CUTANDRUN:DEEPTOOLS_PLOTHEATMAP_PEAKS
WARN: There's no process matching config selector: .*PREPARE_GENOME:GUNZIP_.*
WARN: There's no process matching config selector: NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_GENOME:UNTAR_.*

Command used and terminal output

No response

Relevant files

No response

System information

No response

Improve FRIP performance

Description of the bug

Frip calculation often crashes pipeline - also need an option to skip

Command used and terminal output

No response

Relevant files

No response

System information

No response

Reporting error in seaborn violin plot generation when running multisample no groups

  Namespace(bin_frag='*bin500.awk.bed', func=<function gen_png at 0x7f98d967aa60>, log='log.txt', meta='meta_table_ctrl.csv', meta_ctrl='meta_table_ctrl.csv', output='.', raw_frag='*.frag_len.txt', seacr_bed='*bed.*.bed')

Command error:
  2022-01-06 08:54:32,287:reporting:INFO - CUT&RUN Python Reporting
  2022-01-06 08:54:32,288:gen_img:INFO - Generating plots to output folder
  Traceback (most recent call last):
    File "/camp/home/cheshic/.nextflow/assets/nf-core/cutandrun/bin/reporting.py", line 72, in <module>
      parsed_args.func(parsed_args)
    File "/camp/home/cheshic/.nextflow/assets/nf-core/cutandrun/bin/reporting.py", line 42, in gen_png
      report_gen.generate_cutandrun_reports(output_path)
    File "/camp/lab/luscomben/home/users/cheshic/nfcache/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 63, in generate_cutandrun_reports
      plots, data, mqc_frag_hist = self.generate_reports()
    File "/camp/lab/luscomben/home/users/cheshic/nfcache/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 317, in generate_reports
      plot7b, data7b = self.peak_widths()
    File "/camp/lab/luscomben/home/users/cheshic/nfcache/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 544, in peak_widths
      ax = sns.violinplot(data=self.seacr_beds, x="group", y="peak_width", hue="replicate", palette = "viridis")
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/seaborn/categorical.py", line 2397, in violinplot
      plotter = _ViolinPlotter(x, y, hue, data, order, hue_order,
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/seaborn/categorical.py", line 524, in __init__
      self.estimate_densities(bw, cut, scale, scale_hue, gridsize)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/seaborn/categorical.py", line 639, in estimate_densities
      kde, bw_used = self.fit_kde(kde_data, bw)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/seaborn/categorical.py", line 675, in fit_kde
      kde = stats.gaussian_kde(x, bw)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/scipy/stats/kde.py", line 206, in __init__
      self.set_bandwidth(bw_method=bw_method)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/scipy/stats/kde.py", line 556, in set_bandwidth
      self._compute_covariance()
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/scipy/stats/kde.py", line 565, in _compute_covariance
      self._data_covariance = atleast_2d(cov(self.dataset, rowvar=1,
    File "<__array_function__ internals>", line 5, in cov
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/numpy/lib/function_base.py", line 2469, in cov
      avg, w_sum = average(X, axis=1, weights=w, returned=True)
    File "<__array_function__ internals>", line 5, in average
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/numpy/lib/function_base.py", line 407, in average
      scl = wgt.sum(axis=axis, dtype=result_dtype)
    File "/opt/conda/envs/reporting/lib/python3.8/site-packages/numpy/core/_methods.py", line 47, in _sum
      return umr_sum(a, axis, dtype, out, keepdims, initial, where)
  TypeError: No loop matching the specified signature and casting was found for ufunc add

Remove linting error dummy files on next release of tools

files_exist: File not found: bin/scrape_software_versions.py                                                                                                                 
│ files_exist: File not found: modules/local/get_software_versions.nf

Add an extra multqc table to show a list of software versions not by process (update nf-core as well)

Consider changing samplesheet format to that of the ChIP-seq pipeline

Is your feature request related to a problem? Please describe

In short, the samplesheet as it currently stands is somewhat confusing for new users to utilize and needlessly complicated. It also results in IgG output filenames being hardcoded, which is generally annoying.

Describe the solution you'd like

Switching to use a more consistent and simple samplesheet like the ChIP-seq pipeline may be worth consideration, as it's generally more intuitive and flexible.

Update pipeline to support new nf-core/modules software version framework

Pipeline has no release, but no WIP or UNDER CONSTRUCTION warning

igg_control parameter is boolean not int as in description

Description of the bug

igg_control parameter is boolean not int as in description

--igg_control  Specifies if the samplesheet contains an IgG control default: 1

Command used and terminal output

nextflow run nf-core/cutandrun \
            --input samplesheet.csv \
            --genome GRCm38 \
            --multiqc_title title \
            --save_spikein_aligned \
            --spikein_fasta spike.fna \
            --igg_control 1 \
            -profile cbe

ERROR: Validation of pipeline parameters failed!
* --igg_control: expected type: Boolean, found: Integer (1)



### Relevant files

_No response_

### System information

_No response_

pipeline fails at MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX

Description of the bug

The error message in not very explicit, I can't find the work/ subdir corresponding to the command

Command used and terminal output

$ nextflow run nf-core/cutandrun \
    -profile docker \
    --input cutandrun_sheet_se.csv \
    --igenomes_base ../igenomes \
    --genome GRCh38 \
    --igg_control false


Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX (C52_Pol2_S9_R1)'

Caused by:
  Oops.. something wrong happened while creating task 'NFCORE_CUTANDRUN:CUTANDRUN:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX' unique id -- Offending keys: [
 - type=java.util.UUID value=e24faeb5-1185-432a-86ff-892d11384595, 
 - type=java.lang.String value=NFCORE_CUTANDRUN:CUTANDRUN:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX, 
 - type=java.lang.String value="""
    samtools index $options.args $bam
    cat <<-END_VERSIONS > versions.yml
    ${getProcessName(task.process)}:
        ${getSoftwareName(task.process)}: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    END_VERSIONS
    """
, 
 - type=java.lang.String value=quay.io/biocontainers/samtools:1.13--h8c37831_0, 
 - type=java.lang.String value=meta, 
 - type=java.util.LinkedHashMap value=[id:C52_Pol2_S9_R1, group:C52_Pol2_S9, replicate:1, control_group:4, single_end:true, bt2_total_reads_target:20939652, bt2_align1_target:, bt2_align_gt1_target:, bt2_non_aligned_target:, bt2_total_aligned_target:0, bt2_total_reads_spikein:20939652, bt2_align1_spikein:, bt2_align_gt1_spikein:, bt2_non_aligned_spikein:, bt2_total_aligned_spikein:0], 
 - type=java.lang.String value=bam, 
 - type=nextflow.util.ArrayBag value=[FileHolder(sourceObj:/data/array/sfacchini/PR0626/work/e4/ac5ce468f9676ad37c33243b1f767a/C52_Pol2_S9_R1.target.markdup.bam, storePath:/data/array/sfacchini/PR0626/work/e4/ac5ce468f9676ad37c33243b1f767a/C52_Pol2_S9_R1.target.markdup.bam, stageName:C52_Pol2_S9_R1.target.markdup.bam)], 
 - type=java.lang.String value=$, 
 - type=java.lang.Boolean value=true, 
 - type=java.util.HashMap$EntrySet value=[task.process=NFCORE_CUTANDRUN:CUTANDRUN:MARK_DUPLICATES_PICARD:SAMTOOLS_INDEX, options.args=]]

Relevant files

.nextflow.log

System information

No response

Deeptools compute matrix empty file error

No IgG control

test_full does not work with AWS Batch or Docker profiles

Description of the bug

We created an AWS Batch environment to test nf-core/cutandrun before running it against our data. According to the documentation of nf-core:

Finally, each pipeline comes with a config profile called test and test_full. These are used for automated pipeline CI tests and will run the workflow with a minimal / full-size public dataset, respectively. They can also be used for performing test(s) run of nf-core pipeline on your infrastructure, before using your own data.

The test profile works as expected. AWS Batch fires up new instances and churns through the pipeline in about 20 minutes (please see the attached screen capture for an example of some jobs in action).

The test_full_small profile also works with AWS Batch.

However, the test_full profile fails on the first steps of the pipeline (please see below for more detailed error output).

executor >  awsbatch (3)
[5b/bc654a] process > NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_GENOME:CUSTOM_GETCHROMSIZES (genome.fa)    [100%] 1 of 1, failed: 1 ✘
[62/cccfd6] process > NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_GENOME:GET_SPIKEIN_CHROM_SIZES (genome.fa) [100%] 1 of 1, failed: 1 ✘
[21/029f06] process > NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (test-GSE145187-a... [  0%] 0 of 1

When we try to run the pipeline against our data, we see the exact same error that is returned by the test_full profile.

This is not a bucket access issue because we see the outputs from the test and test_full_small appearing as expected.

It seems that the test_full profile is unable to find the reference genomes?

Any advice or guidance you can offer on how to move forward would be greatly appreciated.

Command used and terminal output

# Running test with AWS Batch is successful.

nextflow run nf-core/cutandrun -profile test,awsbatch --awsqueue nextflow-cutandrun_queue --awsregion us-east-2 --outdir s3://rmarable-cutandrun-us-east-2/rmarable/outputs -bucket-dir s3://rmarable-cutandrun-us-east-2/rmarable/bucket-dir

# Running test_full_small is also successful.

nextflow run nf-core/cutandrun -profile test_small,awsbatch --awsqueue nextflow-cutandrun_queue --awsregion us-east-2 --outdir s3://rmarable-cutandrun-us-east-2/rmarable/outputs -bucket-dir s3://rmarable-cutandrun-us-east-2/rmarable/bucket-dir

# Running test_full with AWS Batch fails.

nextflow run nf-core/cutandrun -profile test_full,awsbatch --awsqueue nextflow-cutandrun_queue --awsregion us-east-2 --outdir s3://rmarable-cutandrun-us-east-2/rmarable/outputs -bucket-dir s3://rmarable-cutandrun-us-east-2/rmarable/bucket-dir

# Pipeline error output - this is the same error we observe when running the pipeline against our own data.

-[nf-core/cutandrun] Pipeline completed with errors-
Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_GENOME:GET_SPIKEIN_CHROM_SIZES (genome.fa)'

Caused by:
  Essential container in task exited

Command executed:

  samtools faidx genome.fa
  cut -f 1,2 genome.fa.fai > genome.fa.sizes

  cat <<-END_VERSIONS > versions.yml
  GET_SPIKEIN_CHROM_SIZES:
      get: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  nxf-scratch-dir ip-10-2-43-3.us-east-2.compute.internal:/tmp/nxf.XXXXOifpcp
  An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
  fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

Work dir:
  s3://flaretx-cutandrun-us-east-2/rmarable/bucket-dir/62/cccfd6236681a6fab380e08d0530c6

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Relevant files

nextflow.log-2.gz

System information

Nextflow Version

      N E X T F L O W
      version 21.10.6 build 5660
      created 21-12-2021 16:55 UTC
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Executor = awsbatch
Container engine = Docker
OS = Amazon Linux 2
nf-core/cutandrun = 1.1

Create single mulled container for all custom reporting

iGenomes not accepted

Description of the bug

The pipeline does not accept --genome GRCm38

however GRCm38 is clearly in the file cutandrun/conf/igenomes.config

Command used and terminal output

nextflow run nf-core/cutandrun \
            --input samplesheet.csv \
            --genome GRCm38 \
            --multiqc_title analysis \
            --save_spikein_aligned \
            --spikein_fasta spike.fa \
            --igg_control true \
            -profile cbe

  Genome 'GRCm38' not found in any config files provided to the pipeline.
  Currently, the available genome keys are:
  GRCh38



### Relevant files

_No response_

### System information

nextflow version 22.01.0-edge.5656

add getchromsizes to nf-core modules

"nf-core download nf-core/cutandrun" fails

Description of the bug

Attempting to download the pipeline along with Singularity images fails with the following error:
CRITICAL [Errno 2] No such file or directory: '/big_storage/work/nextflow/singularity_cache/luslab-cutandrun-dev-frip-latest.img'
This appears to be happening because during the download all images starting with luslab-* are being created one folder above the NXF_SINGULARITY_CACHEDIR and then, when nf-core searches for them there they are not present.
I have circumvented this by running nf-core download nf-core/cutandrun inside the NXF_SINGULARITY_CACHEDIR hehe :)
but it is clearly a bug. all other images are successfully loaded to NXF_SINGULARITY_CACHEDIR and read from there.

Command used and terminal output

$ nf-core download nf-core/cutandrun
select 1.1 version
with Singularity images
(pack option does not matter)

Relevant files

No response

System information

Ubuntu 20.04.4
nf-core/tools version 2.3 (fresh conda download)

Support multiple type of normalisation

Description of feature

None
Spike-in
Read depth

Include statement is not allowed within a workflow definition

Check Documentation

I have checked the following places for your error:

Description of the bug

I am seeing Include statement is not allowed within a workflow definition when running cutnrun:

$ nextflow run ../src/cutandrun -c cutnrun.conf -profile uiuc_hpcbio --custom_config_base /home/a-m/cjflds/projects/zhao5/2021-June-CutAndRun/src/configs -resume -qs 3 -with-report -with-trace
N E X T F L O W  ~  version 21.06.0-edge
Launching `../src/cutandrun/main.nf` [serene_ride] - revision: 0041d1bad8


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/cutandrun v1.0.0dev
------------------------------------------------------
Core Nextflow options
  runName                   : serene_ride
  containerEngine           : singularity
  launchDir                 : /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/results
  workDir                   : /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/results/work
  projectDir                : /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/src/cutandrun
  userName                  : cjfields
  profile                   : uiuc_hpcbio
  configFiles               : /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/src/cutandrun/nextflow.config, /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/results/cutnrun.conf

Input/output options
  input                     : samplesheet.csv
  email                     : [email protected]
  save_reference            : true

Reference data options
  gtf                       : /home/groups/hpcbio/references/nextflow/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf
  blacklist                 : /home/a-m/cjfields/projects/zhao5/2021-June-CutAndRun/data/reference/ENCFF356LFX.bed
  spikein_fasta             : /home/a-m/cjfields/projects/zhao5/2021-June-CutAndRun/data/reference/GCF_000146045.2_R64_genomic.fna
  fasta                     : /home/a-m/cjfields/projects/zhao5/2021-June-CutAndRun/data/reference/GRCh38_flu_RSV.fa
  igenomes_base             : /home/groups/hpcbio/references/nextflow

Pipeline Options
  igg_control               : false
  gene_bed                  : /home/groups/hpcbio/references/nextflow/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed

Institutional config options
  custom_config_base        : /home/a-m/cjfields/projects/zhao5/2021-June-CutAndRun/src/configs
  config_profile_description: University of Illinois IGB Biocluster set up, base provided by nf-core/configs.
  config_profile_contact    : Chris Fields (cjfields - at - illinois.edu)
  config_profile_url        : https://http://biocluster.igb.illinois.edu/

Max job request options
  max_cpus                  : 24
  max_memory                : 500 GB
  max_time                  : 24d 20h 31m 24s

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/cutandrun for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/cutandrun/blob/master/CITATIONS.md
------------------------------------------------------
Include statement is not allowed within a workflow definition

 -- Check script '/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/src/cutandrun/main.nf' at line: 64 or see '.nextflow.log' file for more details

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run cutandrun -c cutnrun.conf -profile uiuc_hpcbio --custom_config_base /home/a-m/cjflds/projects/zhao5/2021-June-CutAndRun/src/configs -resume -qs 3 -with-report -with-trace
See error: Include statement is not allowed within a workflow definition

Expected behaviour

Workflow runs

Log files

Have you provided the following extra information/files:

The command used to run the pipeline (see above)
The .nextflow.log file (see above)

System

Hardware: HPC
Executor: SLURM
OS: CentOS Linux
Version 7

Nextflow Installation

Version: 21.06.0-edge

Container engine

Engine: Singularity
version: 3.8.1

Add pipeline diagram to README

WARNING: Skipping mount /local/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container

Description of the bug

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRAGMENTS:BEDTOOLS_BAMTOBED
Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRAGMENTS:BEDTOOLS_BAMTOBED terminated with an error exit status (140)

Command executed:

bedtools
bamtobed
-bedpe
-i file_R3.mapped.sorted.bam
| bedtools sort > file_R3.bed

cat <<-END_VERSIONS > versions.yml
"NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRAGMENTS:BEDTOOLS_BAMTOBED":
bedtools: $(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS

Command exit status:
140

Command output:
(empty)

Command error:
WARNING: Skipping mount /local/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container

Command used and terminal output

No response

Relevant files

No response

System information

No response

fail at FRIP calculation

Description of the bug

it fail consistently at FRIP calculation
How can i skip this step to get the multiQC steps.

Thank you

Command used and terminal output

nextflow run ./cutandrun     \
    -profile docker    \
    --input /home/shiyi/mnt/cutandrun/fastq_2022jan06/sample.csv \
    --bowtie2 /home/shiyi/mnt/2021genome/bowtie/   \
    --gtf /home/shiyi/mnt/2021genome/Sus_scrofa.Sscrofa11.1.104.gtf \
    --fasta /home/shiyi/mnt/2021genome/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa \
    --spikein_genome 'K12-MG1655' \
    --max_memory 128.GB \
    --max_cpus 70 \
    --outdir 2022_jan27_cutandrun \
    -resume special_sammet

Work well for IGV session generations
fail at FRIP calculation

Relevant files

nextflow.log.1.txt

System information

ec2 instance

CALCULATE_FRIP fails when no peaks are found

Check Documentation

I have checked the following places for your error:

Description of the bug

First, thank you for this great pipeline!
I noticed the following bug: For samples for which no peaks are called (i.e. the .peaks.bed.stringent.bed file is empty), the following error occurs during CALCULATE_FRIP : "RuntimeError: .peaks.bed.stringent.bed does not seem to be a recognized file type!"

My impression is that the problem lies here, in bin/frip.py. It seems like it would make sense to check if the .bed file is empty before calling crpb.CountReadsPerBin, and return 0 as the frip value in this case.

Note that I have replaced the real sample/group name with in the error log below due to confidentiality.

Steps to reproduce

Steps to reproduce the behaviour:

Command line: nextflow run nf-core/cutandrun -r 1.0.0 -bg --input $(pwd)/config/CUTandRUN/2021-12-20.samplesheet.csv --outdir $(pwd)/res/CUTandRUN/nextflow_CUTandRUN_pipeline/2021-12-20/nextflow -profile docker --genome hg38 --spikein_genome K12-MG1655 --max_memory "120.GB" --max_cpus 20 -resume 13659c93-98fb-41b6-a4b6-28729f6b59cd
See error:
Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRIP (H2BK15ac_Dox_s03_R1)'

Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:CALCULATE_FRIP (<group>) terminated with an error exit status (1)

Command executed:

frip.py
--bams ".bam"
--peaks ".bed"
--threads 12
--outpath .

python --version | grep -E -o "([0-9]{1,}.)+[0-9]{1,}" > python.version.txt

Command exit status:
1

Command output:
Calculating .target.markdup.bam using .peaks.bed.stringent.bed

Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-94b1cu4t because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File "/home/peter/.nextflow/assets/nf-core/cutandrun/bin/frip.py", line 44, in
reads_at_peaks = cr.run()
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 356, in run
imap_res = mapReduce.mapReduce([],
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptools/mapReduce.py", line 85, in mapReduce
bed_interval_tree = GTF(bedFile, defaultGroup=defaultGroup, transcriptID=transcriptID, exonID=exonID, transcript_id_designator=transcript_id_designator, keepExons=keepExons)
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptoolsintervals/parse.py", line 591, in init
ftype = self.inferType(fp, line, labelColumn)
File "/opt/conda/envs/packages/lib/python3.8/site-packages/deeptoolsintervals/parse.py", line 166, in inferType
raise RuntimeError('{0} does not seem to be a recognized file type!'.format(self.filename))
RuntimeError: .peaks.bed.stringent.bed does not seem to be a recognized file type!]

Expected behaviour

CALCULATE_FRIP on empty .bed files should return 0 instead of an error.

Log files

Have you provided the following extra information/files:

[x ] The command used to run the pipeline
The .nextflow.log file (sample names are confidential)

System

Hardware: HPC
Executor: local
OS: Debian
Version 4.9.88-1+deb9u1 (2018-05-07) x86_64

Nextflow Installation

Version: 21.10.0.5640

Container engine

Engine: Docker
version: Docker version 18.09.0, build 4d60db4

Add pipeline crick config

Description of feature

see https://github.com/nf-core/configs/tree/master/conf/pipeline/rnaseq

Auto-disable upset plots above a certain sample number

Description of the bug

Warn then disable rather than error

Command used and terminal output

No response

Relevant files

No response

System information

No response

How to set sample sheet without IgG control?

Description of the bug

I have tried rerunning the updated cutandrun workflow (v2) using the following sample sheet, based on prior recommendations for skipping IgG controls (set the control to '1' and --igg_control to false):

group,replicate,fastq_1,fastq_2,control
1_1h_1,1,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_1_GCAGAATT-TGGCCGGT_L001_R1_001.fastq.gz,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_1_GCAGAATT-TGGCCGGT_L001_R2_001.fastq.gz,1
1_1h_2,2,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_2_CTCTCGTC-AGGTTATA_L001_R1_001.fastq.gz,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_2_CTCTCGTC-AGGTTATA_L001_R2_001.fastq.gz,1
1_1h_Heat_1,1,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_Heat_1_GCTTGTCA-GTATGTTC_L001_R1_001.fastq.gz,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_Heat_1_GCTTGTCA-GTATGTTC_L001_R2_001.fastq.gz,1
1_1h_Heat_2,2,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_Heat_2_AATCCGGA-AACTGTAG_L001_R1_001.fastq.gz,/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/raw-seq/1_1h_Heat_2_AATCCGGA-AACTGTAG_L001_R2_001.fastq.gz,1

and the following config:

params {
     input = "samplesheet.small.csv"
     save_reference = true
     igg_control = false
     use_control = false
     fasta = "/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/reference/GRCh38_flu_RSV.fa"
     gtf = "/home/groups/hpcbio/references/nextflow/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
     gene_bed = "/home/groups/hpcbio/references/nextflow/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
     blacklist = "/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/reference/ENCFF356LFX.bed"
     dedup_target_reads = false
     email = "[email protected]"
     spikein_fasta = "/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/data/reference/GCF_000146045.2_R64_genomic.fna"
     custom_config_base = "/home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/src/configs"
     peakcaller="macs2,seacr"
     replicate_threshold=2
     save_spikein_aligned=true
     skip_upset_plots=false
     skip_reporting=false
}

Anything I'm doing wrong?

Command used and terminal output

Run as follows:


$ nextflow run nf-core/cutandrun -r 2.0 -c cutnrun-yeast.conf -profile uiuc_hpcbio --custom_config_base /home/groups/hpcbio/projects/src/configs -resume -qs 4 -with-report -with-trace

This is the error I'm seeing:

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.small.csv)'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.small.csv)` terminated with an error exit status (1)

Command executed:

  check_samplesheet.py samplesheet.small.csv samplesheet.valid.csv false

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK":
      python: $(python --version | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}")
  END_VERSIONS

Command exit status:
  1

Command output:
  ERROR: Please check samplesheet -> Each control entry must match at least one group entry! Unmatched control entry: 1.

Command error:

Work dir:
  /home/groups/hpcbio/projects/zhao5/2021-June-CutAndRun/results/2022-06-14-CUTNRUNv2/work/40/f81d18706ad65dff43bb65d3c12ef9

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line



### Relevant files

[nextflow.log](https://github.com/nf-core/cutandrun/files/8905814/nextflow.log)


### System information

CentOS 7 using SIngularity 3.8.1, SLURM on local HPC.  Nextflow 22.04.3

nextflow run nf-core/cutandrun error: Nextflow version 20.10.0 does not match workflow required version: >=20.11.0-edge

nextflow run nf-core/cutandrun --input samplesheet.csv -profile docker --genome GRCm38

N E X T F L O W ~ version 20.10.0
Launching `nf-core/cutandrun` [tiny_agnesi] - revision: `4531603` [master]

nf-core/cutandrun v1.0dev

Core Nextflow options
revision : master
runName : tiny_agnesi
containerEngine: docker
launchDir : /cut_tag/test
workDir : /cut_tag/test/work
projectDir : /home/.nextflow/assets/nf-core/cutandrun
userName : lee
profile : docker
configFiles : /home/.nextflow/assets/nf-core/cutandrun/nextflow.config

Input/output options
input : samplesheet.csv

Reference genome options
genome : GRCm38
fasta : s3://ngi-igenomes/igenomes//Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa

If you use nf-core/cutandrun for your analysis please cite:

The pipeline
https://doi.org/10.5281/zenodo.1400710
The nf-core framework
https://dx.doi.org/10.1038/s41587-020-0439-x
https://rdcu.be/b1GjZ
Software dependencies
https://github.com/nf-core/cutandrun/blob/master/CITATIONS.md

Nextflow version 20.10.0 does not match workflow required version: >=20.11.0-edge

Broad peak calling and DESeq2 for comparison of called peaks between conditions

Description of feature

Two enhancements suggested:

add SEACR "relaxed" flag as an option and broadpeak for macs2
the nf-core chipseq pipeline uses DESeq2 to compare peaks between different conditions. I suggest adding this feature to the cutandrun pipeline.

Do not define "null" string as a default value at nextflow_schema.json

Description

Some fields at nextflow_schema.json file define default values like an string ("default": "null") this will be a problem in the upcoming version of tower.nf. The "null" string will be set at the launchpad form and send to Nextflow when launching the pipeline. Finally the run will fail because Nextflow will interpret it as a string and not as an empty parameter.

Solution

Aligned with the discussion here about enforcing stricter rules for initialising params with no default value, I suggest to just set this fields to null at nextflow.config file and remove the default setting from the schema file. This will be compatible with the future tower.nf release.

How should I fill the samplesheet if I do not have IGG control

Description of the bug

Hello,

I have a question regarding the way to fill the sample sheet required in order to run the pipeline. Indeed, unfortunately, I do not have IGG sample for normalization so to begin I did not add the column control_group in the sample sheet and I get an error message when I have tried to run the pipeline because the sample didn't have enough columns (even if I put --igg_control false in my command line). Then, I have tried to add the column control_group but without filling it (that is just put ,, fo this column) and I get the following message error (even if I put --igg_control false in my command line) :

Execution cancelled -- Finishing pending tasks before exit

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)'

Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv) terminated with an error exit status (1)

Command executed:

check_samplesheet.py design.csv samplesheet.valid.csv false

Command exit status:
1

Command output:
ERROR: Please check samplesheet -> Invalid number of columns (minimum = 5)!
Line: ''

How should I fill the samplesheet if I do not have IGG control to avoid this error.

Thank you

Command used and terminal output

nextflow run nf-core/cutandrun \
-profile singularity \
--input $pathdesign \
--genome GRCh38 \
--igg_control false \
--peak_threshold 0.05 \
--outdir $pathresults \
-resume \
-r 1.0.0

Relevant files

Execution cancelled -- Finishing pending tasks before exit

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)'

Caused by:
Process NFCORE_CUTANDRUN:CUTANDRUN:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv) terminated with an error exit status (1)

Command executed:

check_samplesheet.py design.csv samplesheet.valid.csv false

Command exit status:
1

Command output:
ERROR: Please check samplesheet -> Invalid number of columns (minimum = 5)!
Line: ''

System information

nextflow/21.04.3
HPC
slurm
Singularity
Linux
1.0.0

Dots in sample ids breaks reporting

Found unexpected parameters in modules

Description of the bug

Calling the pipeline produces a lot of warnings regarding unexpected parameters to modules. Many of these parameters look crucial to the execution of the modules and I think the pipeline will not execute correctly because they look like command lines.

e.g.:

 samtools_frag_len:[args:-F 0x04, args2:awk -F'\t' 'function abs(x){return ((x < 0.0) ? -x : x)} {print abs($9)}' | sort -T '.' | uniq -c | awk -v OFS="\t" '{print $2, $1/2}', suffix:.frag_len, publish_dir:03_peak_calling/06_fragments]]

Command used and terminal output

nextflow run nf-core/cutandrun \
            --input samplesheet.csv \
            --genome GRCm38 \
            --multiqc_title title \
            --save_spikein_aligned \
            --spikein_fasta spike.fa \
            --igg_control true \
            -profile cbe

N E X T F L O W  ~  version 22.01.0-edge
Launching `nf-core/cutandrun` [admiring_plateau] - revision: c30a37fd57 [master]
WARN: Access to undefined parameter `fasta` -- Initialise it to a default value eg. `params.fasta = some_value`

WARN: Found unexpected parameters:
* --modules: [cat_fastq:[publish_dir:01_prealign/merged_fastq], fastqc:[args:--quiet, publish_dir:01_prealign/pretrim_fastqc], trimgalore:[args:, publish_dir:01_prealign/trimgalore, publish_files:[txt:, html:fastqc, zip:fastqc]], bowtie2_spikein_align:[args:--end-to-e
nd --very-sensitive --no-overlap --no-dovetail --no-mixed --no-discordant --phred33 -I 10 -X 700, suffix:.spikein, publish_files:false], samtools_spikein_sort:[suffix:.spikein.sorted, publish_files:false], bowtie2_align:[args:--end-to-end --very-sensitive --no-mixed -
-no-discordant --phred33 -I 10 -X 700, suffix:.target, publish_files:false], samtools_sort:[suffix:.target.sorted, publish_files:false], samtools_view_qfilter:[suffix:.target.filtered, publish_files:false], samtools_qfilter:[suffix:.target.filtered, publish_files:false], picard_markduplicates:[args:-ASSUME_SORT_ORDER coordinate -REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp, suffix:.target.markdup, publish_files:false], picard_markduplicates_samtools:[suffix:.target.markdup, publish_files:false], picard_dedup:[args:-ASSUME_SORT_ORDER coordinate -REMOVE_DUPLICATES true -VALIDATION_STRINGENCY LENIENT -TMP_DIR tmp, suffix:.target.dedup, publish_files:false], picard_dedup_samtools:[suffix:.target.dedup, publish_files:false], awk_dedup:[args:, suffix:.awk, publish_dir:false, command:'/^[^#]/{print}', command2: > int1.txt && head -2 int1.txt > int2.txt && sed 's/\t/,/g' int2.txt > int3.txt && sed 's/.*/\L&/g' int3.txt , publish_files:false], awk_bt2:[suffix:.target, publish_files:false], awk_bt2_spikein:[suffix:.spikein, publish_files:false], bedtools_genomecov_bedgraph:[publish_files:false], sort_bedgraph:[publish_dir:03_peak_calling/01_bam_to_bedgraph, suffix:.sorted], ucsc_bedclip:[suffix:.clipped, publish_dir:03_peak_calling/02_clip_bed], ucsc_bedgraphtobigwig:[publish_dir:03_peak_calling/03_bed_to_bigwig], seacr:[args:non stringent, suffix:.peaks.bed, publish_dir:03_peak_calling/04_called_peaks], awk_name_peak_bed:[command:'{OFS = "\t"} {print $0, FILENAME}', publish_files:false, ext:bed], sort_group_peaks:[args:-k1,1 -k2,2n, publish_files:false, ext:bed], bedtools_merge_groups:[args: -c 2,3,4,5,6,7,7 -o collapse,collapse,collapse,collapse,collapse,collapse,count_distinct, publish_dir:03_peak_calling/05_consensus_peaks, suffix:.consensus.peaks], awk_threshold:[publish_dir:03_peak_calling/05_consensus_peaks, suffix:.consensus.peaks.filtered, ext:bed], plot_peaks:[publish_dir:04_reporting], igv:[publish_dir:04_reporting/igv], dt_compute_mat_gene:[args:scale-regions --beforeRegionStartLength 3000 --regionBodyLength 5000 --afterRegionStartLength 3000 --skipZeros --missingDataAsZero, publish_dir:04_reporting/heatmaps/gene], dt_compute_mat_peaks:[args:reference-point -a 3000 -b 3000 --referencePoint center --skipZeros --missingDataAsZero, publish_dir:04_reporting/heatmaps/peaks], dt_plotheatmap_gene:[args:--sortUsing sum, publish_dir:04_reporting/heatmaps/gene], dt_plotheatmap_peaks:[args:--sortUsing sum --startLabel "Peak Start" --endLabel "Peak End" --xAxisLabel "" --regionsLabel "Peaks", publish_dir:04_reporting/heatmaps/peaks], awk_edit_peak_bed:[command:'{split($6, summit, ":"); split(summit[2], region, "-"); print summit[1]"\t"region[1]"\t"region[2]}', suffix:.max_signal, publish_files:false], export_meta:[publish_dir:04_reporting], calc_frip:[publish_files:false], meta_csv_frip_options:[publish_files:false], bedtools_intersect:[publish_files:false], calc_peak_repro_cut:[args:-f 1,2,3,6, suffix:.repro, ext:bed], calc_peak_repro:[publish_files:false], meta_csv_peak_repro_options:[publish_files:false], generate_reports:[publish_dir:04_reporting/qc], multiqc:[args:-v, publish_dir:04_reporting/multiqc], calc_frag_samtools:[suffix:.mapped, publish_files:false], calc_frag_samtools_view:[args:-F 0x04 -b, suffix:.mapped, publish_files:false], calc_frag_samtools_sort:[args:-n, suffix:.sorted, publish_files:false], calc_frag_bamtobed:[args:-bedpe, publish_files:false], calc_frag_awk:[suffix:.clean, ext:bed, command:'$1==$4 && $6-$2 < 1000 {print $0}', publish_files:false], calc_frag_cut:[args:-f 1,2,6, suffix:.frags, ext:bed, command:| sort -T '.' -k1,1 -k2,2n -k3,3n, publish_dir:03_peak_calling/06_fragments], awk_frag_bin:[args:-v w=500, suffix:.frags.bin500, ext:bed, publish_dir:03_peak_calling/06_fragments, command:'{print $1, int(($2 + $3)/(2*w))*w + w/2, FILENAME}', command2:| sort -T '.' -k1,1V -k2,2n | uniq -c | awk -v OFS="\t" '{print $2, $3, $1, $4}' | sort -T '.' -k1,1V -k2,2n], samtools_frag_len:[args:-F 0x04, args2:awk -F'\t' 'function abs(x){return ((x < 0.0) ? -x : x)} {print abs($9)}' | sort -T '.' | uniq -c | awk -v OFS="\t" '{print $2, $1/2}', suffix:.frag_len, publish_dir:03_peak_calling/06_fragments]]
- Ignore this warning: params.schema_ignore_params = "modules"



### Relevant files

_No response_

### System information

N E X T F L O W
      version 22.01.0-edge build 5656
      created 07-02-2022 11:08 UTC (12:08 CEST)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Re-work consensous peaks

Description of feature

Make sure you add tests!!

NFCORE_CUTANDRUN:CUTANDRUN:ANNOTATE_DEDUP_META:AWK sometimes failing with exit status 141

Description of the bug

In nf-core/cutandrun we have a custom local module awk.nf, which allows one to execute an awk command followed by a series of other linux commands. In the module we have:

awk $options.args $options.command $input $options.command2 > ${prefix}.awk.${ext}

Originally we have command = "'/^[^#]/{print}'" and command 2 = | head -2 | sed 's/\t/,/g' - | sed 's/.*/\L&/g' - >

So that the final command run would be something like

awk  '/^[^#]/{print}' igg_R1.markdup.MarkDuplicates.metrics.txt | head -2 | sed 's/\t/,/g' - | sed 's/.*/\L&/g' - > igg_R1.awk.awk.txt

However, in some instances this results in an error with exit status 141 - a piping failure.

Steps to reproduce

I ran up the docker container used for this module:

docker run -it -v "$PWD":/home/ biocontainers/biocontainers:v1.2.0_cv1 /bin/bash

and ran

$ awk  '/^[^#]/{print}' igg_R1.markdup.MarkDuplicates.metrics.txt | head -2 | sed 's/\t/,/g' - | sed 's/.*/\L&/g' - > igg_R1.awk.awk.txt
$ echo ${PIPESTATUS[@]}

The output produces the comma-separated file igg_R1.awk.awk.txt as expected. However, around 70% of the time the pipe status comes back as

0 0 0 0

and around 30% of the time, it comes back as

141 0 0 0

Indicating that the first pipe is failing with exit status 141, randomly, at a seemingly stable rate. I'm not sure why this is happening, but command and command2 have been replaced such that we have the following work around, which seems to be stable.

Fix

awk  '/^[^#]/{print}' igg_R1.markdup.MarkDuplicates.metrics.txt  > int1.txt && head -2 int1.txt > int2.txt && sed 's/\t/,/g' int2.txt > int3.txt && sed 's/.*/\L&/g' int3.txt  > igg_R1.awk.awk.txt

The intermediary files make this solution a little messy, but these files tend to be very small so it shouldn't be an issue.

System and Container engine

biocontainer v1.2.0_cv1 ran locally on macOS

Upgrade to SEACR 1.4

reporting.py error:ValueError: cannot reindex from a duplicate axis

Check Documentation

I have checked the following places for your error:

Description of the bug

I'm a beginner, and since I don't have root access to the server and no network, I chose to test the pipeline locally with a conda environment, and since there is no environment.yml like atac-seq. I had to install the appropriate packages myself according to the channel. Everything went smoothly until Process NFCORE_CUTANDRUN:CUTANDRUN:GENERATE_REPORTS

Steps to reproduce

Steps to reproduce the behaviour:

Command line:

      --meta meta_table.csv \
      --meta_ctrl meta_table_ctrl.csv \
      --raw_frag "*.frag_len.txt" \
      --bin_frag "*bin500.awk.bed" \
      --seacr_bed "*bed.*.bed" \
      --output . \
      --log log.txt
  
  if [ -f "03_03_frag_len_mqc.txt" ]; then
      cat frag_len_header.txt 03_03_frag_len_mqc.txt > frag_len_mqc.yaml
  fi
  
  python --version | grep -E -o "([0-9]{1,}\.)+[0-9]{1,}" > python.version.txt

See error:

  2022-01-04 20:16:02,898:gen_img:INFO - Generating plots to output folder
  Traceback (most recent call last):
    File "/gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/bin/reporting.py", line 72, in <module>
      parsed_args.func(parsed_args)
    File "/gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/bin/reporting.py", line 42, in gen_png
      report_gen.generate_cutandrun_reports(output_path)
    File "/gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 63, in generate_cutandrun_reports
      plots, data, mqc_frag_hist = self.generate_reports()
    File "/gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 271, in generate_reports
      plot4, data4 = self.fraglen_summary_histogram()
    File "/gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/bin/lib/reports.py", line 454, in fraglen_summary_histogram
      ax = sns.lineplot(data=self.frag_hist, x="Size", y="Occurrences", hue="group", style="replicate", palette = "magma")
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/seaborn/_decorators.py", line 46, in inner_f
      return f(**kwargs)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/seaborn/relational.py", line 710, in lineplot
      p.plot(ax, kwargs)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/seaborn/relational.py", line 471, in plot
      for sub_vars, sub_data in self.iter_data(grouping_vars, from_comp_data=True):
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/seaborn/_core.py", line 983, in iter_data
      data = self.comp_data
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/seaborn/_core.py", line 1057, in comp_data
      comp_col.loc[orig.index] = pd.to_numeric(axis.convert_units(orig))
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/indexing.py", line 692, in __setitem__
      iloc._setitem_with_indexer(indexer, value, self.name)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/indexing.py", line 1637, in _setitem_with_indexer
      self._setitem_single_block(indexer, value, name)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/indexing.py", line 1851, in _setitem_single_block
      value = self._align_series(indexer, Series(value))
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/indexing.py", line 1987, in _align_series
      ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/series.py", line 4345, in reindex
      return super().reindex(index=index, **kwargs)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/generic.py", line 4811, in reindex
      return self._reindex_axes(
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/generic.py", line 4832, in _reindex_axes
      obj = obj._reindex_with_indexers(
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/generic.py", line 4877, in _reindex_with_indexers
      new_data = new_data.reindex_indexer(
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1301, in reindex_indexer
      self.axes[axis]._can_reindex(indexer)
    File "/gpfs/home/zengchunhua/bin/Miniconda3/envs/nf-cut/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3477, in _can_reindex
      raise ValueError("cannot reindex from a duplicate axis")
  ValueError: cannot reindex from a duplicate axis

Expected behaviour

Log files

Have you provided the following extra information/files:

The command used to run the pipeline
nextflow run nf-core/cutandrun -r 1.0.0 -name cut_test -params-file nf-params.json
The .nextflow.log file

N E X T F L O W  ~  version 21.10.0
Launching `nf-core/cutandrun` [desperate_ekeblad] - revision: 5b9f4fad41 [1.0.0]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/cutandrun v1.0.0
------------------------------------------------------
Core Nextflow options
  revision     : 1.0.0
  runName      : desperate_ekeblad
  launchDir    : /gpfs/home/zengchunhua/cutandrun
  workDir      : /gpfs/home/zengchunhua/cutandrun/work
  projectDir   : /gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun
  userName     : zengchunhua
  profile      : standard
  configFiles  : /gpfs/home/zengchunhua/.nextflow/assets/nf-core/cutandrun/nextflow.config

Input/output options
  input        : GSE145187.csv

Reference data options
  gtf          : hg38.ncbiRefSeq.gtf
  spikein_fasta: E.coli_K12_MG1655.fa
  fasta        : hg38.fa
  igenomes_base: s3://ngi-igenomes/igenomes/

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/cutandrun for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/cutandrun/blob/master/CITATIONS.md
------------------------------------------------------
WARN: =============================================================================
  No genome blacklist file specified, switching to dummy empty file...
===================================================================================


executor >  local (36)
[50/21a1d3] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[07/64bd79] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[7e/ad17f8] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[43/b8ecee] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[e9/0fb1e5] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[-        ] process > NFCORE_CUTANDRUN:CUTANDRUN:... -
[10/5305ee] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[1a/f7dc1c] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[dc/b7abdb] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[d3/04f80f] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[29/ec95bb] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[17/7501b9] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[dc/ac8320] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[88/c28cb7] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[96/2c1b12] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[5f/228d15] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[d8/03e2a9] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[49/6b3651] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[0e/c34dda] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[24/3cfa9c] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[89/23b003] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[05/4301a9] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[a8/a2e1b4] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 2 ✔
[58/ca0934] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 2 ✔
[07/ad9048] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 2 ✔
[e3/c18948] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 2 of 2, cached: 2 ✔
[df/8def4b] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 4 ✔
[50/58ba92] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 1 ✔
[30/de8e72] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 1 ✔
[52/ac8bfb] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 1 ✔
[ca/4ee9aa] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[27/3ba016] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[da/0f4063] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[fa/9f3c9f] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[cd/653326] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[a3/0e6d16] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[97/3edf1c] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[29/36dfd8] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 4 of 4, cached: 4 ✔
[64/69ee9c] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 4 of 4, cached: 4 ✔
[65/9bcd5f] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[fb/848416] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1 ✔
[eb/fd0e62] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1 ✔
[91/96ee6a] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1 ✔
[6c/3d0295] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 2 of 2, cached: 2 ✔
[00/a28cdb] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 2 of 2, cached: 2 ✔
[28/814671] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 2 of 2, cached: 2 ✔
[e9/41cd09] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[04/3333e8] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[84/fd8554] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[80/435de6] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[75/16ff30] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[57/99451a] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[09/5d3b6b] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 6 ✔
[fa/b08584] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 6 of 6, cached: 4 ✔
[ac/78c2e9] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[68/890f9a] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 4 of 4, cached: 3 ✔
[fb/49eacb] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 4 of 4, cached: 4 ✔
[70/39494e] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 4 of 4, cached: 4 ✔
[1c/55709e] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[ab/c4c3c3] process > NFCORE_CUTANDRUN:CUTANDRUN:... [100%] 1 of 1, cached: 1 ✔
[b5/c44669] process > NFCORE_CUTANDRUN:CUTANDRUN:... [  0%] 0 of 1
[-        ] process > NFCORE_CUTANDRUN:CUTANDRUN:... -
[-        ] process > NFCORE_CUTANDRUN:CUTANDRUN:... -
Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:GENERATE_REPORTS'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:GENERATE_REPORTS` terminated with an error exit status (1)

System

Hardware: HPC
Executor: PBS local
OS: Red Hat
Version 7.5

Nextflow Installation

Version: 21.10.0.5640

Container engine

Engine: Conda
version: 4.11.0

Additional context

The PBS task node does not have a network, I wrote environment.yml based on the pipeline requirements

channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python=3.8.3
  - numpy
  - pandas=1.2.5
  - seaborn
  - pyranges
  - pysam
  - dask
  - deeptools
  - fastqc
  - trim-galore
  - bowtie2
  - samtools
  - picard
  - bamtools
  - bedtools
  - ucsc-bedgraphtobigwig
  - deeptools
  - multiqc
  - igv
  - pip
  - seacr
  - ucsc-bedclip
  - upsetplot

BOWTIE2_ALIGN step failed

Description of the bug

I had a problem running the pipeline. The test command nextflow run nf-core/cutandrun -profile test,docker works perfectly fine on my machine, but when I run the pipeline with my data, the alignment step failed with the following error message [main_samview] fail to read the header from "-" (Detailed message is showed in the section below).

It seems to me that the code below failed, so I tried to run the command manually on my machine with bowtie2 v2.4.5 and samtools v1.14 and it completed with no error.

bowtie2 \
      -x $INDEX \
      -1 igg_R1_1.trimmed.fastq.gz \
      -2 igg_R1_2.trimmed.fastq.gz \
      --threads 1 \
       \
      --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 \
      2> igg_R1.target.bowtie2.log \
      | samtools view -@ 1  -bhS -o igg_R1.target.bam -

I tried to find out the version of Samtools that the pipeline used but failed to do so. I was using the Docker image pulled from https://hub.docker.com/r/nfcore/cutandrun/ by running docker pull nfcore/cutandrun:dev.

The data I used are DE_FoxA2_(20180205_MM_HsSc_0122MPM151) and DE_IgG_(20180205_MM_HsSc_0122MPM153) from GSE126612.

I am not quite sure why this doesn't work in the pipeline. Could you please help me with this? Thanks very much!

Command used and terminal output

$ nextflow run nf-core/cutandrun -r 1.1 -name DE_FoxA2_test_2 -profile docker -work-dir path_to_work_dir -params-file nf-params.json

The pipeline report:

----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~\
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/cutandrun v1.1.0
----------------------------------------------------

Run Name: DE_FoxA2_test_2

####################################################
## nf-core/cutandrun execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: 1.
The full error message was:

Error executing process > 'NFCORE_CUTANDRUN:CUTANDRUN:ALIGN_BOWTIE2:BOWTIE2_ALIGN (igg_R1)'

Caused by:
  Process `NFCORE_CUTANDRUN:CUTANDRUN:ALIGN_BOWTIE2:BOWTIE2_ALIGN (igg_R1)` terminated with an error exit status (1)

Command executed:

  INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'`
  bowtie2 \
      -x $INDEX \
      -1 igg_R1_1.trimmed.fastq.gz \
      -2 igg_R1_2.trimmed.fastq.gz \
      --threads 1 \
       \
      --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 \
      2> igg_R1.target.bowtie2.log \
      | samtools view -@ 1  -bhS -o igg_R1.target.bam -
  
  if [ -f igg_R1.target.unmapped.fastq.1.gz ]; then
      mv igg_R1.target.unmapped.fastq.1.gz igg_R1.target.unmapped_1.fastq.gz
  fi
  if [ -f igg_R1.target.unmapped.fastq.2.gz ]; then
      mv igg_R1.target.unmapped.fastq.2.gz igg_R1.target.unmapped_2.fastq.gz
  fi
  
  cat <<-END_VERSIONS > versions.yml
  BOWTIE2_ALIGN:
      bowtie2: $(echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//')
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
      pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  [main_samview] fail to read the header from "-".

Relevant files

nextflow.log

System information

Nextflow version: 22.04.1
local
Docker
macOS Big Sur v11.6.4
nf-core/cutandrun version: 1.1.0

New feature: nucleosome repeat length and spacing

As discussed on slack with Chris Cheshire, sequencing cutandrun so deep (>15M reads) gave us nucleosome positioning info on the K4me3 samples. Would be useful to add a nucleosome spacing output to your pipeline. I suggest this algorithm https://github.com/tommyjohn21/nrl_finder. Recently used in a paper published by my colleagues: https://www.nature.com/articles/s41586-020-3032-z

Optimise process labels, especially for single core python processes

Description of feature

Optimise process labels, especially for single core python processes - FRIP