Some sequencing setups will split libraries across lanes. This is currently not modele

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

In Sarek, we have one BWA-mem | samtools sort process cf: <a href="https://github.

For read pairs we do this: <div class="highlight highlight-source-nextflow notrans

Enable multi-lane support about exoseq HOT 4 OPEN

nf-core commented on June 2, 2024

Enable multi-lane support

from exoseq.

Comments (4)

andreas-wilm commented on June 2, 2024 1

Thanks @vsmalladi and @maxulysse for the references!

from exoseq.

andreas-wilm commented on June 2, 2024

Hi Marco,

I think this is important (more so for WGS samples). We use an approach where an input yaml file is created, that consists of a samples dictionary, with so called readunits as its members. The only mandatory readunit key is fq1 (fastq R1). The others are fq2, run_id, lane_id, library_id, flowcell_id. This allows to also construct a read group based on these values.

The input channel is then set up as follows:

def GetReadPair = { sk, rk ->
    tuple(file(params.samples[sk].readunits[rk]['fq1']),
          file(params.samples[sk].readunits[rk]['fq2']))
}

def GetReadUnitKeys = { sk ->
    params.samples[sk].readunits.keySet()
}

Channel
    .from(sample_keys)
    .map { sk -> tuple(sk, GetReadUnitKeys(sk).collect{GetReadPair(sk, it)}.flatten()) }
    .set { fastq_ch }

There might be more elegant ways. Probably makes sense to discuss this on Gitter...

from exoseq.

maxulysse commented on June 2, 2024

In Sarek, we have one BWA-mem | samtools sort process
cf: https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L166-L187
Then we're grouping samples by the read groups:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L199-L205
And then merging the BAMs:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L207-L222

from exoseq.

vsmalladi commented on June 2, 2024

For read pairs we do this:

// Define channel for raw reads
if (pairedEnd) {
  rawReads = designFilePaths
    .splitCsv(sep: '\t', header: true)
    .map { row -> [ row.sample_id, [row.fastq_read1, row.fastq_read2], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
} else {
rawReads = designFilePaths
  .splitCsv(sep: '\t', header: true)
  .map { row -> [ row.sample_id, [row.fastq_read1], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
}