Coder Social home page Coder Social logo

Enable multi-lane support about exoseq HOT 4 OPEN

nf-core avatar nf-core commented on June 2, 2024
Enable multi-lane support

from exoseq.

Comments (4)

andreas-wilm avatar andreas-wilm commented on June 2, 2024 1

Thanks @vsmalladi and @maxulysse for the references!

from exoseq.

andreas-wilm avatar andreas-wilm commented on June 2, 2024

Hi Marco,

I think this is important (more so for WGS samples). We use an approach where an input yaml file is created, that consists of a samples dictionary, with so called readunits as its members. The only mandatory readunit key is fq1 (fastq R1). The others are fq2, run_id, lane_id, library_id, flowcell_id. This allows to also construct a read group based on these values.

The input channel is then set up as follows:

def GetReadPair = { sk, rk ->
    tuple(file(params.samples[sk].readunits[rk]['fq1']),
          file(params.samples[sk].readunits[rk]['fq2']))
}

def GetReadUnitKeys = { sk ->
    params.samples[sk].readunits.keySet()
}

Channel
    .from(sample_keys)
    .map { sk -> tuple(sk, GetReadUnitKeys(sk).collect{GetReadPair(sk, it)}.flatten()) }
    .set { fastq_ch }

There might be more elegant ways. Probably makes sense to discuss this on Gitter...

from exoseq.

maxulysse avatar maxulysse commented on June 2, 2024

In Sarek, we have one BWA-mem | samtools sort process
cf: https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L166-L187
Then we're grouping samples by the read groups:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L199-L205
And then merging the BAMs:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L207-L222

from exoseq.

vsmalladi avatar vsmalladi commented on June 2, 2024

For read pairs we do this:

// Define channel for raw reads
if (pairedEnd) {
  rawReads = designFilePaths
    .splitCsv(sep: '\t', header: true)
    .map { row -> [ row.sample_id, [row.fastq_read1, row.fastq_read2], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
} else {
rawReads = designFilePaths
  .splitCsv(sep: '\t', header: true)
  .map { row -> [ row.sample_id, [row.fastq_read1], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
}

from exoseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.