Comments (4)
Thanks @vsmalladi and @maxulysse for the references!
from exoseq.
Hi Marco,
I think this is important (more so for WGS samples). We use an approach where an input yaml file is created, that consists of a samples
dictionary, with so called readunits
as its members. The only mandatory readunit key is fq1
(fastq R1). The others are fq2
, run_id
, lane_id
, library_id
, flowcell_id
. This allows to also construct a read group based on these values.
The input channel is then set up as follows:
def GetReadPair = { sk, rk ->
tuple(file(params.samples[sk].readunits[rk]['fq1']),
file(params.samples[sk].readunits[rk]['fq2']))
}
def GetReadUnitKeys = { sk ->
params.samples[sk].readunits.keySet()
}
Channel
.from(sample_keys)
.map { sk -> tuple(sk, GetReadUnitKeys(sk).collect{GetReadPair(sk, it)}.flatten()) }
.set { fastq_ch }
There might be more elegant ways. Probably makes sense to discuss this on Gitter...
from exoseq.
In Sarek, we have one BWA-mem | samtools sort process
cf: https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L166-L187
Then we're grouping samples by the read groups:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L199-L205
And then merging the BAMs:
https://github.com/SciLifeLab/Sarek/blob/master/main.nf#L207-L222
from exoseq.
For read pairs we do this:
// Define channel for raw reads
if (pairedEnd) {
rawReads = designFilePaths
.splitCsv(sep: '\t', header: true)
.map { row -> [ row.sample_id, [row.fastq_read1, row.fastq_read2], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
} else {
rawReads = designFilePaths
.splitCsv(sep: '\t', header: true)
.map { row -> [ row.sample_id, [row.fastq_read1], row.experiment_id, row.biosample, row.factor, row.treatment, row.replicate, row.control_id ] }
}
from exoseq.
Related Issues (20)
- Evaluate / Integrate Googles DeepVariant Caller? HOT 2
- Make CRAM default alignment format HOT 2
- Move FastQC into trimGalore HOT 2
- Merge BWA and Samtools steps HOT 1
- Remove support for genome processing HOT 2
- Switch to a CSV/TSV based input HOT 6
- Add HSMetrics
- Split MultiQC output into fastq, library and sample
- Configuration - nest assemblies and references HOT 1
- Parallelization for Haplotypecaller HOT 3
- Add a DOI on first release
- Add AWSBatch Profiles
- Get rid of $processName stuff and use withname syntax HOT 1
- Include Switch to skip GenotypeGVCFs ?
- Singularity Support
- Update Dependencies
- Prepare for nf-core sync
- Difference of pipelines HOT 2
- Joint Discovery merge with main script HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from exoseq.