Comments (2)
Hi Ilias,
If I understand correctly, after running this one sample through Strelka and LoFreq, it had no variants called? Have you performed any QC on this sample to see what might be causing this?
A workaround is to run MuTect2 separately from the SLMS-3 pipeline without specifying the candidate VCF file. However, if Strelka and LoFreq returned empty VCF files, it's unlikely that MuTect2 will find anything either. If this one sample is low quality, it should probably be excluded from your analyses anyway.
If you really want to try to fix this, you could use Python error handling to catch the empty file (example) in the checkpoint, and only use it for making the list of chromosomes if it's not empty. Something like:
checkpoint _mutect2_input_chrs:
input:
candidate_positions = CFG["inputs"]["candidate_positions"] if CFG["inputs"]["candidate_positions"] else str(rules._mutect2_dummy_positions.output),
chrs = reference_files("genomes/{genome_build}/genome_fasta/main_chromosomes.txt"),
capture_arg = _mutect_get_capspace
output:
chrs = CFG["dirs"]["inputs"] + "chroms/{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}/mutated_chromosomes.txt"
run:
# obtain list of main chromosomes
main_chrs = pd.read_csv(input.chrs, comment='#', sep='\t', header=None)
main_chrs = main_chrs.iloc[:, 0].astype(str).unique().tolist()
#obtain list of chromosomes in candidate positions
try:
candidate_chrs = pd.read_csv(input.candidate_positions, comment='#', sep='\t')
candidate_chrs = candidate_chrs.iloc[:, 0].astype(str).unique().tolist()
# If the candidate_chrs csv is empty, skip it and initialize candidate_chrs as an empty list
except pd.errors.EmptyDataError:
print('Note: filename.csv was empty. Skipping.')
candidate_chrs = list()
continue
# obtain list of chromosomes in the capture space
interval_chrs = pd.read_csv(input.capture_arg, comment='@', sep='\t')
interval_chrs = interval_chrs.iloc[:, 0].astype(str).unique().tolist()
# intersect the three lists to obtain chromosomes present in all
intersect_chrs = list(set(main_chrs) & set(candidate_chrs) & set(interval_chrs))
# convert list to single-column df
intersect_chrs = pd.DataFrame(intersect_chrs).sort_values(0)
# write out the file with mutated chromosomes
intersect_chrs.to_csv(output.chrs, index=False, header=False)
I probably won't implement this as a permanent solution in the pipeline, however, because this will mean that some samples are handled in SLMS-3 differently, and in reality anything that has zero variants returned from Strelka and LoFreq probably needs to be dropped from the analysis.
Hope this helps!
Laura
from lcr-modules.
Hi Laura,
Thank you very much for your response, I appreciate it! You are absolutely right that this will only happen on a problematic sample and should be quite rare. My thinking was to avoid manual intervention during a run, but adding workarounds and silently failing can be dengerous too, so keeping it as is makes a lot of sense!
Thanks again,
Ilias
from lcr-modules.
Related Issues (20)
- switch to conda installation of oncopipe HOT 6
- Autorun basic tests on thanos when master updated
- Liftover issue regarding seq_type HOT 1
- SAGE doesn't handle `chr` prefixes correctly HOT 9
- StringTie HOT 1
- Oncopipe mis-handling unmatched normal IDs HOT 1
- Add unmatched samples to demo data HOT 1
- Update documentation HOT 1
- Documenting what each tool accomplishes
- Default unmatched normal bug HOT 2
- Harmonize all CNV caller modules and automatically convert seg files to projections HOT 2
- in_module is missing from some output rules
- Refactor the LymphGen module to run per-sample HOT 1
- Missing lymphgen script HOT 4
- finalizing seg files have an issue with special genome builds HOT 3
- Ecotyper module needs wget conda environment HOT 2
- QC module does not report the per-chromosome coverage HOT 1
- Lofreq- no INDEL calling HOT 2
- ancient flag on all samples tsv prevents rerun of gistic2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lcr-modules.