Comments (5)
Hi,
Can you provide additional information on your project (i.e. average number of indels per sample)? Can you also provide the list of parameters that you are using to run the simulator?
Best
from sigprofilerclusters.
Hi,
Average number of indels per sample is about 36000 (from 50 to 345000).
Parameters for simulator:
sigSim.SigProfilerSimulator(project=project, project_path=project_path, genome=GRCh38, contexts=ID, exome=False, simulations=100,
updating=False, bed_file=None, overlap=False, gender=female, seqInfo=False, chrom_based=True, seed_file=None,
spacing=1, noisePoisson=False, noiseUniform=0, cushion=100, region=None, vcf=False, mask=None)
Best
from sigprofilerclusters.
Hi,
We have not benchmarked against samples with this high of indel TMB, so the simulator is likely struggling to complete many of the high TMB samples. Within PCAWG, we have on average ~1,400 indels per sample with maximum of ~137,000, which should run through without any problems. However, if you also have a large number of samples, I would recommend batching your project into multiple jobs. With projects containing more than 500 samples, we typically simulate across batches to manually speed up the process.
Best
from sigprofilerclusters.
Hi,
Thanks for your suggestion.
You mean I could batching my project into multiple simulator jobs firstly, then run cluster separately or use the merged simulator results to run cluster?
Best
from sigprofilerclusters.
Correct, we typically:
1) Separate the samples into smaller project sizes.
2) Run simulator on each batch
3) Run clusters on each batch.
This should help speed up the process and/or locate problematic samples easier.
Best
from sigprofilerclusters.
Related Issues (18)
- some issues when running SigProfilerClusters? HOT 1
- Problem with MNV records in Mutect2 VCFs HOT 3
- Is txt input eligible? HOT 9
- rainfallPlots is missing. HOT 4
- How to tell the Clusters to use the VAF in MAF input? HOT 1
- UnboundLocalError: local variable 'matrix_file_suffix' referenced before assignment HOT 2
- issue with using CCFs HOT 6
- Exome argument missing from the wiki documentation HOT 3
- Unable to cluster mutations HOT 6
- how could the IMD cutoff be evaluated as value 1 HOT 2
- imds_corrected when correction=True and chrom_based=True HOT 1
- column to extract VAF when standardVC HOT 1
- RuntimeError: process associated HOT 2
- Error: There are no simulated data present for this project. HOT 1
- No such file or directory: '.output/simulations/data/imds.pickle' HOT 1
- Program always exits with no logging or error message HOT 3
- FileNotFoundError: [Errno 2] No such file or directory: './Allen_Pat02_ID/output/vcf_files_corrected/cancer_clustered/INDEL/output/ID/cancer_clustered.ID83.all' HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sigprofilerclusters.