ge's People
ge's Issues
singleton
- flip rate vs MAF, the purpose of it is to see how the flips are affected by singleton, doubleton, and so on.
- remove singletons.
Flip rate
Merge all samples to a single file - chr22
Diary
GEL sequencing data SHAPEIT4 testing
liftover 10292 b37 samples
- The number is based on the data available on 2018-10-07, rare disease, cancer germline cohort, qc_passed, genome_build!=NA.
- Combining with 38874 build 38 files, it will make a release of 49166 samples.
full vanilla shapeit4 re-test
The switch error results differ in two dataset.
I applied switch error rate checking on all ~7000 trios in ~/meta/trios_for_bcftools_mendelian.ped in ~/data/phasing_test/shapeit4_full_all_snp_wrong/
- directely on gwas.phased.vcf.gz
- only 100 trios child_phased.vcf.gz
Issues:
- only 3625 trios are calculated, what are the missing trios? why?
- I am expecting the same number of testing sites, switch rate, and Medelian errors for the 100 trios comparing both data 1) and data 2). But it turned not to be the case. Why?? -- I need to check the sites to see where it differs first.
- switch error rate in 1) ~1.5% 2) 0.5%
Phasing - should divide the chunk by Mb instead of number of sites.
Shorter chromosomes may need larger chunk size. We have to make sure that each chunk is sufficiently large.
Why so many flips?
- sanity check:
-
tabulate call parents child genotype combination, fit all the flips into all the valid entries. The point being not to have abnormality in any of the entries.
-
flips against AF
-
Simon said: probability to flips against frequency in panel.
-
Check the gelsnp dataset, because the problem can be caused by undersized chunk size. Too few het sites.
VCF subsetting results empty.
There is a large number of VCF outputs are empty - size: 203Kb.
My first guess is there are some error occured.
Second guess is the chr22 instead of 22 issue.
- pick up an empty file and redo the subsetting.
- make a script for chr22 style files.
shapeit4_full_allsnp (28792) missing one sample
- sample LP3000448-DNA_A09 has been missed.
multi-allelic using bcftools norm
This is potentially problematic. In the current version of merged data, we removed all multi-allelic data.
Did we keep the one of the allele of the multi-allelic site?
No, I can confirm that all multi-allelic sites are removed in the latest merged file /data/pipeline2018/run_0001/merge/chr20/merged_m2M2_snps_nohemi_minac1.vcf.gz
High mendelean error
We observed high Mendel error and low switch error rate at trio: LP3000134-dna_c04.
Why?
flip switch rate above 50%?
I obtained some flip rate (number of flips/number of switches) above 50%
Can it happen? Need to investigate.
SHAPEIT2 worse accuracy than SHAPEIT4?
Are we sure of this? It was not the case when we did ASHG slide.
Double check.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.