Comments (4)
from gwa_tutorial.
from gwa_tutorial.
from gwa_tutorial.
In general, the best way to convert from VCF to plink is to split multi-allelic sites, left align/normalize, give unique IDs, and then convert. This is described here: http://apol1.blogspot.com/2014/11/best-practice-for-converting-vcf-files.html
The command is:
bcftools norm -Ou -m -any input.vcf.gz |
bcftools norm -Ou -f human_g1k_v37.fasta |
bcftools annotate -Ob -x ID \
-I +'%CHROM:%POS:%REF:%ALT' |
plink --bcf /dev/stdin \
--keep-allele-order \
--vcf-idspace-to _ \
--const-fid \
--allow-extra-chr 0 \
--split-x b37 no-fail \
--make-bed \
--out output
This requires bcftools 1.9 and Plink 1.9 or 2.0 (is still alpha as of Aug 6th, 2019), it also requires you to have the reference genome in fasta format (.fa or .fasta).
Setting the IDs this way also lets you keep track of which is ref vs. alt.
rs numbers as IDs are kind of wonky in general despite being a very common practice, please see the discussion freeseek references in his/her blog post (http://annovar.openbioinformatics.org/en/latest/articles/dbSNP/) written by the author of Annovar.
Normalization is well described here: https://genome.sph.umich.edu/wiki/Variant_Normalization
It's pretty cool that bcftools can be piped that way, but sometimes you want to keep the intermediate files, the tee utility will let you save these, it writes what it gets on standard in to a file while also to standard out for the next pipe, e.g.
echo "Testing, 1, 2, 3..." | tee ./test_file.txt | grep Test
Will both show the output of grep on screen and write "Testing, 1, 2, 3..." to the file test_file.txt.
from gwa_tutorial.
Related Issues (20)
- Miss MDS_merge2.mds file HOT 1
- Quantitative phenotypes QC
- Any plan to have an update to GWA_tutorial? HOT 1
- data for 0.2_low_call_rate_pihat.txt HOT 2
- How to Create HapMap_3_r3 ( bim, bed and fam )
- problem solved: download site of qqman
- Permutation step, Error? or My fault?
- ftp vs https in downloading 1000 genome vcf file in Step 2_Population_stratification HOT 1
- Error: 'legend' is of length 0 HOT 3
- Failed to open snp_1_22.txt
- I have failed convert vcf to Plink(v1.07) format
- how to add the kinship matirx HOT 1
- why individual 13291 NA07045 in 0.2_low_call_rate_pihat.txt HOT 1
- Some questions about qqplots
- Failed to select autosomal SNPs only (QC Step 3)
- .vcf has fewer tokens than expected error HOT 8
- Setting Reference Genome
- The system cannot find the path specified.
- How to confirm which sub has the lower call rate of each pair that I should remove
- Could you please tell me how to "inversion.txt" in the 1_QC_GWAS folder HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gwa_tutorial.