Genomic predictors and Annotation predictors
Genomic predictors | variant to gene | 变异的基因注释
本仓库为以上教程的代码实现
general input: a dataframe, rowname is SNP rsid, rsid column is rsid
general output: add annotation to the columns one by one, like nGene, eGene and cGene
- this is the most important part
- the functions are very simple
- get_nGene(), Nearby (nGene)
- get_eGene(), eQTL (eGene)
- get_cGene(), Conformation (cGene)
- get_fGene(), Function (fGene)
- get_pGene(), Phenotype (pGene)
- get_dGene(), Disease (dGene)
- extend_LD(), get all high LD (R2>0.8) SNPs
input:
- rsid df, a dataframe, rowname is SNP rsid, rsid column is rsid
data preparation:
- go to VEP website: https://asia.ensembl.org/Tools/VEP
- get the VEP annotation result, example: VEP/all.reported.snp.LD.merged_VEP.txt
function pipeline:
- simplify columns
- rename categories from VEP
- remove duplicate according to priority
output:
- unique function annotation of each SNP - a dataframe
- nGene_type: Splicing
- nGene: MYSM1
- nGene_biotype: protein_coding
input:
- rsid df, a dataframe, rowname is SNP rsid, rsid column is rsid
data preparation:
- go to eQTLGen download eQTL data: https://www.eqtlgen.org/cis-eqtls.html
function pipeline:
- simply match rsid to multiple genes
output:
- eGene: geneA,geneB
input:
- rsid df, a dataframe, rowname is SNP rsid, rsid column is rsid
input:
- rsid, example: unique(asso.df$SNPS)
function pipeline:
- library("LDlinkR")
- LDproxy()
output:
- all high LD SNPs - a dataframe