rajlabmssm / echold Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.85 MB

echoverse module: LD downloading and processing

R 97.28% Python 2.72%

1000genomes echoverse genomics linkage-disequilibrium ukbiobank

echold's People

Contributors

Watchers

echold's Issues

Fix UKB LD after alkesgroup data move

Alkes price lab moved their data to Google Storage. This is....not great, because it may mean all the data is now behind a paywall.

https://alkesgroup.broadinstitute.org/UKBB_LD/

Will see if I can work around this, or at least provide users a way to supply their GS credentials and still get the data.

Enable custom LD

Enable custom LD panel by:

VCFs
pre-computed LD matrices

Locus specific LD VCF file generated from 1000genomes Reference Panel is empty

Hi,

It turnes out that when running the "full pipeline vignette" the bgzipped VCF file generated under ./RESULTS/GWAS/Nalls23andMe_2019/BST1/LD is empty.

I believe this is causing the "Segmentation fault (core dumped)" error, so that no finemapping tool works downstream since we are missing the locus specific LD structure.

r$> library(echolocatoR)                                                                                                                                                                                   
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Bioconductor version 3.12 (BiocManager 1.30.12), ?BiocManager::install for help
Bioconductor version '3.12' is out-of-date; the current release version '3.14' is available with R version '4.1'; see https://bioconductor.org/install

data("Nalls_top_SNPs"); 
    top_SNPs <- import_topSNPs( 
      # topSS = "~/Desktop/Fine_Mapping/Data/GWAS/Nalls23andMe_2019/Nalls2019_TableS2.xlsx", 
      topSS = Nalls_top_SNPs, 
      chrom_col = "CHR",  
      position_col = "BP",  
      snp_col="SNP", 
      pval_col="P, all studies",  
      effect_col="Beta, all studies",  
      gene_col="Nearest Gene",  
      locus_col = "Nearest Gene", 
      grouping_vars = c("Locus"), 
      remove_variants = "rs34637584")  
    #> [1] "+ Assigning gene_col and locus_col independently" 
    head(top_SNPs)

     Locus    Gene CHR       POS        SNP        P  Effect
1:   ASXL3   ASXL3  18  31304318  rs1941685 1.69e-08  0.0531
2:    BAG3    BAG3  10 121415685 rs72840788 1.57e-11  0.0763
3:    BIN3    BIN3   8  22525980  rs2280104 1.16e-08  0.0556
4:   BRIP1   BRIP1  17  59917366 rs61169879 9.28e-10  0.0820
5:    BST1    BST1   4  15737348  rs4698412 2.06e-28  0.1035
6: C5orf24 C5orf24   5 134199105 rs11950533 7.16e-09 -0.0916

r$> fullSS_path <- example_fullSS(fullSS_path=file.path("/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR","Nalls23andMe_2019.fullSS_subset.tsv"))

r$> fullSS_path                                                                                                                                                                                            
[1] "/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/Nalls23andMe_2019.fullSS_subset.tsv"

                                              top_SNPs = top_SNPs,   
                                              #  It's best to give absolute paths 
                                              results_dir = file.path("/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR","results"), 
                                              loci = c("BST1","MEX3C"),# top_SNPs$Locus, 
                                              dataset_name = "Nalls23andMe_2019", 
                                              dataset_type = "GWAS",   
                                              force_new_subset = T, 
                                              force_new_LD = T, 
                                              force_new_finemap = T, 
                                              remove_tmps = F, 
                                               
                     # SUMMARY STATS ARGUMENTS 
                     fullSS_path = fullSS_path, 
                     query_by ="tabix", 
                     chrom_col = "CHR", position_col = "POS", snp_col = "RSID", 
                     pval_col = "p", effect_col = "beta", stderr_col = "se", 
                     freq_col = "freq", MAF_col = "calculate", 
                     A1_col = "A1", 
                     A2_col = "A2", 
                      
                     # FILTERING ARGUMENTS 
                     ## It's often desirable to use a larger window size  
                     ## (e.g. 2Mb which is bp_distance=500000*2),  
                     ## but we use a small window here to speed up the process.  
                     bp_distance = 10000,#500000*2, 
                     min_MAF = 0.001,   
                     trim_gene_limits = F, 
                      
                     # FINE-MAPPING ARGUMENTS 
                     ## General 
                     finemap_methods = c("ABF","FINEMAP","SUSIE","POLYFUN_SUSIE"),  
                     n_causal = 5, 
                     PP_threshold = .95,  
                      
                     # LD ARGUMENTS  
                     LD_reference = "1KGphase1",#"UKB", 
                     superpopulation = "EUR", 
                     download_method = "axel", 
                      
                     # PLOT ARGUMENTS  
                     ## general    
                     plot.types=c("fancy"), 
                     ## Generate multiple plots of different window sizes;  
                     ### all SNPs, 4x zoomed-in, and a 50000bp window 
                     plot.zoom = c("all","4x","10x"), 
                     ## XGR 
                     # plot.XGR_libnames=c("ENCODE_TFBS_ClusteredV3_CellTypes"),  
                     ## Roadmap 
                     plot.Roadmap = F, 
                     plot.Roadmap_query = NULL, 
                     # Nott et al. (2019) 
                     plot.Nott_epigenome = T,  
                     plot.Nott_show_placseq = T,                                                                                                                                                           
[1] "+ CONDA:: Activating conda env 'echoR'"
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
BST1 (1 / 2)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "+ CONDA:: Identified bgzip executable in echoR env."
[1] "Determining chrom type from file header"
[1] "LD:: Standardizing summary statistics subset."
[1] "++ Preparing Gene col"
[1] "++ Preparing A1,A1 cols"
[1] "++ Preparing MAF,Freq cols"
[1] "++ Inferring MAF from frequency column..."
[1] "++ Preparing N_cases,N_controls cols"
[1] "++ Preparing `proportion_cases` col"
[1] "++ Calculating `proportion_cases`."
[1] "++ Preparing N col"
[1] "+ Preparing sample_size (N) column"
[1] "++ Computing effective sample size."
[1] "++ Preparing t-stat col"
[1] "+ Calculating t-statistic from Effect and StdErr..."
[1] "++ Assigning lead SNP"
[1] "++ Ensuring Effect, StdErr, P are numeric"
[1] "++ Ensuring 1 SNP per row"
[1] "++ Removing extra whitespace"
[1] "++ Saving subset ==> /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/BST1_Nalls23andMe_2019_subset.tsv.gz"
[1] "LD:: Using 1000Genomes as LD reference panel."
[1] "LD Reference Panel = 1KGphase1"
[1] "+ LD:: Querying 1KG remote server."
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "LD:: Querying VCF subset"
[1] "/home/acarrasco/.conda/envs/echoR/bin/tabix -fh -p vcf ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521//ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 4:15727389-15747197 > /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.1KGphase1.vcf"
Segmentation fault (core dumped)
[1] "LD:BCFTOOLS:: Compressing vcf file..."
[1] "LD:TABIX:: Re-indexing vcf.gz..."
[1] "LD:BCFTOOLS:: Subsetting vcf to only include EUR individuals ( 378 / 1091 )."
Failed to read from /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.1KGphase1.vcf.gz: unknown file type
[1] "LD:PLINK:: Converting vcf.gz to .bed/.bim/.fam"
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/plink.log.
Options in effect:
  --out /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/plink
  --vcf /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.1KGphase1.vcf.gz

257891 MB RAM detected; reserving 128945 MB for main workspace.

Error: File read failure.
[1] "LD:snpStats:: Computing LD (stats = R)"
Error in data.table::fread(bim_path, col.names = c("CHR", "SNP", "V3",  : 
  File '/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD/plink.bim' does not exist or is non-readable. getwd()=='/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR'
Fine-mapping complete in:
Time difference of 11.7 secs

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
MEX3C (2 / 2)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "Determining chrom type from file header"
[1] "LD:: Standardizing summary statistics subset."
[1] "++ Preparing Gene col"
[1] "++ Preparing A1,A1 cols"
[1] "++ Preparing MAF,Freq cols"
[1] "++ Inferring MAF from frequency column..."
[1] "++ Preparing N_cases,N_controls cols"
[1] "++ Preparing `proportion_cases` col"
[1] "++ Calculating `proportion_cases`."
[1] "++ Preparing N col"
[1] "+ Preparing sample_size (N) column"
[1] "++ Computing effective sample size."
[1] "++ Preparing t-stat col"
[1] "+ Calculating t-statistic from Effect and StdErr..."
[1] "++ Assigning lead SNP"
[1] "++ Ensuring Effect, StdErr, P are numeric"
[1] "++ Ensuring 1 SNP per row"
[1] "++ Removing extra whitespace"
[1] "++ Saving subset ==> /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/MEX3C_Nalls23andMe_2019_subset.tsv.gz"
[1] "LD:: Using 1000Genomes as LD reference panel."
[1] "LD Reference Panel = 1KGphase1"
[1] "+ LD:: Querying 1KG remote server."
[1] "+ CONDA:: Identified tabix executable in echoR env."
[1] "LD:: Querying VCF subset"
[1] "/home/acarrasco/.conda/envs/echoR/bin/tabix -fh -p vcf ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521//ALL.chr18.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 18:48673708-48692728 > /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/MEX3C.1KGphase1.vcf"
Segmentation fault (core dumped)
[1] "LD:BCFTOOLS:: Compressing vcf file..."
[1] "LD:TABIX:: Re-indexing vcf.gz..."
[1] "LD:BCFTOOLS:: Subsetting vcf to only include EUR individuals ( 378 / 1091 )."
Failed to read from /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/MEX3C.1KGphase1.vcf.gz: unknown file type
[1] "LD:PLINK:: Converting vcf.gz to .bed/.bim/.fam"
PLINK v1.90b6.9 64-bit (4 Mar 2019)            www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/plink.log.
Options in effect:
  --out /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/plink
  --vcf /mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/MEX3C.1KGphase1.vcf.gz

257891 MB RAM detected; reserving 128945 MB for main workspace.

Error: File read failure.
[1] "LD:snpStats:: Computing LD (stats = R)"
Error in data.table::fread(bim_path, col.names = c("CHR", "SNP", "V3",  : 
  File '/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/MEX3C/LD/plink.bim' does not exist or is non-readable. getwd()=='/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR'
Fine-mapping complete in:
Time difference of 8.7 secs
Error in `[.data.table`(x, r, vars, with = FALSE) : 
  column(s) not found: SNP

Then, we see how the loci specific LD file is empty, leading to the segmentation fault error

acarrasco@kronos:/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD$ gunzip BST1.1KGphase1.vcf.gz 
acarrasco@kronos:/mnt/rreal/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/results/GWAS/Nalls23andMe_2019/BST1/LD$ cat BST1.1KGphase1.vcf

Same output when I tried to use the phase 3 from 1000G Panel

Conclussion

I believe there is an issue accessing the 1000Genome Ref Panel. When I changed the reference panel to UKB, the finemap_loci() function worked

get_LD() error: invalid 'path' argument

I am trying to compute LD matrix from summary statistics, and I want to use the 1000 genome phase 3 reference panel.

echoLD::get_LD(
  query_dat = sumstats,
  locus_dir = save_dir,
  query_genome="hg38",
  LD_reference = "1KGphase3",
  superpopulation = "EUR",
  nThread = 16
)

This works for some loci, but not always. For example, I have a loci on chr2:8906676-9922165, the error said invalid 'path' argument, but I am sure the path is correct.

What is the issue behind it? How should I resolve this problem?

Improve error handling during LD-sumstats merging

Add a message making it more clear when the function fails due to lack of overlapping RSIDS. Perhaps with a backup strategy that involves using the genomic coordinates.

See RajLabMSSM/echolocatoR#47

Replace`bcftools` functionality with R package

RajLabMSSM/echolocatoR#60 (comment)

`get_LD_UKB`: dylib errors during tests

In R console

Running this example seems to work totally fine on my Mac (albeit slowly and taking lots of memory, as usual):

query_dat <- echodata::BST1[seq(1, 50), ]
locus_dir <- file.path(tempdir(), echodata::locus_dir)
LD_list <- echoLD:::get_LD_UKB(
    query_dat = query_dat,
    locus_dir = locus_dir)

Using UK Biobank LD reference panel.
+ UKB LD file name: chr4_14000001_17000001
Downloading full .gz/.npz UKB files and saving to disk.
Downloading with axel [1thread(s)]: 
 https://data.broadinstitute.org/alkesgroup/UKBB_LD/chr4_14000001_17000001.gz ==> /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001.gz

axel download complete.
Time difference of 1.4 secs
Downloading with axel [1thread(s)]: 
 https://data.broadinstitute.org/alkesgroup/UKBB_LD/chr4_14000001_17000001.npz ==> /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001.npz

axel download complete.
Time difference of 22.9 secs
load_ld() python function input: /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001
Reading LD matrix into memory. This could take some time...
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001.gz
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001.npz
Processed URL: /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/chr4_14000001_17000001
Some other message at the end
+ Full UKB LD matrix: 22,085 x 22,085
+ Full UKB LD SNP data.table: 22,085 x 5
50 x 50 LD_matrix (sparse)
Converting obj to sparseMatrix.
Saving sparse LD matrix ==> /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpFq7D4S/results/GWAS/Nalls23andMe_2019/BST1/LD/BST1.UKB_LD.RDS
+ Removing .gz temp files.
+ Removing .npz temp files.

In tests during package checks

Nevertheless, when running unit tests during check, this error is produced. Not sure why testthat wouldn't have access to all the same libraries that the main R console does.

INTEL MKL ERROR: dlopen(/opt/anaconda3/lib/libmkl_intel_thread.1.dylib, 0x0009): Library not loaded: @rpath/libiomp5.dylib
  Referenced from: /opt/anaconda3/lib/libmkl_intel_thread.1.dylib
  Reason: tried: '/Library/Frameworks/R.framework/Resources/lib/libiomp5.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server/libiomp5.dylib' (no such file).
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.1.dylib.

Session info

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] echoLD_0.99.6   snpStats_1.46.0 Matrix_1.4-1    survival_3.4-0 

loaded via a namespace (and not attached):
  [1] bitops_1.0-7                matrixStats_0.62.0          bit64_4.0.5                
  [4] filelock_1.0.2              progress_1.2.2              httr_1.4.4                 
  [7] rprojroot_2.0.3             GenomeInfoDb_1.32.3         tools_4.2.1                
 [10] utf8_1.2.2                  R6_2.5.1                    DT_0.24                    
 [13] DBI_1.1.3                   BiocGenerics_0.42.0         tidyselect_1.1.2           
 [16] prettyunits_1.1.1           bit_4.0.4                   curl_4.3.2                 
 [19] compiler_4.2.1              cli_3.3.0                   Biobase_2.56.0             
 [22] basilisk.utils_1.8.0        xml2_1.3.3                  DelayedArray_0.22.0        
 [25] rtracklayer_1.57.0          readr_2.1.2                 rappdirs_0.3.3             
 [28] stringr_1.4.1               digest_0.6.29               Rsamtools_2.12.0           
 [31] piggyback_0.1.4             R.utils_2.12.0              basilisk_1.8.1             
 [34] XVector_0.36.0              pkgconfig_2.0.3             htmltools_0.5.3            
 [37] echodata_0.99.11            MatrixGenerics_1.8.1        dbplyr_2.2.1               
 [40] fastmap_1.1.0               BSgenome_1.64.0             htmlwidgets_1.5.4          
 [43] rlang_1.0.4                 rstudioapi_0.14             RSQLite_2.2.16             
 [46] BiocIO_1.6.0                generics_0.1.3              jsonlite_1.8.0             
 [49] echoconda_0.99.6            BiocParallel_1.30.3         R.oo_1.25.0                
 [52] zip_2.2.0                   dplyr_1.0.9                 VariantAnnotation_1.42.1   
 [55] RCurl_1.98-1.8              magrittr_2.0.3              GenomeInfoDbData_1.2.8     
 [58] Rcpp_1.0.9                  S4Vectors_0.34.0            fansi_1.0.3                
 [61] reticulate_1.25             R.methodsS3_1.8.2           lifecycle_1.0.1            
 [64] stringi_1.7.8               yaml_2.3.5                  SummarizedExperiment_1.26.1
 [67] zlibbioc_1.42.0             BiocFileCache_2.4.0         grid_4.2.1                 
 [70] blob_1.2.3                  parallel_4.2.1              crayon_1.5.1               
 [73] dir.expiry_1.4.0            lattice_0.20-45             Biostrings_2.64.1          
 [76] splines_4.2.1               GenomicFeatures_1.48.3      hms_1.1.2                  
 [79] KEGGREST_1.36.3             pillar_1.8.1                GenomicRanges_1.48.0       
 [82] rjson_0.2.21                codetools_0.2-18            biomaRt_2.52.0             
 [85] stats4_4.2.1                pkgload_1.3.0               XML_3.99-0.10              
 [88] glue_1.6.2                  BiocManager_1.30.18         data.table_1.14.2          
 [91] tzdb_0.3.0                  png_0.1-7                   vctrs_0.4.1                
 [94] tidyr_1.2.0                 purrr_0.3.4                 assertthat_0.2.1           
 [97] cachem_1.0.6                openxlsx_4.2.5              echotabix_0.99.7           
[100] restfulr_0.0.15             downloadR_0.99.3            tibble_3.1.8               
[103] GenomicAlignments_1.32.1    AnnotationDbi_1.58.0        memoise_2.0.1              
[106] IRanges_2.30.1              ellipsis_0.3.2              here_1.0.1

Liftover `dat` before querying LD panel

`Error : 'snpStats' required; try BiocManager::install('snpStats')`

For some reason this shows up even when snpStats is already installed...

LD reference optimization according to GWAS ancestry from input

Is your feature request related to a problem? Please describe.
We want to be able to get the exact superpopulation, or even better, the subpopulation from the input GWAS, by correlating the input GWAS MAFs against the the LDref MAFs.

Describe the solution you'd like

Get the GWAS MAFs. Split into a) Genotyped, b) Geenotyped + Imputed and perform the correlation
Get the most correlated superpopulation / subpopulation, and use the subset to perform fine-mapping

Describe alternatives you've considered
If the inputted superpopulation does not really mach, rise a warning for now

Additional context
Read what is out there and maybe try to perform an undustment od the LD reference (ie deconvolution methods)

NOTE
Go to translate_population.R

Connect to UKBB 150K

**Using UKBB 150 to calculate LD structure **
Maybe acessing UKB 150K is another alternative to calculate LD
https://www.nature.com/articles/s41586-022-04965-x

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.