Coder Social home page Coder Social logo

dap's Introduction

Integrative Genetic Association Analysis using Deterministic Approximation of Posteriors (DAP)

The current implementation is DAP-G!

This repository contains the software implementations for a suite of statistical methods to perform genetic association analysis integrating genomic annotations. These methods are designed to perform rigorous enrichment analysis, QTL discovery and multi-SNP fine-mapping analysis in a highly efficient way. The statistical model and the key algorithm, Deterministic Approximation of Posteriors (DAP), are described in this manuscript and this preprint

The repository includes source code, scripts and necessary data to replicate the results described in the manuscript. A detailed tutorial to guide the users through some specific analysis tasks is also included.

For questions/comments regarding to the software package, please contact Xiaoquan (William) Wen (xwen at umich dot edu).

License

Software distributed under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. See LICENSE for more details.

Repository directories

  • dap_src: C/C++ source code of the adaptive DAP algorithm with new and improved features (now working with summary-level statistics)

  • torus_src: C/C++ source code of the EM-DAP1 algorithm (for enrichment analysis and QTL discovery)

  • utility: utility scripts for results interpretation, file format conversion etc.

  • version 1: legacy code of the DAP implementation from version 1

User manual

User manual for DAP-G is available in pdf and in html

Tutorial

We are in the process of updating the tutorial for the new DAP-G. The tutorial for the old version of DAP can be found here.

Contributors

  • Xiaoquan Wen (University of Michigan)
  • Roger Pique-Regi (Wayne State University)
  • Yeji Lee (University of Michigan)

Citation

dap's People

Contributors

biosml avatar rpique avatar skimhellmuth avatar xqwen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dap's Issues

How to choose grid file?

Hi

I am trying to do a fine-mapping for my eQTL and sQTL data, followed by colocalization with GWAS. I am wondering how to make my grid file. I only have one study so I know that I can set phi to zero. However, I do not know how to choose the omega. Is it OK if I just use 0.1, 0.2, 0.4, 0.8, 1.6 as shown in the example?

Thanks

-p prior_file format

DAP-G worked well in my project.
I would like to add prior estimated from functional annotation.
Could you please provide some documentation on the file format required for the -p option?
Is the prior a value from 0 to 1 for each variant with 0 indicating low probability of being causal?
Providing an example prior data file would be great too.

Thank you

Errors in Makefile

Hi!

I was compiling dap and discovered an error in the Makefiles for dap. The libraries are linked with "-L /usr/local/lib" which should be "-L/usr/local/lib" without a space, otherwise the linker won't pick up on the libraries. I installed the GSL libraries in a nonstandard location and linked them like so...

main: main.o controller.o parser.o SSLR.o
        g++ -fopenmp -O3 main.o parser.o controller.o SSLR.o -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl -lm -lgslcblas -o dap-g
static: main.o controller.o parser.o SSLR.o
        g++ -fopenmp  -O3 main.o parser.o controller.o SSLR.o -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl -lm -lgslcblas -static -o  dap-g.static
main.o: main.cc
        g++ -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl -c -O3  main.cc
parser.o: parser.cc parser.h
        g++ -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl -c -O3 parser.cc
controller.o: controller.cc controller.h
        g++ -c -O3 -fopenmp -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl controller.cc
SSLR.o: SSLR.h SSLR.cc
        g++ -c -O3 SSLR.cc -I/apps/gsl-2.4/include/ -L/apps/gsl-2.4/lib -lgsl
clean:
        rm *.o dap-g

`dap1` source

Hi there!
I would need the sources for the binary version1/dap1_src/dap1, so that the system administrator at my organization can install it on our cluster. Is there any chance of getting it?

I had mixed success using a binary I built from version1/dap_src as a replacement. Should both of them work similarly for enloc?

Thanks!

Generate eQTL weights using sufficient summary statistics/individual level data

I am trying to follow the PTWAS tutorial/docs PTWAS_scan#1-ptwas-eqtl-weight-construction and was following through using the sufficient summary statistics (SSS). After this step with DAP-G on SSS there are no instructions or hints for what to do with the output and how to generate weights for this. Is it only possible to generate the eQTL weights using individual level data? If this is the case - how can one generate the sbams format? This does not appear to be documented anywhere, with only an indication of what the sbams file looks like? I would appreciate any advice on generating these eQTL weights if possible please? It would be great to see reproducible command line usage for how exactly PTWAS_scan.weights.hg38.txt.gz.tbi was generated. I think the relevant information available might be here: https://github.com/xqwen/dap/tree/master/gtex_v8_analysis to reproduce - but how might one use this to generate eQTL weights on ones own data (not GTEx v8 data)?
Many thanks

No cluster output

Hi,
I am trying to use dap together with fastenloc. The perl script that summarizes the dap output expects to find cluster level output in the file. However, in my analysis of z-scores, I do not get any clusters (all SNPs are in cluster -1). How could I solve this issue?

Could I tune some parameters to get clusters?

Or, if indeed all SNPs are independent, can the output of cluster information (containing only one SNP) be enforced?

Thanks for your assistance

Best,
Matthias

LD r or r^2

Hi,
from the documentation it does not become entirely clear of LD is to be provided as r (correlation) or r^2. Maybe you could add to your readme how you generate this data for instance with plink (--r or --r2 option)?
Thanks a lot for clarifying.
Best,
Matthias

Segmentation fault of torus

Hi Xiaoquan,

I'm trying to run the example of torus. But the program always gives me the segmentation fault error. I cloned the git repository from https://github.com/xqwen/dap. Both dag-g and torus were compiled successfully. The example of dag-g successfully finished, but torus example failed.
The geuv data were downloaded from http://www-personal.umich.edu/~xwen/dap/data/geuv/.

Any suggestion?

Thanks for your .


The following are the details.

localhost@torus_src~ pwd
/home/wqshi/packages/dap-master/torus_src

localhost@torus_src~ ls
classdef.h controller.o examples geuv.egene.rst geuv.snp.map.gz locus.cc logistic.h main.cc Makefile torus
controller.cc data geuv.annot.gz geuv.gene.map.gz geuv.summary.bf.gz logistic.cc logistic.o main.o README.md

localhost@torus_src~ torus -d geuv.summary.bf.gz --load_bf -smap geuv.snp.map.gz -gmap geuv.gene.map.gz -annot geuv.annot.gz -est -dump_prior gene_prior > geuv.enrichment.est
Segmentation fault (core dumped)

localhost@torus_src~ g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
`

dap-g : Segmentation fault

I am trying to run the DAP on my dataset with the following command:

dap-g -d gene.sbams.dat -p gene.prior -ld_control 0.5 --all -t 4 > output_file

The tool is displaying the "Segmentation fault"

Any help will be appreciated.

Error: unknown option "-d_z"

Hi!

I ran make on the dap_src folder and obtained the dap-g executable.

However, when I ran the following code
dap-g -d_z sample_data/sim.1.zval.dat -d_ld sample_data/sim.1.LD.dat

I got the error of
Error: unknown option "-d_z"

Please advise; Thanks!

Cheers,
Marie

A question about the GTEx case

Hi Xiaoquan:
Thank you for your tools, it is very great.

  1. But i find there is no t-stat in the Liver summary “Liver.allpairs.txt.gz” from GTEx. I don’t know if I downloaded the wrong data ? or I need to calculate p_normal to t in the file ?
  2. you know i am just learn this, so what does the number (1,2) in the comment file mean?How to get them?
  3. Doing this step(DAP-G) before colocalization(R package coloc) means that only these SNPs (PIPs > threshold) are analyzed for causal SNPs in colocalization rather than all SNPs during COLOC?
    recently, I keep reading your articles, they are all very good works.
    Look forward to your reply.

DAP-G always returns 1 as exit code

Hi,

I have noticed that when I run any of the DAP-G commands following these instructions with provided test files I get the error code 1. I have all the outputs and everything seems fine, however if I run echo $? I get the error 1, even though the previous commands didn't result in any error messages.

I am guessing that this has something to do with the last couple of lines in the main.cc file:

    // all done, print all configs
    con.print_dap_config();

    con.run();
    return 1;

Is this behavior expected?

Thank you in advance,

probalica

example data SYY=515.6

Hi Xiaoquan,

I am wondering can you share the formula to calculate SYY. I get a different value 1527.381 which is different with 515.6

Thanks.

dap-g -t 8 -ld_control 0.25 -converg_thresh 0.01 -d sample_data/sim.1.sbams.dat --dump_summary2 

segmentation fault with sample z score and LD data

Hi,
I am encountering a problem when running the code I checked out and compiled today. Any idea what could be the problem?
Best,
Matthias

./dap-g -d_z sample_data/sim.1.zval.dat -d_ld sample_data/sim.1.LD.dat

============ DAP Configuration ============

INPUT

    * summary-level data (z-scores and LD matrix)
    * number of candidate SNPs: 1001

PROGRAM OPTIONS

    * maximum model size allowed [-msize]: 1001 (no restriction)
    * LD control threshold [-ld_control]: 0.25
    * normalizing constant convergence threshold [-converg_thresh]: 1.00e-02 (log10 scale)

RUN LOG

    Model_Size      candidates        log10(NC)

Segmentation fault (core dumped)

Segmentation fault in sQTL Fine-mapping of GTEx Data

Hi Xiaoquan,

I'm trying to do a fine-mapping for my sQTL data. I refer to the steps in this document which provides a summary on the processing of GTEx v8 data. And dag-g was compiled successfully. There was an Segmentation fault running my own data.

Command like this:
~/software/dap-master/dap_src/dap-g -d chr9:99960032:99960105:clu_59499:ENSG00000136874.10.sbams.dat -p ENSG00000136874.10.prior -ld_control 0.5 --all -t 1 > chr9:99960032:99960105:clu_59499:ENSG00000136874.10.txt

Any suggestion?
Thanks

How to include NA genotypes in sbam file

Hi,
Does dap-g Individual-level Data "sbam" format support unknown genotype for certain variant in certain sample? Or one MUST impute their genotype matrix before running dap-g?
Best wishes,
Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.