lukejoconnor / lcv Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 16.0 28 KB

Software implementing the Latent Causal Variable Model

MATLAB 56.42% R 43.58%

lcv's People

Stargazers

Watchers

Forkers

hongyanr dongjt0727 biostatpzeng baohuilee fhormoz ksiewert taewoong-ha rezajf biostatyu bwbai sunday2020 jujiaokang annieyuner dawei1203 nvrivera jjmpal

lcv's Issues

Outlier removal procedure

Hi there, thanks for such an interesting method! I calculated GCP estimates between genetically correlated traits, but I am getting insignificant GCP P-values, and I wanted to further explore this relationship by detecting and removing potential outliers in the GWAS summary statistics. In the supplemental material of the paper, it is stated that "non-significant LCV p-values do not constitute evidence against a causal effect", then it goes on to discuss, for example, how the relationship between BMI on T2D was further explored using an outlier removal approach, which is not recommended, but that it helped the authors clarify the genetic relationship between these two traits. Would it be possible to share the code used to perform this outlier removal approach?

negative heritability estimates

Dear LCV users,

when using LCV as shown in the tutorial:
LCV = RunLCV(data[,"L2"],data[,"Z.x"],data[,"Z.y"] )
I would often get the error "NaNs produced, probably due to negative heritability estimates."

However, when providing sample sizes, I got LCV to run without errors:
LCV = RunLCV(data[,"L2"],data[,"Z.x"],data[,"Z.y"],
n.1=n.1, n.2=n.2, ldsc.intercept=0 )

Notice that ldsc.intercept is set to 0, not 1 as stated in the header of RunLCV.R

How to upload LD scores

Hello there!

I was trying to do some LCV analysis, but I did not understand how I read de LD scores files. That is because the ones I use in LDSC are many files (they come in a directory) and in the code it only recognizes a single file. Is there a way I can enter all the LD score files?

Effect of trait1 on trait2

I have a question regarding the gcp zscore.
If for example the gcp between trait1 and trait2 is 0.65, I´m interested in estimating the effect trait1 has on trait2.
I´m I right in assuming that multiplying the gcp zscore with the gcp standard error gives me the causal effect (beta value) trait1 has on trait2 ?

How to interpret the results that GCP is close to 0

When using the Twosample MR package for analysis, I obtained a bidirectional causal relationship between major depression and GERD, both of which were significant. When using the LCV approach, I got the following results, how do I interpret them?Does the fact that GCP is very close to 0 mean that there is no causal relationship between the two traits?

Example script in case of own ss data

Thanks for posting scripts for this interesting method. I have been trying to run LCV on my own data but haven't been succesful so far. The example script seems to be based on a situation where you simulate data, which I don't want. I have prepared the three input files that the description says are needed: ell (LD scores), z.1 (effects on trait 1) & z.2 (effects on trait 2). I also ran the scripts RunLCV.R & MomentFunctions.R. I then tried adapting the ExampleScript.R to run the RunLCV function on my own datasets but I get many different errors.

For, instance, is the below line correct to run the actual LCV model?
LCV<- RunLCV(ell,z.1,z.2,no.blocks,crosstrait.intercept,ldsc.intercept,weights,sig.threshold,N.1,N.2,intercept.12)

But this keeps giving me errors 'requires numeric/complex matrix/vector arguments' or 'undefined columns selected'

Could you please provide some explanation? It would be perfect if you could upload a script with only the lines that are needed to run the LCV model, given that one has their own data (in the files ell, z.1 & z.2).

thanks!

Zscore & CGP give opposite interpretations

It would be great if you could help me out with this issue I've been having. I ran some LCV analyses and I'm a bit confused about the results. The zscore is positive (which according to the explanatory notes in the LCV script means that trait 1 causes trait 2), while the GCP statistic is negative (which according to the notes means that trait 2 causes trait 1). Looking at the p-values it seems that trait 1 is causal for trait 2 (which is in agreement with the zscore, but not with the GCP statistic).

Could you let me know what the right interpretation is for these results?

zscore = 5.84
gcp.pm = -0.82
gcp.pse = 0.14
rho.est = -0.17
rho.err = 0.008
pval.fullycausal.1 = 7.4E-06
pval.fullycausal.2 = 0.951
h2.zscore.1 = 278.19
h2.zscore.2 = 165.81

A tutorial would be very helpful

Hi, I would very much like to use your algorithm to explore a few ideas I got in my mind. A simple tutorial could be much of great help to guide the user for the implementation. Is it possible for you post the code you used to analyze the UK traits in the NG paper? Thanks a lot!

example R dataset

I am interested in using your R program. Could you provide example dataset?

Equation to compute Estimates of mixed 4th moments

File estimate_k4.m estimation equations (lines 74-76)

% Estimates of mixed 4th moments
k41=weighted_mean(nZ2.*nZ1.^3-3*nZ1.*nZ2*(intercept1/s1^2)-3*(nZ1.^2-intercept1/s1^2)*intercept12/s1/s2,weights);%/sum(nZ1.*nZ2)^2;
k42=weighted_mean(nZ1.*nZ2.^3-3*nZ1.*nZ2*(intercept2/s2^2)-3*(nZ2.^2-intercept2/s2^2)*intercept12/s1/s2,weights)

seems to have different signs between some terms in sum than ones precented in article
https://www.biorxiv.org/content/biorxiv/early/2018/04/17/205435.full.pdf (equation 8 p. 16)
In article all terms in the sum has plus sign in front of them, but in code terms -3*nZ1.*nZ2*(intercept1/s1^2) and -3*(nZ1.^2)*intercept12/s1/s2 has minus sign in front of them. So is there small error or have I misunderstood something? (I think this article provide quite little justification for steps of equation 8)

opposite signs of zscore and gcp.pm

Thanks for providing the software for LCV.
We were running LCV for several pairs of traits using https://github.com/lukejoconnor/LCV/blob/master/R/RunLCV.R.

However, we noted that the signs of zscore and gcp.pm are frequently opposite to each other. This seems to contradict with the documentation of the program.
May I know if we can safely ignore the sign of zscore in this case or are there other ways to solve this issue? Thanks a lot.

OUTPUT VARIABLES:
lcv.output, a list with named entries:
zscore", Z score for partial genetic causality. zscore>>0 implies gcp>0.

Genetic correlation

LCV is giving a completely diffrent (and much lower) genetic correlation then regular LDSC on the same munged sumstats. How can that be?

Inconsistent LCV results for the same pair of traits

Dear LCV developers,

We ran three LCV tests to study a relationship between two UKBB traits - varicose (VV, three sets of GWAS data, rg = 0.98 - 1.00) and ostearthritis (OA) - and obtained controversial GCP estimates (all of them statistically significant, but differ in sign and magnitude). Thus, in two analyses we observed GCP < -0.6, but the third GCP value was equal to 0.12 (please, see the full table with the results LCV_results for OA and VV.xlsx).

We expected to get concordant LCV results for all three analyses since we studied the same trait pair and used the data highly genetically correlated between datasets. What could be a reason for such contradicting results? How should they be interpreted?

Here https://drive.google.com/drive/folders/1_HWYDanaeZ1dT1xaPwZQDOYPk6GjFWdG?usp=sharing you could find processed data for VV and OA, munged using GenomicSEM R-package with maf.filter = 0.05 and HapMap 3 reference data for 1000 people.

The code for the LCV-analyses available via the link https://drive.google.com/drive/folders/1kdBebTmwguEnGD_JWAqb0e_EvLFLMEXI?usp=sharing.

Data description:

For VV, we tested GWAS summary statistics from 3 open UKBB-based resources with a strong genetic correlation between each other (rg = 0.98-1.00, LCV estimates). For OA, we used only one dataset.

Trait 1 – “varicose veins of lower extremities” (VV)
• Resource 1 – Gene Atlas (trait: I83: VVs of lower extremities)
• Resource 2 – The Neale Lab (trait: I83: VVs of lower extremities)
• Resource 2 – PheWeb (PheWAS code 454.1: VVs of lower extremity)

Trait 2 – “osteoarthritis, localized, primary” (OA)
• Resource – PheWeb (PheWAS code 740.11: Osteoarthrosis, localized, primary)

Very low LCV p-value but high standard error

I observe instances of very low GCP p-values, yet the GCP standard error is quite large. e.g. p= 5.4x10-8, where GCP is 0.47 (SE=0.52). By z-score alone, this is nowhere close to significant. It is difficult to even make an inference in this instance. I noticed that for one phenotype, h2SNP Zscore > 20, and for the other phenotype h2SNP Zscore =4, so one dataset has very limited power.

Accordingly, my suspicion is that the LCV result is a false positive due to the relatively weak signal in one dataset. Would you also suspect this? In that case, do you have recommendations for a minimum z-score to use for running LCV? e.g. the LDSC rule of thumb to not examine rg between phenotypes with h2 Z < 4. I imagine that the additional complication of the LCV model would require a more stringent z than for rg.

Thanks!
Adam

rg by LCV has different sign with rg by LDSC

Hi LCV developer,
I'm using LCV for causality estimation on summary statistics:

int=fread("ID94.sumstats.gz",data.table=FALSE) #data from 30837455
d1=fread("ID644.sumstats.gz",data.table=FALSE) # data of neuroticism from pmid30643256
d1=d1[which(!is.na(d1$Z)),]
rownames(d1)=d1[,1]
rownames(int)=int[,1]
l=rownames(ldsc) #ldsc is a data.frame made by ld scores under eur_w_ld_chr folder, and i set its
rownames as rsid
L=intersect(l,rownames(d1))
L=intersect(L,rownames(int))
exp=d1[L,"Z"]
out=int[L,"Z"]
ld=ldsc[L,"L2"]
RunLCV(ld,exp,out)

The output rho.est (0.27, se0.07) had an opposite sign compared with rg (-0.23,0.03) from LDSC:

C:\Users\goubegou\ldsc\ldsc.py --rg ID94.sumstats.gz,ID644.sumstats.gz --ref-ld-chr
F:\selection\eur_w_ld_chr\ --w-ld-chr F:\selection\eur_w_ld_chr\ --out ulcer

I believe I haven't flipped the effect column during these procedures (I made no modification to .sumstats file ever since their generation from munge.sumstats.py), and I'm not sure whether any critical bias has emerged. Should I simply flip gcp.pm and its z-score, assuming this issue just reflected some accidental errors? Thanks for your help.

Best Regards

lukejoconnor / lcv Goto Github PK

lcv's People

Stargazers

Watchers

Forkers

lcv's Issues

Recommend Projects

Recommend Topics

Recommend Org