xtonyjiang / gnova Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 12.0 75 KB

A principled framework to estimate annotation-stratified genetic covariance using GWAS summary statistics.

Home Page: http://www.cell.com/ajhg/abstract/S0002-9297(17)30453-6

License: GNU General Public License v3.0

Python 100.00%

gnova's People

Contributors

Stargazers

Watchers

Forkers

zhaolabyale dongjt0727 ghm17 biostatpzeng yiliangtracyzhang jtnedoctor nvrivera belowlab sharkts666

gnova's Issues

default sumstats args

Running this as a module in my code I got this error:

Traceback (most recent call last):
File "ldsc_thin.py", line 451, in
df = pr.pre_function(args)
File "/net/zhao/rlp48/ldsc_test/ldsc/custom/prep.py", line 51, in pre_function
dfs.append(munge_sumstats.munge_sumstats(args, p=False))
File "/net/zhao/rlp48/ldsc_test/ldsc/custom/munge_sumstats.py", line 523, in munge_sumstats
if args.no_alleles and args.merge_alleles:
AttributeError: 'Namespace' object has no attribute 'no_alleles'

I think the problem here is the munge_sumstats parser doesn't get called when the script isn't called as main. You'll need to add some code to set any default arg attributes that the munge_sumstats parser sets (starts at line 444 of sumstats.py). You might need to do some more thorough testing of your code to make sure that munge_sumstats is going to run how you expect it to.

document ldscore format

Hello,
I have my own ld scores computed with the ldsc software, and was hoping to use them with GNOVA; however, the GNOVA CSV format for LD files seems to be different than LDSC output. Could you document your LD file format, or provide an example?

Problem with NaNs

Hi Tony,

I've been trying to use GNOVA in my project. For some of the sumstats files I'm using, I get the following error:

Traceback (most recent call last):
  File "gnova.py", line 86, in <module>
    pipeline(parser.parse_args())
  File "gnova.py", line 47, in pipeline
    out = calculate(gwas_snps, ld_scores, annots, N1, N2)
  File "/Volumes/BD/GNOVA/calculate.py", line 72, in calculate
    m1 = linear_model.LinearRegression().fit(ld_scores, pd.DataFrame((Z_x) ** 2), sample_weight=w1)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/linear_model/base.py", line 458, in fit
    y_numeric=True, multi_output=True)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 750, in check_X_y
    dtype=None)
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 568, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/Volumes/Users/Library/Python/2.7/lib/python/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
    raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I tracked it down to the prep.py, where the

df = pd.merge(bim, dfs[1], on=['SNP']).merge(dfs[0], on=['SNP'])

produces "NaN". I think this shouldn't be the case. Could this be a matter of the pandas version used?
My workaround was to introduce a

df.dropna(inplace=True)

but I don't think that's how it's meant to be.

Rainer

MAF stratification

Could you give any information on your MAF stratified annotations from 1000g, what were the MAF cut-offs for each quartile? I haven't been able to find this anywhere (including in the AJHG paper). Sorry in advance if I've missed it!

TIA!

LDSC step does not work if no annot matrix is provided

This would be in the case in which we don't want annotation stratification. I'll take care of this; I just want to record this for the sake of having a pseudo TODO-list on here.

link to download annotation files dead

Hello,
the link given in the Readme to download the annotation files is dead:
http://genocanyon.med.yale.edu/GNOVAFiles/annotations.tar.gz
Is it possible to fix it ?
Best,
Elise

Reference and annotation files not available

Hi!

Thank you for such an amazing tool. I'm trying to access the plink reference file and the annotation files but it seems that they are not available. Would it be a temporal error or they won't be accessible anymore.

Thanks!

Judit

Problem preparing files for gnova

Hi,

I am having this error repeteadly, even when I have formatted files according to readme specifications. Could you help me with this issue? Thanks!

Preparing files for analysis...
Traceback (most recent call last):
File "./gnova.py", line 85, in
pipeline(parser.parse_args())
File "./gnova.py", line 34, in pipeline
args.sumstats2)
File "/Users/usuario/Desktop/pipeline/prep.py", line 67, in prep
len_b, len_a = len(bim_files), len(annot_files)
TypeError: object of type 'NoneType' has no len()

Running error

TypeError:init() got multiple values for keyword argument 'keep_snps'

Bug

TypeError: init( ) got multiple values for keyword argument 'keep_snps'

Multi-chromosome pattern match too liberal

Replacing the @ sign with a * wildcard could cause problems if there are multiple versions of the bim file in the directory. For example you'd end up globbing chr1.bim and chr1_nosex.bim in the call and have multiple copies of each chromosome. You need a regex here that will specifically match one or two digits only (we may also want to avoid picking up sex chromosome files for now, I'll check with Qiongshi).

Interpreting corrected p-value

I understood that the corrected p-value is the p-value adjusted for sample overlap. However, in my case I am sure that there is no sample overlap. Therefore, could I only use the raw p-value? or does the corrected p-value take into account something else besides the sample overlap?
Besides, sometimes the raw p-value is not significant, but the corrected p-value (not existing a real sample overlap between my data) it is. How is it possible?

Thank you very much in advance!

Program frozen for a long time without error/output

Hi,

Thanks for this tool.

I ran the code to estimate the rg between two traits, but the code got stuck for more than 3 hours, without outputting any files.

Preparing files for analysis... Calculating LD scores... ~/software/gnova/lib/python2.7/site-packages/numpy/core/fromnumeric.py:56: FutureWarning: Series.nonzero() is deprecated and will be removed in a future version.Use Series.to_numpy().nonzero() instead return getattr(obj, method)(*args, **kwds)

Is this normal???

Thanks