Coder Social home page Coder Social logo

metaxcan's Introduction

MetaXcan

MetaXcan is a set of tools to integrate genomic information of biological mechanisms with complex traits. Almost all of the software here is command-line based.

This software has been recently migrated to python 3 as python 2 has been sunset.

Tools

Here you can find the latest implementation of PrediXcan: PrediXcan.py. This uses individual-level genotype and phenotype, along a mechanism's prediction model (e.g. models predicting expression or splicing quantification), to compute associations between omic features and a complex trait.

S-PrediXcan is an extension that infers PrediXcan's results using only summary statistics, implemented in SPrediXcan.py. A manuscript describing S-PrediXcan and the MetaXcan framework with an application can be found S-PrediXcan.

MultiXcan (MulTiXcan.py) and S-MultiXcan (SMulTiXcan.py) compute omic associations, integrating measurements across tissues while factoring correlation. For example, if you have prediction models, each trained on different regions of the brain, MulTiXcan will combine the information across all experiments. This is effectively a meta-analysis across tissues, where each tissue is an experiment and we explictly account for correlation.

Prerequisites

The software is developed and tested in Linux and Mac OS environments. The main S-PrediXcan script is also supported in Windows.

To run S-PrediXcan, you need Python 3.5 or higher, with the following libraries:

To run PrediXcan Associations and MulTiXcan, you also need:

To run prediction of biological mechanisms on individual-level data, you will also need:

R with ggplot and dplyr is needed for some optional statistics and charts.

We recommend a tool like Conda to set up a working environment for MetaXcan. Tools like pyenv also work, but the bgen-reader dependency currently takes some effort to get going on pyenv.

Example conda environment setup

A quick-and-dirty solution to install the basic requirements is using Miniconda and the file software/conda_env.yaml in this repository to create a working environment.

conda env create -f /path/to/this/repo/software/conda_env.yaml
conda activate imlabtools

Useful Data & Prediction models

We make available several transcriptome predictione models and LD references here. These files should be enough for running SPrediXcan.py, MulTiXcan.py and SMulTiXcan.py on practically any GWAS study. We highly recommend MASHR models therein, as they are parsimonious and biologically-informed, using fine-mapped variants and cross-tissue QTL patterns. In the following we use gene recurrently to refer to the prediction model of a genetic feature, but it can stand for other units such as prediction of an intron's quantification.

we provide a end-to-end tutorial, for integrating GWAS summary statistics on the latest release of GTEx models.

Project Layout

software folder contains an implementation of S-PrediXcan's method and associated tools. The following scripts from that folder constitute different components in the MetaXcan pipeline:

SPrediXcan.py
PrediXcan.py
MulTiXcan.py
SMulTiXcan.py

, although SPrediXcan.py is the most widely applicable. SPrediXcan.py script contains the current implementation of S-PrediXcan. MulTiXcan.py and SMulTiXcan.py are the multiple-tissue methods. MultiXcan.py uses as input the predicted levels generated by PrediXcan.py.

The rest of the scripts in software folder are python packaging support scripts, and convenience wrappers such as the GUI.

Subfolder software/metax contains the bulk of Metaxcan's logic, implemented as a python module.

S-PrediXcan Input data

S-PrediXcan will calculate the gene-level association results from GWAS summary statistics. It supports most GWAS formats by accepting command line argument specifying data columns. Some precalculated data is needed, that must be set up prior to S-PrediXcan execution.

The gist of S-PrediXcan's input is:

  • A Transcriptome Prediction Model database (an example is here)
  • A file with the covariance matrices of the SNPs within each gene model (such as this one)
  • GWAS results (such as these, which were computed on a randomly generated phenotype). GWAS results can belong to a single file or be split into multiple ones (i.e. split by chromosome). You can specify the necessary columns via command line arguments (i.e. which column holds snps, which holds p-values, etc)

You can use precalculated databases, or generate new ones with tools available in PredictDB repository. GTEx-based tissues and 1000 Genomes covariances precalculated data can be found here.

(Please refer to /software/Readme.md for more detailed information)

GWAS summary statistic format

S-PrediXcan supports a large number of input GWAS formats through command line arguments. By specifying the appropriate input file column name, S-PrediXcan will analize the file without extra need for input conversion. Input GWAS files can be plain text files or gzip-compressed.

For example, you can specify an effect allele column and a standard error column, or a pvalue column and an odds ratio column, or only a GWAS zscore column. S-PrediXcan will try to use the following (in that order) if available from the command line arguments and input GWAS file:

  1. use a z-score column if available from the arguments and input file;
  2. use a p-value column and either effect, odd ratio or direction column;
  3. use effect size (or odd ratio) and standard error columns if available.

Check the Github's ' wiki for those that work best for your data, and interpreting the results. For example, if your GWAS has p-values that are too small (i.e 1e-350), then you should avoid specifying a p-value column because numerical problems might arise; you should use effect size and standard error instead.

A remark on individual-level genotype format

PrediXcan supports three input file formats:

  • vcf
  • bgen
  • internal "dosage format".

Associations are output as a tab-separated file.

Predicted levels can be output as both text files or HDF5 files. HDF5 files allow a more efficient computation of MultiXcan, as only data for a single gene/inton/whaever across all tissues can be loaded at a time.

Setup and Usage Example on a UNIX-like operating system

The following example assumes that you have python 3.5 (or higher), numpy, and scipy installed.

  1. Clone this repository.
$ git clone https://github.com/hakyimlab/MetaXcan
  1. Go to the software folder.
$ cd MetaXcan/software
  1. Download example data.

This may take a few minutes depending on your connection: it has to download approximately 200Mb worth of data. Downloaded data will include an appropiate Transcriptome Model Database, a GWAS/Meta Analysis summary statistics, and SNP covariance matrices.

Extract it with:

tar -xzvpf sample_data.tar.gz
  1. Run the High-Level S-PrediXcan Script
./SPrediXcan.py \
--model_db_path data/DGN-WB_0.5.db \
--covariance data/covariance.DGN-WB_0.5.txt.gz \
--gwas_folder data/GWAS \
--gwas_file_pattern ".*gz" \
--snp_column SNP \
--effect_allele_column A1 \
--non_effect_allele_column A2 \
--beta_column BETA \
--pvalue_column P \
--output_file results/test.csv

This should take less than a minute on a 3GHZ computer. For the full specification of command line parameters, you can check the wiki.

The example command parameters mean:

  • --model_db_path Path to tissue transriptome model
  • --covariance Path to file containing covariance information. This covariance should have information related to the tissue transcriptome model.
  • --gwas_folder Folder containing GWAS summary statistics data.
  • --gwas_file_pattern This option allows the program to select which files from the input to use based on their name. ...This allows to ignore several support files that might be generated at your GWAS analysis, such as plink logs.
  • --snp_column Argument with the name of the column containing the RSIDs.
  • --effect_allele_column Argument with the name of the column containing the effect allele (i.e. the one being regressed on).
  • --non_effect_allele_column Argument with the name of the column containing the non effect allele.
  • --beta_column Tells the program the name of a column containing -phenotype beta data for each SNP- in the input GWAS files.
  • --pvalue_column Tells the program the name of a column containing -PValue for each SNP- in the input GWAS files.
  • --output_file Path where results will be saved to.

Its output is a CSV file that looks like:

gene,gene_name,zscore,effect_size,pvalue,var_g,pred_perf_r2,pred_perf_pval,pred_perf_qval,n_snps_used,n_snps_in_cov,n_snps_in_model
ENSG00000150938,CRIM1,4.190697619877402,0.7381499095142079,2.7809807629839122e-05,0.09833448081630237,0.13320775358,1.97496173512e-30,7.47907447189e-30,37,37,37
...

Where each row is a gene's association result:

  • gene: a gene's id: as listed in the Tissue Transcriptome model. Ensemble Id for most gene model releases. Can also be a intron's id for splicing model releases.
  • gene_name: gene name as listed by the Transcriptome Model, typically HUGO for a gene. It can also be an intron's id.
  • zscore: S-PrediXcan's association result for the gene, typically HUGO for a gene.
  • effect_size: S-PrediXcan's association effect size for the gene. Can only be computed when beta from the GWAS is used.
  • pvalue: P-value of the aforementioned statistic.
  • pred_perf_r2: (cross-validated) R2 of tissue model's correlation to gene's measured transcriptome (prediction performance). Not all model families have this (e.g. MASHR).
  • pred_perf_pval: pval of tissue model's correlation to gene's measured transcriptome (prediction performance). Not all model families have this (e.g. MASHR).
  • pred_perf_qval: qval of tissue model's correlation to gene's measured transcriptome (prediction performance). Not all model families have this (e.g. MASHR).
  • n_snps_used: number of snps from GWAS that got used in S-PrediXcan analysis
  • n_snps_in_cov: number of snps in the covariance matrix
  • n_snps_in_model: number of snps in the model
  • var_g: variance of the gene expression, calculated as W' * G * W (where W is the vector of SNP weights in a gene's model, W' is its transpose, and G is the covariance matrix)

If --additional_output is used when running S-PrediXcan, you'll get two additional columns:

  • best_gwas_p: the highest p-value from GWAS snps used in this model
  • largest_weight: the largest (absolute value) weight in this model

S-PrediXcan on windows

Please see the following article in the wiki.

S-MultiXcan

Run SMultiXcan.py --help to see arguments and options. Output is a tab-separated text file with the following columns:

  • gene: a gene's id: as listed in the Tissue Transcriptome model. Ensemble Id for most gene model releases. Can also be a intron's id for splicing model releases.
  • gene_name: gene name as listed by the Transcriptome Model, typically HUGO for a gene. It can also be an intron's id.
  • pvalue: significance p-value of S-MultiXcan association
  • n: number of "tissues" available for this gene
  • n_indep: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)
  • p_i_best: best p-value of single-tissue S-PrediXcan association.
  • t_i_best: name of best single-tissue S-PrediXcan association.
  • p_i_worst: worst p-value of single-tissue S-PrediXcan association.
  • t_i_worst: name of worst single-tissue S-PrediXcan association.
  • eigen_max: In the SVD decomposition of predicted expression correlation: eigenvalue (variance explained) of the top independent component
  • eigen_min: In the SVD decomposition of predicted expression correlation: eigenvalue (variance explained) of the last independent component
  • eigen_min_kept: In the SVD decomposition of predicted expression correlation: eigenvalue (variance explained) of the smalles independent component that was kept.
  • z_min: minimum z-score among single-tissue S-Predican associations.
  • z_max: maximum z-score among single-tissue S-Predican associations.
  • z_mean: mean z-score among single-tissue S-Predican associations.
  • z_sd: standard deviation of the mean z-score among single-tissue S-Predican associations.
  • tmi: trace of T * T', where Tis correlation of predicted expression levels for different tissues multiplied by its SVD pseudo-inverse. It is an estimate for number of indepent components of variation in predicted expresison across tissues (typically close to n_indep)
  • status: If there was any error in the computation, it is stated here

Installation

You also have the option of installing the MetaXcan package to your python distribution. This will make the metax library available for development, and install on your system path the main MetaXcan scripts.

You can install it from the software folder with:

# ordinary install
$ python setup.py install

Alternatively, if you are going to modify the sources, the following may be more convenient:

# developer mode instalation
python setup.py develop

PIP support coming soon-ish.

Support & Community

Issues and questions can be raised at this repository's issue tracker.

There is also a Google Group mail list for general discussion, feature requests, etc. Join if you want to be notified of new releases, feature sets and important news concerning this software.

You can check here for the release history.

Cautionary Warning to Existing Users on Updates and Transcriptome Models

Transcriptome Models are a key component of PrediXcan and S-PrediXcan input. As models are improved, sometimes the format of these databases needs be changed too. We only provide support for the very latest databases; if a user updates their repository clone to the latest version and MetaXcan complains about the transcriptome weight dbs, please check if new databases have been published here.

For the time being, the only way to use old transcriptome models is to use older versions of MetaXcan.

Older versions of MetaXcan have a MetaXcan.py script, when it meant to be an entry point to all MetaXcan tools, but it has since been renamed SPrediXcan.py.

Where to go from here

Check software folder in this repository if you want to learn about more general or advanced usages of S-PrediXcan, or MulTiXcan and SMulTiXcan.

Check out the Wiki for exhaustive usage information.

The code lies at

/software

New release and features coming soon!

metaxcan's People

Contributors

fnyasimi avatar hakyim avatar heroico avatar jiamaozheng avatar jrouhana avatar liangyy avatar meliao avatar sabrina-mi avatar torstees avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaxcan's Issues

ImportError: No module named misc

Hi,

I'm trying to setup MetaXcan on an HPC. The software versions I'm using are:

Python 2.7.12
Pandas 0.18.1
Scipy 1.2.1
Numpy 1.11.1

After installing the software and making a module out of it, attempting to run MetaXcan.py, or any of the bin files, yields:

Traceback (most recent call last):
File "/data/Segre_Lab/modules/software/MetaXcan/bin/MetaXcan.py", line 4, in
import('pkg_resources').run_script('MetaXcan==0.6.9', 'MetaXcan.py')
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 719, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 1511, in run_script
File "/data/Segre_Lab/modules/software/MetaXcan/lib/python2.7/site-packages/MetaXcan-0.6.9-py2.7.egg/EGG-INFO/scripts/MetaXcan.py", line 14, in

File "/gpfs/fs1/data/Segre_Lab/modules/software/MetaXcan/bin/M03_betas.py", line 4, in
import('pkg_resources').run_script('MetaXcan==0.6.9', 'M03_betas.py')
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 719, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 1511, in run_script
File "/data/Segre_Lab/modules/software/MetaXcan/lib/python2.7/site-packages/MetaXcan-0.6.9-py2.7.egg/EGG-INFO/scripts/M03_betas.py", line 23, in

ImportError: No module named misc`

Any ideas?

Smallest possible p-value

Hello,

what does a p-value of 0 in S-PrediXcan mean? Is it just smaller than the smallest possible floating point number in Python (~1e-325)?

Thanks

SMultiXcan error - No objects to concatenate

Dear MetaXcan team!

I am having an issue with SMultiXcan using the elastic net models.
The error I got is "INFO - Unexpected error: No objects to concatenate".

./SMulTiXcan.py --models_folder PrediXcan/elastic_net_models/ --models_name_filter "en_Brain_(.).db" --models_name_pattern "en_Brain_(.).db" --snp_covariance SMulTiXcan/gtex_v8_expression_elastic_net_snp_smultixcan_covariance.txt.gz --metaxcan_folder SMulTiXcan/metaxcan_spredixcan_folder/ --metaxcan_filter "en_Brain_(.)allchr_cpd.csv" --metaxcan_file_name_parse_pattern "en(.).csv" --gwas_folder GWASonCPD/en_gwas/ --gwas_file_pattern *.txt --snp_column SNP --effect_allele_column A1 --non_effect_allele_column A2 --beta_column BETA --pvalue_column P --output SMulTiXcan/en_output_chr/SMultixcan_allchr_cpd.csv
INFO - Creating context
INFO - Creating MetaXcan results manager
INFO - Loading genes
INFO - Context for snp covariance
INFO - Assessing GWAS-Models SNP intersection
INFO - Processing GWAS command line parameters
INFO - Unexpected error: No objects to concatenate

I used spredixcan to generate metaxcan files.
For example,
./SPrediXcan.py --model_db_path PrediXcan/elastic_net_models/en_Brain_Substantia_nigra.db --covariance PrediXcan/elastic_net_models/en_Brain_Substantia_nigra.txt.gz --gwas_folder GWASonCPD/en_gwas/ --gwas_file_pattern ".*txt" --snp_column SNP --effect_allele_column A1 --non_effect_allele_column A2 --beta_column BETA --pvalue_column P --output_file SMulTiXcan/metaxcan_spredixcan_folder/en_Brain_Substantia_nigra_allchr_cpd.csv.

Is it correct? If not, what are metaxcan files, and how to generate them? Because I didn't see the Metaxcan.py script in the "software" directory, which was downloaded from the GitHub MetaXcan.

Any suggestion is appreciated. Thank you very much for your help!

Best regards,
Zhenyao

0 % of model's snps used

Hi, MetaXcan group,

I built my own Transcriptome Model with my own data with PredictDB, and then I tried MetaXcan.py with the produced db file, covariance file and downloaded GLGC summary data. However, it always come up with no results and report ' 0 % of model's snps used'. I have checked the format of db file and covariance file, which is exactly the same with that in provided example. I also checked that there are around 20% SNP in db overlapped with those in GLGC. So I am really confused why ' 0 % of model's snps used'.
One thing is the rsID I used in the model and GLGC data is not formal rsID, but formatted like '10_94140_G_A_b38'. I do not like this will cause a problem since I have point out the SNP column in the command. Would you please help me to figure out where is problem is? Thank you very much!

Here is the command I used:
./MetaXcan.py
--model_db_path /net/mulan/disk2/qidif/S-Predixcan/PredictDBPipeline/data/output/dbs/MESA_MESA_alpha0.5_window1e6_filtered.db
--covariance /net/mulan/disk2/qidif/S-Predixcan/PredictDBPipeline/data/output/allCovariances/MESA_MESA_alpha0.5_window1e6.txt.gz
--gwas_folder /net/mulan/disk2/qidif/S-Predixcan/Input/gwas/
--gwas_file_pattern "jointGwasMc_HDL.modified.2.txt"
--snp_column SNP_hg38
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column beta
--pvalue_column P-value
--output_file results/test.csv

And here are the print out:
INFO - Processing GWAS command line parameters
INFO - Building beta for jointGwasMc_HDL.modified.4.txt and /net/mulan/disk2/qidif/S-Predixcan/PredictDBPipeline/data/output/dbs/MESA_MESA_alpha0.5_window1e6_filtered.db
INFO - Reading input gwas with special handling: /net/mulan/disk2/qidif/S-Predixcan/Input/gwas/jointGwasMc_HDL.modified.4.txt
INFO - Processing input gwas
INFO - Successfully parsed input gwas in 203.336451054 seconds
INFO - Started metaxcan process
INFO - Loading model from: /net/mulan/disk2/qidif/S-Predixcan/PredictDBPipeline/data/output/dbs/MESA_MESA_alpha0.5_window1e6_filtered.db
INFO - Loading covariance data from: /net/mulan/disk2/qidif/S-Predixcan/PredictDBPipeline/data/output/allCovariances/MESA_MESA_alpha0.5_window1e6.txt.gz
/net/mulan/disk2/qidif/MetaXcan/software/metax/MatrixManager.py:31: FutureWarning: read_table is deprecated, use read_csv instead.
d = pandas.read_table(path, sep="\s+")
INFO - Processing loaded gwas
INFO - Started metaxcan association
INFO - 0 % of model's snps used
INFO - Sucessfully processed metaxcan association in 839.607428074 seconds

PrediXcan issue with HCP data

Hello, my name is Amy Miles, and I am a postdoctoral fellow in imaging-genetics at the Centre for Addiction and Mental Health in Toronto, Canada. I have been running PrediXcan.py using (1) imputed genotype data, including > 25 million SNPs stored in dosage format, from the Human Connectome Project and (2) the GTEx version 7 cortex prediction model, downloaded from PredictDB. When I do so, a significant number of genes (~50% of the subset I'm interested in) are imputed as 0 for all subjects. Can you please advise as to what might be going wrong? I'm at a bit of a loss, particularly because I had no such issue when running the same set of commands using non-HCP genotype data that had been subject to the same QC and formatting steps. Thanks in advance for any insight you can provide!

Using the Gtexv8- mashr model

Hi,
Thank you for the updated version.
I have used the GTEX v8 mashr models. I did the GWAS harmonization using the first (or the recommended) method. I didn't impute and ran the MetaXcan.py.
In the output file, three columns related to prediction - pred_* had all NAs.
Is there a value(parameter) I should have used to get these?

The following is my script and parameters
python2 MetaXcan/software/MetaXcan.py
--gwas_file mygwas-harmonize.txt.gz
--snp_column panel_variant_id --effect_allele_column effect_allele --non_effect_allele_column non_effect_allele --zscore_column zscore
--model_db_path data/models/eqtl/mashr/mashr_Whole_Blood.db
--covariance data/models/eqtl/mashr/mashr_Whole_Blood.txt.gz
--keep_non_rsid --additional_output --model_db_snp_key varID
--throw
--output_file mygwas-mashr-GTexV8_Whole_Blood.csv

Thank you for helping.

Predict.py - 0 % of models' snps used

Dear MetaXcan team:
I got a problem here about variant mapping. I ran the Predict.py to predict gene expression with genotype data, which were imputed via Michigan Imputation Server. There was no rsid in the vcf file. I used --variant mapping and --on-the-fly-mapping options, and downloaded gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz from the sample dataset. But " 0 % of models' snps used" showed up, and no errors popped. The script is

python3 $METAXCAN/Predict.py --model_db_path $DATA/mashr_Heart_Atrial_Appendage.db --vcf_genotypes $DATA/ABB_ATBdonor_GSA2017_chr1_22.GRCh38_R2GE80MAF001.sort.vcf.gz \ --vcf_mode imputed --variant_mapping $DATA/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz id rsid --on_the_fly_mapping METADATA "chr{}{}{}_{}_b38" --prediction_output $DATA/AA_predict_GSA2017.txt --prediction_summary_output $DATA/AA_summary_GSA2017.txt --verbosity 1 --throw

INFO - Loading samples
INFO - Loading model
INFO - Acquiring variant mapping
INFO - Acquiring on-the-fly mapping
INFO - Preparing genotype dosages
INFO - Setting whitelist from available models
INFO - Processing genotypes
INFO - Preparing prediction
INFO - Couldn't import h5py_cache. Anyway, this dependency should be removed. It has been folded into h5py
Level 9 - Processing vcfs
Level 9 - Processing vcf /home/sunh/MetaXcan/geno_data/ABB_ATBdonor_GSA2017_chr1_22.GRCh38_R2GE80MAF001.sort.vcf.gz
INFO - 0 % of models' snps used
INFO - Storing prediction
INFO - Saving prediction as a text file
INFO - Saving summary
INFO - Successfully predicted expression in 19.879122994840145 seconds

Thanks a lot!

Han Sun

PrediXcan - clarify predict gene expression

Hi,

first of all, thanks for making these models available - much appreciated!

I would like to clarify a use-case of the published models exclusively for GREx prediction. If I manage to match my dosage files to the GTEx IDs in the provided models *.db/weights table, do I still need to run PrediXcan.py or can I just add up weights times dosage myself?

Best wishes,

Kevin

spredixcan with elastic net models

Hi,
I have gwas summary stats where my SNP is coded as chr:BP:A1:A2. I am trying to run spredixcan using elastic net models but I am unable to match the SNP with rsids from gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz file.

I am running the following code.
python $meta/summary-gwas-imputation/src/gwas_parsing.py \ -gwas_file $indat/sumstats_adc_lses.txt.gz \ -snp_reference_metadata $meta/data/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz METADATA \ -output_column_map SNP rsid \ -output_column_map A2 non_effect_allele \ -output_column_map A1 effect_allele \ -output_column_map BETA effect_size \ -output_column_map P pvalue \ -output_column_map stderr standard_error \ -output_column_map CHR chromosome \ --chromosome_format \ -output_column_map BP position \ -output_column_map MAF frequency \ --insert_value sample_size 9603 \ --insert_value n_cases 8468 \ -output_order rsid panel_variant_id chromosome position effect_allele non_effect_allele frequency pvalue zscore effect_size standard_error sample_size n_cases \ -output $indat/rsnew/sumstats_adc_lses_hg38new.txt.gz

Please let me know what I am doing incorrectly in order to be able to run elastic net models (v8).
Thanks.

Using my own eQTL data

Hi,

I have a question regarding using S-PrediXcan with my own eQTL summary statistics. What would be the first step to go? I see the repository (QTL_to_PredictDB), is that the one I should first go to?

Thanks for your time!

Yingbo

ERROR in hg19-based genotype, on GTEx v8 MASHR Model

Dear MetaXcan team:
I got a problem here about variant mapping. I ran the Predict.py to predict gene expression with hg19-based genotype data. Genotype data is preprocessed by the script [https://github.com/hakyimlab/PrediXcan/blob/master/Software/convert_plink_to_dosage.py]. And some of them didn't have rsIDs.,like 22:17063121_AAAAG_A. But "File "KeyError: 'TA' " showed up. The script is
python Predict.py --model_db_path /mashr/mashr_Lung.db --model_db_snp_key varID --text_genotypes ./chr22.txt.gz --text_sample_ids ./chr22.sam --liftover ./liftover/hg19ToHg38.over.chain.gz --on_the_fly_mapping METADATA "chr{}{}{}_{}_b38" --prediction_output ./chr22_wb.txt --prediction_summary_output ./chr22_wb_summary.txt --verbosity 9 --throw

INFO - Loading samples
INFO - Loading model
INFO - Acquiring on-the-fly mapping
INFO - Preparing genotype dosages
INFO - Acquiring liftover conversion
INFO - Setting whitelist from available models
INFO - Processing genotypes
INFO - Preparing prediction
INFO - Couldn't import h5py_cache. Anyway, this dependency should be removed. It has been folded into h5py
Level 9 - Processing Dosage geno ./chr22.txt.gz
Traceback (most recent call last):
File "/software/MetaXcan/MetaXcan-master/software/Predict.py", line 272, in
run(args)
File "/software/MetaXcan/MetaXcan-master/software/Predict.py", line 176, in run
for i,e in enumerate(dosage_source):
File "/software/MetaXcan/MetaXcan-master/software/metax/genotype/DosageGenotype.py", line 60, in dosage_files_geno_lines
for e in dosage_file_geno_lines(f, variant_mapping=variant_mapping, whitelist=whitelist, skip_palindromic=skip_palindromic, liftover_conversion=liftover_conversion):
File "/software/MetaXcan/MetaXcan-master/software/metax/genotype/DosageGenotype.py", line 41, in dosage_file_geno_lines
_id, id = Genomics.maybe_map_variant(id, chrom, pos, ref_allele, alt_allele, variant_mapping, is_dict_mapping)
File "/software/MetaXcan/MetaXcan-master/software/metax/misc/Genomics.py", line 106, in maybe_map_variant
varid = variant_mapping(chr, pos, ref, alt)
File "/software/Predict.py", line 126, in
mapping = lambda chromosome, position, ref_allele, alt_allele: Genomics.coordinate_format(checklist, args.on_the_fly_mapping[1], chromosome, position, ref_allele, alt_allele)
File "/software/MetaXcan/MetaXcan-master/software/metax/misc/Genomics.py", line 12, in coordinate_format
c = format.format(chromosome, position, nt[ref_allele], nt[alt_allele])
KeyError: 'TA'

Thanks a lot!

Lin

en models work, but mashr models do not

Hi Alvaro,
I downloaded harmonized hg38 summary stats from the GWAS catalog to test SPrediXcan.py. The GTEx v8 Whole Blood elastic net model worked beautifully, but the mashr model did not. Any ideas why? I'm puzzled. See my code below. Thank you!

#sumstats I used
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/WojcikGL_31217584_GCST008046/harmonised/31217584-GCST008046-EFO_0004530.h.tsv.gz
#en model
export METAXCAN=/home/hwheeler1/MetaXcan/software
export DB=/home/hwheeler1/PTRS_MESA/predictdb/elastic_net_models
export SUMSTATS=/home/hwheeler1/test_SPrediXcan
export RESULTS=/home/hwheeler1/test_SPrediXcan/results

~/anaconda3/bin/python3 $METAXCAN/SPrediXcan.py \
--model_db_path $DB/en_Whole_Blood.db \
--covariance $DB/en_Whole_Blood.txt.gz \
--gwas_file $SUMSTATS/31217584-GCST008046-EFO_0004530.h.tsv.gz \
--snp_column hm_rsid \
--effect_allele_column effect_allele \
--non_effect_allele_column other_allele \
--beta_column beta \
--se_column standard_error \
--pvalue_column p_value \
--output_file $RESULTS/test_Triglyceride_GTExv8_en_Whole_Blood.csv
#en output
INFO - Processing GWAS command line parameters
INFO - Building beta for /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz and /home/hwheeler1/PTRS_MESA/predictdb/elastic_net_models/en_Whole_Blood.db
INFO - Reading input gwas with special handling: /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz
INFO - Processing input gwas
INFO - Aligning GWAS to models
INFO - Trimming output
INFO - Successfully parsed input gwas in 140.28683778643608 seconds
INFO - Started metaxcan process
INFO - Loading model from: /home/hwheeler1/PTRS_MESA/predictdb/elastic_net_models/en_Whole_Blood.db
INFO - Loading covariance data from: /home/hwheeler1/PTRS_MESA/predictdb/elastic_net_models/en_Whole_Blood.txt.gz
INFO - Processing loaded gwas
INFO - Started metaxcan association
INFO - 10 % of model's snps found so far in the gwas study
INFO - 20 % of model's snps found so far in the gwas study
INFO - 30 % of model's snps found so far in the gwas study
INFO - 40 % of model's snps found so far in the gwas study
INFO - 50 % of model's snps found so far in the gwas study
INFO - 60 % of model's snps found so far in the gwas study
INFO - 70 % of model's snps found so far in the gwas study
INFO - 80 % of model's snps found so far in the gwas study
INFO - 90 % of model's snps found so far in the gwas study
INFO - 99 % of model's snps used
INFO - Sucessfully processed metaxcan association in 123.70860436931252 seconds
#mashr model
export METAXCAN=/home/hwheeler1/MetaXcan/software
export DB=/home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr
export SUMSTATS=/home/hwheeler1/test_SPrediXcan
export RESULTS=/home/hwheeler1/test_SPrediXcan/results

~/anaconda3/bin/python3 $METAXCAN/SPrediXcan.py \
--model_db_path $DB/mashr_Whole_Blood.db \
--covariance $DB/mashr_Whole_Blood.txt.gz \
--gwas_file $SUMSTATS/31217584-GCST008046-EFO_0004530.h.tsv.gz \
--snp_column hm_rsid \
--effect_allele_column effect_allele \
--non_effect_allele_column other_allele \
--beta_column beta \
--se_column standard_error \
--pvalue_column p_value \
--output_file $RESULTS/test_Triglyceride_GTExv8_mashr_Whole_Blood.csv
#mashr output
INFO - Processing GWAS command line parameters
INFO - Building beta for /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz and /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.db
INFO - Reading input gwas with special handling: /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz
INFO - Processing input gwas
INFO - Aligning GWAS to models
INFO - Trimming output
INFO - Successfully parsed input gwas in 134.36815129593015 seconds
INFO - Started metaxcan process
INFO - Loading model from: /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.db
INFO - Loading covariance data from: /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.txt.gz
INFO - Processing loaded gwas
INFO - Started metaxcan association
INFO - 0 % of model's snps used
INFO - Sucessfully processed metaxcan association in 4.979896888136864 seconds
#when I add --verbosity 1
INFO - Processing GWAS command line parameters
INFO - Building beta for /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz and /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.db
INFO - Reading input gwas with special handling: /home/hwheeler1/test_SPrediXcan/31217584-GCST008046-EFO_0004530.h.tsv.gz
INFO - Processing input gwas
Level 9 - Calculating zscore from pvalue
Level 9 - Acquiring sign from beta
INFO - Aligning GWAS to models
INFO - Trimming output
INFO - Successfully parsed input gwas in 133.81744511052966 seconds
INFO - Started metaxcan process
INFO - Loading model from: /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.db
INFO - Loading covariance data from: /home/hwheeler1/PTRS_MESA/predictdb/eqtl/mashr/mashr_Whole_Blood.txt.gz
INFO - Processing loaded gwas
INFO - Started metaxcan association
Level 9 - Processing gene 0:ENSG00000000457.13
INFO - Unexpected error: Last 2 dimensions of the array must be square

MultiXcan problem parsing --expression_pattern

Hi there, I'm trying to run MultiXcan with previous PredXcan expression results.

I'm using the following base commandline parameters with various --expression_pattern patterns:

python MulTiXcan.py --expression_folder input/ --input_phenos_file phenotype/pheno.fam --input_phenos_column pheno --output output/pmat_ --verbosity 5 --throw 

My directory layout looks like this:

├── input
│   ├── results_blood_pmat_predicted_expression.txt
│   └── results_cortex_pmat_predicted_expression.txt
├── output
│   └── pmat_
└── phenotype
    ├── pheno.fam
    └── pheno_full.fam

I tried several --expression_pattern results.

Not defining it and letting it default to None will result in an "almost" successful run - processing all genes, but ultimately failing as the result file output/pheno_ will look like this:

gene    pvalue  n_models    n_samples   p_i_best    m_i_best    p_i_worst   m_i_worst   status                                       
ENSG00000116032.5   NA  2   NA  NA  NA  NA  NA  name_'results_blood_pmat_predicted_expression'_is_not_defined                        
ENSG00000174132.8   NA  1   NA  NA  NA  NA  NA  name_'results_blood_pmat_predicted_expression'_is_not_defined 

When using --expression_pattern ".*txt" I get the error:

File "metax/expression/PlainTextExpression.py", line 81, in _structure
    name = _regex.match(file).group(1)
IndexError: no such group

When replacing the line in question to match the correct index

  name = _regex.match(file).group(0)

I can reproduce the same problem as with leaving it to default to None.

error using train_models.py

I would like to use MetaXcan with eQTL data from Blueprint. To understand the process of creating input files for PrediXcan, I am following your tutorial with the gEUVADIS data, but I'm afraid I didn't go far. When running "python train_models.py" that submits 22 jobs to HPC, I get the errors below. Can you please advice? Many thanks!

qsub -v study=gEUVADIS,expr_RDS=../../data/intermediate/expression_phenotypes/geuvadis.expr.RDS,geno=../../data/intermediate/genotypes/geuvadis.snps.chr1.txt,gene_annot=../../data/intermediate/annotations/gene_annotation/gencode.v12.genes.parsed.RDS,snp_annot=../../data/intermediate/annotations/snp_annotation/geuvadis.annot.chr1.RDS,n_k_folds=10,alpha=0.5,out_dir=../../data/intermediate/model_by_chr/,chrom=1,snpset=HapMap,window=1e6 -N gEUVADIS_model_chr1 -d /GWD/bioinfo/projects/RD-TSci-Software/CB/customrecipes/PredictDBPipeline/joblogs/example train_model_by_chr.pbs
qsub: ERROR! Wrong time format "/GWD/bioinfo/projects/RD-TSci-Software/CB/customrecipes/PredictDBPipeline/joblogs/example" specified to -d option

running MetaXcan with Blueprint eQTL summary statistics

Dear MetaXcan authors,

I would like to run MetaXcan with Blueprint. I have downloaded Blueprint genome-wide eQTL summary statistics, but I don't have the raw data. I thought I could still use MetaXcan, but now having gone through the example for creating the input files and databases (necessarily through PrediXcan), I don't think I can anymore. Can you please confirm if one can run MetaXcan having only genome-wide eQTL summary statistics? Many thanks.

Best,
Ioanna

SPrediXcan script execution __version__

Hi,

I just had two comments on this whole repository.
First thank you for the detailed tutorial, but there is just a small mistake I think which is in the Setup and Usage Example on a UNIX-like operating system section. You use a script called MetaXcan.py as an example that is not present in the software folder. I don't know if it was just for the example or a typo.

Then I tried to use the SPrediXcan script on my data but the following error occured :

Traceback (most recent call last) :
File "./SPrediXcan.py", line 6 , in
version = metax.version
AttributeError : module 'metax' has no attribute 'version'

I've checked the init file in the metax folder and there is a version in the code ("0.7.5"), so I don't know where the error can come from... If you have any idea.

Thank you,
David

Fail when using installed software

Hi,

though everything seems to be working fine when running the various software from MetaXcan/software, it's not the case when attempting to run the installed version (after using the setup script). Only SPrediXcan works after installation, here are the error messages for the others:

PrediXcan.py --help
Traceback (most recent call last):
  File "/data2/home/hhx037/python3/bin/PrediXcan.py", line 4, in <module>
    __import__('pkg_resources').run_script('MetaXcan==0.7.5', 'PrediXcan.py')
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1471, in run_script
    exec(script_code, namespace, namespace)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/MetaXcan-0.7.5-py3.6.egg/EGG-INFO/scripts/PrediXcan.py", line 8, in <module>
ModuleNotFoundError: No module named 'Predict'
MulTiXcan.py --help
Traceback (most recent call last):
  File "/data2/home/hhx037/python3/bin/MulTiXcan.py", line 4, in <module>
    __import__('pkg_resources').run_script('MetaXcan==0.7.5', 'MulTiXcan.py')
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1471, in run_script
    exec(script_code, namespace, namespace)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/MetaXcan-0.7.5-py3.6.egg/EGG-INFO/scripts/MulTiXcan.py", line 13, in <module>
ModuleNotFoundError: No module named 'metax.predixcan'
SMulTiXcan.py --help
Traceback (most recent call last):
  File "/data2/home/hhx037/python3/bin/SMulTiXcan.py", line 4, in <module>
    __import__('pkg_resources').run_script('MetaXcan==0.7.5', 'SMulTiXcan.py')
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 667, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1471, in run_script
    exec(script_code, namespace, namespace)
  File "/data2/home/hhx037/python3/lib/python3.6/site-packages/MetaXcan-0.7.5-py3.6.egg/EGG-INFO/scripts/SMulTiXcan.py", line 11, in <module>
ModuleNotFoundError: No module named 'metax.cross_model'

Any way I can fix this or you need to alter the setup script?

Cheers,

Steph

SNP column name cannot found when used SPrediXcan

Hi Alvaro,
I have an error when I followed the tutorial of SPrediXcan. It seems that the GWAS file I prepared is the same as the example.
I wonder why the SNP column name can not find. Thank you very much .

Best wishes,
Crane King

PS:
The head of the GWAS file:
SNP CHR BP A1 A2 Z N FRQ beta se z pval
rs11250701 10 1689546 A G 0.514064827551 10480.9119402 0.3588 0.00986809128193 0.0115398329315 0.855132941739 0.392477554721
rs3750679 10 1228623 T C 0.286823567907 11272.3267685 0.3181 0.003938126234 0.0115548679411 0.340819666142 0.733239344476
rs1079389 10 1161980 A G -1.33442191526 12795.4215107 0.0616 -0.0158727319187 0.0213125995792 -0.74475813519 0.456417947609
rs904962 10 1372501 T C 0.388541499779 9608.96696345 0.0726 0.00469197622864 0.0220624100173 0.21266834516 0.831585658841

The error:
./SPrediXcan.py \

--model_db_path /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/PsychENCODE_brain_prefrontal_cortex.db
--covariance /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/covariance.PsychENCODE.txt.gz
--gwas_folder /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data
--gwas_file_pattern ".*gz"
--snp_column SNP
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column beta
--pvalue_column pval
--output_file /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/result/test.csv
INFO - Processing GWAS command line parameters
INFO - Building beta for BIP_mtag.txt.gz and /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/PsychENCODE_brain_prefrontal_cortex.db
INFO - Reading input gwas with special handling: /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/BIP_mtag.txt.gz
INFO - Processing input gwas
INFO - Aligning GWAS to models
INFO - Trimming output
INFO - Building beta for PsychENCODE.txt.gz and /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/PsychENCODE_brain_prefrontal_cortex.db
INFO - Reading input gwas with special handling: /media/EXTend2018/Wanghe2019/traits/TwosampleMR/JTI/MetaXcan/data/PsychENCODE.txt.gz
ERROR - Did not find snp colum name

getLogger() message

Dear MetaXcan developers,

I tried to run the documentation example,,

./MetaXcan.py
--model_db_path data/DGN-WB_0.5.db
--covariance data/covariance.DGN-WB_0.5.txt.gz
--gwas_folder data/GWAS
--gwas_file_pattern ".*gz"
--snp_column SNP
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column BETA
--pvalue_column P
--output_file results/test.csv

but got error message below; is there a way to fix?

Traceback (most recent call last):
File "./MetaXcan.py", line 55, in
Logging.configureLogging(int(args.verbosity))
File "/home/jhz22/D/genetics/hakyimlab/ftp/MetaXcan/software/metax/Logging.py", line 7, in configureLogging
logger = logging.getLogger()
AttributeError: 'module' object has no attribute 'getLogger'

I have

uname -a
Linux jhz22-VirtualBox 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Jing Hua Zhao

new GTEx release

Dear MetaXcan authors,

I understand there has been recently a new GTEx release. Are you planning to update the GTEx data that come with your software? If not, would it be easy for a user to transform the GTEx release into the format needed for MetaXcan? Apologies, if that's a naive question, I have not investigated if the format is straightforward.

Best,
Ioanna

Error message: "INFO - Unexpected error: nothing to repeat"

Hi, I'm trying to use MetaXcan on some results from a meta-analysis GWAS study. The study results are in a whitespace-delimited file (meta_08092017_combined.TBL), like this:
MarkerName Allele1 Allele2 Freq1 FreqSE MinFreq MaxFreq Effect StdErr P.value Direction
rs17216707 t c 0.7971 0.0041 0.7916 0.8037 0.0451 0.0051 1.182e-18 ?-++?+?+
rs6127099 a t 0.7189 0.0067 0.7016 0.7231 0.0449 0.0052 4.106e-18 ?-?+?+?+
rs4074995 a g 0.2625 0.0553 0.05 0.2848 -0.0344 0.0042 2.631e-16 ----+---
rs209955 t c 0.2627 0.061 0.05 0.2978 -0.0367 0.0045 7.252e-16 -+?-----
rs17217119 a g 0.8034 0.0445 0.7844 0.95 0.0425 0.0054 2.374e-15 ++?+++?+
rs11746443 a g 0.2507 0.0582 0.05 0.2768 -0.0361 0.0046 5.683e-15 --?-----
rs687289 a g 0.3192 0.0811 0.05 0.3624 0.0338 0.0044 8.91e-15 ++?+-+++
rs11741640 a g 0.269 0.0025 0.2675 0.276 -0.0385 0.005 1.571e-14 ?-?-?-?-
rs209957 a g 0.7326 0.0585 0.7016 0.95 0.0324 0.0042 1.968e-14 +-++++++

I've tried the MetaXcan command, as follows:
$ ./MetaXcan.py \

--model_db_path data/DGN-WB_0.5.db
--covariance data/covariance.DGN-WB_0.5.txt.gz
--gwas_folder data/GWAS
--gwas_file_pattern "*.TBL"
--snp_column MarkerName
--effect_allele_column Allele1
--non_effect_allele_column Allele2
--beta_column Effect
--pvalue_column P.value
--output_file results/fgf23.csv

and get the following error:
INFO - Unexpected error: nothing to repeat

I've also tried the M03_betas command, as follows, with the same error.
$ ./M03_betas.py \

--model_db_path data/DGN-WB_0.5.db
--output_folder temp/beta_f
--gwas_folder data/GWAS
--gwas_file_pattern "*.TBL"
--snp_column MarkerName
--non_effect_allele_column Allele2
--effect_allele_column Allele1
--beta_column Effect
--pvalue_column P.value
INFO - Unexpected error: nothing to repeat

Everything works just fine when I'm running the example.

Any thoughts as to what might be going on?

Thanks!!

Cassy**

Spredixcan.py output effect_size column NA

Thank you for the nice software and update to GTEX v8!

I'm running python 3 based SprediXcan.py.

I would like to ask, is there a command to use to get other than NA to the effect_size column in the output?
I did full harmonization for my gwas (no imputation) and tested options
--beta_column
--zscore_column
--pvalue_column
options, but they all gave same results with values in the zscore-column, while effect_size column was NA in the output.

PrediXcan MashR model 0% of model's snps used

Hi MetaXcan group,
i am trying to run PrediXcan using imputed genetic data formatted like 1:13380:C:G.However, i kept on getting an empty file with "0% of models' sips used. I also tried to add in the line --keep_non_rsid based on a similar post on MetaXcan, but it seems to be specific to the MetaXcan program. I am wondering if i should convert the formatting to 1_13380_C_G? Is there a command that allow me to directly force the colon into a dash in PrediXcan? I would really appreciate to get some of your input, it will be very helpful. Thank you very much!
Here is the command I used:
python3 ./Predict.py --model_db_path $DATA/mashr_Brain_Frontal_Cortex_BA9.db
--model_db_snp_key varID
--vcf_genotypes $CARTA/AxiomVCF.vcf
--vcf_mode imputed
--liftover $DATA/hg19ToHg38.over.chain.gz
--on_the_fly_mapping METADATA "chr{}{}{}_{}_b38"
--prediction_output $RESULT/Frontalcortex__predict.txt
--prediction_summary_output $RESULT/frontalcortex__summary.txt
--verbosity 9
--throw

and here is the output:
INFO - Loading samples
INFO - Loading model
INFO - Acquiring on-the-fly mapping
INFO - Preparing genotype dosages
INFO - Acquiring liftover conversion
INFO - Setting whitelist from available models
INFO - Processing genotypes
INFO - Preparing prediction
INFO - Couldn't import h5py_cache. Anyway, this dependency should be removed. It has been folded into h5py
Level 9 - Processing vcfs
Level 9 - Processing vcf /mnt/sample/data/untared/AxiomVCF.vcf
INFO - 0 % of models' snps used
INFO - Storing prediction
INFO - Saving prediction as a text file
INFO - Saving summary
INFO - Successfully predicted expression in 2.8029250 seconds

Thank you very much!

mismatch between gene2pheno and WB-DGN models

Hi,
I used the 'gene2pheno' database for A2LD1 gene with WB-DGN-0.5 models. It says that 7 SNPs are in the PredictDB model, and 5 were used. However, when I downloaded the predictDB files, I found 54 SNPs for the same gene. On top of that, the R^2 values do not match between the 2 databases.

Could you help me to figure out what was your strategy to filter the SNPs ?
Thanks,
Kevin Vervier, PhD

MulTiXcan Error: Will not import single tissue expression text files

Hi, I'm having trouble running MulTiXcan with my completed single tissue expression text files (example text file with the first two columns for FID and IID removed: example_tissue_exprs_output.txt). I keep receiving this error output (
error.txt) after running MulTiXcan using these settings:

/data100t1/gapps/MetaXcan/software/MulTiXcan.py \
--expression_folder ./predicted_gene_exprs/ \
--verbosity 1 \
--throw \
--mode logistic \
--input_phenos_file MAXUNRELATED_PHENO.txt \
--input_phenos_column PHENO \
--covariates_file MAXUNRELATED_PHENO.txt \
--covariates PC1 PC2 PC3 PC4 PC5 PC6 \
--coefficient_output multi-tissue_coeff \
--output

Any help is appreciated, thank you!

tutorial data

When I try to run tutorial data (please see command downunder) I get the following message (please see bottom)

./SPrediXcan.py
--model_db_path data/DGN-WB_0.5.db
--covariance data/covariance.DGN-WB_0.5.txt.gz
--gwas_folder data/GWAS
--gwas_file_pattern ".*gz"
--snp_column SNP
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column BETA
--pvalue_column P
--output_file results/test.csv

my output

"WARNING - Issues processing gene THAP11, skipped
WARNING - Issues processing gene TMEM203, skipped
WARNING - Issues processing gene TMEM30B, skipped
WARNING - Issues processing gene ZBED6, skipped
WARNING - Issues processing gene ZFP112, skipped
WARNING - Issues processing gene ZNF643, skipped
WARNING - Issues processing gene ZNF747, skipped
INFO - Unexpected error: type object 'object' has no attribute 'dtype'"
ipped.docx

A question about MetaXcan

Hi ,Thank you for your tools. i think it is very useful.
Can it only make predictions for a few organizations (such as, 13 Brains tissues) together rather than all tissues(49 tissues).
Beacuse it throws an error "AttributeError: 'NoneType' object has no attribute 'group", when i just run 13 Brain tissues.

so I run it using spredixcan in 13 brain tissues models ,then , i run smetaxcan with 13 spredixcan results in the spredxcan/eqtl file(.csv). But it throw a error "No intersection between model names in MetaXcan Results and Prediction Models. Please verify your input".
Thank you. Looking forward to your reply.

Unexpected error: CRC check failed

Hi,

I am testing S-PrediXcan for the first time, using the example dataset and the code:

wget https://s3.amazonaws.com/imlab-open/Data/MetaXcan/sample_data/v0_5/sample_data.tar.gz  

tar -xzvpf sample_data.tar.gz  

./MetaXcan.py \  
--model_db_path data/DGN-WB_0.5.db \  
--covariance data/covariance.DGN-WB_0.5.txt.gz \  
--gwas_folder data/GWAS \  
--gwas_file_pattern ".*gz" \  
--snp_column SNP \  
--effect_allele_column A1 \  
--non_effect_allele_column A2 \  
--beta_column BETA \  
--pvalue_column P \  
--output_file results/test.csv  

After processing the last chromosome, I get this "CRC check failed" error:

.....
INFO - Building beta for chr9.assoc.dosage.gz and data/DGN-WB_0.5.db
INFO - Reading input gwas with special handling: data/GWAS/chr9.assoc.dosage.gz
INFO - Processing input gwas
INFO - Aligning GWAS to models
INFO - Trimming output
INFO - Successfully parsed input gwas in 42.6134901047 seconds
INFO - Started metaxcan process
INFO - Loading model from: data/DGN-WB_0.5.db
INFO - Loading covariance data from: data/covariance.DGN-WB_0.5.txt.gz
INFO - Unexpected error: CRC check failed 0x4ed8f5d6 != 0xc37e2172L

What might be causing this? Thanks in advance!

PS: I am working on a conda python 2.7 environment with below packages & versions:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
blas 2.11 openblas conda-forge
ca-certificates 2019.11.27 0 anaconda
certifi 2019.11.28 py27_0 anaconda
intel-openmp 2019.5 281 anaconda
libblas 3.8.0 11_openblas conda-forge
libcblas 3.8.0 11_openblas conda-forge
libffi 3.2.1 he1b5a44_1006 conda-forge
libgcc-ng 9.2.0 h24d8f2e_1 conda-forge
libgfortran 3.0.0 1 conda-forge
libgfortran-ng 7.3.0 hdf63c60_0 anaconda
libgomp 9.2.0 h24d8f2e_1 conda-forge
liblapack 3.8.0 11_openblas conda-forge
liblapacke 3.8.0 11_openblas conda-forge
libopenblas 0.3.6 h5a2b251_2 anaconda
libstdcxx-ng 9.2.0 hdf63c60_1 conda-forge
mkl 2019.5 281 anaconda
mkl-service 2.3.0 py27he904b0f_0 anaconda
mkl_fft 1.0.10 py27_0 conda-forge
mkl_random 1.0.4 py27hf2d7682_0 conda-forge
ncurses 6.1 hf484d3e_1002 conda-forge
numpy 1.14.2 py27hdbf6ddf_1 anaconda
numpy-base 1.11.3 py27h2f8d375_12 anaconda
openblas 0.2.19 0 anaconda
openmp_impl 4.5 0_gnu conda-forge
openssl 1.1.1 h7b6447c_0 anaconda
pandas 0.24.2 py27he6710b0_0 anaconda
pip 19.3.1 py27_0 conda-forge
python 2.7.15 h5a48372_1009 conda-forge
python-dateutil 2.8.1 py_0 anaconda
pytz 2019.3 py_0 anaconda
readline 8.0 hf8c457e_0 conda-forge
scipy 1.2.1 py27he2b7bc3_0 anaconda
setuptools 44.0.0 py27_0 conda-forge
six 1.13.0 py27_0 anaconda
sqlalchemy 1.3.12 py27h7b6447c_0 anaconda
sqlite 3.30.1 hcee41ef_0 conda-forge
tk 8.6.10 hed695b0_0 conda-forge
wheel 0.33.6 py27_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge

SMultiXcan not recognizing the file pattern, unclear debugging

I am trying to run a combined tissue test using SMultiXcan.py

I have run MetaXcan for all of the tissues. Now I am feeding those results into SMultiXcan.py which uses them and the original prediction files to try to combine. The error is coming when it is trying to figure out the naming structure.

I get this error:
File "/projects/bsi/psych/s105109.goa/BP_Biobank/PrediXcan/MetaXcan2/MetaXcan/software/metax/metaxcan/MetaXcanResultsManager.py", line 45, in _parse_name
pheno, model = g[0], g[1]
IndexError: tuple index out of range

I’m just not sure about a fix. Below is the code I am using. I have bolded the two lines which I think are causing the problem. I need the program to connect the files (MetaXcan results and PrediXcan db models) with that pattern where (.*) is.

PROJDIR=/projects/bsi/psych/s105109.goa/BP_Biobank/PrediXcan
DBDIR=/data5/bsi/psych/s105109.goa/BP_Biobank/larry/PrediXcan/PredictDB
OUTDIR=/data5/bsi/psych/s105109.goa/BP_Biobank/batzler/PGC_Analysis/metaAnalysis_imputed/BMI.pcAdj.assoc.linear
OUTDIR2=/data5/bsi/psych/s105109.goa/BP_Biobank/WNT_Pathway/PGC

cd $PROJDIR/MetaXcan2/MetaXcan/software/

./SMulTiXcan.py
--models_folder $DBDIR/GTEx-V6p-1KG-2016-11-16
--models_name_filter "TW_Brain(.)_0.5_1KG.db"
**--models_name_pattern "TW_Brain(.
)_0.5_1KG.db" **
--snp_covariance $PROJDIR/MetaXcan/snps_covariance.txt.gz
--metaxcan_folder $OUTDIR2/metaxcan_results
--metaxcan_filter "TW_Brain(.)_0.5_1KG.pgcbmi.caseonly.csv"
**--metaxcan_file_name_parse_patter "TW_Brain(.
)_0.5_1KG.pgcbmi.caseonly.csv" **
--gwas_file $OUTDIR2/ALL_BMI.pcAdj.assoc.linear_P_MAresultsFE.txt.gz
--snp_column SNP
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column b
--pvalue_column pval
--cutoff_condition_number 30
--verbosity 7
--throw
--output $OUTDIR2/brain_pgcbmi.caseonly.csv

Providing sample ids with bgen files

Hi,
I am running Predict.py with some bgen files. I actually have two issues. The first is that I am unable to pass sample IDs. I tried supplying the sample file with the --text_sample_ids command but I still see the message "Sample IDs are not present in this file. I will generate them on my own". Is there another way I should supply them?

The second issue is that the run time is very long. So far many hours. I am wondering what the normal running time for predicting from the GTeX v8 models should be.

Thanks!
Jean

ERROR - Could not read input tissue database. Please try updating the tissue model files.

Hello,

I am trying to test if SPrediXcan.py can run on my cluster.
I am using: python-3.7.3
and following this tutorial: https://github.com/hakyimlab/MetaXcan/wiki/S-PrediXcan-Command-Line-Tutorial

The sample data was downloaded and I was running this:

/MetaXcan/software $ ./SPrediXcan.py \

--model_db_path data/DGN-WB_0.5.db
--covariance data/covariance.DGN-WB_0.5.txt.gz
--gwas_folder data/GWAS
--gwas_file_pattern ".*gz"
--snp_column SNP
--effect_allele_column A1
--non_effect_allele_column A2
--beta_column BETA
--pvalue_column P
--output_file results/test.csv
INFO - Processing GWAS command line parameters
ERROR - Could not read input tissue database. Please try updating the tissue model files.

Please advise,
Ana

How to construct a PrediXcan MASHR model using my own data?

Hi,
I know that you provide GTEx v8 eQTL and sQTL PrediXcan MASHR models, but what if I want to know how to predict my own data model?
I don’t know if you can provide me with the code to predict the GETx eQTL model?

Best regards,
Yueming

Error on GWAS summary stats imputation

Hi

I'm trying to impute summary-stats data. I could run properly the harmonization pipeline, but it give me an error when I run the imputation script on chr1. I'm using Python/3.8.2-GCCcore-9.3.0 and pyarrow-3.0.0.

$ python3 summary-gwas-imputation/src/gwas_summary_imputation.py \

-by_region_file data/data/eur_ld.bed.gz
-gwas_file outpu/harmonized_gwas/Summary_results_iPSYCH_PGC_10_euro_excluding_span_filtered_harmo.txt.gz
-parquet_genotype data/data/reference_panel_1000G/chr1.variants.parquet
-parquet_genotype_metadata data/data/reference_panel_1000G/variant_metadata.parquet
-window 100000
-parsimony 7
-chromosome 22
-regularization 0.1
-frequency_filter 0.01
-sub_batches 10
-sub_batch 0
--standardise_dosages
-output results_summary_imputation/ADHD_chr1_sb0_reg0.1_ff0.01_by_region.txt.gz
INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
Traceback (most recent call last):
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 97, in
run(args)
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 60, in run
results = run_by_region(args)
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 40, in run_by_region
context = SummaryImputationUtilities.context_by_region_from_args(args)
File "/home/juditc/ADHD/GWAS_TDAH/TDAH/GWAS_TDAH_b38/MetaXcan/MetaXcan_nou/software/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 229, in context_by_region_from_args
study = load_study(args)
File "/home/juditc/ADHD/GWAS_TDAH/TDAH/GWAS_TDAH_b38/MetaXcan/MetaXcan_nou/software/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study
study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome)
File "/home/juditc/ADHD/GWAS_TDAH/TDAH/GWAS_TDAH_b38/MetaXcan/MetaXcan_nou/software/summary-gwas-imputation/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet
_v = pq.ParquetFile(variants)
File "/home/juditc/.local/lib/python3.8/site-packages/pyarrow/parquet.py", line 217, in init
self.reader.open(source, use_memory_map=memory_map,
File "pyarrow/_parquet.pyx", line 949, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Couldn't deserialize thrift: TProtocolException: Exceeded size limit

As it seems that the problem is due to the size of the file, I've also tried to summary impute chr22, but it give me another error:

$ python3 summary-gwas-imputation/src/gwas_summary_imputation.py \

-by_region_file data/data/eur_ld.bed.gz
-gwas_file outpu/harmonized_gwas/Summary_results_iPSYCH_PGC_10_euro_excluding_span_filtered_harmo.txt.gz
-parquet_genotype data/data/reference_panel_1000G/chr22.variants.parquet
-parquet_genotype_metadata data/data/reference_panel_1000G/variant_metadata.parquet
-window 100000
-parsimony 7
-chromosome 22
-regularization 0.1
-frequency_filter 0.01
-sub_batches 10
-sub_batch 0
--standardise_dosages
-output results_summary_imputation/ADHD_chr22_sb0_reg0.1_ff0.01_by_region.txt.gz
INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
INFO - Loading variants metadata
Level 9 - Loading row group 21
INFO - Loading regions
INFO - Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO - NumExpr defaulting to 8 threads.
Level 9 - Selecting target regions with specific chromosome
Level 9 - Selecting target regions from sub-batches
Level 9 - generating GWAS whitelist
INFO - Loading gwas
INFO - Acquiring filter tree for 35799 targets
INFO - Processing gwas source
Level 9 - Loaded 6667 GWAS variants
Level 9 - Parsing GWAS
Level 9 - Processing region 1/3 [15927607.0, 17193405.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,15927607.0,17193405.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 2/3 [17193405.0, 17813322.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,17193405.0,17813322.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 3/3 [17813322.0, 19924835.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (22,17813322.0,19924835.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
INFO - Finished in 26.57472068723291 seconds

M03_betas.py doesn't work for odds ratio values

Hi Haky and Scott,

The M03_betas.py script is working fine for zscore based summary file, but when using a OR based summary file it throws error saying Unexpected error ! couldn't figure what's the problem. Can you verify if the script is working fine with OR based summary file ?

Thanks

-Veera

Imputation problem

Hi,

I am trying to follow the steps outlined in the tutorial (https://github.com/hakyimlab/MetaXcan/wiki/Tutorial:-GTEx-v8-MASH-models-integration-with-a-Coronary-Artery-Disease-GWAS).

I am having a problem at the 'Imputation' stage when I run the example script. I get the following stdout and stderr messages:

INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
INFO - Loading variants metadata
Level 9 - Loading row group 0
INFO - Loading regions
Level 9 - Selecting target regions with specific chromosome
Level 9 - Selecting target regions from sub-batches
Level 9 - generating GWAS whitelist
INFO - Loading gwas
INFO - Acquiring filter tree for 227041 targets
INFO - Processing gwas source
Level 9 - Loaded 59616 GWAS variants
Level 9 - Parsing GWAS
Level 9 - Processing region 1/14 [10583.0, 1961168.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,10583.0,1961168.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 2/14 [1961168.0, 3666172.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,1961168.0,3666172.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 3/14 [3666172.0, 4320751.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,3666172.0,4320751.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 4/14 [4320751.0, 5853833.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,4320751.0,5853833.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 5/14 [5853833.0, 7187275.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,5853833.0,7187275.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 6/14 [7187275.0, 9305140.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,7187275.0,9305140.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 7/14 [9305140.0, 10746927.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,9305140.0,10746927.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 8/14 [10746927.0, 11717784.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,10746927.0,11717784.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 9/14 [11717784.0, 12719464.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,11717784.0,12719464.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 10/14 [12719464.0, 14565015.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,12719464.0,14565015.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 11/14 [14565015.0, 16571235.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,14565015.0,16571235.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 12/14 [16571235.0, 18336405.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,16571235.0,18336405.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 13/14 [18336405.0, 20142656.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,18336405.0,20142656.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
Level 9 - Processing region 14/14 [20142656.0, 21410095.0]
Level 8 - Roll out imputation
Level 8 - Preparing data
INFO - Error for region (1,20142656.0,21410095.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'")
INFO - Finished in 75.25838277116418 seconds

On further inspection, it appears that this error is originating from this function call ("variants = _get_variants(context, ids)") from line 109 in summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/SmmaryInputation.py script

Any advice would be much appreciated.

Best wishes,

Nay

Attribute Error; run(args)

I am trying to run Predict.py script for applying MASHR model.

  • My VCF file does not have rsid.

I used your example 5. It is hg38-based genotype

printf "Predict expression\n\n"

python3 $dir/Predict.py
--model_db_path $dir/mashr_Whole_Blood.db
--model_db_snp_key varID
--vcf_genotypes $dir/predixcan.blood.vcf.gz
--vcf_mode genotyped
--prediction_output $dir/Whole_Blood__predict.txt
--prediction_summary_output $dir/Whole_Blood__summary.txt
--verbosity 9
--throw
--variant_mapping $dir/gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz id rsid
--on_the_fly_mapping METADATA "{}{}{}_{}_b38"

I keep getting the following error message

INFO - Loading samples
INFO - Loading model
INFO - Acquiring variant mapping
INFO - Acquiring on-the-fly mapping
INFO - Preparing genotype dosages
INFO - Setting whitelist from available models
INFO - Processing genotypes
INFO - Preparing prediction
INFO - Couldn't import h5py_cache. Anyway, this dependency should be removed. It has been folded into h5py
Level 9 - Processing vcfs
Level 9 - Processing vcf predixcan.blood.vcf.gz
INFO - 0 % of models' snps used
INFO - Storing prediction
Traceback (most recent call last):
File "Predict.py", line 270, in
run(args)
File "Predict.py", line 218, in run
results.store_prediction()
AttributeError: 'NoneType' object has no attribute 'store_prediction'

Unexpected error in test case

Hello,

I have been following along with your online example and have been running the appropriate modules (python 2.7.10 and all correct versions of the module, the compiler is gcc). When I attempt to run the example script (starts with ./MetaXcan.p ) I get the warning "INFO - unexpected error: 'module' object no attribute 'to_numeric'.

Is there an easy fix to this issue?

Thanks

covariance file LD panel

Hello!
Can you please clarify if the covariance file provided on Zenodo was made with all five populations of 1000G and not just EUR. I read it in the first paper of MetaXcan, but I still wanted to confirm. If not can you provide reference for EAS population. My goal is to predict gene expression for EAS summary stats.
Thank you for developing this approach, its great work.

errors at imputation step

Hello,

I am trying to do the imputation step (below), using the sample data provided, and I am getting the following error messages:

%The following imputes a portion of the imput GWAS.
%The imputation is meant to be split over many executions to improve paralellism
%so you would have to iterate -chromosome between [1,22] and -sub_batch between [0,_sub_batches],
%ideally in an HPVC environment

python3 $GWAS_TOOLS/gwas_summary_imputation.py
-by_region_file $DATA/eur_ld.bed.gz
-gwas_file $OUTPUT/harmonized_gwas/CARDIoGRAM_C4D_CAD_ADDITIVE.txt.gz
-parquet_genotype $DATA/reference_panel_1000G/chr1.variants.parquet
-parquet_genotype_metadata $DATA/reference_panel_1000G/variant_metadata.parquet
-window 100000
-parsimony 7
-chromosome 1
-regularization 0.1
-frequency_filter 0.01
-sub_batches 10
-sub_batch 0
--standardise_dosages
-output $OUTPUT/summary_imputation_1000G/CARDIoGRAM_C4D_CAD_ADDITIVE_chr1_sb0_reg0.1_ff0.01_by_region.txt.gz


INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
Traceback (most recent call last):
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 97, in
run(args)
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 60, in run
results = run_by_region(args)
File "summary-gwas-imputation/src/gwas_summary_imputation.py", line 40, in run_by_region
context = SummaryImputationUtilities.context_by_region_from_args(args)
File "/user/PrediXcan/sampledata/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 229, in context_by_region_from_args
study = load_study(args)
File "/user/PrediXcan/sampledata/summary-gwas-imputation/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study
study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome)
File "/user/PrediXcan/sampledata/summary-gwas-imputation/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet
_v = pq.ParquetFile(variants)
File "/home/user/.local/lib/python3.8/site-packages/pyarrow/parquet.py", line 199, in init
self.reader.open(source, use_memory_map=memory_map,
File "pyarrow/_parquet.pyx", line 1021, in pyarrow._parquet.ParquetReader.open
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Couldn't deserialize thrift: TProtocolException: Exceeded size limit

What am I doing wrong? Could someone help me please?

Many thanks for your help.

ImportError: No module named abc with PrediXcan.py script

Hi all,
I've been unable to run the PrediXcan.py script with newer versions of Python 2.7. I've tried on

Python 2.7.16 (default, Oct 7 2019, 17:36:04)
[GCC 8.3.0] on linux2

and

Python 2.7.17 (default, Nov 7 2019, 10:07:09)
[GCC 7.4.0] on linux2

Here is the error trace:

$ PrediXcan.py -h
Traceback (most recent call last):
  File "/usr/local/bin/PrediXcan.py", line 12, in <module>
    from metax.predixcan import PrediXcanAssociation
  File "/usr/local/bin/metax/predixcan/PrediXcanAssociation.py", line 5, in <module>
    import statsmodels.api as sm
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/api.py", line 5, in <module>
    from . import iolib
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/iolib/__init__.py", line 1, in <module>
    from .foreign import StataReader, genfromdta, savetxt
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/iolib/foreign.py", line 14, in <module>
    from statsmodels.compat.python import (lzip, lmap, lrange,
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/compat/__init__.py", line 1, in <module>
    from statsmodels.tools._testing import PytestTester
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tools/__init__.py", line 1, in <module>
    from .tools import add_constant, categorical
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tools/tools.py", line 11, in <module>
    from statsmodels.tools.validation import array_like
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tools/validation/__init__.py", line 1, in <module>
    from .validation import (array_like, bool_like,  dict_like,
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/tools/validation/validation.py", line 1, in <module>
    from collections.abc import Mapping
ImportError: No module named abc

Thank you for maintaining a great tool and thanks for any help with this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.