SuShiE🍣

SuShiE (Sum of Shared Single Effect) is a Python software to fine-map causal SNPs, compute prediction weights, and infer effect size correlation for molecular data (e.g., mRNA levels and protein levels etc.) across multiple ancestries.

- We detest usage of our software or scientific outcome to promote racial discrimination.

SuShiE is described in

Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk.

Zeyun Lu, Xinran Wang, Matthew Carr, Artem Kim, Steven Gazal, Pejman Mohammadi, Lang Wu, Alexander Gusev, James Pirruccello, Linda Kachuri, Nicholas Mancuso.

Check here for full documentation.

Installation

Users can download the latest repository and then use pip:

git clone https://github.com/mancusolab/sushie.git
cd sushie
pip install .

We currently only support Python3.8+.

Before installation, we recommend to create a new environment using conda so that it will not affect the software versions of the other projects.

Get Started with Example

SuShiE software is very easy to use:

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --covar EUR.covar AFR.covar --output ./test_result

It can perform:

SuShiE: multi-ancestry fine-mapping accounting for ancestral correlation
Single-ancestry SuSiE (Sum of Single Effect)
Independent SuShiE: multi-ancestry SuShiE without accounting for correlation
Meta-SuSiE: single-ancestry SuSiE followed by meta-analysis
Mega-SuSiE: single-ancestry SuSiE on row-wise stacked data across ancestries
QTL effect size correlation estimation
cis-SNP heritability estimation
Cross-validation for SuShiE prediction weights
Convert prediction results to FUSION format, thus can be used in TWAS

See here for more details on how to use SuShiE.

If you want to use in-software SuShiE inference function, you can use following code as an example:

from sushie.infer import infer_sushie
# Xs is for genotype data, and it should be a list of numpy array whose length is the number of ancestry.
# ys is for phenotype data, and it should also be a list of numpy array whose length is the number of ancestry.
infer_sushie(Xs=X, ys=y)

You can play it with your own ideas!

Notes

SuShiE currently only supports continuous phenotype fine-mapping.
SuShiE currently only supports fine-mapping on autosomes.
SuShiE uses JAX with Just In Time compilation to achieve high-speed computation. However, there are some issues for JAX with Mac M1 chip. To solve this, users need to initiate conda using miniforge, and then install SuShiE using pip in the desired environment.

Version History

Version	Description
0.1	Initial Release
0.11	Fix the bug for OLS to compute adjusted r squared.
0.12	Update io.corr function so that report all the correlation results no matter cs is pruned or not.
0.13	Add `--keep` command to enable user to specify a file that contains the subjects ID SuShiE will perform on. Add `--ancestry_index` command to enable user to specify a file that contains the ancestry index for fine-mapping. With this, user can input single phenotype, genotype, and covariate file that contains all the subjects across ancestries. Implement padding to increase inference time. Record elbo at each iteration and can access it in the `infer.SuShiEResult` object. The alphas table now outputs the average purity and KL divergence for each `L`. Change `--kl_threshold` to `--divergence`. Add `--maf` command to remove SNPs that less than minor allele frequency threshold within each ancestry. Add `--max_select` command to randomly select maximum number of SNPs to compute purity to avoid unnecessary memory spending. Add a QC function to remove duplicated SNPs.
0.14	Remove KL-Divergence pruning. Enhance command line appearance and improve the output files contents. Fix small bugs on multivariate KL.
0.15	Fix several typos; add a sanity check on reading vcf genotype data by assigning gt_types==Unknown as NA; Add preprint information.
0.16	Add option to remove ambiguous SNPs; fix several bugs and enhance codes quality.

Support

For any questions, comments, bug reporting, and feature requests, please contact Zeyun Lu ([email protected]) and Nicholas Mancuso ([email protected]), and open a new thread in the Issue Tracker.

Other Software

Feel free to use other software developed by Mancuso Lab:

MA-FOCUS: a Bayesian fine-mapping framework using TWAS statistics across multiple ancestries to identify the causal genes for complex traits.
SuSiE-PCA: a scalable Bayesian variable selection technique for sparse principal component analysis
twas_sim: a Python software to simulate TWAS statistics.
FactorGo: a scalable variational factor analysis model that learns pleiotropic factors from GWAS summary statistics.
HAMSTA: a Python software to estimate heritability explained by local ancestry data from admixture mapping summary statistics.
Traceax: a Python library to perform stochastic trace estimation for linear operators.

This project has been set up using PyScaffold 4.1.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

ancestry	rsq	p_value	N	trait
1	0.4932215908291845	4.133847921724767e-74	639	Trait
2	0.4743709425200131	3.0193789659893506e-91	639	Trait

Issue with running example data

Hello,

Thanks for developing this amazing software! I followed the instructions to install the package into a conda environment with Python 3.10.14 on a Linux system. The installation seems to be successful, but when I run the "Get Started with Example" command:

cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --covar EUR.covar AFR.covar --output ./test_result

I encounter the following error message:

===================================
             SuShiE v0.15.post1.dev3+g97cb3ad             
===================================
sushie finemap
        --pheno EUR.pheno
                AFR.pheno
        --vcf vcf/EUR.vcf
              vcf/AFR.vcf
        --covar EUR.covar
                AFR.covar
        --output ./test_result

Starting log...
[2024-06-15 00:27:34 - INFO] Detect phenotypes for Trait from 2 ancestries.
[2024-06-15 00:27:34 - INFO] Detect genotype data in vcf format.
[2024-06-15 00:27:34 - INFO] Detect covariates data.
[2024-06-15 00:27:34 - INFO] Fine-mapping finishes for Trait, and thanks for using our software. For bug reporting, suggestions, and comments, please go to https://github.com/mancusolab/sushie.
Traceback (most recent call last):
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/cli.py", line 1014, in run_finemap
    rawData = io.read_data(
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/io.py", line 116, in read_data
    bim, fam, bed = geno_func(geno_paths[idx])
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/io.py", line 221, in read_vcf
    var.gt_types = jnp.where(var.gt_types == 3, jnp.nan, var.gt_types)
AttributeError: attribute 'gt_types' of 'cyvcf2.cyvcf2.Variant' objects is not writable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxs410/.conda/envs/sushie/bin/sushie", line 8, in <module>
    sys.exit(run_cli())
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/cli.py", line 1641, in run_cli
    return _main(sys.argv[1:])
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/cli.py", line 1635, in _main
    args.func(args)
  File "/home/xxs410/.conda/envs/sushie/lib/python3.10/site-packages/sushie/cli.py", line 1053, in run_finemap
    traceback.format_exception(
TypeError: format_exception() got an unexpected keyword argument 'etype'

I believe the issue is specific to the VCF format genotype data, as the error does not occur when I replace --vcf with --plink.

Thanks in advance!

Xinyu

mancusolab / sushie Goto Github PK

sushie's Introduction

SuShiE🍣

Installation

Get Started with Example

Notes

Version History

Support

Other Software

sushie's People

Contributors

Stargazers

Watchers

Forkers

sushie's Issues

Recommend Projects

Recommend Topics

Recommend Org