Coder Social home page Coder Social logo

focus's People

Contributors

quattro avatar ruthjohnson95 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

focus's Issues

Unable to match ID when importing fusion weights

Hi,
I was trying to import fusion wgt by focus import but had the following error for all my fusion genes:

Starting log...
[2019-12-05 17:54:55 - INFO] Preparing weight database
[2019-12-05 17:54:56 - INFO] Starting import from FUSION database test_WEIGHTS/WEIGHTS/train_weights.pos.new
[2019-12-05 17:54:56 - INFO] Querying mygene servers for gene annotations
[2019-12-05 17:55:00 - INFO] Starting individual model conversion
[2019-12-05 17:55:00 - WARNING] Unable to match ENSG00000117868 to Ensembl ID. Using symbol for ID
[2019-12-05 17:55:00 - WARNING] Unable to match ENSG00000111647 to Ensembl ID. Using symbol for ID
[2019-12-05 17:55:00 - WARNING] Unable to match ENSG00000168569 to Ensembl ID. Using symbol for ID
[2019-12-05 17:55:00 - WARNING] Unable to match ENSG00000137266 to Ensembl ID. Using symbol for ID
....

If I changed the ID to ensg ID and used --use-ens-id, it still had similar errors.

I downloaded the fusion examples data from their website, https://data.broadinstitute.org/alkesgroup/FUSION/WGT/GTEx.Whole_Blood.tar.bz2
and I encountered the same error.

In the meantime, I can import my.gene and run mg.querymany(ens_list, scopes='ensembl.gene') in my python.

Any thoughts about what may be the issue? Thanks a lot.

Yue

How to evaluate result of a gene in 2 region.

Hi @quattro,

I performed FOCUS and get result. However, I found a gene in 2 region as below.

GENE CHR START END FUSION_pval cv.R2.pval twas_z PIP in_cred_set region
TRIT1 1 40303782 40349183 8.62E-02 0.065217 -0.406 0.000171 0 1:38732300-1:40199346
TRIT1 1 40303782 40349183 8.62E-02 0.065217 2.65 0.00479 0 1:40201007-1:41975226

The transcription site of TRIT1 is 40303782-40349183, which is in 1:40201007-1:41975226. Why the result of TRIT1 also exist in 1:38732300-1:40199346. Would you tell me?

Best Regards,
Hiroyuki

How to finemap the locus without GWAS pvalue < 5e-8?

Hello,

I conducted FOCUS and succeeded to finemap loci with GWAS pvalue < 5e-8.
However, loci without GWAS pvalue < 5e-8 are skipped as below.

[2020-12-26 15:10:40 - WARNING] No GWAS SNPs with p-value < 5e-08 found at 2:12419150 - 2:14335308. Skipping
[2020-12-26 15:10:40 - WARNING] No GWAS SNPs with p-value < 5e-08 found at 2:12419150 - 2:14335308. Skipping
[2020-12-26 15:10:40 - WARNING] No GWAS SNPs with p-value < 5e-08 found at 2:14335308 - 2:16329735. Skipping

Previously, I conducted FUSION and identified significant genes in loci without GWAS pvalue < 5e-8. I hope to perform FOCUS for those loci, however, they are skipped.

If possible, wolud you tell me the method to finemap those loci?

Integer column has NA values in column 3

I had problems while I was running "focus finemap", and it looks like this:

         FOCUS v0.802

===================================
focus finemap
../Output/agnu_meta_Output_clean.trans_phar/agnu_meta_Output_clean.trans_phar_munge.sumstats.gz
../1000G_EUR_Phase3_plink/1000G.EUR.QC.11
GTEx_Adipose_Visceral_Omentum.db
--chr 11
--tissue Adipose_Visceral_Omentum
--p-threshold 1
--out ../Output/agnu_meta_Output_clean.trans_phar/TWASresults/GTEx_Adipose_Visceral_Omentum/GTEx_Adipose_Visceral_Omentum.chr11

Starting log...
[2022-08-30 19:59:53 - INFO] Preparing GWAS summary file
[2022-08-30 20:00:24 - INFO] Preparing reference SNP data
[2022-08-30 20:00:24 - ERROR] Integer column has NA values in column 3
[2022-08-30 20:00:24 - INFO] Finished twas & fine-mapping

add_credible_set reject the genes with very high pip value

Hi, first of all, thanks a lot for developing such an amazing tool. I recently found finemap will not include the genes with very high pip values.
I found add_credible_set function from finemap will reject the very high pip values.
` df = df.sort_values(by=["pip"], ascending=False)

# add credible set flag
psum = np.sum(df.pip.values)
npost = df.pip.values / psum
csum = np.cumsum(npost)
in_cred_set = csum <= credible_set
df["in_cred_set"] = in_cred_set.astype(int)`

for example, df.pip.values=[0.999,0.001,0.001,0.0001]; csum will be [0.9979023, 0.9989012, 0.9999001,1.0000000]
the first csum is greater then credible_set (for example 0.9) and will be rejected.
My understanding is such genes should also be included, would you help me with this ?
Thanks in advance.

Importing weights from FUSION

Hi,

I am trying to import FUSION weights to focus. The FUSION weights are directly downloaded from FUSION databases (http://gusevlab.org/projects/fusion/).

The directory test/ contains two files:
~/test$ ls
Minor_Salivary_Gland Minor_Salivary_Gland.P01.pos

"Minor_Salivary_Gland" is a directory containing all .wgt.RDat weight files. Minor_Salivary_Gland.P01.pos is a file with weight lists: Minor_Salivary_Gland/*****.wgt.RDat.

Under directory test/, I run:
focus import Minor_Salivary_Gland.P01.pos fusion --tissue Minor_Salivary_Gland --name GTEx --assay rnaseq --output DB_NAME

There is always an error:

Starting log...
[2019-07-10 12:30:56 - INFO] Preparing weight database
[2019-07-10 12:30:57 - ERROR] 'rpy2.rinterface.SexpEnvironment' object has no attribute 'find'
[2019-07-10 12:30:57 - INFO] Finished importing prediction models

I also tried to import weights from PrediXcan and it works, so FOCUS should be successfully installed and can be used. Then, i think it might be due to FUSION weights, but the weights are directly downloaded and all the packages needed for FUSION and focus are installed.

Any suggestions will be appreciated. Thank you for your help!
Sincerely,
Sophie

P less than 5e-8 but it says doesnt' exist

Hi
Thank you for your tool. It is very helpful for us.
But I have a question about it.

  1. my GWAS had SNP CHR BP A1 A2 BETA SE P MAF,after munging the data P is not in the data, and result of fine map is empty, the log file says there is not p < 5e-8 ....
    #my munge code
    focus munge $GWAS --N 80610 --N-cas 20806 --snp SNP --a1 A1 --a2 A2 --p P --frq MAF --output $CLEAN_OUT/AAA_combind_clean
    the result was going to get a lot fewer rows.(did i do something wrong?)
    what is the munging data main function?
    #my fine map code
    focus finemap $CLEAN_OUT $REF $PREDICT_DB --chr 9 --tissue Brain_Caudate_basal_ganglia --plot --output $FINE_OUT/AAA_combind_clean_fine_chr9
  2. only analyze one chromosome at a time?Or an tissue?The inability to compare genes in multiple tissues simultaneously?Like 13 GTEX brain tissues

Thank you ! look forward to your reply!
Have a nice day!

Importing weights from FUSION

Hi,

I am trying to import FUSION weights to focus. The FUSION weights are directly downloaded from FUSION databases (http://gusevlab.org/projects/fusion/).

The directory test/ contains two files:
~/test$ ls
YFS.BLOOD.RNAARR YFS.BLOOD.RNAARR.pos YFS.BLOOD.RNAARR.profile.err
YFS.BLOOD.RNAARR.list YFS.BLOOD.RNAARR.profile

"YFS.BLOOD.RNAARR" is a directory containing all .wgt.RDat weight files. YFS.BLOOD.RNAARR.pos is a file with weight lists: YFS.BLOOD.RNAARR/*****.wgt.RDat.

Under directory test/, I run:
focus import YFS.BLOOD.RNAARR.pos fusion --tissue blood --name GTEx --assay array --output YFS

There is always an error:

===================================
FOCUS v0.7

focus import
YFS.BLOOD.RNAARR.pos
fusion
--tissue blood
--name GTEx
--assay array
--output YFS

Starting log...
[2021-09-17 00:48:07 - INFO] Preparing weight database
[2021-09-17 00:48:07 - ERROR] import_fusion() missing 1 required positional argument: 'session'
[2021-09-17 00:48:07 - INFO] Finished importing prediction models

Any suggestions will be appreciated.
Thanks!
Sincerely,
lyn

Posterior predictive check

The older R-based version of focus could perform posterior-predictive checks at each fine-mapped region. The python version should support this feature as well.

The simplest implementation would be an additional flag that indicates computing the simulation, and exporting the results per region.

unsupported operand type(s) for +: 'float' and 'str'

Hi FOCUS team,

Thank for developing this useful tool.
I got the following error when I ran focus finemap.
I know that the same issue was reported at #41 , but it was already closed without any solutions.

I did munge sumstats before finemap without any trouble.
Also, I used PLINK bfiles (https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz)
and weight files (focus.db) as the Wiki page indicated.
FOCUS version (v0.802) seems to be up-to-date.

It is appreciated if you could tell me solution for this issue.

Best,
Nobuyuki

===================================
             FOCUS v0.802             
===================================
focus finemap
	../tmp/GWAS/input/munged.sumstats.gz
	../dat/focus_dat/1000G_EUR_Phase3_plink/1000G.EUR.QC.1
	../dat/focus_dat/focus.db
	--chr 1
	--locations 37:EUR
	--verbose
	--out ../tmp/GWAS/output/finemap_chr1

Starting log...
[2022-10-30 14:06:39 - INFO] Detecting 1 populations for fine-mapping.
[2022-10-30 14:06:39 - INFO] As a result, running single-population FOCUS.
[2022-10-30 14:06:39 - INFO] Preparing GWAS summary file for population at ../tmp/GWAS/input/munged.sumstats.gz.
[2022-10-30 14:07:11 - INFO] Preparing reference SNP data for population at ../dat/focus_dat/1000G_EUR_Phase3_plink/1000G.EUR.QC.1.
[2022-10-30 14:07:16 - INFO] Preparing weight database at ../dat/focus_dat/focus.db.
[2022-10-30 14:07:18 - INFO] Preparing user-defined locations at 37:EUR.
[2022-10-30 14:07:18 - INFO] Found 1703 independent regions on the entire genome.
[2022-10-30 14:07:18 - INFO] 133 independent regions currently used after being filtered on chromosome, start, and stop.
[2022-10-30 14:07:20 - INFO] Preparing data at region 1:14891511-1:16897730. Skipping if following warning occurs.
[2022-10-30 14:07:20 - INFO] Deciding prior probability for a gene to be causal.
[2022-10-30 14:07:20 - INFO] Using gencode file prior probability 0.010869565217391304.
[2022-10-30 14:07:48 - INFO] Fine-mapping starts at region 1:14891511-1:16897730.
[2022-10-30 14:07:48 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 1:14891511-1:16897730 will skip if following errors occur.
[2022-10-30 14:07:48 - ERROR] unsupported operand type(s) for +: 'float' and 'str'
[2022-10-30 14:07:48 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!

[ERROR] import_fusion 'Series' object has no attribute 'DIR'

Hello,

I wonder if someone knows what could be the issue/solution here.
I'm trying to look into the code 'pyfocus/models/convert.py' , but I'm not familiar with python.

The script I use:
focus import WEIGHTS/GTEx.Whole_Blood.pos fusion --tissue Whole_Blood --name GTEx --assay rnaseq --output fusion

Which print this:
Starting log...
[2022-04-11 12:29:08 - INFO] Preparing weight database
[2022-04-11 12:29:08 - INFO] Starting import from FUSION database WEIGHTS/GTEx.Whole_Blood.pos
[2022-04-11 12:29:08 - INFO] Querying mygene servers for gene annotations
[2022-04-11 12:29:15 - INFO] Starting individual model conversion
[2022-04-11 12:29:15 - ERROR] 'Series' object has no attribute 'DIR'
[2022-04-11 12:29:15 - INFO] Finished importing prediction models

Thanks a lot,
Emilio

Allow custom column names for predixcan import

Hello and thank you for this great piece of software ! I'm having an issue with importing a Predixcan format db (which runs fine with Predixcan / Metaxcan), and I have no idea about the origin of the error

Command executed :

${FOCUS} import ${PREDIXCAN_DB_FILE} predixcan --tissue BLOOD --name INTERVAL.SomaLogic --assay array --output INTERVAL.SOMALOGIC.BLOOD.cis_1mb.focus.db

Logging with error ERROR list index out of range :

[2019-05-30 15:39:18 - INFO] Starting import from PrediXcan database ...
[2019-05-30 15:39:19 - INFO] Querying mygene servers for gene annotations
[2019-05-30 15:39:21 - INFO] Starting individual model conversion
[2019-05-30 15:39:21 - ERROR] list index out of range
[2019-05-30 15:39:21 - INFO] Finished importing prediction models

It might be that the list of genes from the weights can't be found when using the mygene package ? Any pointers would be greatly appreciated, happy to provide any further info

Cheers,

Bram

Importing weights from FUSION

Hi @quattro
it's very kind that you make this algorithms. I met problem when I try to import FUSION weights to focus.
the R version is 4.0.4.
focus import GTEx.Cells_EBV-transformed_lymphocytes.P01/Cells_EBV-transformed_lymphocytes.P01.pos
fusion
--tissue Cells_EBV-transformed_lymphocytes
--name GTEx
--assay rnaseq
--output Focus.Model/fusion.EBV

Starting log...
[2021-04-20 13:25:39 - INFO] Preparing weight database
[2021-04-20 13:25:39 - ERROR] cannot load library '/usr/local/lib64/R/lib/libR.so': /usr/local/lib64/R/lib/libR.so: cannot open shared object file: No such file or directory
[2021-04-20 13:25:39 - INFO] Finished importing prediction models.

I changed the R version to 3.5.0, the error changed:

Starting log...
[2021-04-20 13:22:16 - INFO] Preparing weight database
Error: package or namespace load failed for ‘methods’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/methods/libs/methods.so':
libR.so: cannot open shared object file: No such file or directory
Error: package or namespace load failed for ‘utils’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/utils/libs/utils.so':
libR.so: cannot open shared object file: No such file or directory
Error: package or namespace load failed for ‘grDevices’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/grDevices/libs/grDevices.so':
libR.so: cannot open shared object file: No such file or directory
Error: package or namespace load failed for ‘graphics’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/grDevices/libs/grDevices.so':
libR.so: cannot open shared object file: No such file or directory
Error: package or namespace load failed for ‘stats’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/grDevices/libs/grDevices.so':
libR.so: cannot open shared object file: No such file or directory
Error: package or namespace load failed for ‘methods’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/methods/libs/methods.so':
libR.so: cannot open shared object file: No such file or directory
During startup - Warning messages:
1: package "methods" in options("defaultPackages") was not found
2: package ‘utils’ in options("defaultPackages") was not found
3: package ‘grDevices’ in options("defaultPackages") was not found
4: package ‘graphics’ in options("defaultPackages") was not found
5: package ‘stats’ in options("defaultPackages") was not found
6: package ‘methods’ in options("defaultPackages") was not found
R[write to console]: Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/methods/libs/methods.so':
libR.so: cannot open shared object file: No such file or directory

[2021-04-20 13:22:16 - ERROR] Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/opt/ohpc/pub/libs/gnu8/R/3.5.0/lib64/R/library/methods/libs/methods.so':
libR.so: cannot open shared object file: No such file or directory

[2021-04-20 13:22:16 - INFO] Finished importing prediction models

Thank you for your help!
Sincerely,
LDH

[ERROR] Import submodule requires mygene and rpy2 to be installed

Hi,

I am trying to create weights using FUSION models and got the following error

===================================
FOCUS v0.7

focus import
../Banner/train_weights.pos
fusion
--tissue brain_dorsolateral_prefrontal_cortex
--name BAN
--assay rnaseq
--output ../Results/fusion

Starting log...
[2022-05-23 16:13:25 - INFO] Preparing weight database
[2022-05-23 16:13:25 - ERROR] Import submodule requires mygene and rpy2 to be installed.
[2022-05-23 16:13:25 - ERROR] No module named 'mygene'
[2022-05-23 16:13:25 - INFO] Finished importing prediction models

I would greatly appreciate any help from your end.

[ERROR] ufunc 'isfinite' not supported for the input types

Hi,

Thanks for developing this nice tool for TWAS fine mapping.

Please, is there a reason why fine-mapping works on certain regions and gives the error message below for other regions? Any ideas about how to solve this issue?

===================================
FOCUS v0.802

focus finemap
../FOCUS_results/SMK.cleaned.sumstats.gz
../1000G_EUR_Phase3_plink/1000G.EUR.QC.17
../FUSION_Weights/ROS_fxn.db
--p-threshold 5e-06
--location 37:EUR
--start 2206677
--stop 2228554
--chr 17
--plot
--out ../FOCUS_results/SkK_chr17

Starting log...
[2022-06-03 14:18:32 - INFO] Detecting 1 populations for fine-mapping.
[2022-06-03 14:18:32 - INFO] As a result, running single-population FOCUS.
[2022-06-03 14:18:32 - INFO] Preparing GWAS summary file for population at ../FOCUS_results/SkK.cleaned.sumstats.gz.
[2022-06-03 14:19:01 - INFO] Preparing reference SNP data for population at ../1000G_EUR_Phase3_plink/1000G.EUR.QC.17.
[2022-06-03 14:19:02 - INFO] Preparing weight database at ../FUSION_Weights/ROS_fxn.db.
[2022-06-03 14:19:02 - INFO] Preparing user-defined locations at 37:EUR.
[2022-06-03 14:19:02 - INFO] Found 1703 independent regions on the entire genome.
[2022-06-03 14:19:02 - INFO] 1 independent regions currently used after being filtered on chromosome, start, and stop.
[2022-06-03 14:19:02 - INFO] Preparing data at region 17:1928731-17:3702312. Skipping if following warning occurs.
[2022-06-03 14:19:02 - INFO] Deciding prior probability for a gene to be causal.
[2022-06-03 14:19:02 - INFO] Using gencode file prior probability 0.012048192771084338.
[2022-06-03 14:19:04 - INFO] Fine-mapping starts at region 17:1928731-17:3702312.
[2022-06-03 14:19:04 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 17:1928731-17:3702312 will skip if following errors occur.
[2022-06-03 14:19:04 - ERROR] ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
[2022-06-03 14:19:04 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!

Data folder doesn't contain gencode_map_v37.tsv

Hi,
I am trying to run focus but got following error. Will you please let me know where the error originates from and how to solve it?
I ll highly appreciate your help.

Thanks
BK

 ===================================
             FOCUS v0.8             
===================================
focus finemap
       test.sumstats.gz
        1000G_EUR_Phase3_plink/1000G.EUR.QC.22
        focus/focus.db
        --locations 37:EUR
        --chr 22
        --out ss_gwas_2020.chr22

Starting log...
[2022-04-04 20:56:24 - INFO] Detecting 1 populations for fine-mapping.
[2022-04-04 20:56:24 - INFO] As a result, running single-population FOCUS.
[2022-04-04 20:56:24 - INFO] Preparing GWAS summary file for population at test.sumstats.gz.
[2022-04-04 20:56:36 - INFO] Preparing reference SNP data for population at 1000G_EUR_Phase3_plink/1000G.EUR.QC.22.
[2022-04-04 20:56:37 - INFO] Preparing weight database at focus/focus.db.
[2022-04-04 20:56:37 - INFO] Preparing user-defined locations at 37:EUR.
[2022-04-04 20:56:37 - INFO] Found 1703 independent regions on the entire genome.
[2022-04-04 20:56:37 - INFO] 24 independent regions currently used after being filtered on chromosome, start, and stop.
[2022-04-04 20:56:37 - INFO] Preparing data at region 22:16050408-22:17674295. Skipping if following warning occurs.
[2022-04-04 20:56:37 - INFO] Deciding prior probability for a gene to be causal.
[2022-04-04 20:56:37 - ERROR] Data folder doesn't contain gencode_map_v37.tsv [Errno 2] No such file or directory: '/usr/local/analysis/focus/0.8/venv/lib/python3.10/site-packages/pyfocus/data/gencode_map_v37.tsv'
[2022-04-04 20:56:37 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!

Using GTEx v8 Weight Databases

Hi,

First of all, thank you @quattro for developing such useful and great tool! I am currently using FOCUS dev version (v0.7). I have
two questions regarding the weight databases:

First, I am interested in creating custom weight databases that can be used for FOCUS for fine mapping the TWAS results (based on TWAS/fusion pipeline) I have generated using the same custom panels.

In addition to that, I also generated TWAS results using weights available at http://predictdb.org/ and usable by MetaXcan/PrediXcan pipeline. I know that FOCUS is compatible with PrediXcan weights as well, but in the wiki example I saw only GTEx v7 based analyses and not GTEx v8.

I created GTEx v8 based weight databases, and there were no problems for this neither. However, I used MASHR-based models but not Elastic Net models (as the authors recommend it), and when I check my FOCUS results output, I saw under the "inference" column that my MASHR-based model was still interpret as "ElasticNet".

I wonder if this creates a bias for statistical testing, and I should avoid using MASHR-based models in FOCUS pipeline? Did you have any experience with this?

If possible, I'd like to use MASHR-based models as I have generated my original TWAS results using that. The output for the fine-mapped genes seems reasonable for me, but I just wanted to ask your opinion on this as well.

Second, I wonder if I should convert SNP IDs in the summary statistics file according to the variant naming of GTEx, such as in "chr1_100_A_T_b38" format, as actually these GTEx v8 models are using this format instead of rsIDs. I use GTEx name format for summary statistics that are used for PrediXcan/MetaXcan, but I am not sure if for FOCUS as well I should name the variants in this GTEx format as well.

Thank you in advance!

Edit: I tried having GTEx IDs vs rsIDs for the second question, and when GTEx IDs used FOCUS gave no output, indicating: "No overlap between LD reference and GWAS". So I assume FOCUS is only compatible with rsIDs (and if not available, CHR_POS_REF_ALT format, with "_b38" suffix?), even though GTEx weights did not have any rsIDs (so I assume these are converted during weight database preparation?).

Focus import error - "500 Server Error: Internal Server Error for url: http://mygene.info/v3/query/"

Hi there

Has any of you seen this previously?

wget https://data.broadinstitute.org/alkesgroup/FUSION/WGT/NTR.BLOOD.RNAARR.tar.bz2
tar -xf NTR.BLOOD.RNAARR.tar.bz2
focus import NTR.BLOOD.RNAARR.pos fusion --tissue NTR_BLOOD --name NTR_BLOOD  --output NTR_BLOOD_focus_db

===================================
             FOCUS v0.802
===================================
focus import
        NTR.BLOOD.RNAARR.pos
        fusion
        --tissue NTR_BLOOD
        --name NTR_BLOOD
        --output DB_NAME

Starting log...
[2022-10-31 13:39:51 - INFO] Preparing weight database
[2022-10-31 13:39:51 - INFO] Starting import from FUSION database NTR.BLOOD.RNAARR.pos
[2022-10-31 13:39:51 - INFO] Querying mygene servers for gene annotations
[2022-10-31 13:39:52 - ERROR] 500 Server Error: Internal Server Error for url: http://mygene.info/v3/query/
[2022-10-31 13:39:52 - INFO] Finished importing prediction models

I am guessing it's a temporary error (fingers crossed), as I've used the exact same command previously to generate focus weights from fusion weights, and it would work normally, but today something's wrong.

Tissue prioritization

Hi all! Thanks for this fantastic tool!

This is probably a stupid question, but here we go: when should we prioritize tissues with FOCUS (using the --tissue flag)?

Is it only when there's clear evidence of the involvement of a tissue? For example, in the FOCUS paper, the authors analyzed a GWAS of LDL levels and prioritized adipose tissue, which makes sense. However, could we prioritize tissues based on findings from the conditional analyses performed in FUSION? For example, if I ran a TWAS using all GTEx tissues, and I found some TWAS hits that were significant in the conditional analyses in 2-3 tissues, could I perform the FOCUS analysis by prioritizing one (or more) of these tissues?

sqlite3.OperationalError

Hello,

I was using the FOCUS finemap function and seem to be getting the following error. I've munged the dataset and I'm using the recommended LD panel + focus.db. Do you know why I may be getting this error? Thank you so much!

ERROR BELOW:

[2019-04-08 15:54:39 - WARNING] No GWAS SNPs with p-value < 5e-08 found at 1:41975327 - 1:43758457. Skipping
[2019-04-08 15:54:39 - ERROR] (sqlite3.OperationalError) too many SQL variables
[SQL: SELECT weight.id, weight.snp, weight.chrom, weight.pos, weight.effect_allele, weight.alt_allele, weight.effect, weight.std_error, weight.model_id, model.id, model.inference, model.ref_id, model.mol_id, molecularfeature.id, molecularfeature.ens_gene_id, molecularfeature.ens_tx_id, molecularfeature.mol_name, molecularfeature.type, molecularfeature.chrom, molecularfeature.tx_start, molecularfeature.tx_stop, refpanel.id, refpanel.ref_name, refpanel.tissue, refpanel.assay
FROM weight JOIN model ON model.id = weight.model_id JOIN molecularfeature ON molecularfeature.id = model.mol_id JOIN refpanel ON refpanel.id = model.ref_id
WHERE weight.snp IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ....'rs2429049', 'rs673242', 'rs11210864', 'rs3011220', 'rs12746640', 'rs11210913', 'rs3791102', 'rs325145', 'rs3818536', 'rs783304', 'rs12140156', 'rs803682', 'rs72683973', 'rs11210949', 'rs2527776', 'rs574736', 'rs78263317')]
(Background on this error at: http://sqlalche.me/e/e3q8)
[2019-04-08 15:54:39 - INFO] Finished twas & fine-mapping

HLA/MHC region should be skipped by default

FOCUS will perform association testing and fine-mapping at the HLA/MHC region by default. Due to the pervasive LD at this region it is troublesome both computationally and statistically.

FOCUS should operate similarly to FUSION and skip this region by default. I think adding a flag to explicitly include it, along with a warning/log message upon startup, would be useful.

Investigate JIT performance

http://numba.pydata.org/ looks like a really flexible JIT for scientific code. JIT is a "just-in-time" compiler that will translate python code to native machine code on the fly (how Java/C# are reasonably fast).

While pyfocus is typically fairly quick in the prioritized tissue case, it might be worthwhile trying to squeeze every last bit we can out of the code for the multi-tissue case.

Look into pyreadr

https://github.com/ofajardo/pyreadr is a package to read RData and Rds files. This is much more lightweight than the rpy2 package and may cover the functionality we need. It appears to be stable in terms of its interface so we are less at the mercy of updates as well.

The TWAS z-score obtained by S-PrediXcan and FOCUS software is different

Hi,

I'm trying to use FOCUS to do the post-TWAS fine-mapping, Before that,
I usesd elastic net to train my own prediction models.
Then, I used SPrediXcan.py (https://github.com/hakyimlab/MetaXcan/tree/master/software) to run TWAS.
I provided the covariance of SNPs in prediction model, which is calculated using in-sample genotype information (data for model training).

python SPrediXcan.py
--model_db_path chr1.db
--keep_non_rsid
--covariance chr1.covariances.txt.gz
--gwas_file chr1.assoc.gz
--snp_column Variant
--effect_allele_column ALT
--non_effect_allele_column REF
--beta_column BETA
--se_column SE
--additional_output
--overwrite
--output_file chr1.csv

S-PrediXcan results seem reasonable.

For fine-mapping by FOCUS, I provided my own weight database, and plink file of in-sample genotype (data for model training).
All files meet the required format, and I am pretty sure SNPs within SQL db, GWAS, and plink files are consistent.

I double checked the script and focus finemap can run sucessfully without error (except warning: PerformanceWarning: Slicing with an out-of-order index is generating 15 times more chunks)

focus finemap
chr1.assoc.sumstats.gz
geno.chr1
FOCUS.chr1.db
--locations hg38.LD-block.ASN.bed
--chr 1
--plot
--ridge-term 0
--out chr1

However, the z-score obtained by S-PrediXcan and FOCUS software is different, with a peason correlation about 0.7.
For some genes, the TWAS p-value in S-PrediXcan is unimpressive (like, < 1e-4), while in FOCUS, it is really significant (>1e-30, [2*(pnorm(twas_z, lower.tail=F]).

I have been unable to identify the reason for this discrepancy and have been trying to resolve the issue for some time now.
Any assistance that you can provide would be greatly appreciated!

Thank you for your time and consideration.

[ERROR] 'str' object has no attribute 'N'

I received a new error after updating sqlite3 to the latest version.

python
Python 3.6.5 (default, Jun 19 2018, 16:28:05) 
[GCC 6.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.28.0'

But a new error occurred when FOCUS tried to write to output.

[2019-05-30 18:43:07 - INFO] Starting fine-mapping at region 1:48978188 - 1:49894177
[2019-05-30 18:43:09 - INFO] Completed fine-mapping at region 1:48978188 - 1:49894177
[2019-05-30 18:43:09 - INFO] Creating FOCUS plots at region 1:48978188 - 1:49894177
[2019-05-30 18:43:10 - ERROR] 'str' object has no attribute 'N'
[2019-05-30 18:43:10 - INFO] Finished twas & fine-mapping

Training models is broken

FOCUS has a mechanism to train weights but it appears to be broken. Not only that it is not really amenable to parallelization which will be a HUGE bottleneck.

I think the easiest workaround is to just train models using either prediXcan or FUSION and import them after the fact.

ERROR] MetaData.__init__() got multiple values for argument 'schema'

When trying to run FOCUS this error appears. It is about de weights, but I don't know which is the problem and how to fix it.
Thank you!

Starting log...
[2022-12-28 13:04:53 - INFO] Preparing GWAS summary file
[2022-12-28 13:04:56 - INFO] Preparing reference SNP data
[2022-12-28 13:04:56 - INFO] Preparing weight database
[2022-12-28 13:04:56 - ERROR] MetaData.init() got multiple values for argument 'schema'
[2022-12-28 13:04:56 - INFO] Finished twas & fine-mapping

R issue

Hi,

I have an issue when trying to import GTEx weights from FUSION

cannot load library '/home/R/R-3.5.1/lib/R/lib/libR.so': libRblas.so: cannot open shared object file: No such file or directory

[ERROR] Import submodule requires mygene and rpy2 tobe installed

Hi,
I am trying to create weights for all tissues using mashr models (GTEx v8) and got the following error.

===================================
FOCUS v0.6.10

focus import
mashr_Whole_Blood.db
predixcan
--tissue Whole_Blood
--name GTEx
--assay rnaseq
--output Whole_Blood_focus

Starting log...
[2021-01-28 15:22:17 - INFO] Preparing weight database
[2021-01-28 15:22:17 - ERROR] Import submodule requires mygene and rpy2 to be installed.
[2021-01-28 15:22:17 - ERROR] No module named 'mygene'
[2021-01-28 15:22:17 - INFO] Finished importing prediction models

Any help to solve this would be appreciated. Thank you!

unsupported operand type(s) for +: 'float' and 'str'

Hi,

Thank for develop this usful tool. I got the following error when I run focus finemap. Do you know how to fix it? Thanks!

===================================
FOCUS v0.802

focus finemap
EA_WBC.Chen.2020/EA_WBC_new_munged_chr1.tsv.gz
./1000G_EUR_Phase3_plink/1000G.EUR.QC.1
focus.db
--locations 37:EUR
--p-threshold 5e-07
--chr 1
--out 1_AAA.chr1

Starting log...
[2022-05-18 16:30:57 - INFO] Detecting 1 populations for fine-mapping.
[2022-05-18 16:30:57 - INFO] As a result, running single-population FOCUS.
[2022-05-18 16:30:57 - INFO] Preparing GWAS summary file for population at EA_WBC.Chen.2020/EA_WBC_new_munged_chr1.tsv.gz.
[2022-05-18 16:31:05 - INFO] Preparing reference SNP data for population at ./1000G_EUR_Phase3_plink/1000G.EUR.QC.1.
[2022-05-18 16:31:11 - INFO] Preparing weight database at focus.db.
[2022-05-18 16:31:12 - INFO] Preparing user-defined locations at 37:EUR.
[2022-05-18 16:31:12 - INFO] Found 1703 independent regions on the entire genome.
[2022-05-18 16:31:12 - INFO] 133 independent regions currently used after being filtered on chromosome, start, and stop.
[2022-05-18 16:31:12 - INFO] Preparing data at region 1:10583-1:1892607. Skipping if following warning occurs.
[2022-05-18 16:31:12 - INFO] Deciding prior probability for a gene to be causal.
[2022-05-18 16:31:12 - INFO] Using gencode file prior probability 0.007575757575757576.
[2022-05-18 16:31:12 - WARNING] No GWAS SNPs with p-value < 5e-07 found at region 1:10583-1:1892607 at EA_WBC.Chen.2020/EA_WBC_new_munged_chr1.tsv.gz. Skipping.
[2022-05-18 16:31:13 - INFO] Preparing data at region 1:4380811-1:5913893. Skipping if following warning occurs.
[2022-05-18 16:31:13 - INFO] Deciding prior probability for a gene to be causal.
[2022-05-18 16:31:13 - INFO] Using gencode file prior probability 0.09090909090909091.
[2022-05-18 16:31:13 - WARNING] No GWAS SNPs with p-value < 5e-07 found at region 1:4380811-1:5913893 at EA_WBC.Chen.2020/EA_WBC_new_munged_chr1.tsv.gz. Skipping.
[2022-05-18 16:31:13 - INFO] Preparing data at region 1:5913893-1:7247335. Skipping if following warning occurs.
[2022-05-18 16:31:13 - INFO] Deciding prior probability for a gene to be causal.
[2022-05-18 16:31:14 - INFO] Using gencode file prior probability 0.03125.
[2022-05-18 16:31:14 - WARNING] No GWAS SNPs with p-value < 5e-07 found at region 1:5913893-1:7247335 at EA_WBC.Chen.2020/EA_WBC_new_munged_chr1.tsv.gz. Skipping.
[2022-05-18 16:31:14 - INFO] Preparing data at region 1:7247335-1:9365199. Skipping if following warning occurs.
[2022-05-18 16:31:14 - INFO] Deciding prior probability for a gene to be causal.
[2022-05-18 16:31:14 - INFO] Using gencode file prior probability 0.018867924528301886.
[2022-05-18 16:32:24 - INFO] Fine-mapping starts at region 1:7247335-1:9365199.
[2022-05-18 16:32:24 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 1:7247335-1:9365199 will skip if following errors occur.
[2022-05-18 16:32:24 - ERROR] unsupported operand type(s) for +: 'float' and 'str'
[2022-05-18 16:32:24 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!

[Wiki addition] : PLINK-formatted genotype data for computing reference LD

Just perhaps worthwhile to add to the wiki that the necessary PLINK-formatted genotype data for computing reference LD as mentioned in the example (1000G.EUR.QC.1) can be downloaded and extracted as follows :

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz
tar -xvzf 1000G_Phase3_plinkfiles.tgz

error in creating the weight database

Hi, I was trying to create the weight database using the CMC and GTEXv8 FUSION weights and encountered the error below. Although a .db file was created, I am not able to use the database that I created to run the finemapping portion of the analysis. Can you please offer guidance on this?

===================================
             FOCUS v0.802
===================================
focus import
	/data/ALS_50k/BrainBankSeq_May2021/MSA/Analysis.TWAS/fusion_twas/WEIGHTS/CMC.BRAIN.RNASEQ.pos
	fusion
	--tissue brain_dorsolateral_prefrontal_cortex
	--name CMC
	--assay rnaseq
	--output fusion

Starting log...
[2023-01-22 20:29:43 - INFO] Preparing weight database
[2023-01-22 20:29:43 - INFO] Starting import from FUSION database /data/ALS_50k/BrainBankSeq_May2021/MSA/Analysis.TWAS/fusion_twas/WEIGHTS/CMC.BRAIN.RNASEQ.pos
[2023-01-22 20:29:43 - INFO] Querying mygene servers for gene annotations
[2023-01-22 20:29:57 - INFO] Starting individual model conversion
[2023-01-22 20:29:57 - ERROR] 'Series' object has no attribute 'DIR'
[2023-01-22 20:29:57 - INFO] Finished importing prediction models

FYI, the FOCUS tool was installed in a container with the following pyfocus dependencies:

apt-get install -y python3-pip r-base
pip install numpy==1.23.1
pip install pyfocus mygene rpy2

I am guessing this error may have something to do with python packages/versions installed in the container. Can you tell me what specific versions are compatible with the current FOCUS tool?

thanks!
Ruth

Allele mismatch for weight database

Right now focus will try its best to align SNPs by alleles between the GWAS and LD reference panel, but will break if the the SNPs in the weight database don't align. Currently, it will throw an error and short circuit any estimation.

One option is to filter out SNPs from the weight database on-the-fly and log a warning that weights have been discarded in the inference procedure due to mismatch.

image

Default --n-min in munge

Hello,

In the munge help documentation it says the default -n-min value is '(90th percentile N) / 2'. However, when I look at the default value used it says 'dat.N.quantile(0.9) / 1.5'. Does this indicate the default value of --n-min is actually '(90th percentile N) / 1.5'.

Apologies if I am confused and thank you for your help.

Best wishes,

Ollie

name 'zscores' is not defined errror with when running finemap with --intercept

Hi,
I tried running finemap with the --intercept option and received the following error. See full log below.
Can you please tell me how I should resolve this?
thanks,
Ruth

===================================
             FOCUS v0.802
===================================
focus finemap
	Trait_MAF0.01_forFOCUS.cleaned.sumstats.gz
	1000G_EUR_Phase3_plink/1000G.EUR.QC.9
	fusion_GTEXv8.db
	--trait ALS
	--chr 9
	--start 26111759
	--stop 28224285
	--tissue brain
	--out results_fusion/trait_fusion_GTEXv8.chr9_brain
	--locations 38:EUR
	--p-threshold 0.0001
	--prior-prob gencode38
	--max-genes 10
	--credible-level 0.9
	--min-r2pred 0.8
	--intercept
Starting log...
[2023-03-09 08:24:36 - INFO] Detecting 1 populations for fine-mapping.
[2023-03-09 08:24:36 - INFO] As a result, running single-population FOCUS.
[2023-03-09 08:24:36 - INFO] Preparing GWAS summary file for population at META_ALSmegav1_addCohorts.noDups.UNRELATED.hg38.Rsq05.hwe1e-6_filteredHetISq80_MAF0.01_forFOCUS.cleaned.sumstats.gz.
[2023-03-09 08:24:55 - INFO] Preparing reference SNP data for population at 1000G_EUR_Phase3_plink/1000G.EUR.QC.9.
[2023-03-09 08:24:56 - INFO] Preparing weight database at fusion_GTEXv8.db.
[2023-03-09 08:24:56 - INFO] Preparing user-defined locations at 38:EUR.
[2023-03-09 08:24:56 - INFO] Found 1700 independent regions on the entire genome.
[2023-03-09 08:24:56 - INFO] 1 independent regions currently used after being filtered on chromosome, start, and stop.
[2023-03-09 08:24:56 - INFO] Preparing data at region 9:26111759-9:28224285. Skipping if following warning occurs.
[2023-03-09 08:24:56 - INFO] Deciding prior probability for a gene to be causal.
[2023-03-09 08:24:56 - INFO] Using gencode file prior probability 0.037037037037037035.
[2023-03-09 08:27:19 - INFO] Prioritizing genes by brain tissue then on predictive performance at region 9:26111835 - 9:28223982 of 1000G_EUR_Phase3_plink/1000G.EUR.QC.9.
[2023-03-09 08:27:20 - INFO] Fine-mapping starts at region 9:26111759-9:28224285.
[2023-03-09 08:27:20 - INFO] Aligning GWAS, LD, and eQTL weights for the single population. Region 9:26111759-9:28224285 will skip if following errors occur.
[2023-03-09 08:27:20 - INFO] Find 21 common genes to be fine-mapped at region 9:26111759-9:28224285.
[2023-03-09 08:27:20 - INFO] Running TWAS for the single population.
[2023-03-09 08:27:20 - ERROR] name 'zscores' is not defined
[2023-03-09 08:27:20 - INFO] Finished TWAS & fine-mapping. Thanks for using FOCUS, and have a nice day!

Filter invalid SNPs error

There are some rare instances of SNPs that look like they could be a match between QC'd and munged GWAS and RefPanel, but result in an error. This prevents fine-mapping from occurring.

Cleaning GWAS summary data

H FOCUS,
Thank you for the tool. I am new to this area. I tried very hard to follow the instructions in the Wiki Cleaning GWAS summary data. It looks so simple, but I just not getting it!. According to the instruction, it says "In order to do this we included a modified 'munge' tool from the LDSC software (FOCUS needs a few extra bells and whistles in its summary data compared with the original munge tool). At a minimum, FOCUS requires CHR, SNP, BP, A1 (effect allele), A2, BETA (or OR), and P to produce a munged GWAS data file." In order to munge the data enter the command
focus munge GWAS.txt --output GWAS.cleaned
My GWAS summary statistics have the column name CHR, SNP, BP, A1, A2, OR and P and yet I have error saying that it could not determine N column. Why can't I just have the above column names? Can you please help? Do I need all the columns but you said minimum only?
Starting log...
[2021-11-17 16:12:04 - ERROR] Could not determine N column.
[2021-11-17 16:12:04 - INFO] Conversion finished

symbol issue when creating weight database using Predixcan databases

Hi @quattro,

I am trying to create the weight database using Predixcan v8 (downloaded from https://predictdb.org/post/2021/07/21/gtex-v8-models-on-eqtl-and-sqtl/). However, I am encountering the same error message as found in an issue you had previously closed (https://github.com/bogdanlab/focus/issues/26). I am not sure what the resolution is for this because I didn't see it in your reply to that thread. Can you please offer some guidance on this?

Thanks!

===================================
             FOCUS v0.802
===================================
focus import
	./eqtl/mashr/mashr_Whole_Blood.db
	predixcan
	--tissue Whole_Blood
	--name GTEXv8
	--assay rnaseq
	--use-ens-id
	--from-gencode
	--output predixcan_mashr_eqtl_v8

Starting log...
[2023-01-25 16:02:20 - INFO] Preparing weight database
[2023-01-25 16:02:20 - INFO] Starting import from PrediXcan database ./eqtl/mashr/mashr_Whole_Blood.db
[2023-01-25 16:02:21 - INFO] Querying mygene servers for gene annotations
[2023-01-25 16:02:50 - INFO] Starting individual model conversion
[2023-01-25 16:02:53 - INFO] Committed 250 models to db
[2023-01-25 16:02:56 - INFO] Committed 250 models to db
[2023-01-25 16:02:59 - INFO] Committed 250 models to db
[2023-01-25 16:03:02 - INFO] Committed 250 models to db
[2023-01-25 16:03:05 - INFO] Committed 250 models to db
[2023-01-25 16:03:08 - INFO] Committed 250 models to db
[2023-01-25 16:03:11 - INFO] Committed 250 models to db
[2023-01-25 16:03:15 - INFO] Committed 250 models to db
[2023-01-25 16:03:18 - INFO] Committed 250 models to db
[2023-01-25 16:03:21 - INFO] Committed 250 models to db
[2023-01-25 16:03:24 - INFO] Committed 250 models to db
[2023-01-25 16:03:27 - INFO] Committed 250 models to db
[2023-01-25 16:03:30 - INFO] Committed 250 models to db
[2023-01-25 16:03:33 - INFO] Committed 250 models to db
[2023-01-25 16:03:36 - ERROR] 'symbol'
[2023-01-25 16:03:36 - INFO] Finished importing prediction models

FOCUS stalls on chromosome 6

First of all, thank you for a great tool!

Using the default p value, I have been able to fine map all chromosomes except for chromosome 6. It is getting stuck at region 6:30798168 - 6:31570931. From reading the Wiki, I know that the HLA region is not fine mapped by default, but I cannot think of another reason why the program is getting stuck on chromosome 6. Any troubleshooting advice would be very appreciated as it has been stuck on this chromosome for a few days.

#Focus finemap code
focus finemap sumstats.gz 1000G.EUR.QC.6 focus.db --chr 6 --out sumstats.chr6

Additionally, when using any p value less stringent than the default (i.e. a p<0.001 or p<0.01), chromosome 1 comes out with a .tsv file of zero output and an operational error is cited. I seem to have unlimited memory, CPU time and sqlite3 is the most updated version (as explained in #1), so I am not sure how I can remedy this. Has this been a problem you have seen before as chromosome 1 is the largest in terms of base pairs to run/fine map?

Thank you,
MacKenzie
[email protected]

No GWAS data found at {chr}{start}:{chr}{stop}. Skipping

Hello,

I was using the FOCUS to finemap at some specified loci. When I provide the software locus information using --chr, --start, --end flags, the log file generated seemed to be fine, with "No GWAS SNPs with p-value < 5e-08 found at {chr}{start}:{chr}{stop}. Skipping."

However, when I provide the software the loci bed file using --locations flag, the log result generated was not correct. It was showing "No GWAS data found at {chr}{start}:{chr}{stop}. Skipping" for all the loci in the provided bed file. Then I checked the GWAS summary statistics, but there are SNPs found in the specified loci.

Do you know why I might be getting this problem? Thank you very much!

.log file:

[2019-06-11 02:04:11 - INFO] Preparing GWAS summary file
[2019-06-11 02:04:24 - INFO] Preparing reference SNP data
[2019-06-11 02:04:26 - INFO] Preparing weight database
[2019-06-11 02:04:26 - INFO] Preparing user-defined locations
[2019-06-11 02:04:26 - WARNING] No GWAS data found found at 1:8383469 - 1:8910986. Skipping
[2019-06-11 02:04:26 - WARNING] No GWAS data found found at 1:9338719 - 1:9396660. Skipping
[2019-06-11 02:04:27 - WARNING] No GWAS data found found at 1:10271688 - 1:10638604. Skipping
[2019-06-11 02:04:27 - WARNING] No GWAS data found found at 1:12076347 - 1:12175658. Skipping
[2019-06-11 02:04:27 - WARNING] No GWAS data found found at 1:150142430 - 1:150142430. Skipping
[2019-06-11 02:04:27 - WARNING] No GWAS data found found at 1:150570943 - 1:151306728. Skipping

.bed file:

chrom	start	stop
1	8383469	8910986
1	9338719	9396660
1	10271688	10638604
1	12076347	12175658
1	150142430	150142430
1	150570943	151306728

GWAS sumstats:

SNP	A1	A2	Z	N	CHR	BP
...
rs6675857	A	G	1.08	18052.0	1	8652154
rs7523186	A	G	0.878	18052.0	1	8655495
rs4480384	A	G	-2.737	18052.0	1	8659714
...

[ERROR] 'symbol'

Hi I got an error when importing gtex v8 mashr model.

===================================
              FOCUS v0.6.10            
===================================
focus import
        mashr_Lung.db
        predixcan
        --tissue Lung
        --name GTEx
        --assay rnaseq
        --output mashr_lung_focus

Starting log...
[2020-11-05 07:06:59 - INFO] Preparing weight database
[2020-11-05 07:06:59 - INFO] Starting import from PrediXcan database mashr_Lung.db
[2020-11-05 07:06:59 - INFO] Querying mygene servers for gene annotations
[2020-11-05 07:16:43 - INFO] Starting individual model conversion
[2020-11-05 07:16:43 - ERROR] 'symbol'
[2020-11-05 07:16:43 - INFO] Finished importing prediction models

Not sure what it means. any help is appreciated.

option to use predefined gene sets

Hi,

Great method! I wanted to file a feature request. Would it be possible to have an option to use only protein coding genes for the analysis or, more generally, to use genes from a predefined set of gene names?

Thanks!
Pete

Parameter error for mygene.querymany

Hi,

I encountered a problem in mygene.querymany() function when trying to import predixcan gene models.

For mygene version 3.1.0, the fields parameter in querymany() should be a list OR a comma separated string, but not a comma separated string inside of a list as it currently is in the following lines.

  • results = mg.querymany(genes, scopes='ensembl.gene', verbose=False,
    fields=['ensembl.gene,genomic_pos,symbol,ensembl.type_of_gene,alias'], species="human")
  • results = mg.querymany(genes, scopes='symbol', verbose=False,
    fields=['ensembl.gene,genomic_pos,symbol,ensembl.type_of_gene,alias'], species="human")
  • results = mg.querymany(genes, scopes='ensembl.gene', verbose=False,
    fields=["genomic_pos_hg19,symbol,alias"], species="human")

Best,
-Jonathan

ImpG to impute missing GWAS summary statistics

To maintain feature parity with other tools (FUSION, PrediXcan GWAS QC), FOCUS should internally perform GWAS imputation of missing variant z-scores. This way the db will have the best chance to overlap with GWAS data.

There should be some internal limit (or user flag) specifying what proportion of imputed GWAS data can be included. If LD is used to impute a significant proportion, which is then used to model the SE, reported statistics will be overconfident.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.