Coder Social home page Coder Social logo

gwastools's Introduction

gwasTools

A collection of R scripts that might be useful to plot GWAS results.

The following R pakages need to be installed for running these Rscripts:

optparse, data.table, RColorBrewer, plotrix, pwr, Map2NCBI, basicPlotteR, devtools devtools::install_github('JosephCrispell/basicPlotteR')

QQ Plot

Frequency binned and thinned QQ plot.

Please check out the required/available Rscript parameters by using the following command

Rscript QQplot.r --help

Manhattan Plot

Fast Manhattan plot script. Please check out the required/available Rscript parameters by using the following command

Rscript ManhattanPlot.r --help

80% Power Plot

80% Power plots based on Cohen's effect size calculation for proportions as implemented in the 'pwr' R package

Rscript PowerPlot.r --help

Minimal/example Input format for QQ and Manhattan plots

CHROM POS MAF PVALUE
1 1ย  0.05 0.99
2 2 0.15 0.1
3 3 0.5 0.25

Examples (run from command line):

Rscript QQplot.r \
--input ExampleGWAS.txt \
--prefix Example \
--maf MAF \
--pvalue PVALUE \
--maintitle 'An Example QQ plot'


# GWAS results from http://csg.sph.umich.edu/abecasis/public/amd2015/Fritsche_2015_AdvancedAMD.txt.gz
Rscript ManhattanPlot.r \
--input Fritsche_2015_AdvancedAMD.txt.gz \
--prefix Example \
--chr Chrom \
--pos Pos \
--pvalue GC.Pvalue \
--coltop T \
--maintitle 'Age-related macular degeneration (Fritsche et al. 2016)' \
--threads 8


Rscript PowerPlot.r \
--prefix Example \
--cases 500,1000,5000 \
--controls 1000,2000,10000 \
--minMAF 0.001 \
--alpha 5E-8 

gwastools's People

Contributors

ilarsf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gwastools's Issues

Script fails if no SNPs are significant?

If I run the test data included with SAIGE I get the following results:

CHROM    POS    ID    REF    ALT    AC    MAC    MAF    AF    CHISQ    BETA    SEBETA_GC    PVALUE_GC    LOG10P_GC
1    4    rs4    2    1    20    20    0.01    0.01    0.00900168133609557    -0.0748360120098867    0.788767156474349    0.924    0.0341342498337596
1    9    rs9    2    1    4    4    0.002    0.002    0.778767920480584    -1.36954622722431    1.5519319320999    0.378    0.42306208504815
1    10    rs10    2    1    4    4    0.002    0.002    0.198787281155206    0.655246816602172    1.46963884563484    0.656    0.183293820523815
1    11    rs11    2    1    2    2    0.001    0.001    0.0982969720215684    -1.10691964528169    3.53057974490197    0.754    0.122696058298277
1    12    rs12    2    1    30    30    0.015    0.015    0.0187906336497365    -0.0897892694469568    0.655018740034906    0.891    0.0501376964185931
1    14    rs14    2    1    75    75    0.0375    0.0375    0.0298836109133599    -0.0747761355054132    0.432560123012224    0.863    0.0641127257257747
1    15    rs15    2    1    244    244    0.122    0.122    1.32622747205384    -0.28363607944272    0.24629322403832    0.249    0.602968677777628
1    17    rs17    2    1    12    12    0.006    0.006    0.00243722719101353    -0.0411351808952507    0.833230970885822    0.961    0.0174457511091217
1    20    rs20    2    1    762    762    0.381    0.381    0.000120928933151418    -0.00181338599534426    0.164901705220366    0.991    0.00382730415769189
1    22    rs22    2    1    459    459    0.2295    0.2295    0.559747345486054    0.144922266593042    0.193704226331097    0.454    0.342597941239989

But if I run your script on these data I get the message:

Error in seq.default(2, length(chrLab), by = 2) : wrong sign in 'by' argument Calls: axis -> seq -> seq.default Execution halted

I do not know if fixing this is worth your time, but I thought I should report the error :)

ManhattanPlot.R: NCBI annotation not available

Hi,

For a collaboration project, we need to make Manhattan plots from our GWAS data. I have been able to use the Rscript ManhattanPlot.r and that given the following message.

Please run the following code line by line in an interactive R session:
library(Map2NCBI)
library(data.table)
file.genelist37 <- path.expand("~/GeneList_Homo_sapiens_BUILD.37.3.txt")
genelist37 <- GetGeneList_v11("Homo sapiens",build="BUILD.37.3",savefiles=F,destfile=tempfile())
fwrite(genelist37,file.genelist37,sep="\t",quote=T)

Unfortunately, the ftp://ftp.ncbi.nih.gov/genomes/MapView for build37 is not available. Do you have other options for the annotation?

Your help will be much appreciated.

Error including candidate regions file in updated Manhattan plot script

I have successfully included a hitregions file in an older version of the Manhattan plot, but in the newest version i get an error message:

Error in merge.data.table(plotdata[, .(CHROM, POS, x, y)], candidateRegions,  : 
 Elements listed in `by` must be valid column names in x and y
Calls: ManhattanPlot -> merge -> merge.data.table

This is the tabseparated candidate region file (I tried to change the last column name from LEGENDTEXT to LABEL, as indicated in the --help function), but nothing changed

CHROM START END COL LABEL
8 65984044 66384044 green Novel_locus
7 20490738 20890738 green Novel_locus
12 22975219 24975219 blue Known_locus
8 129718772 131718772 blue Known_locus
18 49379032 51379032 blue Known_locus

This is the code I am running

Rscript ManhattanPlot.r --input plots/low_back_pain_H23_forplots.txt --prefix test --chr CHR --pos POS --pvalue p.value --hitregion hitregions.txt --maintitle 'test'

I might have missed an update on how to make the hitregion file, but the only difference I could see by comparing the two scripts, is the header of the last column (LEGENDTEXT vs. LABEL). The excact same code and files (except for changing the header of the last column) works when running the older script. It also works fine when dropping the --hitregion option.

Sorry if I have missed something obvious.

Thanks,
Sigrid

Error in candidateRegions$POS[r] <- rhits$POS[which.max(rhits$LOG10P)] : replacement has length zero

I just downloaded the updated scripts (my old was probably several years old), but I have an issue with the Manhattan plot script.

The error message I get is
Error in candidateRegions$POS[r] <- rhits$POS[which.max(rhits$LOG10P)] :
replacement has length zero
Calls: ManhattanPlot
In addition: Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names, :
Item 3 has 0 rows but longest item has 1; filled with NA

I have tried to run the script on two phenotypes, where it works on one, but not the other. I have gone through the files, but I am not able to find anything wrong; all columns have value within expected ranges, with no NA. They should also be sorted on chr - pos. For the phenotype giving an error, I tried with the old script I had on the cluster, and it works fine.

This is the first ten lines of the input file (which is tab-separated)
CHR POS P MAF
1 85022 0.99079011268437 0.00404269760474563
1 135195 0.52322468045994 0.00930232927203178
1 233476 0.44981101380145 0.00814521033316851
1 233582 0.286454369491075 0.00564469257369637
1 554968 0.294178441484736 0.00653086602687836
1 569994 0.615267901422401 0.00595780368894339
1 574151 0.622304870569572 0.0127126947045326
1 581812 0.302093715615789 0.0190441608428955
1 648843 0.643641380536429 0.00501822307705879

This is the code I am running: Rscript ManhattanPlot.r --input test.txt --prefix test --chr CHR --pos POS --pvalue P

It guess it must be something wrong with the input file, that is tolerated by the old script, but not the new one, but I have not been able to figure out by looking through the code or the inputfile...

Do you have any ideas what your updated script is not tolerating?
Thankful for any help!

Edit: The only difference I could find between the two phenotypes, is that the phenotype that did not work, did not have any significant SNPs. For the file where the script worked, I removed the significant SNP(s). Then running the Manhattan plot script again I get the error. Could it be a bug in the script that requires the resultfile to contain significant SNPs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.