swvanderlaan / qtltoolkit Goto Github PK
View Code? Open in Web Editor NEWA workflow based on QTLtools to run cis- and trans-QTL analyses.
Home Page: https://swvanderlaan.github.io/QTLToolKit/
A workflow based on QTLtools to run cis- and trans-QTL analyses.
Home Page: https://swvanderlaan.github.io/QTLToolKit/
We should detach the RTC Score from the main script QTLAnalyzer
.
Add a routine (somewhere) to remove CpGs (probes) containing SNPs or that map to multiple locations. Refer to: Zhou W. et al. Nucleic Acids Res. 2016.
Add an annotation creation script for expression and/or methylation array data.
Add in some code to produce a simple report of the QTL analysis.
Order of columns, etc. is incorrect for mQTL analyses.
There is an issue with the re-ordering of the nominal results: these are empty after attempting to re-order using the following statements:
if [[ ${QTL_TYPE} == "CIS" ]]; then
if [[ ${CLUMP} == "Y" ]]; then
echo "* Clumped ${QTL_TYPE}-QTL results."
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_clumped_summary.txt
echo "* And re-ordering based on p-value."
zcat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt.gz | (head -n 1 && tail -n +3 | sort -t , -k 24) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
elif [[ ${ANALYSIS_TYPE} == "MQTL" ]]; then
echo "* Non-clumped ${QTL_TYPE}-QTL results."
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_summary.txt
echo "* And re-ordering based on p-value."
cat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt | (head -n 1 && tail -n +3 | sort -t , -k 32) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
else
echo "* Non-clumped ${QTL_TYPE}-QTL results."
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_summary.txt
echo "* And re-ordering based on p-value."
cat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt | (head -n 1 && tail -n +3 | sort -t , -k 24) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
fi
elif [[ ${QTL_TYPE} == "TRANS" ]]; then
gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
fi
We should add in checks of the environment, similar to slideToolkit scripts.
Several items need improvement or fixing:
data.table()
to read and write tables using fread
or fwrite
Add a routine for genome-wide analyses of QTLs.
We should think of a solution to plot mQTL results in a regional association plot, maybe using LocusZoom.
Update the workflow image plus include a better/high resolution version.
When running a QTL analysis - usually in cis with GWAS loci - we should always have the r^2 reported with the GWAS locus.
This is done when CLUMP="Y"
, but this should be done regardless. It is done using QTLSumEditor.py
.
This connects to QTLSumParser.py
, as since regular, non-clumped data does not have the r^2 plugged in, the QTLSumParser.py
script cannot make the lists of interesting eGenes and eSNPs.
We should have a proper --help
flag to aid in explaining the various options in the workflow.
We need to fix the Q-value calculation. This goes awry when there are a lot - or only - very significant p-values, or some NAs. Or too little values. Currently we use the Benjamini - Hochberg FDR correction, which is fine (a bit less conservative) and quite similar to Q-value (Storey & Tibshirani, which is the least conservative).
It pertains this part of the QTL_QC.R
script:
cat("\n* Least conservative correction: Storey & Tibshirani correction...\n")
### Storey & Tibshirani correction - Least conservative
### references:
### - http://en.wikipedia.org/wiki/False_discovery_rate
### - http://svitsrv25.epfl.ch/R-doc/library/qvalue/html/qvalue.html
### Requires a bioconductor package: "qvalue"
if(opt$resulttype == "NOM") {
#RESULTS$Q = qvalue(RESULTS$Nominal_P)$qvalues # original code
RESULTS$Q = "Not calculated: throws an error when p-value is infinite or NA. NEED FIXING"
} else if(opt$resulttype == "PERM") {
#RESULTS$Q = qvalue(RESULTS$Approx_Perm_P)$qvalues # original code
RESULTS$Q = ifelse(RESULTS$Approx_Perm_P > 0, qvalue(RESULTS$Approx_Perm_P)$qvalues, "NA")
} else {
cat ("\n\n*** ERROR *** Something is rotten in the City of Gotham; most likely a typo. Double back, please.\n\n",
file=stderr()) # print error messages to stder
}
# RESULTS$Q = "Currently not calculated due to an issue with the qvalue() package."
Update/write a proper wiki for more information and explanation of what is what.
We should detach the Functional Enrichment Analysis from the main script QTLAnalyzer
.
Add in workflow to calculate SMR.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.