swvanderlaan / qtltoolkit Goto Github PK

View Code? Open in Web Editor NEW

6.0 5.0 6.0 38.07 MB

A workflow based on QTLtools to run cis- and trans-QTL analyses.

Home Page: https://swvanderlaan.github.io/QTLToolKit/

Shell 63.56% Python 17.13% R 16.99% Perl 2.32%

qtl qtl-analyses gene-expression methylation mqtl eqtl

qtltoolkit's People

Contributors

Stargazers

Watchers

Forkers

jacco-schaap llandsmeer lichenbiostat zhenggongwei conghan-01

qtltoolkit's Issues

Add in genome-wide QTL analysis script

Detach RTC score

We should detach the RTC Score from the main script QTLAnalyzer.

Remove CpGs/probes containing SNPs

Add a routine (somewhere) to remove CpGs (probes) containing SNPs or that map to multiple locations. Refer to: Zhou W. et al. Nucleic Acids Res. 2016.

add an annotation creation script

Add an annotation creation script for expression and/or methylation array data.

QTL analysis report

Add in some code to produce a simple report of the QTL analysis.

QTL QC: mQTL results

Order of columns, etc. is incorrect for mQTL analyses.

Summarizer: re-order results

There is an issue with the re-ordering of the nominal results: these are empty after attempting to re-order using the following statements:

if [[ ${QTL_TYPE} == "CIS" ]]; then
	
		if [[ ${CLUMP} == "Y" ]]; then
			echo "* Clumped ${QTL_TYPE}-QTL results."
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_clumped_summary.txt
			
			echo "* And re-ordering based on p-value."
			zcat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt.gz | (head -n 1 && tail -n +3  | sort -t , -k 24) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_clumped_summary.txt
		
		elif [[ ${ANALYSIS_TYPE} == "MQTL" ]]; then
			echo "* Non-clumped ${QTL_TYPE}-QTL results."
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_summary.txt
			
			echo "* And re-ordering based on p-value."
			cat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt | (head -n 1 && tail -n +3  | sort -t , -k 32) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
			
		else
			echo "* Non-clumped ${QTL_TYPE}-QTL results."
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlperm_summary.txt
			
			echo "* And re-ordering based on p-value."
			cat ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt | (head -n 1 && tail -n +3  | sort -t , -k 24) > ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
			gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt
			
		fi

	elif [[ ${QTL_TYPE} == "TRANS" ]]; then
		gzip -fv ${SUMMARY}/${STUDYNAME}_QC_qtlnom_summary.txt

	fi

Environment checks

We should add in checks of the environment, similar to slideToolkit scripts.

Edit QC pipeline of QTL

Several items need improvement or fixing:

to check the delimiter automatically of the annotation file
add in the data.table() to read and write tables using fread or fwrite
the eQTL-part (nom/perm for cis) to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)
double check the trans-QTL-part to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)
double check the mQTL-part to match with the new 'strand' column (as the column numbers have changed by the addition of the 'strand' column in the output)

Add routine for genome-wide QTL analysis

Add a routine for genome-wide analyses of QTLs.

Plotting of mQTL analyses

We should think of a solution to plot mQTL results in a regional association plot, maybe using LocusZoom.

update workflow image

Update the workflow image plus include a better/high resolution version.

Add r^2 to results - SumEditor/SumParser

When running a QTL analysis - usually in cis with GWAS loci - we should always have the r^2 reported with the GWAS locus.
This is done when CLUMP="Y", but this should be done regardless. It is done using QTLSumEditor.py.
This connects to QTLSumParser.py, as since regular, non-clumped data does not have the r^2 plugged in, the QTLSumParser.py script cannot make the lists of interesting eGenes and eSNPs.

add proper --help flag

We should have a proper --help flag to aid in explaining the various options in the workflow.

QTL QC

We need to fix the Q-value calculation. This goes awry when there are a lot - or only - very significant p-values, or some NAs. Or too little values. Currently we use the Benjamini - Hochberg FDR correction, which is fine (a bit less conservative) and quite similar to Q-value (Storey & Tibshirani, which is the least conservative).

It pertains this part of the QTL_QC.R script:

  cat("\n* Least conservative correction: Storey & Tibshirani correction...\n")
  ### Storey & Tibshirani correction - Least conservative
  ### references:
  ###     - http://en.wikipedia.org/wiki/False_discovery_rate
  ###     - http://svitsrv25.epfl.ch/R-doc/library/qvalue/html/qvalue.html
  ### Requires a bioconductor package: "qvalue"
  if(opt$resulttype == "NOM") {
  #RESULTS$Q = qvalue(RESULTS$Nominal_P)$qvalues # original code
  RESULTS$Q = "Not calculated: throws an error when p-value is infinite or NA. NEED FIXING"
  } else if(opt$resulttype == "PERM") {
   #RESULTS$Q = qvalue(RESULTS$Approx_Perm_P)$qvalues # original code
   RESULTS$Q = ifelse(RESULTS$Approx_Perm_P > 0, qvalue(RESULTS$Approx_Perm_P)$qvalues, "NA")
  } else {
   cat ("\n\n*** ERROR *** Something is rotten in the City of Gotham; most likely a typo. Double back, please.\n\n",
        file=stderr()) # print error messages to stder
  }
  # RESULTS$Q = "Currently not calculated due to an issue with the qvalue() package."