Coder Social home page Coder Social logo

treedater's Introduction

treedater

treedater fits a strict or relaxed molecular clock to a phylogenetic tree and estimates evolutionary rates and times of common ancestry. The calendar time of each sample must be specified (possibly with bounds of uncertainty) and the length of the sequences used to estimate the tree.

treedater uses heuristic search to optimise the TMRCAs of a phylogeny and the substitution rate. An uncorrelated relaxed molecular clock accounts for rate variation between lineages of the phylogeny which is parameterised using a Gamma-Poisson mixture model.

To cite:

Installation

You can install the latest development version from github using the devtools package:

library(devtools)
install_github( 'emvolz/treedater')

Basic usage

dater( tre, sts, s, omega0)

where

  • tre is an ape::phylo phylogeny,
  • sts is a named vector of sample times for each tip in tre
  • s is the length of the genetic sequences used to estimate tre
  • omega0 is an initial guess of the substitution rate (can be omitted)

For a detailed introduction to features available in treedater, see the vignette on analysis of Influena H3N2: vignette('h3n2').

Command line

You can also use treedater from the command line without starting R using the tdcl script:

./tdcl -h
Usage: ./tdcl [-[-help|h] [<logical>]] [-[-treefn|t] <character>] [-[-samplefn|s] <character>] [-[-sequenceLength|l] <double>] [-[-output|o] [<character>]]

-t <file> : file name of tree in newick format  
-s  <file> : should be a comma-separated-value file with sample times in format <taxon-id,sample-time> and no header
-l <length> :  the integer length of sequences in alignment used to construct the tree 
-o <file>: name of file for saving output 

Note that you may need to modify the first line of the tdcl script with the correct path to Rscript or littler.

treedater's People

Contributors

emvolz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

treedater's Issues

Error in if (cv > 1) { : missing value where TRUE/FALSE needed message

time.data <- treedater::dater(tree, sts=sts, s = seqlen, clock ="uncorrelated", searchRoot=TRUE)
Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using omega0 parameter.
Note: Minimum temporal branch length (minblen) set to 0.0186331312223994. Increase minblen in the event of convergence failures.
Tree is rooted. Not estimating root position.
Initial guesses of substitution rate: 0.0223656181040256,0.0223811368102775,0.0367202750598568,0.0369537668001815,0.0510749320156881,0.0515263967900855
Error in if (cv > 1) { : missing value where TRUE/FALSE needed

I found the error message when I want to running treedater.

This is a brief of my data:

sts
WT-NORO-0324 WT-NORO-0325 WT-NORO-0326 WT-NORO-0331 WT-NORO-0693 WT-NORO-0989
2003.819 2003.816 2003.816 2003.934 2004.194 2004.934

summary(tree)

Phylogenetic tree: tree

Number of tips: 6
Number of nodes: 5
Branch lengths:
mean: 0.006128069
variance: 0.0001479546
distribution summary:
Min. 1st Qu. Median 3rd Qu. Max.
0.000000e+00 3.844413e-05 4.141834e-04 6.424393e-03 3.920875e-02
No root edge.
Tip labels: WT-NORO-0989
WT-NORO-0693
WT-NORO-0331
WT-NORO-0325
WT-NORO-0324
WT-NORO-0326
Node labels: Root
97
94
57
100

Enhancements?

Hi @emvolz

Some suggested minor tweaks:

  • Add in a field for the elapsed time (handy for benchmarking) and number of iterations
  • Include an option for 'quiet' during optimisation
  • Add in a field for the TMRCA (I assume it's min(x$Ti) though)

outlierTips() always prints some output to console

I've recently started using your package, many thanks for developing it!

After fitting the tree with dater(), when I call outlierTips() on the fitted tree, a data.frame is printed to the console even when I assign the output to a variable. I was wondering if it would be possible to silence printing to console when the output of the function is assigned to a variable? The solution seems simple, happy to raise a PR if you're okay with that.

only sampling year

What should I do if some sequences only have sampling year for sts, thanks.

Units for meanRateLimits

Hi,

Thanks for a great program. I'd just like to confirm that I'm correctly understanding the units in the dater function. omgeg0 should be in units of substituions/site and so should meanRateLimits, correct? Does the seqlen parameter include the full length of the genome, if an ML tree was inferred from a SNPs called from the full genome (with ascertainment bias correction for invariant sites)?

Thank you and best,

sts

I have a csv file that contained the sampling time, the first column is the sequence name, and the second column is the sampling time, how can I transform it into the "sts" for treedater, thanks.

Missing dates

Hi again I was able to load in my tree and the metadata. I created a data frame corresponding to the dates, and that loaded perfectly. However, after running the following commands I recieved an error that I am missing sample time information. What should I do about samples in the tree for which I have no dates? Additionally for the mrl values are these just guides or do I need to change this respective to my bug?

tre <- ape::read.tree

tre <- read.tree("RAxML_bipartitionsBranchLabels.finalTree")
tre <- unroot(tre) # will estimate root position

#Load Metadata: 

md <- read.csv( 'Isolate_Metadata.csv', sep = ',', header = TRUE )
rownames(md) <- md$node
head( md )
sts <- md$year
head(sts)
sts.df <- data.frame(lower = sts[1:2] - 0, upper = sts[1:2] + 1 )
head(sts.df)
mrl <- c( .0005, .002 ) # the estimated clock rate should fall between these values 
s.dtr0 <- dater( tre, sts, s = 18e3 , estimateSampleTimes = sts.df , quiet = FALSE, ncpu = 6, strict=TRUE, meanRateLimits = mrl)

Error in dater(tre, sts, s = 18000, estimateSampleTimes = sts.df, quiet = FALSE,  : 
  Missing sample time information.

Thanks for your help!

TODO

  • Parametric bootstrap
  • Improve optima by starting from multiple start conditions

How to add dates if they are not already on tip labels or can I import them from a csv file?

Apologize as I am new to this...

I have some samples run through ParSNP to get the core genome and gubbins to remove recombination. I am attempting to use your tools and I have successfully imported the tree using:

tre <- read.tree("gubbins.final_tree.tre")

But I soon realized my tips do not have a date on them. That information is in a separate datafile. Is there a way to pull this data from my csv file or add them to the tips on my tree?

Thanks!

Error in double(nm * nm) : vector size cannot be NA

I am not sure as to what "nm" (or the referenced vector) is referring to here. I read in the bootstrap-labeled tree using read.tree() and the dates file as a named vector (converting dates to decimals using lubridate::decimal_date()), as usual, but I have never received this error before. Any help would be appreciated.
issue.zip

Generate the tmrca of every node

I have utilized parboot to find tmrca and HPD 05% of my tree. Can parboot also extract the interval of every node in my tree?

Error when providing multiple initial values for mutation rate

Throws error from the evaluation of the logical operator is.na( omega0 ) saying there are multiple conditions when omega0 is a vector of multiple values. The condition is evaluated in multiple places within .dater() and dater().

reprex

library(treedater)

this.tree <- structure(list(edge = structure(c(8L, 8L, 9L, 10L, 11L, 11L, 
10L, 9L, 12L, 12L, 8L, 1L, 9L, 10L, 11L, 2L, 3L, 4L, 12L, 5L, 
6L, 7L), dim = c(11L, 2L)), edge.length = c(0.0005870302, 0.001771467, 
0.000586391, 2.0761e-06, 2.1467e-06, 2.63e-06, 0.019689715, 0.0023575334, 
1e-06, 1e-06, 0.0011784197), Nnode = 5L, tip.label = c("EPI_ISL_85580_20101028_Alabama", 
"EPI_ISL_85818_20101103_Alabama", "EPI_ISL_85819_20101107_Alabama", 
"EPI_ISL_90804_20110404_Alabama", "EPI_ISL_87729_20101216_Alabama", 
"EPI_ISL_87882_20101216_Alabama", "EPI_ISL_89756_20110105_Alabama"
)), class = "phylo", order = "cladewise")


these.label.dates <- c(EPI_ISL_85580_20101028_Alabama = 2010.82191780822, EPI_ISL_85818_20101103_Alabama = 2010.83835616438, 
EPI_ISL_85819_20101107_Alabama = 2010.84931506849, EPI_ISL_90804_20110404_Alabama = 2011.25479452055, 
EPI_ISL_87729_20101216_Alabama = 2010.95616438356, EPI_ISL_87882_20101216_Alabama = 2010.95616438356, 
EPI_ISL_89756_20110105_Alabama = 2011.01095890411)


this.seqlength <- 1701


dater(tre = this.tree, 
      sts = these.labels.dates, 
      s = this.seqlength
     , omega0 = c(2e-4, 13e-4, 2e-3)
     # , numStartConditions = 0 
     # , clock = "uncorrelated"
     , meanRateLimits = c(1e-4, 3e-3)
     # , quiet = FALSE                       
)

I forked and implemented quick fixes; I'll submit pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.