Coder Social home page Coder Social logo

Comments (8)

yasuiniko avatar yasuiniko commented on July 17, 2024

Update
The issue is that the root node may have an edge length. Node.distance_from_root() includes the root's edge length, but Tree.calc_node_root_distance() ignores it.

Output

distance_from_root: 2.4693498882842526
[2.4693498882842526, 2.4693498882842526, 2.4693498882842526]

calc_node_root_distances: 0.5186108276003377
[0.5186108276003377, 0.5186108276003377, 0.5186108276003377]

root edge length: 1.950739060683915
difference between 2 calculation methods: 1.9507390606839148

Code

import dendropy as dp

def distance_from_root(node):
    dist = node.edge.length
    parent = node.parent_node
    while parent:
        dist += parent.edge.length
        parent = parent.parent_node
    return dist

def distance_from_root2(node):
    dist = 0
    while node.parent_node:
        dist += node.edge.length
        node = node.parent_node
    return dist

species = dp.TaxonNamespace(["A", "B", "C"])
t = dp.simulate.birth_death_tree(birth_rate=1.0, death_rate=0, taxon_namespace=species)


# First style
print()
one = max(map(distance_from_root, t.leaf_node_iter()))
print("distance_from_root: {}".format(one))
print([x.distance_from_root() for x in t.leaf_node_iter()])
print()

# Second style
two = max(map(distance_from_root2, t.leaf_node_iter()))
print("calc_node_root_distances: {}".format(two))
print(t.calc_node_root_distances())
print()

print("root edge length: {}".format(t.seed_node.edge.length))
print("difference between 2 calculation methods: {}".format(one-two))

from dendropy.

jeetsukumaran avatar jeetsukumaran commented on July 17, 2024

Nice catch. The question is now, which of the behaviors is correct? Or,
put another way, which is less surprising?

On the one hand, if the declared intention is "distance from root node",
then it seems that any edge lengths before the root node should be
disregarded.

On the other, one might expect the distance from root node for tips to
be the entire tree height?

I am leaning toward the former, i.e., to consider both tree height and
distance from root node to start from the root node itself, ignoring the
edge length of the root node if any.

On 7/6/16 1:13 AM, Niko Yasui wrote:

Update
The issue is that the root node may have an edge length.
Node.distance_from_root() includes the root's edge length, but
Tree.calc_node_root_distance() ignores it.

Output

distance_from_root: 2.4693498882842526
[2.4693498882842526, 2.4693498882842526, 2.4693498882842526]

calc_node_root_distances: 0.5186108276003377
[0.5186108276003377, 0.5186108276003377, 0.5186108276003377]

root edge length: 1.950739060683915
difference between 2 calculation methods: 1.9507390606839148

Code

import dendropy as dp

def distance_from_root(node):
dist = node.edge.length
parent = node.parent_node
while parent:
dist += parent.edge.length
parent = parent.parent_node
return dist

def distance_from_root2(node):
dist = 0
while node.parent_node:
dist += node.edge.length
node = node.parent_node
return dist

species = dp.TaxonNamespace(["A", "B", "C"])
t = dp.simulate.birth_death_tree(birth_rate=1.0, death_rate=0, taxon_namespace=species)

First style

print()
one = max(map(distance_from_root, t.leaf_node_iter()))
print("distance_from_root: {}".format(one))
print([x.distance_from_root() for x in t.leaf_node_iter()])
print()

Second style

two = max(map(distance_from_root2, t.leaf_node_iter()))
print("calc_node_root_distances: {}".format(two))
print(t.calc_node_root_distances())
print()

print("root edge length: {}".format(t.seed_node.edge.length))
print("difference between 2 calculation methods: {}".format(one-two))


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR0d2lUxc20Au9YWiawZTFQW4GqIhks5qSzmOgaJpZM4JE3Py.


Jeet Sukumaran

[email protected]

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

from dendropy.

yasuiniko avatar yasuiniko commented on July 17, 2024

Since I'm using dendropy to reproduce Mesquite-like behavior, and Mesquite assumes that the root node bifurcates immediately, I have so far manually set the root node's edge length to 0 in my birth/death trees.

I agree with your assessment of the situation. Based on my reading of the function names and my theoretical understanding of trees, the root node should have an edge with length 0 (or even better, not have an edge).

from dendropy.

yasuiniko avatar yasuiniko commented on July 17, 2024

@jeetsukumaran,

I've also noticed that contained coalescent trees tend to have extremely short edges near the root, especially when species tree edges are short relative to the population size.

Could this be because on line 516 in coalescent.py, coalesce_nodes is forcing all remaining taxa to rapidly coalesce, instead of using the total length of the root node's edge length to coalesce?

from dendropy.

jeetsukumaran avatar jeetsukumaran commented on July 17, 2024

If period=None is specified, then no restriction is put on the total
time available for coalescence, and coalescence proceeds until all nodes
have coalesced into one.

BUT the rate of coalescence should continue to follow the Kingman
distribution.

Ln 272 shows that if time_remaining (which is assigned the value of
period) is None, the loop repeats without restriction.

-- jeet

On 7/7/16 9:03 PM, Niko Yasui wrote:

@jeetsukumaran https://github.com/jeetsukumaran,

I've also noticed that contained coalescent trees tend to have extremely
short edges near the root, especially when species tree edges are short
relative to the population size.

Could this be because on line 516 in coalescent.py, coalesce_nodes is
forcing all remaining taxa to rapidly coalesce, instead of using the
total length of the root node's edge length to coalesce?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR58f7b9P_QauurfYRo1CqIS_fPgZks5qTaH3gaJpZM4JE3Py.


Jeet Sukumaran

[email protected]

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

from dendropy.

yasuiniko avatar yasuiniko commented on July 17, 2024

That's true, I'll keep digging. There is something about the comments that I'm confused about, though.

Line 101

    The waiting time for coalescence of *any* two gene lineages in a sample of
    $n$ gene lineages evolving in a population with haploid size $N$ is an
    exponentially-distributed random variable with rate of $\\choose{N, 2}$ and
    an expectation of $\\frac{1}{\choose{N, 2}}$.

This seems incorrect, because the code is using the below formula, but also because the number of gene lineages is completely ignored.

$N$ * Exponential($\\choose{n, 2}$)

Why is the exponential multiplied by the population size?

Also, line 80 should describe the continuous time version, rather than the descrete time version:

A random draw from the "Kingman distribution" (discrete time version): 

from dendropy.

jeetsukumaran avatar jeetsukumaran commented on July 17, 2024

The documentation is indeed incorrect (notation slippage, with the
number of genes $n$ being replaced by the population size $N$), but the
formula we actually use is the correct one:
133: rate = combinatorics.choose(n_genes, n_to_coalesce)
134: tmrca = rng.expovariate(rate)

The number of gene lineages is n_genes. n_to_coalesce defaults to 2.

That is, the waiting time to coalescence for any two genes in a sample
of $n$ genes is exponentially-distributed with a rate of
$\choose{n_genes, 2}$ (not $\choose(N,2)$, as indicated in the
documentation). This formula gives the rate in units of $N$. To get the
time in, e.g. calendar time, you multiply the result by $N$.

Thank you for pointing this documentation error out.

-- jeet

On 7/7/16 10:56 PM, Niko Yasui wrote:

That's true, I'll keep digging. There is something about the comments
that I'm confused about, though.

Line 101

The waiting time for coalescence of *any* two gene lineages in a sample of
$n$ gene lineages evolving in a population with haploid size $N$ is an
exponentially-distributed random variable with rate of $\\choose{N, 2}$ and
an expectation of $\\frac{1}{\choose{N, 2}}$.

This seems incorrect, because the code is using the below formula, but
also because the number of gene lineages is completely ignored.

$N$ * Exponential($\choose{n, 2}$)

Based on the wikipedia page, I suspect the correct formula should be
something like this:

Exponential($N$) / $\choose{n, 2}$,

where the time for a given pair of genes to coalesce is divided by the
number of pairs of genes. What do you think?

Also, line 80 should describe the continuous time version, rather than
the descrete time version:

A random draw from the "Kingman distribution" (discrete time version):


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR5oVfOE9CYaCkK1lU4OjNCoZteGFks5qTbxygaJpZM4JE3Py.


Jeet Sukumaran

[email protected]

Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):

http://www.flickr.com/photos/jeetsukumaran/sets/

from dendropy.

yasuiniko avatar yasuiniko commented on July 17, 2024

Thanks for the clarification!

from dendropy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.