Coder Social home page Coder Social logo

jeetsukumaran / dendropy Goto Github PK

View Code? Open in Web Editor NEW
203.0 13.0 63.0 55.76 MB

A Python library for phylogenetic scripting, simulation, data processing and manipulation.

Home Page: https://pypi.org/project/DendroPy/.

License: Other

Python 85.62% Shell 0.01% R 0.01% Turing 14.36%

dendropy's Introduction

DendroPy wordmark

automated tests:

Continuous Integration build

 

package version:

PyPI version

 

documentation:

Documentation Status

 

test coverage:

codecov coverage


DendroPy is a Python library for phylogenetic computing. It provides classes and functions for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats, such as NEXUS, NEWICK, NeXML, Phylip, FASTA, etc. Application scripts for performing some useful phylogenetic operations, such as data conversion and tree posterior distribution summarization, are also distributed and installed as part of the libary. DendroPy can thus function as a stand-alone library for phylogenetics, a component of more complex multi-library phyloinformatic pipelines, or as a scripting "glue" that assembles and drives such pipelines.

The primary home page for DendroPy, with detailed tutorials and documentation, is at:

https://jeetsukumaran.github.io/DendroPy/

DendroPy is also hosted in the official Python repository:

http://pypi.org/project/DendroPy/

Requirements and Installation

The current version of DendroPy requires Python 3:

You can install DendroPy by running:

$ sudo pip install dendropy

More information is available here:

https://jeetsukumaran.github.io/DendroPy/downloading.html

Documentation

Full documentation is available here:

https://jeetsukumaran.github.io/DendroPy/

This includes:

and more.

Citing

If you use any portion of DendroPy v5 in your research, please cite it as:

Moreno, M. A., Sukumaran, J., and M. T. Holder. 2024. DendroPy 5: a mature Python library for phylogenetic computing. arXiv preprint arXiv:2405.14120. https://doi.org/10.48550/arXiv.2405.14120

For BibTex users:

@misc{dendropy5,
   title = {DendroPy 5: a mature Python library for phylogenetic computing},
   author = {Moreno,  Matthew Andres and Sukumaran,  Jeet and Holder,  Mark T.},
   year = {2024},
   keywords = {Populations and Evolution (q-bio.PE),  FOS: Biological sciences,  FOS: Biological sciences},
   publisher = {arXiv},
   doi = {10.48550/ARXIV.2405.14120},
   url = {https://arxiv.org/abs/2405.14120},
   copyright = {arXiv.org perpetual, non-exclusive license}
}

Earlier DendroPy versions can be cited as:

Sukumaran, J. and M. T. Holder. 2010. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26: 1569-1571. https://doi.org/10.1093/bioinformatics/btq228

Consider also leaving a star on GitHub!

License and Warranty

Please see the file "LICENSE.rst" for details.

Developers

dendropy's People

Contributors

actapia avatar allista avatar andrewguy avatar biologyguy avatar carreau avatar crosenth avatar dependabot[bot] avatar eascarrunz avatar hdetering avatar jeetsukumaran avatar joaks1 avatar jonchang avatar kyungtaeklim avatar mmore500 avatar mtholder avatar nephrain avatar nicoladm avatar noahamsel avatar petercombs avatar samstudio8 avatar snacktavish avatar synedraacus avatar wosnat avatar wrightaprilm avatar wwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dendropy's Issues

Duplicating trees while pruning

Hi Jeet-

I encountered something that I wasn't sure was an expected behavior or not an expected behavior.

When pruning tips from a tree, I've noticed that

tree.prune_taxa(first_foss)

returns a tree but that

tree1 = tree.prune_taxa(first_foss)

does not assign a tree object to tree1.

If that's the expected behavior, cool. If not, I can write up a more detailed example with an example input file and put it somewhere for you to access.

uncaught exception when using "plot_metric='length'" without lengths

Hi

Possible minor bug in that function call looks for a non-existent attribute. Perhaps a more informative Error pointing to the lack of edge lengths rather than the more indirect lack of an attribute?

import dendropy

t = dendropy.Tree.get_from_string("(('A1','A2'),('B1','B2'),('C1','C2'));", schema = "newick")
print(t.as_ascii_plot(plot_metric='length'))

# AttributeError: 'Tree' object has no attribute 'as_newick_string'

t = dendropy.Tree.get_from_string("(('A1':0.1,'A2':0.1):0.2,('B1':0.1,'B2':0.1):0.2,('C1':0.1,'C2':0.1):0.2);", schema = "newick")
print(t.as_ascii_plot(plot_metric='length'))

# prints OK

UPGMA Trees are None-rooted

This might be an intentional feature, but just in case...

The PDM class's upgma_tree() method produces a "None"-rooted tree, similarly to the default Tree class. Is this on purpose? I can't imagine why a user would assume UPGMA creates an unrooted tree.

Best,
Niko

Is there a recommended way to run the tests in parallel?

the test suite is awe-inspiring... but obviously it takes awhile to run. Do you have any tricks for running it in parallel?

I just played around with using concurrencytest on a concurrenttest branch. I did not submit a pull request, as I'm not confident that this is the optimal solution (that package does not seem to be updated often, though perhaps that is not a problem). I've only tested on python 2.7.9 (on ubuntu)

It did improve the runtime of:

time python dendropy/test/__main__.py

from:

real    5m26.438s
user    5m14.988s

to:

real  2m30.712s
user  0m0.724s

possible enhancement: retain edge identity when rerooting

This is a low priority request, and may not be a great idea...

When you are rerooting at an existing node and the tree is currently rooted at a polytomy, the number of nodes and edges in the tree does not change.
It would be somewhat nice if the rerooting were accomplished by flipping edges so that the mapping between an edge and the (unrooted) bipartition of taxa was maintained. This is not currently done.

An example, when writing https://gist.github.com/mtholder/8c5a4079d04a0129c652e431575add01 I tried:

  def edge_label_method():
    tree = get_tree()
    for edge in tree.preorder_edge_iter():
        h = edge.head_node
        if not h.is_leaf():
            edge.label = h.label
    tree.seed_node.label =''
    outgroup_node = tree.find_node_with_taxon_label("X")
    new_root = outgroup_node.parent_node
    tree.reseed_at(new_root)
    for edge in tree.preorder_edge_iter():
        h = edge.head_node
        if not h.is_leaf():
            h.label = edge.label
    new_root.label = ''
    return tree

but this did not work because of the lack of persistence of edge identity.

Possible bugs setting parent_node, not _parent_node

Just looking through the source at https://pythonhosted.org/DendroPy/_modules/dendropy/datamodel/treemodel.html#Tree.ladderize, I see there are 3 instances where .parent_node is assigned (without leading underscore), e.g. in _set_tail_node():

self._head_node.parent_node = node

_set_seed_node:

self._seed_node.parent_node = None

and in from_split_bitmasks():

parent_node = parent_node.parent_node

Perhaps I'm misunderstanding this, but I was wondering if you mean ._parent_node, not .parent_node in at least 2 of these cases?

The trprobs file built by sumtrees.py contains unrooted duplicates when run with rooted trees

I'd like to know how frequently topologies are found in a sample of rooted trees, but sumtrees.py seems to return unrooted trees.
I provide an example with trees containing just 4 leaves. sumtrees.py produces the following set of 15 trees, some of which look identical to each other:

TREE Tree1 = [&count=66,probability=0.132,cumulative_probability=0.132](A_2,B_2,(A_1,B_1));

TREE Tree2 = [&count=63,probability=0.126,cumulative_probability=0.258](A_2,B_1,(A_1,B_2));

TREE Tree3 = [&count=57,probability=0.114,cumulative_probability=0.372](A_2,A_1,(B_1,B_2));

TREE Tree4 = [&count=36,probability=0.072,cumulative_probability=0.444](A_2,B_1,(A_1,B_2));

TREE Tree5 = [&count=33,probability=0.066,cumulative_probability=0.51](A_2,B_2,(A_1,B_1));

TREE Tree6 = [&count=29,probability=0.058,cumulative_probability=0.568](A_2,A_1,(B_1,B_2));

TREE Tree7 = [&count=26,probability=0.052,cumulative_probability=0.62](A_2,A_1,(B_1,B_2));

TREE Tree8 = [&count=26,probability=0.052,cumulative_probability=0.672](A_2,B_2,(A_1,B_1));

TREE Tree9 = [&count=26,probability=0.052,cumulative_probability=0.724](A_2,B_1,(A_1,B_2));

TREE Tree10 = [&count=26,probability=0.052,cumulative_probability=0.776](A_2,B_2,(A_1,B_1));

TREE Tree11 = [&count=26,probability=0.052,cumulative_probability=0.828](A_2,B_1,(A_1,B_2));

TREE Tree12 = [&count=24,probability=0.048,cumulative_probability=0.876](A_2,B_2,(A_1,B_1));

TREE Tree13 = [&count=23,probability=0.046,cumulative_probability=0.922](A_2,A_1,(B_1,B_2));

TREE Tree14 = [&count=20,probability=0.04,cumulative_probability=0.962](A_2,B_1,(A_1,B_2));

TREE Tree15 = [&count=19,probability=0.038,cumulative_probability=1.0](A_2,A_1,(B_1,B_2));

Thanks for your help!

Bastien.
My command:
/Downloads/sumtrees.py --rooted --trprobs=/Downloads/treeProbs.txt ~/Downloads/500Trees.txt
The file (shortened, otherwise github won't let me submit an issue; also, the file is properly uploaded, but does not display well in the github issue viewer for some reason. If I click on "edit" I have access to the correct file):
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;
&R[&index=6]:0;

Inconsistent AttributeError on deepcopy

Hi there,
I've been getting an intermittent error that you might be interested in.

PhyloBuddy.py:408: in _make_copy
    return deepcopy(_phylobuddy)
../../../anaconda/lib/python3.4/copy.py:182: in deepcopy
    y = _reconstruct(x, rv, 1, memo)
../../../anaconda/lib/python3.4/copy.py:300: in _reconstruct
    state = deepcopy(state, memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:246: in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:219: in _deepcopy_list
    y.append(deepcopy(a, memo))
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:3117: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:1006: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:219: in _deepcopy_list
    y.append(deepcopy(a, memo))
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:1006: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:219: in _deepcopy_list
    y.append(deepcopy(a, memo))
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:1006: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:219: in _deepcopy_list
    y.append(deepcopy(a, memo))
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:1006: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:155: in deepcopy
    y = copier(x, memo)
../../../anaconda/lib/python3.4/copy.py:219: in _deepcopy_list
    y.append(deepcopy(a, memo))
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:1006: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:785: in __deepcopy__
    other.__dict__[k] = copy.deepcopy(self.__dict__[k], memo)
../../../anaconda/lib/python3.4/copy.py:166: in deepcopy
    y = copier(memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/treemodel.py:759: in __deepcopy__
    return basemodel.Annotable.__deepcopy__(self, memo=memo)
../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:789: in __deepcopy__
    other.deep_copy_annotations_from(self, memo)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <dendropy.datamodel.treemodel.Edge object at 0x10b9be128>
other = <dendropy.datamodel.treemodel.Edge object at 0x10b93ba90>
memo = {4297067376: None, 4489007632: 1.0, 4489196880: <PhyloBuddy.PhyloBuddy object at 0x10c074f98>, 4489197608: <Tree object at 0x10c074198>, ...}

    def deep_copy_annotations_from(self, other, memo=None):
        """
            Note that all references to ``other`` in any annotation value (and
            sub-annotation, and sub-sub-sub-annotation, etc.) will be
            replaced with references to ``self``. This may not always make sense
            (i.e., a reference to a particular entity may be absolute regardless of
            context).
            """
        if hasattr(other, "_annotations"):
            # if not isinstance(self, other.__class__) or not isinstance(other, self.__class__):
            if type(self) is not type(other):
                raise TypeError("Cannot deep-copy annotations from different type (unable to assume object equivalence in dynamic or nested annotations)")
            if memo is None:
                memo = {}
            for a1 in other._annotations:
                a2 = copy.deepcopy(a1, memo=memo)
                memo[id(a1)] = a2
                if a2.is_attribute and a1._value[0] is other:
                    a2._value = (self, a1._value[1])
                self.annotations.add(a2)
>           memo[id(other._annotations)] = self._annotations
E           AttributeError: 'Edge' object has no attribute '_annotations'

../../../anaconda/lib/python3.4/site-packages/dendropy/datamodel/basemodel.py:736: AttributeError

I'm making a bunch of calls to my _make_copy() function during unit testing, and I'd estimate that the deepcopy() fails with an AttributeError about 30% of the time (seemingly randomly). I haven't really dug into it myself yet, so there could well be a weakness in my own code, but I wanted to pass the traceback along to put it on your radar.
Cheers,
-Steve

sumtrees.py outputs an extra tree when onto a target tree

Given the command:

sumtrees.py bootstrap.trees -f 0.33 -F newick --suppress-annotations -o output.tree -i newick

sumtrees will output a single line containing a tree, as expected. However, if a (single) target tree is specified:

sumtrees.py bootstrap.trees -f 0.33 -F newick --suppress-annotations -o output.tree -i newick -t target.tree

The output contains two identical trees. I'm using version 4.1.0 of both Dendropy and SumTrees, on Python 2.7.3

Perhaps the culprit is line 1720:

target_trees.append(tree)

Which occurs immediately after a message about number of target trees is printed, and is outside the if/else block regarding target trees, and outside the for loop iterating over the target trees.

Indeed, if I add a second target tree to the target.tree file, SumTrees prints:

SumTrees: Summarizing onto 2 target trees defined in 'target.tree':

but then it outputs three trees.

bipartition.is_trivial is True for non-trivial bipartitions in v4.0.3

In the following print() output, I would not expect line 3 to be included because AB | CD isn't a trivial bipartition:

:tree = dendropy.Tree.get_from_string('((A,B),(C,D));', 'newick', rooting=None)
:tree.encode_bipartitions()
:for bipartition in tree.bipartition_encoding:
:    if bipartition.is_trivial:
:        print(bipartition.split_as_newick_string(tree.taxon_namespace))
:
:--                               
((B, C, D), (A));
((B), (A, C, D));
((C, D), (A, B));
((C), (A, B, D));
((D), (A, B, C));
(A,B,C,D);

I also wouldn't expect the last line to be printed because "rooting=None" so it is partitioning an unrooted tree from nothing. Is this behaviour unexpected?

Probelms with DendroPy compiled with other software.

Thank you for your platform to support the use of DendroPy.

I installed a software called CheckM . And DendroPy was also installed. But seems there are some errors. The details are as follows:

[root@student ~]# python -m dendropy
DendroPy version : DendroPy 4.1.0
DendroPy location : /usr/local/lib/python2.7/site-packages/dendropy
Python version : 2.7 (r27:82500, Apr 29 2016, 10:17:23) [GCC 4.4.6 20120305 (Red Hat 4.4.6-4)]
Python executable : /usr/local/bin/python
Python site packages : ['/usr/local/lib/python2.7/site-packages', '/usr/local/lib/site-python']
Exception AttributeError: "'NoneType' object has no attribute 'ImmutableTypeError'" in <bound method FrozenOrderedDict.del of FrozenOrderedDict([('1', <StateIdentity at 0x1ed8490: '1'>), ('0', <StateIdentity at 0x1ed84d0: '0'>)])> ignored
Exception AttributeError: "'NoneType' object has no attribute 'ImmutableTypeError'" in <bound method FrozenOrderedDict.del of FrozenOrderedDict([('1', <StateIdentity at 0x1ed8490: '1'>), ('0', <StateIdentity at 0x1ed84d0: '0'>)])> ignored
Exception AttributeError: "'NoneType' object has no attribute 'ImmutableTypeError'" in <bound method FrozenOrderedDict.del of FrozenOrderedDict()> ignored
Exception AttributeError: "'NoneType' object has no attribute 'ImmutableTypeError'" in <bound method FrozenOrderedDict.del of FrozenOrderedDict([(0, <StateIdentity at 0x1ed8490: '1'>), (1, <StateIdentity at 0x1ed84d0: '0'>)])> ignored
Exception AttributeError: "'NoneType' object has no attribute 'ImmutableTypeError'" in <bound method FrozenOrderedDict.del of FrozenOrderedDict()> ignored

Documentation for newick output possibly wrong

At https://pythonhosted.org/DendroPy/schemas/newick.html (which I presume is generated from the codebase), under reading, it says

  • suppress_internal_node_taxa (boolean, default: True) – If False, internal node labels will be instantantiated into Taxon objects. If True, internal node labels will not be instantantiated as strings.
  • suppress_leaf_node_taxa (boolean, default: False) – If False, leaf (external) node labels will be instantantiated into Taxon objects. If True, leaff (external) node labels will not be instantantiated as strings.

Should these read "If True, xxx node labels will not be instantantiated into Taxon objects, but be placed as node labels instead". That's what seems to happen anyway.

Feature request: Writing to JSON schema

DendroPy is awesome.

I would love to see DendroPy's Tree data structures access modern tree visualization tools like D3. A simple method for writing the Tree data structure to a JSON schema would do the trick. I'm imagining JSON that follows the format described in this post.

I'd be happy to work on a PR if y'all agree it would be useful.

mrca changes underlying tree?

from dendropy import Tree

tr = Tree.get(schema='newick', data='((A,B):1,(C,(D,E):2):3);')
print str(tr)

lca = tr.mrca(taxon_labels=['A', 'B'])
print str(tr)

gives

((A,B):1.0,(C,(D,E):2.0):3.0)
((A,B):4.0,C,(D,E):2.0)

I've tried this using python2 & 3, as well as on 4.1.0 and development-master. Calling update_bipartitions() before mrca was no help either.

mrca isn't supposed to change the tree, right?

Maximum recursion depth error

Hi @jeetsukumaran,

Before I continue reporting on this error, just want to say a big thanks for putting out this library. It's come in really useful for a few papers we've written.

For a project that we're working on in which we are analyzing a large phylogenetic tree dataset, we are getting a RecursionError: maximum recursion depth exceeded. Not sure where this is coming from, but will do my best to illustrate what's going on here.

First off, the traceback looks like this:

Traceback (most recent call last):
  File "isolate_pds.py", line 14, in <module>
    t = Tree.get(path=tree_filename, schema='nexus')
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/treemodel.py", line 2730, in get
    return cls._get_from(**kwargs)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/basemodel.py", line 155, in _get_from
    return cls.get_from_path(src=src, schema=schema, **kwargs)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/basemodel.py", line 218, in get_from_path
    **kwargs)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/treemodel.py", line 2635, in _parse_and_create_from_stream
    global_annotations_target=None)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/ioservice.py", line 362, in read_tree_lists
    global_annotations_target=global_annotations_target)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/nexusreader.py", line 362, in _read
    self._parse_nexus_stream(stream)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/nexusreader.py", line 567, in _parse_nexus_stream
    self._parse_trees_block()
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/nexusreader.py", line 1115, in _parse_trees_block
    taxon_symbol_mapper=taxon_symbol_mapper)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/nexusreader.py", line 995, in _parse_tree_statement
    tree = self._build_tree_from_newick_tree_string(tree_factory, taxon_symbol_mapper)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/nexusreader.py", line 1017, in _build_tree_from_newick_tree_string
    taxon_symbol_map_fn=taxon_symbol_mapper.require_taxon_for_symbol)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/newickreader.py", line 374, in _parse_tree_statement
    is_internal_node=None)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/newickreader.py", line 550, in _parse_tree_node_description
    is_internal_node=is_new_internal_node,

The last line gets repeated many, many times, until we get the final lines:

  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/dataio/newickreader.py", line 541, in _parse_tree_node_description
    new_node = tree.node_factory();
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/treemodel.py", line 2999, in node_factory
    return Node(**kwargs)
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/treemodel.py", line 1002, in __init__
    length=kwargs.pop("edge_length", None))
  File "/home/ericmjl/anaconda/envs/h5n2/lib/python3.5/site-packages/dendropy/datamodel/treemodel.py", line 746, in __init__
    basemodel.DataObject.__init__(self, label=kwargs.pop("label", None))
RecursionError: maximum recursion depth exceeded

My computing environment is as such:

$ conda list
# packages in environment at /home/ericmjl/anaconda/envs/h5n2:
#
cycler                    0.10.0                   py35_0    defaults
dendropy                  4.1.0                     <pip>
fontconfig                2.11.1                        5    defaults
freetype                  2.5.5                         0    defaults
icu                       56.1                          2    conda-forge
libiconv                  1.14                          1    conda-forge
libpng                    1.6.17                        0    defaults
libxml2                   2.9.3                         6    conda-forge
matplotlib                1.5.1               np111py35_0    defaults
mkl                       11.3.1                        0    defaults
numpy                     1.11.0                   py35_0    defaults
openssl                   1.0.2h                        0    defaults
pandas                    0.18.1              np111py35_0    defaults
pip                       8.1.1                    py35_1    defaults
pyparsing                 2.1.1                    py35_0    defaults
pyqt                      4.11.4                   py35_1    defaults
python                    3.5.1                         0    defaults
python-dateutil           2.5.3                    py35_0    defaults
pytz                      2016.4                   py35_0    defaults
qt                        4.8.7                         1    defaults
readline                  6.2                           2    defaults
setuptools                20.7.0                   py35_0    defaults
sip                       4.16.9                   py35_0    defaults
six                       1.10.0                   py35_0    defaults
sqlite                    3.11.1                        0    conda-forge
tk                        8.5.18                        0    defaults
tqdm                      4.5.0                     <pip>
wheel                     0.29.0                   py35_0    defaults
xz                        5.0.5                         1    defaults
zlib                      1.2.8                         0    defaults

Would this be an issue with the input tree file? We are happy to provide a copy of the Nexus tree, as well as the script, for further testing.

python3 test failure

Hi,

I'm attempting to package dendropy for the GNU Guix package manager, and I'm hitting 2 errors when running the tests of python3(.4.3). I've not done very extensive searching, but I wonder if these have already been fixed in master and are unreleased? Or is there some other issue? There is no problem on python 2.7.

======================================================================
ERROR: test_distances (dendropy.test.test_phylogenetic_distance_matrix.NodeToNodeDistancesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/test/test_phylogenetic_distance_matrix.py", line 817, in test_distances
    ndm = tree.node_distance_matrix()
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/datamodel/treemodel.py", line 5512, in node_distance_matrix
    return NodeDistanceMatrix.from_tree(tree=self)
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/calculate/phylogeneticdistance.py", line 1363, in from_tree
    ndm.compile_from_tree(tree=tree)
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/calculate/phylogeneticdistance.py", line 1393, in compile_from_tree
    for ch1_subtree_node in self._node_phylogenetic_distances[ch1].keys():
RuntimeError: dictionary changed size during iteration

======================================================================
ERROR: test_mrca (dendropy.test.test_phylogenetic_distance_matrix.NodeToNodeDistancesTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/test/test_phylogenetic_distance_matrix.py", line 839, in test_mrca
    ndm = tree.node_distance_matrix()
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/datamodel/treemodel.py", line 5512, in node_distance_matrix
    return NodeDistanceMatrix.from_tree(tree=self)
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/calculate/phylogeneticdistance.py", line 1363, in from_tree
    ndm.compile_from_tree(tree=tree)
  File "/tmp/nix-build-python-dendropy-4.1.0.drv-0/DendroPy-4.1.0/dendropy/calculate/phylogeneticdistance.py", line 1393, in compile_from_tree
    for ch1_subtree_node in self._node_phylogenetic_distances[ch1].keys():
RuntimeError: dictionary changed size during iteration

----------------------------------------------------------------------
Ran 833 tests in 467.171s

FAILED (errors=2)

Thanks,
ben

Inconsistent distances from leaves to root in birth/death trees

[Edit: Added code blocks rather than images]
Bug Description
The functions "tree.calc_node_root_distances()" and "tree.distance_from_root()" (when applied to leaves) return different distances. I have not tested any other types of trees.

Output

Calc root node distances: [0.9332675163652531, 0.9332675163652531, 0.933267516365253, 0.933267516365253, 0.9332675163652528]
Distance from root: [1.1328731100915086, 1.1328731100915086, 1.1328731100915086, 1.1328731100915086, 1.1328731100915084]

Code

import dendropy as dp

t = dp.simulate.birth_death_tree(birth_rate=1.0,
                                 death_rate=0,
                                 ntax=5)

print("Calc root node distances: {}".format(t.calc_node_root_distances()))

print("Distance from root: {}".format([x.distance_from_root() for x in t.leaf_node_iter()]))

System Information
Operating System: OSX 10.11.5 (15F34)
DendroPy version : DendroPy 4.1.0
DendroPy location : /Users/nikoyasui/anaconda3/lib/python3.5/site-packages/dendropy
Python version : 3.5.1 |Anaconda custom (x86_64)| (default, Dec 7 2015, 11:24:55) [GCC 4.2.1 (Apple Inc. build 5577)]
Python executable : /Users/nikoyasui/anaconda3/bin/python
Python site packages : ['/Users/nikoyasui/anaconda3/lib/python3.5/site-packages']

taxon_set.labels() does not properly parse underscores

Jeet,
I love DendroPy, but there's a bug in the labels function that strips underscores from the taxa names. This is for version 3.8.1.

tree = "((((((pan_tro:0.000000,(nom_leu:0.004887,gor_gor:0.007390):0.000000):0.000000,hom_sap:0.007444):0.002207,pon_abe:0.015340):0.002445,(rhe_mac:0.004855,pap_ham:0.002393):0.007689):0.005980,cal_jac:0.038147):0.015467,mus_mus:0.122579,oto_gar:0.011451);"
t = dendropy.Tree()
t.read_from_string(tree,'newick')
t.taxon_set.labels()

Result:

['pan tro', 'nom leu', 'gor gor', 'hom sap', 'pon abe', 'rhe mac', 'pap ham', 'cal jac', 'mus mus', 'oto gar']

Cheers,
Nick

This newick string breaks tree reader

(Loxodonta_cyclotis_x_africana_ott4943109,(Loxodonta_africana_knochenhaueri_ott904474)Loxodonta_africana_ott541936,Loxodonta_cyclotis_ott418570)Loxodonta_ott541933;

reroot_at_edge() causes trifurcation and incorrect branch lengths

The Tree.reroot_at_edge() method does not work as described. In version 4.0.2 and 4.0.3, it results in a trifurcation and does not respect the provided edge lengths:

newick_str = '(((A:0.5,B:0.5):0.5,C:0.5):0.5,(D:0.5,E:0.5):0.5);
tree = dendropy.Tree.get(data=newick_str, schema='newick')

mrca = tree.mrca(taxon_labels=('A', 'B'))
print 'mcra.edge_length', mrca.edge_length

tree.reroot_at_edge(mrca.edge, length1=0.5 * mrca.edge_length, length2=0.5 * mrca.edge_length, update_bipartitions=True)

for child in tree.seed_node.child_nodes():
    print 'child.edge_length', child.edge_length

tree.write_to_path('rooted.tree', schema='newick')

The output is:

mcra.edge_length 0.5
child.edge_length 0.5
child.edge_length 0.5
child.edge_length 1.0

The resulting tree is:
[&R] ((A:0.5,B:0.5):0.5,C:0.5,(D:0.5,E:0.5):1.0);

Note that this tree incorrectly has a trifurcation and that the edge lengths to the root are not 0.25 as expected (i.e., half the edge length of the parent branch from the MCRA node).

rerooting trifurcated tree issue

Hi there,

I'm encountering a bit of a problem where I'm trying to reroot a tree that is trifurcated at the root so that the new root is on the longest edge descending from the current root. I've outlined the problem with the following code:

example tree

import dendropy

tree_string = '(1111319:0.0267,(1111717:0.01494,1111294:0.02598)1.000:0.02879,(1111874:0.03322,1111783:0.01707)0.772:0.01236);'
tree = dendropy.Tree.get(data=tree_string, schema="newick")

The tree is trifurcated

print tree.as_ascii_plot(plot_metric='length')
/------------------------------ 1111319
|
|                                /----------------- 1111717
+--------------------------------+
|                                \------------------------------ 1111294
|
|             /-------------------------------------- 1111874
\-------------+
              \------------------- 1111783

I want to reroot the tree on the longest edge descending from the root (i.e. the middle one). So I find that edge, and attempt to reroot it on that edge

children = {x.length:x for x in tree.seed_node.child_edges()}
longest_edge_length = max(children.keys())
longest_edge = children[longest_edge_length]
tree.reroot_at_edge(longest_edge, 
                    length1=(longest_edge_length/2), 
                    length2=(longest_edge_length/2))

But the tree is still trifurcated

print tree.as_ascii_plot(plot_metric='length')
                                 /----------------- 1111717
/--------------------------------+
|                                \------------------------------ 1111294
|
+------------------------------ 1111319
|
|             /-------------------------------------- 1111874
\-------------+
              \------------------- 1111783

Strangely enough, when I try it again on the just-rerooted tree it works


tree.reroot_at_edge(longest_edge,
                    length1=(longest_edge_length/2),
                    length2=(longest_edge_length/2))
print tree.as_ascii_plot(plot_metric='length')

               /--------------- 1111717
/--------------+
|              \--------------------------- 1111294
+
|              /--------------------------- 1111319
\--------------+
               |            /----------------------------------- 1111874
               \------------+
                            \----------------- 1111783

The following description of the reroot_at_edge function lead me to believe that it would create a node on the edge you specify, and reroot the tree on that node, with the distances on either side of the node being specified with the 'length' parameters:

Takes an internal edge, edge, adds a new node to it, and then roots
the tree on the new node.

I'm not totally sure if this is a bug or just my misunderstanding of how to use the reroot_at_edge function. Do you have any idea why it would not be working on the first try? How would I be able to achieve what I need on the first go?

I'm using version 4.1.0 of dendropy

Thanks so much for any help you can provide.

Joel

'Tree' object has no attribute 'bipartitions'

Calling TreeList.frequency_of_bipartition() can fail as the bipartitions attribute is not defined for Tree objects. Perhaps this is suppose to be bipartition_encoding?

File "/srv/sw/DendroPy/4.0.3/lib/python2.7/site-packages/dendropy/datamodel/treecollectionmodel.py", line 1197, in frequency_of_bipartition
    if not is_bipartitions_updated or not tree.bipartitions:
AttributeError: 'Tree' object has no attribute 'bipartitions'

char. matrix `from_dict` and unicode keys

If the taxon labels are unicode, then (in python 2) from_dict will fail because there is a special case:

if isinstance(x, str):

logic. We should probably have a string utility function. peyotl uses an is_str_type method for this. See https://github.com/mtholder/peyotl/blob/c3a544211edc669e664bae28095d52cecfa004f3/peyotl/utility/str_util.py#L5-L25 to deal with the py2 + py3 differences.

All cases of isinstance(..., str) should probably be checked.

Not a high priority (for me at least)

trprobs memory crash causes truncated consensus tree

I was trying to get all of the tree probabilities (using the --trprobs command), however my runs have been crashing while writing the tree probability file.
Even more problematic, it seems to forget to write the entire con tree file before moving onto the tree probabilities file (it just gets cut off without finishing the printing of the file).
Does the trprobs command require much more memory than the normal analysis?
Even if that is true, why would it stop printing the con tree before it has finished?
Does it move onto the tree probabilities while it is printing the tree, so it crashes before printing?
I have tried running the analysis without --trprobs and it runs without a problem.

Ordered Parsimony

Would it be possible to implement ordered parsimony (Wagner parsimony.. I think)? I see that Fitch parsimony is already implemented.

  • Jamie

Duplicate labels

Trying to read a tree with duplicate labels fails.

If I have test.out containing
((a,a),(b,c));

The following line:
trees = dendropy.TreeList.get_from_path('test.out','newick',allow_duplicate_taxon_labels=True)

outputs:
Traceback (most recent call last):
File "", line 1, in
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/dataobject/tree.py", line 641, in init
self.process_source_kwargs(**kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/utility/iosys.py", line 285, in process_source_kwargs
self.read(stream=stream, schema=schema, *_kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/dataobject/tree.py", line 701, in read
tree = Tree._parse_from_stream(stream, schema, *_kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/dataobject/tree.py", line 485, in _parse_from_stream
d = DataSet(stream=stream, schema=schema, **kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/dataobject/dataset.py", line 88, in init
self.process_source_kwargs(**kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/utility/iosys.py", line 285, in process_source_kwargs
self.read(stream=stream, schema=schema, **kwargs)
File "python/lib/python2.7/site-packages/DendroPy-3.12.0-py2.7.egg/dendropy/dataobject/dataset.py", line 165, in read
raise x
dendropy.utility.error.DataParseError: Error parsing data source "test.out" on line 1 at column 7: Taxon a used twice (it appears as a the second time)

Unstable behavior in symmetric differences

Hi Jeet-

I'm having an issue with the behavior of symmetric differences on a huge tree (4161 taxa).

lit_tree = dendropy.Tree.get_from_path("original_tree.tre", "newick")
ml = dendropy.Tree.get_from_path("RAxML_bestTree.result", "newick")
ml.is_rooted
False
lit_tree.is_rooted
False

ml.symmetric_difference(lit_tree)
820
ml.symmetric_difference(lit_tree)
7100
lit.symmetric_difference(ml)
7100

Taxon set is the same. Do you have a sense for what might be going on here?

PatristicDistanceMatrix

Hi,

I have a script that uses PatristicDistanceMatrix. I found it does not work on 4.1 and I looked and it says legacy: will soon be deprecated. I tried using patristic_distance but since I use the whole tree this was much slower.

Jim

Infinite loop in treemodel.Tree.reroot_at_midpoint()

In the following while loop, I believe there needs to be a break following the else statement.

def reroot_at_midpoint(self, update_bipartitions=False, suppress_unifurcations=True):
        ...
        # going up ...
        while cur_node is not mrca_node:
            if cur_node.edge.length > plen:
                target_edge = cur_node.edge
                head_node_edge_len = plen #cur_node.edge.length - plen
                plen = 0
                break
            elif cur_node.edge.length < plen:
                plen -= cur_node.edge.length
                cur_node = cur_node._parent_node
            else:
                break_on_node = cur_node

Otherwise there isn't a way out of the loop if you make it past the if statements.
-Steve

can node labels be parsed as support values for tree rooting purposes?

Hi @jeetsukumaran and @mtholder!
I have recently expanded the analysis by Czech and Stamatakis regarding the incorrect mapping of support values after tree rooting in tree viewers, and found that multiple toolkits (perhaps also dendropy) might be affected.

@lczech, @stamatak and I are now working on reporting/confirming the bug in more frameworks and tree viewers.

Current tests are here:
https://github.com/jhcepas/test_branch_support_after_tree_rerooting

regarding dendropy, I don't know why the rooted topology returned looks as unrooted in the image. Is that normal? In any case, it seems that the support values are not mapped correctly (i.e. the support value of 20 does not correspond to the same partition as in the original tree).

Could you take a look and confirm that this is not a problem in our tests?
tx!

sumlabels?

Hi there!

A couple years ago I was using sumlabels.py to map support values of multiple different onto one tree. I see that the script has not been updated in some time and I cannot seem to find it in the newest repository. Can you tell me if it has been incorporated into another package or with another name?

I also have a specific question about the script. I noticed that if it does not find the same support value for a given bipartition, it will not map a value to the output tree. This makes sense since that bipartition was not found! However, when mapping the values, its hard to determine which value was added from which tree. Is there a way to get the script to write a '-' or 'nf.' in place of nothing?

Thank you!
Courtney

Error installing DendroPy along Python 2.7 on Mac OS

Dear all,

I've been trying to install DendroPy using pip and the GitHub repository. Both conda and pip install didn't work for me.

Using the command:

conda install -c ericmjl dendropy=4.1.0

UnsatisfiableError: The following specifications were found to be in conflict:

  • dendropy 4.1.0*
  • python 2.7*
    Use "conda info " to see the dependencies for each package.

pip install tells me:

sudo pip install -U dendropy

Collecting dendropy
Downloading DendroPy-4.2.0.tar.gz (16.7MB)
100% |████████████████████████████████| 16.7MB 38kB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/pip-build-p12Yma/dendropy/setup.py", line 35, in
from dendropy import version, revision_description, description
File "dendropy/init.py", line 24, in
from dendropy.dataio.nexusprocessing import get_rooting_argument
File "dendropy/dataio/init.py", line 20, in
from dendropy.dataio import newickreader
File "dendropy/dataio/newickreader.py", line 27, in
from dendropy.utility.textprocessing import StringIO
File "dendropy/utility/textprocessing.py", line 40, in
ENCODING = locale.getdefaultlocale()[1]
File "/Users/rubenbakker/anaconda/lib/python2.7/locale.py", line 545, in getdefaultlocale
return _parse_localename(localename)
File "/Users/rubenbakker/anaconda/lib/python2.7/locale.py", line 477, in _parse_localename
raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-build-p12Yma/dendropy/

using the commands:

cd
git clone git://github.com/jeetsukumaran/DendroPy.git
sudo python setup.py develop

I receive the same error message as using pip.

Do other people experience the same issues when installing dendropy? Has the library been discontinued?

Thank you for reading this far down my post.

Ruben

resolve_polytomies inconsistently creates 0-length branches

In python3/dendropy 4.1.0 (NB: I couldn't find 4.2.0 in the https://pypi.python.org repo, despite what is written on the dendropy home page), when resolve_polytomies() is called with no arguments it creates 0-length branches (which is what I prefer, and would expect). When it is called with an rng argument, the newly created edges have no length, however.

from dendropy import Tree
import random
t1 = Tree.get_from_string("((A,B,C),D);", schema="newick")
t1.resolve_polytomies()
print(t1.as_string(schema="newick")) #contains 0-length branches, where polytomies have been resolved
r = random.Random()
r.seed(1234)
t2 = Tree.get_from_string("((A,B,C),D);", schema="newick")
t2.resolve_polytomies(rng=r)
print(t2.as_string(schema="newick")) #no 0-length branches

I presume there should be some consistency here. Personally, I would prefer t2 to be in the style of t1, with zero-length edges, although I could see that setting either length = 0 or length = None might be a further useful switch to the resolve_polytomies() method.

Floating point error accumulation in contained coalescent trees

When a coalescent tree is contained within a species tree with longest distance to root of N, the longest distances to root in the coalescent tree are slightly longer than N. I believe this can be solved by using Decimal, or at least importing some popular numerical packages so that users can convert their edge lengths to a numerical representation of their choice.

If that is not in line with the vision of the project, please change the documentation in treemodel.py to reflect the types of edge lengths.

Best,
Niko

NameError raised by dendropy.calculate.treecompare.find_missing_bipartitions

Hi

In version 4.0.3 a NameError is raised by find_missing_bipartitions(), unless I'm doing something wrong. Can be reproduced with:

tree1 = dendropy.Tree.get_from_string('(A,(B,C));', 'newick')
tree2 = dendropy.Tree.get_from_string('((A,B),C);', 'newick', taxon_namespace = tree1.taxon_namespace)
dendropy.calculate.treecompare.find_missing_bipartitions(tree1, tree2)

which gives me

DendroPy-4.0.3/dendropy/calculate/treecompare.pyc in find_missing_bipartitions(reference_tree, comparison_tree, is_bipartitions_updated)
    340     if not is_bipartitions_updated:
    341         reference_tree.encode_bipartitions()
--> 342         comparision_tree.encode_bipartitions()
    343     else:
    344         if reference_tree.bipartition_encoding is None:

NameError: global name 'comparision_tree' is not defined

thanks for all the hard work! Version 4 looks good.

PHYLIP reader can only read one character matrix per file

It would be useful if the PHYLIP reader could read a bunch of character matrices from a single PHYLIP file. For example a PHYLIP file might look like

200 345
...200 lines with 345 characters each...
200 221
...200 lines with 221 characters each...
etc.

Change of consensus frequency in sumtrees.py does not work

The -f parameter has no effect. Suggested change:
tree = tree_array.consensus_tree(min_consensus_freq=args.min_consensus_freq, summarize_splits=False)

Becomes

tree = tree_array.consensus_tree(min_freq=args.min_consensus_freq, summarize_splits=False)

Either that or change treedatamodel.py

avoid recursion over trees.

See https://groups.google.com/forum/#!topic/dendropy-users/-CVyTrcM2Sw
_compose_node in dataio.newickwriter is recursive which causes crashes on big trees.

A workaround is:

import sys
sys.setrecursionlimit(2000)

but that should not be required.

DendroPy's nice tree iterators should make recursion easy to avoid.
sidenote: In peyotl, I recently added a before_after_apply method to the node class. It takes a function to be called before an subtree is entered, another for each leaf encountered, and a third for leaving the subtree. That can be a convenient way of writing some functions with iteration rather than recursion. For example, the (very simplistic) write_newick in peyotl is then:

def write_newick(out, node):
    def _open_newick(node):
        if node.is_first_child_of_parent:
            out.write('(')
        else:
            out.write(',(')
    def _t(node):
        if not node.is_first_child_of_parent:
            out.write(',')
        _write_node_info_newick(out, node, **kwargs)
    def _a(node):
        out.write(')')
        _write_node_info_newick(out, node, **kwargs)
    self._root.before_after_apply(before_fn=_open_newick, after_fn=_a, leaf_fn=_t)
    out.write(';\n')

note edited to correct markdown

[User Error] Bug: Error using TreeList and contained coalescent tree

[Edit: added code blocks]
Bug Description
Hi @jeetsukumaran,

I'm trying to make a bunch of contained coalescent trees, and when I use a TreeList to hold my species trees I get a weird error (traceback below). If I use a regular list there's no problem. I don't think my use case is that strange, since it seems like we're meant to use TreeLists as an efficient list of trees when their taxon namespaces are all the same.

Sorry for butchering all your function names in the demo code.

Best,
Niko

Traceback

Traceback (most recent call last):
  File "debug.py", line 49, in <module>
    gene_trees = list(map(make_ctrees, sp_trees))
  File "debug.py", line 47, in <lambda>
    range(n_gene_trees)))
  File "debug.py", line 46, in <lambda>
    make_ctrees = lambda tree: list(map(lambda x: cc_tree(tree, si_map),
  File "/Users/nikoyasui/anaconda3/lib/python3.5/site-packages/dendropy/model/coalescent.py", line 524, in contained_coalescent_tree
    uncoal = coalesce_nodes(nodes=pop_node_genes[edge.head_node],
KeyError: <Node object at 0x1038e7198: 'None' (<Taxon 0x1038e7518 'C'>)>

Demo Code

import dendropy as dp
import string

taxa_map = dp.TaxonNamespaceMapping.create_contained_taxon_mapping
bd_tree = dp.simulate.birth_death_tree
cc_tree = dp.simulate.treesim.contained_coalescent_tree

c = 1
n_sp_trees = 2
n_gene_trees = 1000
n_sp = 5
n_ind = 2 # can be int or list

Ne = 10000
sp_depth = Ne*c

species = dp.TaxonNamespace(string.ascii_uppercase[:n_sp])

if "Creating a list is OK":
    sp_trees = list(map(lambda x: bd_tree(birth_rate=1.0,
                                          death_rate=0,
                                          taxon_namespace=species), 
                        range(n_sp_trees)))

if "Creating a TreeList breaks everything":
    sp_trees = dp.TreeList(map(lambda x: bd_tree(birth_rate=1.0,
                                                 death_rate=0,
                                                 taxon_namespace=species), 
                           range(n_sp_trees)))

elif "Creating a TreeList from a list also breaks everything":
    sp_trees = dp.TreeList(list(map(lambda x: bd_tree(birth_rate=1.0,
                                                      death_rate=0,
                                                      taxon_namespace=species), 
                                    range(n_sp_trees))))


for tree in sp_trees:
    for edge in tree.postorder_edge_iter():
        setattr(edge, 'pop_size', Ne)

si_map = taxa_map(containing_taxon_namespace=species,
                  num_contained=n_ind)

make_ctrees = lambda tree: list(map(lambda x: cc_tree(tree, si_map),
                                    range(n_gene_trees)))

gene_trees = list(map(make_ctrees, sp_trees))

System Info
OS Version: OSX 10.11.5 (15F34)
DendroPy version : DendroPy 4.1.0
DendroPy location : /Users/nikoyasui/anaconda3/lib/python3.5/site-packages/dendropy
Python version : 3.5.1 |Anaconda custom (x86_64)| (default, Dec 7 2015, 11:24:55) [GCC 4.2.1 (Apple Inc. build 5577)]
Python executable : /Users/nikoyasui/anaconda3/bin/python3
Python site packages : ['/Users/nikoyasui/anaconda3/lib/python3.5/site-packages']

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.