Coder Social home page Coder Social logo

Comments (13)

ms609 avatar ms609 commented on July 23, 2024

Thanks for getting in touch. The present limit is 2048 leaves (which I'll document); I'm looking into a workaround but it's less straightforward that I'd hoped. I'll post an update once I get somewhere with this.

from treetools.

ms609 avatar ms609 commented on July 23, 2024

To calculate distances between trees with <8192 tips, you can now:

  1. Uninstall TreeDist and TreeTools

    remove.packages("TreeDist")
    remove.packages("TreeTools")

  • Check the console output to be sure that the packages are fully uninstalled.
  1. Install a modified TreeTools

    devtools::install_github("ms609/TreeTools", ref = "more-leaves")

  2. Re-install TreeDist from source

    devtools::install_github("ms609/TreeDist", ref = "more-leaves") -- not install.packages("TreeDist"), which installs pre-compiled binaries that will not link to the customized TreeTools.

Note that distance computation scales with the square of the number of tips. In other words, comparing two 8000 leaf trees will take a couple of minutes.

I've updated the documentation with this information. Please let me know how you get on; I had a bit of trouble getting this running locally, but hopefully the above instructions will avoid these problems.

from treetools.

noranekonobokkusu avatar noranekonobokkusu commented on July 23, 2024

Hi Martin,

thanks a lot for such a rapid reply! It aborts my RStudio session the moment I run this command now 😅 But I guess that means I did re-installed it successfully and this will work on a computational cluster!

from treetools.

ms609 avatar ms609 commented on July 23, 2024

Drat – this is the issue I was running into as well.
My diagnosis was that the crash occurred when the modified TreeTools was reinstalled without uninstalling and re-installing TreeDist. Could you confirm that you uninstalled both packages before installing both from source, using install_github()?
I'll also be interested to hear whether it runs successfully on a cluster!

from treetools.

noranekonobokkusu avatar noranekonobokkusu commented on July 23, 2024

I can confirm I did all that. When I try running it from command line, I am getting
> TreeDistance(t_large, t_large) Error: segfault from C stack overflow
Even for two trees with 10 leaves each!

On a cluster, it works with 8GB (which is less than on my laptop) for 8,000 leaves 🤔

from treetools.

ms609 avatar ms609 commented on July 23, 2024

Weird – sorry it's not proving straightforward! I've reproduced this issue on a second PC.
My suspicion is that this is related to the (un)installation of the packages. I'll investigate.

from treetools.

ms609 avatar ms609 commented on July 23, 2024

Okay, I think I've got to the bottom of the issue – which is that the stack overflow error should be taken literally; there is not enough space in the stack to create two SplitList() objects of the required dimensions, used to compute the distances.

In summary, this means that a significant re-coding will be required for larger trees to be handled – and that the computation for larger trees will be significantly slower (as it will need to make more use of the heap, rather than fast stack memory). That's a bigger job than I am able to attempt right now. Sorry.


More details for my own future reference:

  • Initially attempted using #define SL_MAX_BINS 128 in TreeTools::SplitList.h

  • This conclusion was reached by editing cpp_mutual_clustering():

    • Rewrite cpp_mutual_clustering to {return <empty list>; const SplitList a(x);: Succeeds
    • Add const SplitList b(y);: stack overflows
  • Test performed by running cpp_mutual_clustering(as.Splits(BalancedTree(8)), as.Splits(PectinateTree(8)), 8)

  • "more-leaves" branches of TreeTools and TreeDist rename packages to BigTreeTools / BigTreeDist

  • some int16s replaced by int32s to allow multiplication in array lookup

from treetools.

noranekonobokkusu avatar noranekonobokkusu commented on July 23, 2024

What I still don't understand is why it stopped working locally even for two tiny trees with 10 leaves each but actually worked on a cluster for a huge tree.

Thanks a lot for looking into this anyhow!

from treetools.

ms609 avatar ms609 commented on July 23, 2024

A fixed amount of memory is allocated as soon as the underlying C++ function is called; because this is allocated on the stack, the amount of memory to allocate is pre-determined and is independent of the variables actually passed. So whatever size of tree is passed, the software requests enough stack memory to compare two 8192-leaf trees.

Differences between a local PC and a cluster will reflect how much memory is available on the stack, which will reflect aspects of memory management that are context-dependent: for instance, I see a crash when using RStudio, but not when running a standalone R session, presumably because Windows allocates memory differently in these contexts.

from treetools.

pterzian avatar pterzian commented on July 23, 2024

Hi Martin,

I've been trying to compare two trees with around 5k leaves (both have the same number of leaves) but I couldn't pass the error : This many leaves cannot be supported. Please contact the TreeTools maintainer if you need to use more!.
I first tried following the above process (uninstalling previous versions) as you suggest but it still gives me the error on my local computer. Then I build a fresh R conda env on a distant server with more resources but I still get the same error.
Any idea what could cause this issue ?

Also, thanks a lot for your tools, they have been very useful so far!
Paul.

from treetools.

ms609 avatar ms609 commented on July 23, 2024

Glad you have been finding the tools useful, @pterzian. Not clear why you would be seeing the "This many leaves" with ~5000 leaves if you are using the BigTreeTools and BigTreeDist packages; maybe worth checking that you are using the functions from these modified packages (which have different names, so need loading with e.g. library("BigTreeDist")) rather than TreeDist?

from treetools.

pterzian avatar pterzian commented on July 23, 2024

You are absolutely right! I saw the BigTreeTools package was installed. However I don't see any BigTreeDist package, should it be installed along with the devtools::install_github("ms609/TreeDist") command ?

Checking conda logs :

  • This command successfully installed BigTreeTools devtools::install_github("ms609/TreeTools", ref = "more-leaves")

    • building ‘BigTreeTools_1.10.0.tar.gz’
  • However this command devtools::install_github("ms609/TreeDist") did not install BigTreeDist :

    • building ‘TreeDist_2.7.0.tar.gz’

from treetools.

ms609 avatar ms609 commented on July 23, 2024

Looks like the ref = "more-leaves" argument is missing from your second command. (Note updated above.)

from treetools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.