Comments (13)
Thanks for getting in touch. The present limit is 2048 leaves (which I'll document); I'm looking into a workaround but it's less straightforward that I'd hoped. I'll post an update once I get somewhere with this.
from treetools.
To calculate distances between trees with <8192 tips, you can now:
- Uninstall TreeDist and TreeTools
remove.packages("TreeDist")
remove.packages("TreeTools")
- Check the console output to be sure that the packages are fully uninstalled.
- Install a modified TreeTools
devtools::install_github("ms609/TreeTools", ref = "more-leaves")
- Re-install TreeDist from source
devtools::install_github("ms609/TreeDist", ref = "more-leaves")
-- notinstall.packages("TreeDist")
, which installs pre-compiled binaries that will not link to the customized TreeTools.
Note that distance computation scales with the square of the number of tips. In other words, comparing two 8000 leaf trees will take a couple of minutes.
I've updated the documentation with this information. Please let me know how you get on; I had a bit of trouble getting this running locally, but hopefully the above instructions will avoid these problems.
from treetools.
Hi Martin,
thanks a lot for such a rapid reply! It aborts my RStudio session the moment I run this command now 😅 But I guess that means I did re-installed it successfully and this will work on a computational cluster!
from treetools.
Drat – this is the issue I was running into as well.
My diagnosis was that the crash occurred when the modified TreeTools was reinstalled without uninstalling and re-installing TreeDist. Could you confirm that you uninstalled both packages before installing both from source, using install_github()
?
I'll also be interested to hear whether it runs successfully on a cluster!
from treetools.
I can confirm I did all that. When I try running it from command line, I am getting
> TreeDistance(t_large, t_large) Error: segfault from C stack overflow
Even for two trees with 10 leaves each!
On a cluster, it works with 8GB (which is less than on my laptop) for 8,000 leaves 🤔
from treetools.
Weird – sorry it's not proving straightforward! I've reproduced this issue on a second PC.
My suspicion is that this is related to the (un)installation of the packages. I'll investigate.
from treetools.
Okay, I think I've got to the bottom of the issue – which is that the stack overflow error should be taken literally; there is not enough space in the stack to create two SplitList()
objects of the required dimensions, used to compute the distances.
In summary, this means that a significant re-coding will be required for larger trees to be handled – and that the computation for larger trees will be significantly slower (as it will need to make more use of the heap, rather than fast stack memory). That's a bigger job than I am able to attempt right now. Sorry.
More details for my own future reference:
-
Initially attempted using
#define SL_MAX_BINS 128
inTreeTools::SplitList.h
-
This conclusion was reached by editing
cpp_mutual_clustering()
:- Rewrite
cpp_mutual_clustering
to{return <empty list>; const SplitList a(x);
: Succeeds - Add
const SplitList b(y);
: stack overflows
- Rewrite
-
Test performed by running
cpp_mutual_clustering(as.Splits(BalancedTree(8)), as.Splits(PectinateTree(8)), 8)
-
"more-leaves" branches of TreeTools and TreeDist rename packages to
BigTreeTools
/BigTreeDist
-
some int16s replaced by int32s to allow multiplication in array lookup
from treetools.
What I still don't understand is why it stopped working locally even for two tiny trees with 10 leaves each but actually worked on a cluster for a huge tree.
Thanks a lot for looking into this anyhow!
from treetools.
A fixed amount of memory is allocated as soon as the underlying C++ function is called; because this is allocated on the stack, the amount of memory to allocate is pre-determined and is independent of the variables actually passed. So whatever size of tree is passed, the software requests enough stack memory to compare two 8192-leaf trees.
Differences between a local PC and a cluster will reflect how much memory is available on the stack, which will reflect aspects of memory management that are context-dependent: for instance, I see a crash when using RStudio, but not when running a standalone R session, presumably because Windows allocates memory differently in these contexts.
from treetools.
Hi Martin,
I've been trying to compare two trees with around 5k leaves (both have the same number of leaves) but I couldn't pass the error : This many leaves cannot be supported. Please contact the TreeTools maintainer if you need to use more!
.
I first tried following the above process (uninstalling previous versions) as you suggest but it still gives me the error on my local computer. Then I build a fresh R conda env on a distant server with more resources but I still get the same error.
Any idea what could cause this issue ?
Also, thanks a lot for your tools, they have been very useful so far!
Paul.
from treetools.
Glad you have been finding the tools useful, @pterzian. Not clear why you would be seeing the "This many leaves" with ~5000 leaves if you are using the BigTreeTools
and BigTreeDist
packages; maybe worth checking that you are using the functions from these modified packages (which have different names, so need loading with e.g. library("BigTreeDist")
) rather than TreeDist
?
from treetools.
You are absolutely right! I saw the BigTreeTools package was installed. However I don't see any BigTreeDist package, should it be installed along with the devtools::install_github("ms609/TreeDist")
command ?
Checking conda logs :
-
This command successfully installed BigTreeTools
devtools::install_github("ms609/TreeTools", ref = "more-leaves")
building ‘BigTreeTools_1.10.0.tar.gz’
-
However this command
devtools::install_github("ms609/TreeDist")
did not install BigTreeDist :building ‘TreeDist_2.7.0.tar.gz’
from treetools.
Looks like the ref = "more-leaves"
argument is missing from your second command. (Note updated above.)
from treetools.
Related Issues (20)
- `as.MixedBase()` hangs (in `sort.multiPhylo()`)
- Tree balance index
- NexusTokens() shiny interaction
- Quality of a dataset
- demo() HOT 1
- Unsupported NEXUS file HOT 1
- GH Actions templates
- Replace `.C` with `.Call`
- using WriteTNTcharacters() with continuous matrix HOT 5
- Custom directory for caching HOT 3
- as.TreeNumber not identifying unique topologies HOT 4
- Will not install on Rstudio HOT 2
- DropTip will not remove tip on tree HOT 2
- Random Trees don't match balance HOT 1
- Simulation of Birth-Death trees
- str method for relevant classes HOT 1
- If tree has defined node.labels AddTips() will not change them HOT 2
- Remove `ape_node_depth`
- TCIContext(tree)$maximum overflow HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from treetools.