Coder Social home page Coder Social logo

Comments (3)

rohansuri avatar rohansuri commented on May 31, 2024

Thanks for your question. I didn't investigate it before, but now have. Here's my understanding:

TLDR

The reason is subtly stated in another paper.

ART outperforms the other indexes for all workloads and key
types except Scan/Insert, where its iteration requires more memory
access than the OpenBw-Tree.

Longer version

One way to intuitively think about why ART requires more memory accesses is that user keys are only stored in the leaf nodes whereas in RB trees every node stores a key. Which means ART has additional supporting links/nodes to get to those leaf nodes. Hence any traversal that requires you to go through a lot of next-user-keys would require passing through those supporting links/nodes.

I ran some experiments with the dataset of 100,000 random 64 bit integers. RBtree will have 2n links i.e. 200,000 since it has a node for every user key and every node has a link to its left, right child.
For ART it depends on the dataset. In this case the node counts and links are:

Node #Nodes #Links #Nodes per level
Node4 20785 83140 L3=20193 L4=592
Node16 6252 100032 L3=6252
Node48 0 0
Node256 129 33024 L1=1, L2=128
LeafNode 100000 L3=4786, L4=94029, L5=1185

In total 216,196 links. Hence ART has about 16k more links than RB tree. More links means more load instructions and likely more cache misses. Height of ART is 5 whereas RBtree's height is 20. Point queries only require traversing the height hence ART is faster.

Range scan is a ceilingEntry call + a number of successor calls depending on scan length. The ceilingEntry involves a depth traversal just as in get and in some cases if you reach the bottom, a successor call. A scan of length 0 will only require a ceilingEntry call largely involving the depth traversal, in which case ART should be faster because of smaller depth. As we increase the scan length, more of those supporting links need to be traversed. For example if the present user key came from the 2nd child of a Node4, the next user key is simply the next child in the same Node4. Similarly the next child. After this we're required to visit Node4's parent and go through the supporting links. And this seems to be the case for this dataset as most of the leaf nodes are at L4 and have Node4 as parents on L3. Which means scan length of 4 should be the turning point and from the plots it does look like it is.

Plots of perf events.

Average time
L1-dcache-loads
L1-dcache-load-misses
LLC-loads

For better understanding of the traversal, here's a raw dump of the moves taken to get to the next key for a single YCSB scan request of length 50 in both ART, TreeMap.

It'd be nice to somehow derive a theoretical worst case bound for ART for the number of links visited depending on n, span, length and maybe a distribution factor. In case of RBTree it is straightforward as the tree structure is dependent only on n.

Hope that explains why ART is slower for range scans as compared to RBTree. Let me know if you have further questions.

from adaptive-radix-tree.

JohannesLichtenberger avatar JohannesLichtenberger commented on May 31, 2024

2n links for binary trees? Isn't it n + 1 (with null links)?

Hm, I think you're right. However, in binary trees you also have to go up even more links sometimes during range-scans.

But I think in comparison to a binary tree a cache oblivious B+ tree might be even way better because of the locality and CPU caches. That is, I don't know how a cache oblivious indirect binary tree compares (with just one array and without direct pointers), which is built using the Van Emde Boas layout.

BTW: Any plans to kind of "advance" your ART to a HOT?

https://dbis.uibk.ac.at/sites/default/files/2018-04/hot-height-optimized-author-version.pdf

from adaptive-radix-tree.

JohannesLichtenberger avatar JohannesLichtenberger commented on May 31, 2024

On another note, do you have any recommendations about the JVM implementations and hardware support (for instance SIMD instructions)? Seems in-memory it's now all about using cache-lines efficiently, but I don't know what's possible with JVM based languages nowadays. Also, I'm not sure if it makes sense to use Azul Zing or the GraalVM (the latter makes sense for me, for instance as I'm using lambdas often times, and small intermediate objects seem to be not that costly).

from adaptive-radix-tree.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.