Coder Social home page Coder Social logo

Comments (20)

peterkogo avatar peterkogo commented on September 22, 2024 2

I made a shallow deep dive into acceleration structures for spatial queries so here are my 2 cents.

Due to the declarative nature of React Flow (pass in edges, get out fancy graphs), we pretty much have to expect nodes and edges to change 100% in between updates. Whatever algorithm we chose in the end, fully rebuilding the structure should be fast. To further extend on this point, we can substantially simplify the implementation if we manage to do this in a stateless fashion and without additional heuristics (e.g when do we just update vs fully rebuild).

I did some preliminary testing on implementing Quadtrees, a BVH and using the libraries rbush and flatbush.

Quadtrees

  • Pretty fast to construct, but you end up with a lot of inefficient trees for various situations
  • does not work so good if the full extent you'd like to subdivide is not known from the start (which might happen if we don't want to iterate over all nodes before adding)
  • managing a lot of empty space as well as managing node density, is both influenced by tree depth
  • deciding on what depth and how many nodes per leaf to choose can vary depending on type of flow graph

BVH

  • surprisingly complex and a lot of work involved in optimizing the partition algorithm
  • might take a little longer to construct
  • you end up with a pretty good tree very easily

rbush & flatbush

  • bulk insertion is really fast
  • almost optimal tree
  • it can deal with a lot of weird spatial distributions quite well
  • it's a tested library and just weighs around 3kb
  • flatbush is only really faster for tens of thousands of nodes (when fully rebuilding)

So thats that. I am very inclined to just use rbush - it should be noted however that its currently in a broken state in terms of build dependencies (already opened a pull request) & es module support (forking might be the easiest route, not sure if library will be updated...)

I will release a small benchmark repo for these things soon, however have to deal with js module issues 🥲 first...

If anyone has some libraries I am unaware of or some comments on my preliminary findings feel invited to join the conversation!

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

Thanks so much for the RFC! I am just going to dump a couple of thoughts here.

  1. Performance metrics to check out for Spatial Querying implementation: initial build time, single update, bulk update, point query, rect query, memory

  2. I know BVH trees from 3D game engines, where you have a lot of overlapping bounding boxes and the goal is mostly ray intersections. It might be interesting to see how different spatial data structures behave in this regard as flow graphs are usually a little further spaced out. But maybe this becomes irrelevant when edges are also taken into consideration... Do you have a good resource/information/experience on how in our case BVH might be the way to go?

  3. In my head it would work like this:

  • maintain an AABB in internal node

  • build up a BVH tree in adoptUserNodes or updateNodeInternals

  • query the BVH when viewport moves to render visible nodes (make mounting/unmounting of nodes faster to prevent lag)

  • (maybe) only update the tree after node dragging has finished

  • expose interface for querying the BVH tree

    If I look at the performance of javascript BVH implementations I don't think an async function is really needed and it might degrade performance as async does not come for free and complicates computation ordering. Plus some of that work has to be done anyway when updating node dimensions/positions.

    Is there a really a use case for exposing anything more, even have a cache?

  1. I think if you'd want to lazy load nodes you'd have to load all nodes and edges (maybe not all edges) in advance with no information attached to them. And then implement a placeholder node that fetches the information when it gets into view/big enough. You could also create a way to batch the calls from these nodes. But I feel like this can be implemented implicitly and does not require more BVH API surface area.

  2. This would also pave the way for canvas based edges

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

@ncthbrt I talked to @moklick about how to move forward on this.

You submitted this RFC at an undoubtably tumultuous time here at xyflow :) Just to give a bit of context, we have changed quite a lot about the inner workings of React Flow for the next release and are in the middle of a rewrite of Svelte Flow. There is one (hopefully) very last thing we would like to get into v12, which would influence (and simplify) this RFC immensely - namely how we handle node origins.

Just as a rough timeframe, I will create a PR with the node origin changes in the next 2 weeks, so we can move forward on this afterwards. I'll keep you posted!

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

@fredericrous this RFC would support dynamic changes to node size & position. Should be no problem.

from xyflow.

moklick avatar moklick commented on September 22, 2024 1

Thanks for the deep dive and the detailed explanation @peterkogo!

We could fork rbush and publish it under @xyflow/rbush. If your PR mourner/rbush#138 gets merged, we can replace the dependency with the original rbush package again.

Do I get it right, that the rough process would be:

  • build a new tree when nodes come in (in adoptUserNodes for example)
  • use tree for culling (onlyRenderVisibleElements)
  • use tree for node selections

Questions:

  • do we want to export helpers to query it?

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

That sounds about right!
We can expose functions for finding intersections for a specific node, finding intersections for an arbitrary rectangle and finding all colliding nodes.

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

I have some updates from my investigation I'd like to share.

On accelerating spatial queries

First and foremost, RBush and Flatbush turned out to be delightful libraries. Overall, Flatbush performed slightly better than RBush in terms of memory usage & garbage collection delays — even though it required am additional mapping of node IDs to Flatbush indices. Both libraries needed to be extended by a function determining full intersections.

However, measuring the performance impact of building these trees (~1ms with 2500 nodes on M3 Apple Silicon) I realized that we started our optimization journey on the wrong end, really. With our current naive method, determining which of the 2500 nodes is in view takes roughly 0.25ms (I'm re-discovering every time how astonishingly fast computers can iterate over arrays), which is a fraction of the 80ms it takes to re-render all the elements.

The very same bottleneck applies to the lag observed with onlyRenderVisible enabled or while drawing selection boxes: the biggest culprit is mounting/unmounting/updating React components. What a bummer…

No need to fret

Well, what do we make of this?

For one, the virtualization part of this RFC would be much more impactful, than focusing on faster intersection algorithms. onlyRenderVisible for instance, provides quite a boost in performance when interacting with big flows with partially visible nodes & edges. The only downside is, that moving the viewport tends to be laggy as nodes & edges need to be constantly mounted and unmounted while coming into view. Having a way to render placeholder nodes (possibly canvas based) while moving the viewport and only rendering the actual nodes in their full glory when the interaction has stopped, would make it a far more recommendable option. One limitation of this approach, however, is that nodes and handle positions need predetermined dimensions and positions for this to work. As a small consolation, you would receive SSR for free.

Further, having the ability to render the minimap and edges on a canvas, would pose another possibility to bypass reacts lifecycle overhead. Tradeoffs of canvas-based approaches apply.

Conclusion

This was quite valuable research, thanks for providing the impulse. We will most definitely use Flatbush in the future for supporting pointer interactions on canvas elements, and we will also look into possibilities for delaying certain state updates during interactions. This helped us quite a lot in identifying more fruitful optimization paths.

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024 1

My message sounds unintentionally very conclusive. We will continue working on this.

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024 1

I agree with this. And mirrors my performance profiling

I have some updates from my investigation I'd like to share.

On accelerating spatial queries

First and foremost, RBush and Flatbush turned out to be delightful libraries. Overall, Flatbush performed slightly better than RBush in terms of memory usage & garbage collection delays — even though it required am additional mapping of node IDs to Flatbush indices. Both libraries needed to be extended by a function determining full intersections.

However, measuring the performance impact of building these trees (~1ms with 2500 nodes on M3 Apple Silicon) I realized that we started our optimization journey on the wrong end, really. With our current naive method, determining which of the 2500 nodes is in view takes roughly 0.25ms (I'm re-discovering every time how astonishingly fast computers can iterate over arrays), which is a fraction of the 80ms it takes to re-render all the elements.

The very same bottleneck applies to the lag observed with onlyRenderVisible enabled or while drawing selection boxes: the biggest culprit is mounting/unmounting/updating React components. What a bummer…

No need to fret

Well, what do we make of this?

For one, the virtualization part of this RFC would be much more impactful, than focusing on faster intersection algorithms. onlyRenderVisible for instance, provides quite a boost in performance when interacting with big flows with partially visible nodes & edges. The only downside is, that moving the viewport tends to be laggy as nodes & edges need to be constantly mounted and unmounted while coming into view. Having a way to render placeholder nodes (possibly canvas based) while moving the viewport and only rendering the actual nodes in their full glory when the interaction has stopped, would make it a far more recommendable option. One limitation of this approach, however, is that nodes and handle positions need predetermined dimensions and positions for this to work. As a small consolation, you would receive SSR for free.

Further, having the ability to render the minimap and edges on a canvas, would pose another possibility to bypass reacts lifecycle overhead. Tradeoffs of canvas-based approaches apply.

Conclusion

This was quite valuable research, thanks for providing the impulse. We will most definitely use Flatbush in the future for supporting pointer interactions on canvas elements, and we will also look into possibilities for delaying certain state updates during interactions. This helped us quite a lot in identifying more fruitful optimization paths.

from xyflow.

moklick avatar moklick commented on September 22, 2024

Hey @ncthbrt

thanks for this detailed RFC!

Even if React Flow wasn't built for huge graphs, I really like the idea of having a better support and a better performance in general. In v12 we already improved the performance for dragging nodes in bigger graphs, but the onlyRenderVisibleElements implementation wasn't touched.

It was always important to us, that React Flow is flexible and adjustable. Maybe it's an option here to expose some functionality, so that users can implement their own intersection algorithms (just yesterday someone created an issue #4272 that goes into that direction). For now we are using getNodesInside for the selection rectangle and onlyRenderVisibleElements. Would it be possible for you to use a BVH if you could overwrite that function somehow? It would be nice if we could find a better implementation than the current naive approach and give users the option to overwrite that functionality. What do you think about that? I would like to start with a small change. In my view it would make sense to concentrate on the virtualization topic first.

Thanks @peterkogo, what do you think about this? Would it be possible to expose some functions to be able to implement a more performant virtualization strategy?

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

@moklick It would be possible to do that I think. I can work on a POC to that end?

Thanks for the comments @peterkogo

from xyflow.

moklick avatar moklick commented on September 22, 2024

That would be great, but let's wait for @peterkogo feedback here. I know he is also interested in the topic and maybe he already started something!

Would you like to make a POC for a better built-in virtualization or expose functionality or both?

To answer you open questions:

Ordering may be tricky with optimistic updates and cache handling and invalidation

Not needed for now when we focus on virtualization first

How to document and communicate these changes effectively to users

We should try not to introduce breaking changes, but only more options. We could create a page under /learn where we explain how to use a quadtree or something like that with the new API.

How to effectively support this across all packages

Let's focus on React first.

MiniMap` support?

I think it would make sense to implement an alternative canvas based minimap at some point but I would like to postpone this task.

Selection-rect support for edges. Implement parser for edge renderer?

In my view a naive approach via nodes is enough (if one connected node is visible, an edge is visible too)

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

Happy to do that too!

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

from xyflow.

fredericrous avatar fredericrous commented on September 22, 2024

This RFC initiative looks exciting. I have a use case that I wonder could complicate the algorithm:

How about nodes that grow?

For context, on my project I have a diagram that represents the sitemap of a website. I drag and drop elements onto the nodes, that make the nodes grow and I then trigger a relayout with d3-flextree and d3-hierarchy

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

Nice work on the POC @peterkogo.

Still want to experiment with improving performance beyond culling nodes but not in a huge rush to do so right now, so this could be a great intermediate win.

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

Would edges also be included in this?

from xyflow.

moklick avatar moklick commented on September 22, 2024

Currently we create a box of the connected nodes. We could do the same here and create a tree for the edges too. wdyt @peterkogo

from xyflow.

peterkogo avatar peterkogo commented on September 22, 2024

@ncthbrt what other ideas do you have?

Still want to experiment with improving performance beyond culling nodes [...]

Some ideas from me: add a inView prop to custom nodes & edges, so it's easier to lazy load things. Maybe have some kind of padding to load things that are slightly out of view as well.

Would edges also be included in this?

I would include edges sooner or later because implementing hover effects on canvas based edges will require this eventually. Though for culling only, having a fast way to determine if either the source or target node is in view would already come a long way. edit: this does not work. Just use a bounding box for edges as well.

Edit: edges will probably have a separate tree for various reasons

from xyflow.

ncthbrt avatar ncthbrt commented on September 22, 2024

The biggest bottleneck that I've measured for manipulating large, zoomed out graphs has been the store update logic (I'm using yrs, a Rust port of yjs) and updating a thousand or more nodes in the store within a transaction can be quite costly. My hypothesis is that a tiered/hierarchical approach to updates would be appropriate for slower stores. In this hierarchy, responsiveness would be maintained by optimistically updating properties in an internal store until an appropriate idle point is reached at which point the updates are written back. This would also reduce the amount of time spent on integrating changes as updates would be batched.

Some ideas from me: add a inView prop to custom nodes & edges, so it's easier to lazy load things. Maybe have some kind of padding to load things that are slightly out of view as well.

That is a good idea as well!

My idea is tangent to this issue however

from xyflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.