Coder Social home page Coder Social logo

Performance improvements about cdt HOT 21 CLOSED

artem-ogre avatar artem-ogre commented on June 19, 2024
Performance improvements

from cdt.

Comments (21)

artem-ogre avatar artem-ogre commented on June 19, 2024

More information can be found in: #37

from cdt.

starseeker avatar starseeker commented on June 19, 2024

It might be worth taking a look at https://github.com/nushoin/RTree to see if it could be of use for this purpose - I ended up using a slightly modded version of the https://github.com/DevHwan/RTree version for some work I did a couple years back, and it did OK (although I didn't hammer it with the sort of benchmarks you've got there, so I can't say definitely how it would compare.) I was using poly2tri for the CDT, but I did do quite a lot of data management and lookups using that RTree and it seemed to work fairly well.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

Thanks for the suggestions @starseeker. I wander how cgal and Triangle do it. I doubt that some other rtree implementation will greatly outperform boost's...

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

I got a little bit of time to do the profiling. From the data it seems that boost::rtree performance is not a bottleneck.

Original interactive flame graph as ZIP.

Flame Graph

flamegraph

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 1000;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            vertices.push_back(V2d::make(i * 1e-5, j * 1e-5));
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

I should of been more descriptive, you are correct that is flip needed is a larger issue when using a "quad" grid. Try profiling using random numbers and you will see the one that I posted earlier. What I expect is happening (a total guess) in this case is that when you have 4 points on a circle the Delaunay triangulation always flip the triangulation when something happens to any of the neighbour triangles. Since this entire mesh is made of 4 point Delaunay edge cases then there is a lot of flipping on every input. This would not happen in a random mesh.

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

I have also been implementing a simple KD-Tree and linking to CDT to see if my expectations were correct. From first benchmarking it seems like my KD-Tree for now was more efficient by a large margin but I don't remember the exact numbers and not really tested so far. I might have time this weekend to continue working on it and will update you with the numbers when I have them.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

@AndreFecteau
Ah, right! I mindlessly grabbed the first benchmark as it showed some really bad timings. With random coordinates rtree indeed consumes most of the time.

I will post the flame graph and full benchmark later for documentation.

Vertices placed on a regular grid is an interesting but exotic corner case. Random vertices better reflect the common usage of triangulation. So thank you very much for your help with it. I am looking forward for the results.

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

Attached you can find the comparison of the 2 implementations. The one with the KDTree (CDT) and the Boost:RTree (CDT2). Theres still a lot of work left before I can push the KDTree to a git repo so it can be added to this project, but it's definitely a good proof of concept for now. Note: this flame snippet is missing the erase super-triangle section of the code that is considerable.

Screen Shot 2021-08-08 at 5 53 05 PM

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

Here's the random benchmark data.

Flame Graph (insertVertices call only)

flamegraph_rand

Original interactive flame graph

flamegraph_rand.zip

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 500;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            bool inserted = false;
            while(!inserted)
            {
                size_t x = (rand() % 10000);
                size_t y = (rand() % 10000);
                auto it = std::find_if(
                    vertices.begin(), vertices.end(), [&](const auto& xy) {
                        return x == xy.x && y == xy.y;
                    });
                if(it == vertices.end())
                {
                    vertices.push_back(V2d::make(x * 1e-5, y * 1e-5));
                    inserted = true;
                }
            }
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

Interesting info: how triangle location is implemented in CGAL

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

From memory, the timings that I show in the CGAL benchmarks is using the the range inserter for vertices. I would be interesting to see if inserting points one at a time would amount to the same efficiency. I believe that there are much faster algorithms for Delaunay triangulation when you are able to sort the data points initially.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

Note on shuffling vertices and performance

CDT does not perform vertex shuffling which other implementations might do by default. For example a benchmark with vertices on a grid gets at least 2x faster when vertices are shuffled:

std::random_device rd;
std::mt19937 g(rd());
std::shuffle(vertices.begin(), vertices.end(), g); // shuffle vertices

On some of my inputs (very long and narrow strip with very dense points - looks like a road) I observed up to 300x speed-ups 🚀 😱

Currently users need to shuffle vertices themselves if it improves performance for their inputs. I am considering including shuffling into the library and make it on by default with a possibility to opt-out. Still thinking about a good and flexible way to implement this.

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

I am almost done implementing the KDTree to incorporate in CDT (If you want to) and I have a few questions for you (@artem-ogre ). The most up to date branch is https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree. It is definitely not complete but I want to know:

Are you ok with this being integrated in CDT?
How do you want the external library to be stored in the files? I currently set up an external folders for the files.
Do you want the hole project included or just the minimalistic files?
Are you ok if I remove the RTree since the KDTree is faster?

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

@AndreFecteau
First, sorry for somehow slow reply: my current personal circumstances are such that there is too little time I can dedicate to the project.

Neat, thanks for the nice effort! I had a quick look. I myself also came to the conclusion that point locator should be a customization point for Triangulation class (template parameter as you done or an interface).

I am working on randomized vertex insertion right now and have a PoC. The change I've done requires re-factoring 'random-nearest' point locator. So I think it is a good opportunity to implement custom point-locator support.
I intend to implement point locator as template parameter and re-write boost::rtree and nearest random locators as stand-alone classes (similar to your branch).

Then when we have an interface in place you could do minimal changes to make it work with your KDTree implementation. And we could take it forward from there!

Are you ok with this being integrated in CDT?

Yes, but we would need to do the code review and re-write. There are likely going to be changes to be done: conventions, naming, formatting, supporting pre-c++11 standard, etc. Given my limited availability, it will take time to re-factor and merge. It is good to have it in the fork for the time being so that it is immediately usable for those who need it (and not constrained by ancient c++ standard 😃 ). And let's try to move the fork upstream when we can.
By the way, have you investigated alternative structures such as quad-trees or grids? I would also like to explore the solution space more before replacing boost's rtree.

How do you want the external library to be stored in the files? I currently set up an external folders for the files.

I will re-implement current locators so we'll have a reference implementation. I like having it in separate files but they don't have to be 'external', could go right into 'include' folder. Something to think about still.

Do you want the hole project included or just the minimalistic files?

Not sure I understand this question.

Are you ok if I remove the RTree since the KDTree is faster?

Yes, totally it can be removed. It could also be left there just in case, but if other approach is clearly superior I don't see why to keep boost rtree.

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

my current personal circumstances are such that there is too little time I can dedicate to the project.

@artem-ogre There is no real rush on my end to get this in, obviously I am using this software and better efficiency is always nice but it is not crucial.

template parameter as you done or an interface

I agree that having the point locator as a template parameter will help in the future test out different libraries, I would not be surprised that there are faster ones out there for this specific purpose.

conventions, naming, formatting, supporting pre-c++11 standard, etc

Let me know what conventions need to be supported and I can change it. I hope I don't loose smart pointers but if I have to I can remove them.

have you investigated alternative structures such as quad-trees or grids

I have not and I think that it would be worthwhile, unfortunately at my current employment I have written multiple quad-trees and I'm trying to avoid having too similar work on personal projects.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

@AndreFecteau

First, thank you for your kd-tree implementation! It was easy to use after I extracted locator into a template parameter. And the code was easy to follow. 👍

I took it as a starting point and ended up re-writing most of the code, but the ideas and algorithms are the same. Performance is also about the same.

Given this situation how would you like to be attributed? I will add a mention in contributors list in README. We can also make it so that you author kd-tree code (maybe by doing a PR) so that your user-name shows up in contributors list auto-generated by github.

Could you maybe take a look at the PR #45?
Here are some notable differences to your implementation:

  • KDTree class is now tightly integrated with CDT: this enables simpler code and better efficiency (e.g., re-using points buffer, see points below)
  • All functionality is in a single header: CDT/include/KDTree.h
  • nodes are stored in a single buffer and addressed with indices
  • recursive methods (insertion and nearest query) are changed to loop-based where possible (this looks neater in profiling and can be slightly more efficient)
  • requires less memory: nodes only store children indices (not a leaf) and vector of points (leaf)
  • point coordinates are not stored in kd-tree. Instead point-buffer from CDT::Triangulation is re-used. This allows to save quite some memory too
  • unique_ptr are not used anymore, kd-tree should easily support value-semantics (e.g., copy-able)
  • c++98 compatibility

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

The PR also contains a feature of randomizing points insertion. For some datasets it can provides huge performance gains.
Here's an example long-narrow-stripe.txt
With randomization speed up is around 300x. With randomization+kd-tree it is about 500x 😮

from cdt.

AndreFecteau avatar AndreFecteau commented on June 19, 2024

@artem-ogre I took a quick look at the new KDTree that you implemented and I can't claim credit for that. Not much of my implementation is left (Not to worry, I can definitely see the benefits of having this integrated). Just a mention in the readme would be great.

I'll probably take a deeper dive this weekend since I'm always curious on other ways of programming stuff. Few things that I'm curious about.

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

This was my hope as well, but I could not see a noticeable speed improvement with the vector. I still like the simplicity and how easy it is to copy compared to pointers.

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

Yes, depending on the vector implementation it could consume 1.5x-2x more memory than needed for storing it's elements. This is a trade-off that sounds scary in theory but is almost always not a problem in practice (in my experience).
If it really becomes a problem: switching to some kind of list container and using iterators instead of indices should be the same as using individual allocation for each node, I guess.

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

@AndreFecteau I'm trying to figure out how to add you as a reviewer to the PR. It looks like you should be the collaborator for this to work (I've sent an invite)

from cdt.

artem-ogre avatar artem-ogre commented on June 19, 2024

@AndreFecteau I closed this issue. Thank you for all the help! :)

from cdt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.