Coder Social home page Coder Social logo

go-kd's Introduction

go-kd

Golang K-D tree implementation with duplicate coordinate support

See Wikipedia for more information.

Testing

go test github.com/downflux/go-kd/...
go test github.com/downflux/go-kd/internal/perf \
  -bench . \
  -benchmem \
  -timeout=60m \
  -args -performance_test_size=large

Example

package main

import (
	"fmt"

	"github.com/downflux/go-geometry/nd/hyperrectangle"
	"github.com/downflux/go-geometry/nd/vector"
	"github.com/downflux/go-kd/point"

	"github.com/downflux/go-kd/kd"
)

// P implements the point.P interface, which needs to provide a coordinate
// vector function P().
var _ point.P = &P{}

type P struct {
	p   vector.V
	tag string
}

func (p *P) P() vector.V     { return p.p }
func (p *P) Equal(q *P) bool { return vector.Within(p.P(), q.P()) && p.tag == q.tag }

func main() {
	data := []*P{
		&P{p: vector.V{1, 2}, tag: "A"},
		&P{p: vector.V{2, 100}, tag: "B"},
	}

	// Data is copy-constructed and may be read from outside the k-D tree.
	t := kd.New[*P](kd.O[*P]{
		Data: data,
		K:    2,
		N:    1,
	})

	fmt.Println("KNN search")
	for _, p := range kd.KNN(
		t,
		/* v = */ vector.V{0, 0},
		/* k = */ 2,
		func(p *P) bool { return true }) {
		fmt.Println(p)
	}

	// Remove deletes the first data point at the given input coordinate and
	// matches the input check function.
	p, ok := t.Remove(data[0].P(), data[0].Equal)
	fmt.Printf("removed %v (found = %v)\n", p, ok)

	// RangeSearch returns all points within the k-D bounds and matches the
	// input filter function.
	fmt.Println("range search")
	for _, p := range kd.RangeSearch(
		t,
		*hyperrectangle.New(
			/* min = */ vector.V{0, 0},
			/* max = */ vector.V{100, 100},
		),
		func(p *P) bool { return true },
	) {
		fmt.Println(p)
	}
}

Performance (@v1.0.0)

This k-D tree implementation was compared against a brute force method, as well as with the leading Golang k-D tree implementation (http://github.com/kyroy/kdtree). Overall, we have found that

  • tree construction is about 10x faster for large N.

    BenchmarkNew/kyroy/K=16/N=1000-8               758980 ns/op  146777 B/op
    BenchmarkNew/Real/K=16/N=1000/LeafSize=16-8    200749 ns/op   32637 B/op
    
    BenchmarkNew/kyroy/K=16/N=1000000-8                7407144200 ns/op  184813784 B/op
    BenchmarkNew/Real/K=16/N=1000000/LeafSize=256-8     588456300 ns/op   12462912 B/op
    
  • KNN is significantly faster; for small N, we have found our implementation is ~10x faster than the reference implementation and ~20x faster than brute force. For large N, we have found up to ~15x faster than brute force, and a staggering ~1500x speedup when compared to the reference implementation.

    BenchmarkKNN/BruteForce/K=16/N=1000-8                   1563019 ns/op  2220712 B/op
    BenchmarkKNN/kyroy/K=16/N=1000/KNN=0.05-8                791415 ns/op    21960 B/op
    BenchmarkKNN/Real/K=16/N=1000/LeafSize=16/KNN=0.05-8      69537 ns/op    12024 B/op
    
    BenchmarkKNN/BruteForce/K=16/N=1000000-8                       5030811400 ns/op  5347687464 B/op
    BenchmarkKNN/kyroy/K=16/N=1000000/KNN=0.05-8                 529703585200 ns/op    23755688 B/op
    BenchmarkKNN/Real/K=16/N=1000000/LeafSize=256/KNN=0.05-8        335845533 ns/op     6044016 B/op
    
  • RangeSearch is slower for small N -- we are approximately at parity for brute force, and ~10x slower than the reference implementation. However, at large N, we are ~300x faster than brute force, and ~100x faster than the reference implementation.

    BenchmarkRangeSearch/BruteForce/K=16/N=1000-8                        154712 ns/op   25208 B/op
    BenchmarkRangeSearch/kyroy/K=16/N=1000/Coverage=0.05-8                13373 ns/op     496 B/op
    BenchmarkRangeSearch/Real/K=16/N=1000/LeafSize=16/Coverage=0.05-8    193276 ns/op  101603 B/op
    
    BenchmarkRangeSearch/BruteForce/K=16/N=1000000-8                         173427000 ns/op  41678072 B/op
    BenchmarkRangeSearch/kyroy/K=16/N=1000000/Coverage=0.05-8                 56820240 ns/op       496 B/op
    BenchmarkRangeSearch/Real/K=16/N=1000000/LeafSize=256/Coverage=0.05-8       530937 ns/op    212134 B/op
    

Raw data on these results may be found here.

go-kd's People

Contributors

kevmo314 avatar minkezhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

go-kd's Issues

Implement faster RangeSearch

RangeSearch implements a recursive model for searching through a bounding box. In order to increase efficiency, RangeSearch runs the left and right recursion calls in parallel via a channel. This recursion becomes expensive as we near the bottom of the tree, and we should instead swap to using a serial execution model instead. However, we do not have a good heuristic as to how much data is left in the subtree to prune -- node.Data() []T only returns the data stored in each individual node and does not reflect child nodes. We will need to implement node.Depth() int or some other heuristic in order to make RangeSearch more efficient.

Overall, this makes RangeSearch of small N (< 10k) fairly slow as compared to the reference http://github.com/kyroy/kdtree implementation. For large N, RangeSearch is still overall faster, but the execution time can still be improved.

Implement (1+ε)-ANN

KNN is very slow, due to

  • the thrashing caused by the internal priority queue constantly attempting to maintain order for large number of points, and
  • the large number of nodes visited for large N

Running a KNN performance test in 10M randomly generated 10D points and searching for 50K points takes ~2s and looks to be irreducible, barring some minor memory optimization.

We should look into offering ANN as an alternative when performance is critical.

[Radial]Filter very slow

Per Reddit, running RadialSearch on 100k 3D points netted a much slower result than naive search.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.