Coder Social home page Coder Social logo

unixjunkie / bisec-tree Goto Github PK

View Code? Open in Web Editor NEW
25.0 3.0 0.0 631 KB

Bisector tree implementation in OCaml

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.20% OCaml 97.53% Shell 1.27%
ocaml-library nearest-neighbor-search nearest-neighbours bisector-tree distance metric point-query point-cloud range-query static-spatial-indexing

bisec-tree's Introduction

bisec-tree

Bisector tree implementation in OCaml.

A bisector tree allows to do fast and exact nearest neighbor searches in any space provided that you have a metric (function) to measure the distance between any two points in that space.

Cf. this article for details: "A Data Structure and an Algorithm for the Nearest Point Problem"; Iraj Kalaranti and Gerard McDonald. ieeexplore.ieee.org/iel5/32/35936/01703102.pdf

Bunny

Figure: the Stanford bunny, consisting of 35947 3D points, guillotined by the first layer of a bisector tree.

bisec-tree's People

Contributors

unixjunkie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bisec-tree's Issues

add length

without creating a potentially huge tmp list

to_dot

output in graphviz format

add method

to "grow" an existing tree with more points

partition

like neighbors, but will also keep the non matching points

add an any_neighbors query

boolean answer: yes/no your query point has neighbors in the tree at given distance;
i.e. the range query, but we don't care about the list of neighbors.
Such queries should terminate faster than the range query.

corner case

P.dist vp1 vp2 = 0.0
then all remaining points must be put into the same bucket; this bucket might have a size > k

Pre_node is used too often

in one_band and two_bands.
This is a performance bug since the two VPs are random in that case.
We should use a bucket

simplify code in neighbors fun

the code can be simpler:

  • should we include all points below
  • else should we include nothing
  • else inspect vp + points below

incremental build

algorithm:

  • build tree for a smallish random partition
  • address in parallel all remaining molecules
  • emboss the small tree
  • Eventually, continue building the tree for each group of molecules that obtained the same address

buckets on disk

now this makes a lot of sense with the add method having been added (to grow an existing tree)

add a version parameterized by a functor with a distance bound

With the regulat functor: only exact distances are used.
If the user knows a cheap to compute upper bound, we could exploit that
one as often as possible.

Inspired by:

Swamidass, S. J., & Baldi, P. (2007). Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. Journal of chemical information and modeling, 47(2), 302-317.

store buckets on disk

make another implementation, where only the index will be in memory, actual bucket
content's is in a persistent hash table on disk.
maybe just change the bucket type as a start, then add another config option (storage = Volatile | Persistent).

Wording...

I'm a little puzzled by the wording in the README:

A bisector tree allows to do fast but exact nearest neighbor

isn't 'exact nearest neighbor' what we want, so it'd be better to say:

A bisector tree allows to do fast and exact nearest neighbor

or, and I've only read the abstract so it isn't clear but is it actual an inexact nearest neighbor?

on-line algorithm

  • randomize points
  • two_band with given sample size
  • address all remaining points in //
  • add points to the tree
  • recurse on remaining buckets

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.