Coder Social home page Coder Social logo

u1234x1234 / pynanoflann Goto Github PK

View Code? Open in Web Editor NEW
32.0 3.0 8.0 73 KB

Unofficial python wrapper to the nanoflann k-d tree

License: BSD 2-Clause "Simplified" License

Python 57.02% C++ 42.98%
python kd-tree kdtree nearest-neighbor-search nearest-neighbors nanoflann pybind11

pynanoflann's Introduction

Build Status codecov

pynanoflann

Unofficial python wrapper to the nanoflann library [1] with sklearn interface and additional multithreaded capabilities.

nanoflann implementation of k-d tree provides one of the best performance for many k-nearest neighbour problems [2].

It is a good choice for exact k-NN, radius searches in low dimensional spaces.

Install

pip install git+https://github.com/u1234x1234/[email protected]

Basic Usage

import numpy as np
import pynanoflann

index = np.random.uniform(0, 100, (100, 4))
queries = np.random.uniform(0, 100, (10, 4))

nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)
nn.fit(index)

# Get k-nearest neighbors
distances, indices = nn.kneighbors(queries)

# Radius search
distances, indices = nn.radius_neighbors(queries)

Save, load

If you need to save the model, there are two options:

  1. Save only the built index. In this case, data points are NOT stored in file. Very efficient, but inconvenient in some cases.
kdtree.fit(X)
kdtree.save_index('index.bin')

prebuilt_kdtree = pynanoflann.KDTree()
# Must use the same data on which the index was built.
prebuilt_kdtree.fit(X, 'index.bin')

Please refer to the detailed example

  1. Pickle the whole model with data points. Less efficient, but convenient.
kdtree.fit(X)
with open('kdtree.pkl', 'wb') as out_file:
    pickle.dump(kdtree, out_file)
with open('kdtree.pkl', 'rb') as in_file:
    unpickled_kdtree = pickle.load(in_file)

Please refer to the detailed example

Multicore parallelization

Implemented on C++ side to reduce python's multiprocessing overhead.

Performance

Generally it much faster than brute force or cython implementation of k-d tree in sklearn

To run benchmark:

python benchmark.py

Links

  1. Blanco, Jose Luis and Rai, Pranjal Kumar, 2014, nanoflann: a C++ header-only fork of FLANN, a library for Nearest Neighbor ({NN}) with KD-trees.
  2. Vermeulen, J.L., Hillebrand, A. and Geraerts, R., 2017. A comparative study of k‐nearest neighbour techniques in crowd simulation.
  3. OpenFLANN Performance comparison between PCL, nanoflann, picoflann

pynanoflann's People

Contributors

ccinc avatar u1234x1234 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pynanoflann's Issues

Multicore processing of query points

Hey,

Another feature I've in mind is if it is possible to implement parallellized multicore processing for queries. For instance, in Scipy cKDTree (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query) you can define number of cores with n_jobs parameter.

What I mean is that we could set 'n_cores' parameter when executing the query:

import pynanoflann
import numpy as np

target = np.random.rand(10000, 3)
query = np.random.rand(2000, 3)

kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_tree.fit(target)
d, nn_idx = kd_tree.kneighbors(query, n_cores=10)

This would create a single kd-tree for the data in 'target' and then split the data in 'query' for multiple cores to do the queries.

Python implementation of this will introduce quite much overhead when working with relatively small data sets. I assume that this would be much faster when implemented in c++ side.

Best regards
Juho

Serializing KDTrees

Hi @u1234x1234,

Thanks for your awesome wrapper. I would like to use it for radius search in 3D point clouds. In my implementation, I would like to be able to save a KDTree in a text file and then reload it afterwards.

I tried to use pickle, but nanoflann_ext.KDTree32 or nanoflann_ext.KDTree64 object are not supported.

Do you think there is a way to save these objects to files? Or enable pickle to serialize them?

Best,
Hugues

How to find the package 'nanoflann_ext' ?

Hello,

Thanks for this great tool.

I try to use your code, and I realize that there is no specific package named 'nanoflann_ext' in import nanoflann_ext.

Can I get some help with this? Thanks in advance

Parallelized batch processing

Hey,

I'm wondering whether it could be possible to implement multi-core implementation for processing multiple batches of data simultaneously.

My current approach is:

import pynanoflann
import numpy as np

n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)

for i in range(n_batches):
    pts_target = target[i]
    pts_query = query[i]
    
    kd_tree = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
    kd_tree.fit(pts_target)
    d, nn_idx = kd_tree.kneighbors(pts_query)

Instead, I'd like to do something like that:

import pynanoflann
import numpy as np

n_batches = 10
target = np.random.rand(n_batches, 10000, 3)
query = np.random.rand(n_batches, 2000, 3)

kd_trees = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)
kd_trees.fit(pts_target)
distances, nn_indexes = kd_trees.kneighbors(pts_query)

This would create a kd-tree for each batch in 'target'. Corresponding batches in 'query' then would be used to make nearest neighbor searches with corresponding kd-trees.

Currently, if I want to implement this kind of parallelized processing I have do it in python. For large data sets this is not a problem but for smaller data sets it will cause too much overhead. I assume that it would be much faster to implement in c++ side of the code.

Best regards
Juho

Parallelized Radius Query

Hey! I really like the new multiprocessing capabilities of kneighbors, for one of my projects I need to use radius queries, would it be possible to get the same feature for radius_neighbors?

Your library is great, by the way, I have found it to be the fastest option for processing very large point clouds, so thank you!

What does the radius parameter do?

First thanks for this great tool.
I'm a little confused by the radius parameter:
nn = pynanoflann.KDTree(n_neighbors=5, metric='L1', radius=100)

It seems like I can set this to a tiny value and still get correct results when I query.

Memory leak

Hey,

Thanks for the very good library, it is quite fast (fastest python implementation I've found so far). Unfortunately it seems that there is a memory leak somewhere.

If I run the following code memory usage keeps slowly increasing:

import pynanoflann
import numpy as np

for i in range(100):
    print(i)
    
    nn_search = pynanoflann.KDTree(n_neighbors=1, metric='L2', leaf_size=20)    
    nn_search.fit(np.random.rand(10000000,3))
    _, nn_idx = nn_search.kneighbors(np.random.rand(1000,3))
    
    del nn_search, nn_idx

I'm using version 0.0.1 of pynanoflann. The problem limits the usability of the library since I've to create new kdtree in each loop iteration and in the end I'll run out of memory.

Best regards
Juho

Adding pynanoflann to pypi

Do you have any plans to add pynanoflann to pypi to make it easier to include in other projects? I've made some updates and written a GitHub workflow to build wheels of pynanoflann the I can merge if you are interested. Otherwise and can also upload my fork to pypi (possibly under a different name) if you're not interested.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.