Coder Social home page Coder Social logo

ryanrhymes / panns Goto Github PK

View Code? Open in Web Editor NEW
214.0 214.0 39.0 287 KB

Python Approximate Nearest Neighbor Search in very high dimensional spaces with optimised indexing.

Home Page: http://www.cl.cam.ac.uk/~lw525/

License: GNU General Public License v2.0

Python 100.00%

panns's People

Contributors

ryanrhymes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

panns's Issues

Dump mmap file using parallel build.

An issue reported by Dave that index cannot be loaded after using parallel build. My hunch is mmap object is pickled instead of in-memory object.

Error creating Pann index on Windows 8.1 64bit

ERROR:

Traceback (most recent call last):
  File "..\approximate_nearest_neighbors/panns_test/main.py", line 12, in <module>
    p.build(50)
  File "..\site-packages\panns\index.py", line 136, in build
    self.build_sequential(c)
  File "..\site-packages\panns\index.py", line 152, in build_sequential
    self.make_tree(tree.root, children)
  File "..\site-packages\panns\index.py", line 170, in make_tree
    parent.proj = numpy.random.randint(2**32-1)
  File "mtrand.pyx", line 1262, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:14458)
ValueError: high is out of bounds for int32

RESOLUTION:

explicitly include "dtype=numpy.uint32" at
class = "PannsIndex"
function = "make_tree"
original line = "parent.proj = numpy.random.randint(2**32-1)"
fixed line = "parent.proj = numpy.random.randint(2**32-1, dtype=numpy.uint32)"

Issue with example on numpy 1.9.1

Here is the snippet:

from panns import *

# create an index of Euclidean distance
p = PannsIndex('euclidean')

# generate a 1000 x 100 dataset
for i in xrange(1000):
   v = gaussian_vector(100)
   p.add_vector(v)

# build an index of 50 trees and save to a file
p.build(int(50))
#p.save('test.idx')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-b53e159ae9fd> in <module>()
    10 
    11 # build an index of 50 trees and save to a file
---> 12 p.build(int(50))
    13 #p.save('test.idx')

/Users/*****/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/panns/index.pyc in build(self, c)
   132         """
   133         num_prj = int(2 ** (numpy.log2(len(self.mtx) / self.K) + 1))
--> 134         self.prj = self.random_directions(num_prj)
   135         if self.parallel:
   136             self.mmap_core_data()

/Users/******/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/panns/index.pyc in random_directions(self, c)
   258         c: the number of principle components needed.
   259         """
--> 260         return [ gaussian_vector(self.dim, True, self.typ) for _ in xrange(c) ]
   261 
   262

/Users/******/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/panns/utils.pyc in gaussian_vector(size, normalize, dtype)
   112     normalize: the vector length is normalized to 1 if True.
   113     """
--> 114     v = numpy.random.normal(0,1,size)
   115     if normalize:
   116         v = v / linalg.norm(v)

/Users/****/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.normal (numpy/random/mtrand/mtrand.c:12052)()

/Users/****/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/random/mtrand.so in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:2757)()

TypeError: an integer is required

I've tested this same example on numpy 1.8.0rc1, and it works perfectly. Perhaps this is a numpy bug in 1.9.1 where the casting is being done badly. The "self.dim" parameter somehow gets converted to something other than a C int.

If it seems that this is a numpy bug and not a panns bug, I can investigate that and do a PR on numpy instead, to resolve the issue.

Thanks!!

Error on import

Hi @ryanrhymes,

I just installed panns (using pip install). Importing it resulted in the following error:

Traceback (most recent call last):

  File "/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "/var/folders/5j/b2d4gmb51n18l6yy_x9gky2r0000gn/T/ipykernel_10917/510385268.py", line 1, in <module>
    from panns import *

  File "/opt/anaconda3/lib/python3.9/site-packages/panns/__init__.py", line 12, in <module>
    from .index import PannsIndex

  File "/opt/anaconda3/lib/python3.9/site-packages/panns/index.py", line 24
    except Exception, err:
                    ^
SyntaxError: invalid syntax

Problem running the sample example given

EDIT 2:

My bad! I ran the example like this:
p = PannsIndex('euclidean')

This was from your github.io's page.

So obviously it got fucked up.

Please update your github.io page's example to properly initialize an object of PannsIndex.

Please close it now.

Inconsistent licensing?

Hi:
setup.py has a license of "GNU LGPL v2.1" as well as most of the code, but the LICENSE file is GPL?
Which one should I consider?
Thanks!

Error

After pip install panns, i try run the code from panns import *

create an index of Euclidean distance

p = PannsIndex(dimension=100, metric='euclidean')

generate a 1000 x 100 dataset

for i in xrange(1000):
v = gaussian_vector(100)
p.add_vector(v)

build an index of 50 trees and save to a file

p.build(50)
p.save('test.idx')

after run this code, there are 1 idx and a npy, that' ok, but when i run the code repeat, all the things are deleted, and the directory is deleted also, what's the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.