Coder Social home page Coder Social logo

matsui528 / nanopq Goto Github PK

View Code? Open in Web Editor NEW
312.0 312.0 43.0 102 KB

Pure python implementation of product quantization for nearest neighbor search

License: MIT License

Makefile 1.06% Python 98.94%
approximate-nearest-neighbor-search data-compression nearest-neighbor-search product-quantization

nanopq's People

Contributors

calvinmccarter avatar de9uch1 avatar hiroshiba avatar lsb avatar matsui528 avatar mpskex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanopq's Issues

Typo

I think this should be 8 bits not 256! otherwise the package is very helpful thanks!

into 256 bits = 1 byte = uint8)

How to compute distance between PQ codes?

Not sure if this should be a feature request.

Supposed I just want to approximate distance between two PQ codes (under the same encoder of course). What is the most efficient way to perform such operation?

why with parametric init has poor performance than non-parametric one ?

hi,friend ,I have two question .

  1. why with parametric init has poor performance than non-parametric one according to your unit test 'test_parametric_init'? it is inconsistent with the conclusion of the paper 《Optimized Product Quantization for Approximate Nearest Neighbor Search 》--Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun

`def test_parametric_init(self):
N, D, M, Ks = 100, 12, 4, 10
X = np.random.random((N, D)).astype(np.float32)
opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=False, rotation_iter=1)
err_init = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=True, rotation_iter=1)
err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

self.assertLess(err_init, err)`
  1. the code compute normal not need rotate X, the decode will rotate code to original space at 255 line of opq.py

self.pq.decode(codes) @ self.R.T

Turn print statements into logging

Hi!

Thanks for writing this package, it looks great!

I'd be interested in turning the print statements (with verbose=True) into logging statements. The verbose flag could then be used to control whether this logging is output to stdout (i.e., by setting the log level). Is this something you are interested in? if so, I could submit a PR.

Centroid of Centroids using NanoPQ

I am looking in to do centroid of centroids using NanoPQ, is it possible?. I have a first level nanopq model M=4, K=16, D=24. The codewords that is produced is (4, 16, 6), can this output be sent as an input for the second level nanoPQ to calculate centroid of centroids? The reason for investigating centroid of centroids is due to processing large datasets and reduce processing time.

about reconstructed

thanks for your work.

`import nanopq
import numpy as np

N, Nt, D = 10000, 2000, 128
X = np.random.random((N, D)).astype(np.float32) # 10,000 128-dim vectors to be indexed
Xt = np.random.random((Nt, D)).astype(np.float32) # 2,000 128-dim vectors for training
query = np.random.random((D,)).astype(np.float32) # a 128-dim query vector

pq = nanopq.PQ(M=8, Ks=256)
pq.fit(Xt, seed=123)
X_code = pq.encode(X) # (10000, 8) with dtype=np.uint8
X_reconstructed = pq.decode(codes=X_code)

tmp = X[0]
tmp1 = X_reconstructed[0]
dis = np.sqrt(np.sum(np.square(tmp - tmp1)))`

the dis is about 2.0+ . dose it look like right?

Add `shubham0204/pq.rs`, a Rust implementation of `pq.py`, as a community resource in `README.md`

I wanted to learn how product quantization works, and this repository provided excellent code to understand how it works. As I had been learning Rust for a few months now, I decided to re-write the pq.py script in Rust to understand each step thoroughly by self-implementation. Here's the repository containing the Rust code: shubham0204/pq.rs.

The following steps are have to be taken in order to complete the project:

  1. Complete README.md and add a small usage sample of the Rust API
  2. Prepare a crate and upload it to crates.io

Do let me know if the repository can be included as a community resource. Just like me, many other learners would like to learn implementation of product quantization in languages other than Python, and building a section where implementations in other languages would be of great help. Moreover, I'm also working on a detailed blog which will explain product-quantization from first-concepts and with a Rust implementation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.