Coder Social home page Coder Social logo

intel / scalablevectorsearch Goto Github PK

View Code? Open in Web Editor NEW
107.0 8.0 15.0 15.9 MB

Home Page: https://intel.github.io/ScalableVectorSearch/

License: GNU Affero General Public License v3.0

CMake 1.47% Python 4.59% C++ 93.80% C 0.09% Shell 0.03% Dockerfile 0.01%

scalablevectorsearch's Introduction

Scalable Vector Search

Scalable Vector Search (SVS) is a performance library for vector similarity search. Thanks to the use of Locally-adaptive Vector Quantization [ABHT23] and its highly optimized indexing and search algorithms, SVS provides vector similarity search:

  • on billions of high-dimensional vectors,
  • at high accuracy
  • and state-of-the-art speed,
  • while enabling the use of less memory than its alternatives.

This enables application and framework developers using similarity search to unleash its performance on Intel ® Xeon CPUs (2nd generation and newer).

SVS offers a fully-featured and yet simple Python API, compatible with most standard libraries. SVS is written in C++ to facilitate its integration into performance-critical applications.

Performance

SVS provides state-of-the-art performance and accuracy [ABHT23] for billion-scale similarity search on standard benchmarks.

For example, for the standard billion-scale Deep-1B dataset, different configurations of SVS yield significantly increased performance (measured in queries per second, QPS) with a smaller memory footprint (horizontal axis) than the alternatives1:

SVS is primarily optimized for large-scale similarity search but it still offers state-of-the-art performance at million-scale.

Best performance is obtained with 4th generation (Sapphire Rapids) by making use of Intel(R) AVX-512 instructions, with excellent results also with 2nd and 3rd Intel ® Xeon ® processors (Cascade Lake and Ice Lake).

Performance will be degraded if Intel(R) AVX-512 instructions are not available. A warning message will appear when loading the SVS Python module if the system does not support Intel(R) AVX-512 instructions.

Key Features

SVS supports:

  • Similarity functions: Euclidean distance, inner product, cosine similarity.
  • Vectors with individual values encoded as: float32, float16, uint8, int8.
  • Vector compression (including Locally-adaptive Vector Quantization [ABHT23])
  • Optimizations for Intel ® Xeon ® processors:
    • 2nd generation (Cascade Lake)
    • 3rd generation (Ice Lake)
    • 4th generation (Sapphire Rapids)

See Roadmap for upcoming features.

Documentation

SVS documentation includes getting started tutorials with installation instructions for Python and C++ and step-by-step search examples, an API reference, as well as several guides and benchmark comparisons.

References

Reference to cite when you use SVS in a research paper:

@article{aguerrebere2023similarity,
        title={Similarity search in the blink of an eye with compressed indices},
        volume = {16},
        number = {11},
        pages = {3433--3446},
        journal = {Proceedings of the VLDB Endowment},
        author={Cecilia Aguerrebere and Ishwar Bhati and Mark Hildebrand and Mariano Tepper and Ted Willke},
        year = {2023}
}

[ABHT23] Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T.:Similarity search in the blink of an eye with compressed indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)

How to Contribute

We welcome your contributions to this project. See How to Contribute for more details.

Legal

Refer to the LICENSE file for details.

Footnotes

  1. Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

scalablevectorsearch's People

Contributors

aguerreb avatar mihaic avatar hildebrandmw avatar ibhati avatar alexsandruss avatar ethanglaser avatar marianotepper avatar slice4e avatar michaelbeale-il avatar meiravgri avatar vyaivo avatar

Stargazers

jaime avatar old Jack avatar Leonardo Delfino avatar Jan Luca Scheerer avatar Cosimo Rulli avatar Tristone avatar Minhui Xie avatar  avatar Fabio Dias Rollo avatar gruebleen avatar MOK TIJI avatar lxy_Elpsy avatar Match_yc avatar HyoJeong Yun avatar G avatar  avatar pq avatar  avatar Yu-Zhou avatar  avatar Anry Yang avatar Kaspter Ju avatar Eze Lanza (Eze) avatar Yuhong Zhong avatar Felipe Cruz avatar Jinyung Hong avatar CanYing | 残影 avatar 0400H avatar Dian-Lun (Aaron) Lin avatar V. Can Keklik avatar Theo M. Bulut avatar Kevinzz avatar Junyi Mei avatar xiangyuzhi avatar Shixuan Sun avatar Xuhao Chen avatar Regis Wu avatar Nikolay Petrov avatar  avatar Patrick Lurch avatar  avatar Charles avatar  avatar  avatar QWave avatar  avatar Mingzhuo Yin avatar  avatar yi wang avatar Jinjing Zhou avatar  avatar Sebastiano Sessa avatar GOUYT avatar Marcia Louis avatar Sachin Godse avatar Marc Schluper avatar Vignesh Srinivasan avatar Ofer Rivlin avatar John Andersen avatar Marc Schluper avatar  avatar Noaki avatar Raul Carlomagno avatar Jianyang Gao avatar Lorisyy avatar Ziting Wang avatar  Pavithran Pandiyan avatar  avatar Suraj Narayanan Sasikumar avatar Mohammed OE Abdallah avatar  avatar Gal Sarid avatar Qcy avatar JCZ avatar  avatar  avatar Jinx avatar  avatar Yong Liu avatar HGY avatar  avatar Xiaofan avatar zhuwenxing avatar Max avatar Yee avatar  avatar Tian Xinhui avatar shen yushi avatar Ted Willke avatar ciuji avatar Zhichang Yu avatar LSyhpruM avatar Jankin avatar  avatar Victor avatar Yingfeng avatar  avatar Daniel Fleischer avatar Zhang Jun avatar Harsha Vardhan Simhadri avatar

Watchers

 avatar Kaspter Ju avatar  avatar  avatar  avatar  avatar  avatar  avatar

scalablevectorsearch's Issues

About reproducibility

It seems that the MLVQ.py in reproducibility/ doesn't work properly due to the updated api. Could you please fix it?

Build issues for GCC-12+

Newer versions of GCC have gotten more pedantic about the use of concepts in require clauses. We should correct these and add gcc-12/gcc-13 to CI.

Reproducibility

Thank you for a great work.
I would like to reproduce the results and compare to another engine. Following the documentation, it appears that you have already included a test in ann-benchmarks:

The code to run OG-LVQ can be found here, and it was included in the ANN-benchmarks and Big-ANN-benchmarks evaluation codes following their guidelines.

Unfortunately, I am not able to find it. Could you please provide a link?

thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.