Coder Social home page Coder Social logo

grubdragon / lanterndb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lanterndata/lantern

0.0 1.0 0.0 294 KB

PostgreSQL vector database extension for building AI applications

Home Page: https://lantern.dev

License: MIT License

Shell 11.77% Python 3.93% C 68.35% Makefile 0.08% CMake 8.79% PLpgSQL 6.99% Dockerfile 0.10%

lanterndb's Introduction

LanternDB ๐Ÿฎ

build test codecov

LanternDB is a relational and vector database, packaged as a Postgres extension. It provides a new index type for vector columns called hnsw which speeds up ORDER BY queries on the table.

Quickstart

Building and Installing LanternDB

LanternDB builds and uses usearch for its single-header state-of-the-art HNSW implementation.

To build and install LanternDB:

git clone --recursive https://github.com/lanterndata/lanterndb.git
cd lanterndb
mkdir build
cd build
cmake ..
make install
# optionally
# make test
If you have previously cloned LanternDB and would like to update ```bash git pull git submodule update ```

To install on M1 macs, replace cmake .. from the above with cmake -DUSEARCH_NO_MARCH_NATIVE=ON .. to avoid building usearch with unsupported march=native

Using LanternDB

  1. Run the following SQL command to enable lanterndb:
CREATE EXTENSION lanterndb;
  1. Create a table with a vector column and populate it with data.
CREATE TABLE small_world (
    id varchar(3),
    vector real[]
);

INSERT INTO small_world (id, vector) VALUES
('000', '{0,0,0}'),
('001', '{0,0,1}'),
('010', '{0,1,0}'),
('011', '{0,1,1}'),
('100', '{1,0,0}'),
('101', '{1,0,1}'),
('110', '{1,1,0}'),
('111', '{1,1,1}');
  1. Create an hnsw index on the table.
-- create index with default parameters
CREATE INDEX ON small_world USING hnsw (vector);
-- create index with custom parameters
-- CREATE INDEX ON small_world USING hnsw (vector) WITH (M=2, ef_construction=10, ef=4, dims=3);
  1. Leverage the index in queries like:
SELECT id, ROUND(l2sq_dist(vector, array[0,0,0])::numeric, 2) as dist
FROM small_world
ORDER BY vector <-> array[0,0,0] LIMIT 5;

A Note on Index Construction Parameters

The M, ef, and efConstruction parameters control the tradeoffs of the HNSW algorithm. In general, lower M and efConstruction speed up index creation at the cost of recall. Lower M and ef improve search speed and result in fewer shared buffer hits at the cost of recall. Tuning these parameters will require experimentation for your specific use case. An upcoming LanternDB release will include an optional auto-tuning index.

A Note on Performance

LanternDB's hnsw enables search latency similar to pgvector's ivfflat and is faster than ivfflat under certain construction parameters. LanternDB enables higher search throughput on the same hardware since the HNSW algorithm requires fewer distance comparisons than the IVF algorithm, leading to less CPU usage per search.

A note on operators and operator classes

Currently, there is only one operator <-> available.
This operator is intended exclusively for use with index lookups, such as in cases like ORDER BY vector <-> array[0,0,0].
Consequently, attempting to execute the query SELECT array[0,0,0] <-> array[0,0,0] will result in an error.

There are four defined operator classes that can be employed during index creation:

  • dist_l2sq_ops: Default for the type real[]
  • dist_vec_l2sq_ops: Default for the type vector
  • dist_cos_ops: Applicable to the type real[]
  • dist_hamming_ops: Applicable for the type integer[]

When creating an index, you have the option to specify the operator class to be used, like so:

CREATE INDEX ON small_world USING hnsw (vector dist_cos_ops);

This approach allows the <-> operator to automatically identify the appropriate distance function when utilized in index lookups.

Roadmap

  • Postgres wal-backed hnsw index creation on existing tables with sane defaults
  • Efficient index lookups, backed by usearch and postgres wal
  • INSERTs into the created index
  • DELETEs from the index and VACUUMing
  • Automatic index creation parameter (M, ef, efConstruction) tuning
  • Support for 16bit and 8bit vector elements
  • Support for over 2000 dimensional vectors
  • Support for INDEX-ONLY scans
  • Support for INCLUDE clauses in index creation, to expand the use of INDEX-ONLY scans
  • Allow out-of-band indexing and external index importing (to speed up index generation for large tables)
  • Allow using postgres ARRAYs as vectors
  • Add more distance functions
  • Add Product Quantization as another vector compression method
  • Implement a Vamana index introduced in DiskANN to potentially reduce the number of buffers hit during an index scan.

lanterndb's People

Contributors

ngalstyan4 avatar var77 avatar dqii avatar davkhech avatar grubdragon avatar ezra-varady avatar siddharth1729 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.