lanterndata / lantern Goto Github PK
View Code? Open in Web Editor NEWPostgreSQL vector database extension for building AI applications
Home Page: https://lantern.dev
License: Other
PostgreSQL vector database extension for building AI applications
Home Page: https://lantern.dev
License: Other
Test the case when the table does not have a vector column but an hnsw index is create on an expression-generated vector
something like
CREATE INDEX ON papers(clip_text(abstract));
context:
https://www.postgresqltutorial.com/postgresql-indexes/postgresql-index-on-expression/
Currently, inserting elements into an empty index is not supported.
Add support for that.
We should have stress tests such as:
We concurrently insert into the database from 4 clients, we query from another 3 clients and we do some misc. things from another client (e.g. alter table add column, create additional index, drop/add index).
We can calculate some invariants to check in the end, in the regression test. e.g. number of total rows in the table, result of a query after all this load is done.
We can achieve this with pg_regress
. To do that we just should avoid creating a separate DB for each test and should instead use pg_regress
like most people usually use it.
I am open to keeping current tests intact and just adding a new mechanism for running this kinds of tests. I am open to other suggestions as well.
I think we compile and package all binary release binaries on ubuntu22, with corresponding glibc version.
However, older systems have older libc and CREATE EXTENSION lantern
on those systems fails with the following message:
ERROR: could not load library "/usr/lib/postgresql/13/lib/lantern.so": /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/lib/postgresql/13/lib/lantern.so)
All works well when compiled from source on these systems since the correct libc dependency is used.
@var77, thoughts on solutions?
Are libc versions backward compatible or do they require exact match? if they are backward compatible, we could use an old machine for compiling the binaries.
Every insert seems to land at the same level as seed gets reset to default value every time.
Ask is to look into this and fix it.
In particular, make sure to run benchmark code to ensure no regression before and after the fix.
Currently we hold a lock on the header page of the index for much much longer than necessary on insertions.
When building Lantern on WSL and running make test
, the first test debug_helpers
fails. diff
ing the generated output file with the expected output in test/expected/debug_helpers.out
produces the following:
5,7c5,7
< CREATE TABLE small_world (
< id varchar(3),
< vector vector(3)
---
> CREATE TABLE small_world (
> id varchar(3),
> vector vector(3)
10,17c10,17
< INSERT INTO small_world (id, vector) VALUES
< ('000', '[0,0,0]'),
< ('001', '[0,0,1]'),
< ('010', '[0,1,0]'),
< ('011', '[0,1,1]'),
< ('100', '[1,0,0]'),
< ('101', '[1,0,1]'),
< ('110', '[1,1,0]'),
---
> INSERT INTO small_world (id, vector) VALUES
> ('000', '[0,0,0]'),
> ('001', '[0,0,1]'),
> ('010', '[0,1,0]'),
> ('011', '[0,1,1]'),
> ('100', '[1,0,0]'),
> ('101', '[1,0,1]'),
> ('110', '[1,1,0]'),
These appear to be identical, but running cat -net debug_helpers.out
on the generated output shows the following:
1 CREATE EXTENSION IF NOT EXISTS vector;$
2 CREATE EXTENSION$
3 CREATE EXTENSION IF NOT EXISTS lanterndb;$
4 CREATE EXTENSION$
5 CREATE TABLE small_world (^M$
6 id varchar(3),^M$
7 vector vector(3)^M$
8 );$
9 CREATE TABLE$
10 INSERT INTO small_world (id, vector) VALUES ^M$
11 ('000', '[0,0,0]'),^M$
12 ('001', '[0,0,1]'),^M$
13 ('010', '[0,1,0]'),^M$
14 ('011', '[0,1,1]'),^M$
15 ('100', '[1,0,0]'),^M$
16 ('101', '[1,0,1]'),^M$
17 ('110', '[1,1,0]'),^M$
18 ('111', '[1,1,1]');$
19 INSERT 0 8$
20 SHOW hnsw.init_k;$
21 hnsw.init_k $
...
So apparently some lines of the generated output have windows-style CRLF, while most of the file has unix-style LF.
Currently compilation fails for my machine since it has __AVX512F__
defined, but environment does not support _Float16
. See code below:
There is another underlying issue in the else condition of the same code, since the code throws away the environment's choice, even if USEARCH_USE_NATIVE_F16 is pre-defined,
A better choice is to -
__AVX512__
is defined, we check if _Float16
is defined.This will be used for benchmarking, fuzz-testing and other testing.
I have a couple of embedded devices I would like to connect to these Ci/CD hooks as well (so benchmarks continually run on those devices as well). So, would be great if the approach extends to triggering the same script on multiple servers.
Currently the dockerhub home page for LanternDB is empty
Add documentation and basic usage instructions there. Look at Postgres docker image for inspiration
There is a repo LanternDB Extras, we should include it in our archive package in release pipeline.
Should we add it to our docker release?
Currently we properly detect and fail the query in cases when the distance operator <->
is used outside of index-search context
We only check for existence of the index, however, and we do not check whether the index will actually be triggered.
This means that on small tables search queries fill fail unless we set enable_seqscan=off
. This is bad UX.
We can instead detect that index is not used and rewrite the query to trigger the right distance operator.
All the info is already in CMakeLists.txt version so we can generate that file, like we generate others
This will make Release one step shorter
Currently, for testing we simply run diff
between expected output and the output generated on the instance being tested.
We should move to using pg_regress
.
pg_regress
is a binary available from postgres server dev packages. It is usually not added to $PATH
, but is available under
$(pg_config --pkglibdir)/pgxs/src/test/regress/pg_regress
. pg_regress
calls diff
under the hood, but advantages of pg_regress
are:
The advantages of pg_regress
are:
diff
naively)$ docker run -dit --rm -p 5432:5432 -e 'POSTGRES_PASSWORD=postgres' lanterndata/lantern:latest-pg15
cc6e330dca087bef8370d21ee4925f79b76d05ebc28d7a47840360d7f63a1cf3
$ psql -h localhost -p 5432 -U postgres -d postgres
Password for user postgres:
psql (10.23 (Ubuntu 10.23-0ubuntu0.18.04.2), server 15.4 (Debian 15.4-1.pgdg120+1))
WARNING: psql major version 10, server major version 15.
Some psql features might not work.
Type "help" for help.
postgres=# CREATE EXTENSION lantern;
CREATE EXTENSION
postgres=# CREATE TABLE small_world (id integer, vector real[3]);
CREATE TABLE
postgres=# INSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');
INSERT 0 2
postgres=# CREATE INDEX ON small_world USING hnsw (vector);
INFO: done init usearch index
INFO: inserted 2 elements
INFO: done saving 2 vectors
CREATE INDEX
postgres=# CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops)
postgres-# WITH (M=2, ef_construction=10, ef=4, dims=3);
ERROR: unrecognized parameter "dims"
postgres=# CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops)
WITH (M=2, ef_construction=10, ef=4, dim=3);
INFO: done init usearch index
INFO: inserted 2 elements
INFO: done saving 2 vectors
CREATE INDEX
postgres=# SELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist
postgres-# FROM small_world ORDER BY vector <-> ARRAY[0,0,0] LIMIT 1;
ERROR: Operator <-> has no standalone meaning and is reserved for use in vector index lookups only
postgres=# \q
I was able to get it working by changing dims to dim, and omitting the <-> ... operator and subsequent clause, but I'm not sure how important these are for the overall workflow. Thanks for this, I was struggling getting the sqllite extension and pgvector extensions running within a spike efficient timeframe.
Once the script is ready, we can add it to ci/cd to run for all supported minor version upgrades and ensure that upgrade scripts work properly
B-Tree indices support incremental sorts on top of an index.
postgres=# create table temp_table(id serial primary key, v integer, b integer);
CREATE TABLE
postgres=# insert into temp_table(v) values (1), (2), (3);
INSERT 0 3
postgres=# create index on temp_table(v);
CREATE INDEX
postgres=# set enable_seqscan = off;
SET
postgres=# explain select 1 from temp_table order by v;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Only Scan using temp_table_v_idx on temp_table (cost=0.13..12.18 rows=3 width=8)
(1 row)
postgres=# explain select 1 from temp_table order by v, id;
QUERY PLAN
-------------------------------------------------------------------------------------------
Incremental Sort (cost=4.15..12.31 rows=3 width=12)
Sort Key: v, id
Presorted Key: v
-> Index Scan using temp_table_v_idx on temp_table (cost=0.13..12.18 rows=3 width=12)
(4 rows)
postgres=# explain select 1 from temp_table order by v, b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Incremental Sort (cost=4.15..12.31 rows=3 width=12)
Sort Key: v, b
Presorted Key: v
-> Index Scan using temp_table_v_idx on temp_table (cost=0.13..12.18 rows=3 width=12)
(4 rows)
We should do the same.
Postgres is sometimes not using hnsw index, and returning the wrong values. Fix this.
Query:
CREATE TABLE sift_base10k_1 (
id SERIAL PRIMARY KEY,
v real[128]
);
\copy sift_base10k_1 (v) FROM '/tmp/lanterndb/vector_datasets/siftsmall_base_arrays_smaller.csv' with csv;
CREATE INDEX hnsw_idx ON sift_base10k_1 USING hnsw (v dist_l2sq_ops) WITH (M=5, ef_construction=10, ef=100, dims=128);
-- \copy sift_base10k_1 (v) FROM '/tmp/lanterndb/vector_datasets/siftsmall_base_arrays_smaller.csv' with csv;
-- SELECT V AS v4444 FROM sift_base10k_1 WHERE id = 4444 \gset
EXPLAIN ANALYZE SELECT * FROM sift_base10k_1 order by v <-> '{120,15,0,0,0,8,8,10,34,0,0,0,1,120,89,25,16,0,0,1,3,39,120,120,26,0,0,11,70,7,19,26,111,16,0,0,0,105,49,16,3,0,0,0,1,120,83,8,120,2,0,0,0,20,55,98,120,1,4,10,55,39,13,46,16,0,0,0,8,87,31,8,8,0,0,0,0,36,34,14,120,89,6,6,11,7,12,31,64,24,4,6,26,54,117,25,0,0,0,2,16,7,1,0,11,0,0,2,2,0,0,2,120,26,2,4,25,28,19,69,3,6,1,4,57,52,24,12}'
LIMIT 5;
-- should be 5100
SELECT id, l2sq_dist(v, '{120,15,0,0,0,8,8,10,34,0,0,0,1,120,89,25,16,0,0,1,3,39,120,120,26,0,0,11,70,7,19,26,111,16,0,0,0,105,49,16,3,0,0,0,1,120,83,8,120,2,0,0,0,20,55,98,120,1,4,10,55,39,13,46,16,0,0,0,8,87,31,8,8,0,0,0,0,36,34,14,120,89,6,6,11,7,12,31,64,24,4,6,26,54,117,25,0,0,0,2,16,7,1,0,11,0,0,2,2,0,0,2,120,26,2,4,25,28,19,69,3,6,1,4,57,52,24,12}'), v FROM sift_base10k_1 order by v <-> '{120,15,0,0,0,8,8,10,34,0,0,0,1,120,89,25,16,0,0,1,3,39,120,120,26,0,0,11,70,7,19,26,111,16,0,0,0,105,49,16,3,0,0,0,1,120,83,8,120,2,0,0,0,20,55,98,120,1,4,10,55,39,13,46,16,0,0,0,8,87,31,8,8,0,0,0,0,36,34,14,120,89,6,6,11,7,12,31,64,24,4,6,26,54,117,25,0,0,0,2,16,7,1,0,11,0,0,2,2,0,0,2,120,26,2,4,25,28,19,69,3,6,1,4,57,52,24,12}'
LIMIT 5;
Data:
"{0.0, 16.0, 35.0, 5.0, 32.0, 31.0, 14.0, 10.0, 11.0, 78.0, 55.0, 10.0, 45.0, 83.0, 11.0, 6.0, 14.0, 57.0, 102.0, 75.0, 20.0, 8.0, 3.0, 5.0, 67.0, 17.0, 19.0, 26.0, 5.0, 0.0, 1.0, 22.0, 60.0, 26.0, 7.0, 1.0, 18.0, 22.0, 84.0, 53.0, 85.0, 119.0, 119.0, 4.0, 24.0, 18.0, 7.0, 7.0, 1.0, 81.0, 106.0, 102.0, 72.0, 30.0, 6.0, 0.0, 9.0, 1.0, 9.0, 119.0, 72.0, 1.0, 4.0, 33.0, 119.0, 29.0, 6.0, 1.0, 0.0, 1.0, 14.0, 52.0, 119.0, 30.0, 3.0, 0.0, 0.0, 55.0, 92.0, 111.0, 2.0, 5.0, 4.0, 9.0, 22.0, 89.0, 96.0, 14.0, 1.0, 0.0, 1.0, 82.0, 59.0, 16.0, 20.0, 5.0, 25.0, 14.0, 11.0, 4.0, 0.0, 0.0, 1.0, 26.0, 47.0, 23.0, 4.0, 0.0, 0.0, 4.0, 38.0, 83.0, 30.0, 14.0, 9.0, 4.0, 9.0, 17.0, 23.0, 41.0, 0.0, 0.0, 2.0, 8.0, 19.0, 25.0, 23.0, 1.0}"
"{120.0, 15.0, 6.0, 1.0, 0.0, 0.0, 0.0, 12.0, 80.0, 18.0, 65.0, 123.0, 0.0, 0.0, 0.0, 2.0, 1.0, 2.0, 123.0, 123.0, 2.0, 0.0, 0.0, 0.0, 7.0, 10.0, 58.0, 39.0, 3.0, 0.0, 0.0, 0.0, 123.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 40.0, 123.0, 23.0, 6.0, 15.0, 1.0, 3.0, 4.0, 32.0, 10.0, 8.0, 24.0, 115.0, 90.0, 48.0, 7.0, 1.0, 7.0, 10.0, 7.0, 60.0, 54.0, 23.0, 8.0, 1.0, 119.0, 8.0, 1.0, 0.0, 0.0, 0.0, 0.0, 53.0, 100.0, 3.0, 0.0, 0.0, 0.0, 47.0, 121.0, 123.0, 2.0, 1.0, 0.0, 0.0, 30.0, 123.0, 123.0, 5.0, 32.0, 18.0, 0.0, 1.0, 10.0, 62.0, 63.0, 19.0, 31.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 25.0, 16.0, 0.0, 0.0, 5.0, 29.0, 41.0, 55.0, 31.0, 24.0, 16.0, 11.0, 5.0, 7.0, 44.0, 48.0, 9.0, 52.0, 34.0, 16.0, 5.0, 0.0, 5.0, 3.0, 2.0}"
"{19.0, 14.0, 10.0, 13.0, 7.0, 0.0, 2.0, 4.0, 15.0, 43.0, 139.0, 48.0, 6.0, 0.0, 0.0, 1.0, 4.0, 93.0, 139.0, 22.0, 0.0, 0.0, 0.0, 2.0, 0.0, 61.0, 87.0, 1.0, 0.0, 0.0, 0.0, 0.0, 88.0, 14.0, 1.0, 2.0, 2.0, 1.0, 3.0, 27.0, 139.0, 94.0, 106.0, 14.0, 0.0, 0.0, 2.0, 24.0, 13.0, 28.0, 139.0, 56.0, 1.0, 0.0, 4.0, 3.0, 5.0, 51.0, 139.0, 14.0, 3.0, 1.0, 0.0, 0.0, 108.0, 7.0, 0.0, 0.0, 0.0, 1.0, 6.0, 19.0, 139.0, 14.0, 0.0, 0.0, 0.0, 0.0, 11.0, 114.0, 13.0, 2.0, 8.0, 2.0, 1.0, 3.0, 51.0, 55.0, 13.0, 4.0, 5.0, 5.0, 4.0, 5.0, 15.0, 13.0, 42.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 21.0, 126.0, 1.0, 0.0, 0.0, 0.0, 0.0, 12.0, 125.0, 11.0, 6.0, 3.0, 0.0, 0.0, 4.0, 92.0, 57.0, 28.0, 6.0, 0.0, 0.0, 0.0, 0.0, 23.0, 37.0}"
"{86.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 65.0, 124.0, 10.0, 3.0, 2.0, 0.0, 1.0, 7.0, 54.0, 2.0, 3.0, 3.0, 6.0, 20.0, 16.0, 11.0, 2.0, 0.0, 0.0, 0.0, 0.0, 31.0, 35.0, 29.0, 7.0, 138.0, 13.0, 0.0, 0.0, 0.0, 2.0, 7.0, 110.0, 138.0, 34.0, 1.0, 1.0, 5.0, 4.0, 10.0, 32.0, 5.0, 1.0, 1.0, 3.0, 58.0, 55.0, 12.0, 2.0, 2.0, 0.0, 0.0, 6.0, 94.0, 53.0, 13.0, 6.0, 138.0, 38.0, 3.0, 0.0, 0.0, 1.0, 12.0, 45.0, 117.0, 10.0, 0.0, 0.0, 2.0, 31.0, 138.0, 90.0, 0.0, 0.0, 0.0, 0.0, 11.0, 129.0, 138.0, 6.0, 0.0, 0.0, 0.0, 0.0, 11.0, 138.0, 116.0, 0.0, 13.0, 17.0, 6.0, 0.0, 2.0, 3.0, 16.0, 15.0, 3.0, 2.0, 1.0, 1.0, 2.0, 33.0, 88.0, 16.0, 0.0, 0.0, 0.0, 1.0, 8.0, 23.0, 97.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 20.0, 35.0, 1.0}"
"{12.0, 9.0, 18.0, 5.0, 4.0, 5.0, 13.0, 19.0, 20.0, 60.0, 46.0, 18.0, 5.0, 7.0, 8.0, 3.0, 4.0, 36.0, 36.0, 89.0, 31.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 37.0, 20.0, 0.0, 0.0, 0.0, 33.0, 2.0, 0.0, 0.0, 1.0, 50.0, 87.0, 52.0, 136.0, 64.0, 1.0, 5.0, 2.0, 12.0, 46.0, 44.0, 35.0, 28.0, 3.0, 118.0, 88.0, 1.0, 0.0, 2.0, 0.0, 0.0, 23.0, 82.0, 45.0, 0.0, 0.0, 0.0, 96.0, 1.0, 0.0, 0.0, 10.0, 123.0, 21.0, 24.0, 136.0, 12.0, 0.0, 0.0, 0.0, 4.0, 41.0, 117.0, 19.0, 1.0, 0.0, 17.0, 37.0, 93.0, 116.0, 33.0, 0.0, 0.0, 5.0, 28.0, 41.0, 47.0, 47.0, 7.0, 53.0, 0.0, 0.0, 0.0, 1.0, 22.0, 2.0, 20.0, 77.0, 0.0, 0.0, 0.0, 0.0, 8.0, 136.0, 113.0, 6.0, 0.0, 0.0, 0.0, 0.0, 75.0, 136.0, 32.0, 37.0, 0.0, 0.0, 0.0, 0.0, 10.0, 51.0, 47.0}"
"{61.0, 3.0, 0.0, 0.0, 5.0, 47.0, 9.0, 17.0, 2.0, 1.0, 6.0, 21.0, 23.0, 89.0, 77.0, 43.0, 11.0, 15.0, 63.0, 41.0, 4.0, 0.0, 15.0, 67.0, 34.0, 21.0, 8.0, 7.0, 0.0, 0.0, 0.0, 7.0, 87.0, 8.0, 0.0, 34.0, 76.0, 39.0, 4.0, 20.0, 0.0, 1.0, 5.0, 57.0, 104.0, 74.0, 58.0, 75.0, 43.0, 10.0, 13.0, 7.0, 2.0, 0.0, 52.0, 121.0, 72.0, 28.0, 13.0, 6.0, 7.0, 2.0, 4.0, 50.0, 75.0, 12.0, 1.0, 42.0, 50.0, 1.0, 5.0, 47.0, 0.0, 21.0, 84.0, 121.0, 93.0, 1.0, 2.0, 5.0, 66.0, 121.0, 121.0, 7.0, 0.0, 0.0, 1.0, 33.0, 110.0, 77.0, 9.0, 0.0, 24.0, 42.0, 1.0, 12.0, 13.0, 54.0, 34.0, 3.0, 3.0, 0.0, 0.0, 7.0, 1.0, 62.0, 121.0, 22.0, 7.0, 7.0, 9.0, 1.0, 1.0, 87.0, 114.0, 2.0, 8.0, 25.0, 27.0, 0.0, 3.0, 16.0, 8.0, 2.0, 41.0, 59.0, 6.0, 0.0}"
"{24.0, 22.0, 1.0, 0.0, 0.0, 1.0, 10.0, 39.0, 14.0, 13.0, 2.0, 0.0, 1.0, 45.0, 47.0, 27.0, 8.0, 3.0, 13.0, 30.0, 2.0, 12.0, 88.0, 86.0, 3.0, 0.0, 12.0, 89.0, 12.0, 0.0, 6.0, 17.0, 27.0, 52.0, 22.0, 6.0, 30.0, 62.0, 17.0, 6.0, 9.0, 11.0, 2.0, 2.0, 13.0, 120.0, 79.0, 7.0, 120.0, 10.0, 0.0, 0.0, 0.0, 40.0, 82.0, 120.0, 120.0, 5.0, 13.0, 27.0, 17.0, 0.0, 3.0, 51.0, 0.0, 2.0, 17.0, 92.0, 108.0, 29.0, 14.0, 3.0, 4.0, 37.0, 120.0, 71.0, 36.0, 46.0, 6.0, 0.0, 120.0, 100.0, 107.0, 3.0, 1.0, 0.0, 0.0, 7.0, 120.0, 19.0, 3.0, 6.0, 29.0, 2.0, 2.0, 16.0, 0.0, 0.0, 6.0, 17.0, 8.0, 6.0, 86.0, 20.0, 0.0, 21.0, 87.0, 18.0, 1.0, 6.0, 98.0, 34.0, 19.0, 18.0, 56.0, 8.0, 4.0, 17.0, 53.0, 24.0, 84.0, 10.0, 2.0, 4.0, 3.0, 0.0, 1.0, 18.0}"
"{8.0, 2.0, 1.0, 6.0, 30.0, 14.0, 25.0, 39.0, 13.0, 1.0, 1.0, 5.0, 46.0, 45.0, 46.0, 71.0, 1.0, 0.0, 0.0, 0.0, 53.0, 51.0, 2.0, 3.0, 69.0, 4.0, 0.0, 0.0, 17.0, 11.0, 0.0, 12.0, 28.0, 53.0, 28.0, 56.0, 34.0, 5.0, 1.0, 7.0, 25.0, 10.0, 8.0, 126.0, 77.0, 7.0, 4.0, 72.0, 4.0, 6.0, 10.0, 4.0, 45.0, 38.0, 5.0, 26.0, 102.0, 80.0, 14.0, 4.0, 9.0, 2.0, 0.0, 6.0, 110.0, 64.0, 14.0, 37.0, 4.0, 0.0, 0.0, 14.0, 41.0, 28.0, 25.0, 126.0, 126.0, 0.0, 2.0, 16.0, 1.0, 126.0, 114.0, 29.0, 19.0, 5.0, 3.0, 12.0, 30.0, 126.0, 125.0, 7.0, 11.0, 2.0, 0.0, 0.0, 100.0, 4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 56.0, 51.0, 16.0, 23.0, 84.0, 94.0, 11.0, 2.0, 31.0, 38.0, 126.0, 49.0, 23.0, 16.0, 1.0, 0.0, 6.0, 5.0, 41.0, 45.0, 12.0, 9.0, 1.0, 0.0, 1.0}"
"{33.0, 20.0, 2.0, 0.0, 0.0, 37.0, 75.0, 28.0, 1.0, 0.0, 0.0, 0.0, 7.0, 96.0, 134.0, 21.0, 92.0, 29.0, 1.0, 3.0, 22.0, 61.0, 74.0, 31.0, 26.0, 19.0, 1.0, 15.0, 89.0, 16.0, 0.0, 0.0, 3.0, 4.0, 0.0, 1.0, 19.0, 134.0, 80.0, 13.0, 9.0, 11.0, 6.0, 7.0, 4.0, 62.0, 134.0, 21.0, 134.0, 7.0, 2.0, 5.0, 17.0, 32.0, 65.0, 120.0, 50.0, 3.0, 0.0, 6.0, 123.0, 89.0, 7.0, 20.0, 16.0, 37.0, 0.0, 0.0, 40.0, 46.0, 4.0, 4.0, 15.0, 66.0, 8.0, 22.0, 32.0, 0.0, 2.0, 2.0, 134.0, 23.0, 2.0, 16.0, 49.0, 24.0, 36.0, 77.0, 37.0, 0.0, 0.0, 0.0, 22.0, 88.0, 67.0, 29.0, 10.0, 60.0, 6.0, 1.0, 5.0, 1.0, 0.0, 0.0, 4.0, 72.0, 38.0, 33.0, 22.0, 0.0, 0.0, 0.0, 38.0, 83.0, 22.0, 35.0, 46.0, 15.0, 4.0, 2.0, 19.0, 10.0, 8.0, 9.0, 12.0, 14.0, 11.0, 8.0}"
"{107.0, 8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.0, 49.0, 8.0, 20.0, 51.0, 126.0, 26.0, 0.0, 5.0, 0.0, 45.0, 126.0, 47.0, 76.0, 20.0, 2.0, 0.0, 1.0, 42.0, 118.0, 17.0, 0.0, 0.0, 0.0, 0.0, 126.0, 28.0, 0.0, 0.0, 0.0, 0.0, 0.0, 17.0, 117.0, 55.0, 96.0, 44.0, 20.0, 0.0, 0.0, 4.0, 0.0, 59.0, 126.0, 50.0, 33.0, 8.0, 0.0, 0.0, 0.0, 9.0, 89.0, 71.0, 33.0, 16.0, 0.0, 0.0, 126.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 26.0, 126.0, 14.0, 3.0, 0.0, 2.0, 3.0, 1.0, 44.0, 5.0, 3.0, 4.0, 5.0, 71.0, 89.0, 1.0, 4.0, 0.0, 0.0, 0.0, 3.0, 41.0, 72.0, 13.0, 1.0, 70.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 53.0, 106.0, 0.0, 0.0, 0.0, 1.0, 6.0, 4.0, 104.0, 7.0, 0.0, 0.0, 9.0, 64.0, 78.0, 11.0, 9.0, 0.0, 0.0, 0.0, 1.0, 34.0, 67.0, 16.0, 0.0}"
"{14.0, 14.0, 0.0, 3.0, 8.0, 3.0, 4.0, 2.0, 19.0, 36.0, 39.0, 41.0, 13.0, 0.0, 0.0, 2.0, 11.0, 6.0, 76.0, 107.0, 10.0, 0.0, 0.0, 1.0, 118.0, 18.0, 4.0, 17.0, 0.0, 0.0, 0.0, 5.0, 102.0, 23.0, 0.0, 0.0, 1.0, 4.0, 21.0, 38.0, 105.0, 118.0, 76.0, 2.0, 0.0, 3.0, 3.0, 9.0, 5.0, 53.0, 90.0, 118.0, 61.0, 52.0, 12.0, 0.0, 118.0, 20.0, 4.0, 77.0, 32.0, 9.0, 2.0, 21.0, 118.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 61.0, 118.0, 13.0, 0.0, 0.0, 0.0, 52.0, 118.0, 95.0, 4.0, 6.0, 0.0, 3.0, 18.0, 118.0, 118.0, 4.0, 118.0, 9.0, 0.0, 7.0, 11.0, 23.0, 38.0, 57.0, 24.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 25.0, 11.0, 0.0, 0.0, 0.0, 0.0, 24.0, 83.0, 34.0, 0.0, 0.0, 0.0, 0.0, 0.0, 37.0, 58.0, 5.0, 28.0, 0.0, 0.0, 0.0, 0.0, 1.0, 14.0, 62.0}"
"{0.0, 0.0, 0.0, 6.0, 13.0, 7.0, 10.0, 7.0, 15.0, 2.0, 0.0, 13.0, 25.0, 16.0, 3.0, 2.0, 143.0, 41.0, 0.0, 1.0, 1.0, 1.0, 0.0, 25.0, 96.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 21.0, 0.0, 2.0, 7.0, 12.0, 4.0, 10.0, 73.0, 22.0, 43.0, 4.0, 11.0, 1.0, 8.0, 24.0, 53.0, 16.0, 143.0, 31.0, 0.0, 0.0, 1.0, 1.0, 1.0, 46.0, 114.0, 6.0, 0.0, 0.0, 0.0, 0.0, 7.0, 35.0, 5.0, 48.0, 143.0, 67.0, 1.0, 0.0, 7.0, 8.0, 18.0, 40.0, 143.0, 39.0, 0.0, 0.0, 5.0, 5.0, 143.0, 40.0, 137.0, 46.0, 0.0, 0.0, 2.0, 73.0, 46.0, 6.0, 4.0, 1.0, 0.0, 0.0, 40.0, 72.0, 72.0, 32.0, 59.0, 5.0, 0.0, 0.0, 2.0, 58.0, 9.0, 22.0, 143.0, 18.0, 0.0, 0.0, 10.0, 31.0, 4.0, 4.0, 114.0, 123.0, 4.0, 2.0, 1.0, 8.0, 7.0, 7.0, 20.0, 38.0, 1.0, 0.0, 5.0, 13.0}"
"{0.0, 0.0, 0.0, 0.0, 2.0, 30.0, 49.0, 0.0, 0.0, 0.0, 0.0, 1.0, 24.0, 61.0, 66.0, 1.0, 21.0, 7.0, 0.0, 1.0, 48.0, 74.0, 29.0, 8.0, 45.0, 58.0, 3.0, 0.0, 0.0, 3.0, 3.0, 6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 80.0, 137.0, 9.0, 0.0, 0.0, 0.0, 0.0, 2.0, 130.0, 137.0, 7.0, 137.0, 24.0, 0.0, 0.0, 3.0, 102.0, 130.0, 48.0, 137.0, 55.0, 5.0, 1.0, 7.0, 7.0, 1.0, 14.0, 0.0, 16.0, 20.0, 5.0, 0.0, 14.0, 54.0, 3.0, 4.0, 0.0, 13.0, 32.0, 37.0, 30.0, 51.0, 3.0, 137.0, 1.0, 0.0, 3.0, 12.0, 8.0, 14.0, 98.0, 137.0, 0.0, 0.0, 0.0, 12.0, 35.0, 18.0, 95.0, 1.0, 14.0, 13.0, 4.0, 3.0, 0.0, 0.0, 0.0, 1.0, 4.0, 9.0, 28.0, 65.0, 9.0, 0.0, 0.0, 96.0, 54.0, 0.0, 4.0, 24.0, 3.0, 5.0, 45.0, 68.0, 16.0, 0.0, 0.0, 3.0, 64.0, 34.0, 47.0}"
"{0.0, 0.0, 0.0, 0.0, 0.0, 93.0, 60.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 92.0, 136.0, 14.0, 42.0, 0.0, 0.0, 0.0, 1.0, 38.0, 127.0, 63.0, 27.0, 0.0, 0.0, 2.0, 1.0, 1.0, 5.0, 34.0, 3.0, 13.0, 11.0, 0.0, 0.0, 136.0, 136.0, 2.0, 26.0, 6.0, 14.0, 7.0, 3.0, 83.0, 136.0, 24.0, 136.0, 8.0, 0.0, 0.0, 0.0, 12.0, 57.0, 116.0, 57.0, 2.0, 0.0, 0.0, 0.0, 5.0, 22.0, 100.0, 26.0, 109.0, 39.0, 9.0, 11.0, 31.0, 16.0, 0.0, 25.0, 13.0, 30.0, 35.0, 40.0, 36.0, 2.0, 5.0, 136.0, 20.0, 0.0, 2.0, 1.0, 13.0, 12.0, 38.0, 27.0, 11.0, 1.0, 0.0, 0.0, 15.0, 28.0, 84.0, 19.0, 20.0, 0.0, 0.0, 22.0, 58.0, 1.0, 3.0, 29.0, 1.0, 0.0, 3.0, 37.0, 49.0, 2.0, 16.0, 136.0, 8.0, 0.0, 0.0, 0.0, 0.0, 0.0, 22.0, 29.0, 11.0, 3.0, 2.0, 0.0, 0.0, 0.0, 4.0}"
"{7.0, 2.0, 5.0, 1.0, 0.0, 1.0, 10.0, 58.0, 15.0, 0.0, 6.0, 54.0, 29.0, 16.0, 11.0, 75.0, 132.0, 6.0, 5.0, 49.0, 43.0, 13.0, 2.0, 28.0, 132.0, 10.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.0, 2.0, 9.0, 95.0, 47.0, 0.0, 0.0, 3.0, 14.0, 8.0, 2.0, 44.0, 37.0, 12.0, 23.0, 36.0, 47.0, 131.0, 35.0, 15.0, 7.0, 22.0, 17.0, 14.0, 28.0, 132.0, 49.0, 0.0, 0.0, 0.0, 0.0, 0.0, 58.0, 4.0, 4.0, 73.0, 61.0, 3.0, 0.0, 0.0, 2.0, 33.0, 15.0, 132.0, 132.0, 2.0, 1.0, 0.0, 13.0, 84.0, 83.0, 132.0, 64.0, 2.0, 0.0, 1.0, 7.0, 132.0, 89.0, 15.0, 0.0, 0.0, 0.0, 0.0, 12.0, 3.0, 31.0, 20.0, 13.0, 3.0, 2.0, 1.0, 1.0, 30.0, 77.0, 90.0, 8.0, 0.0, 0.0, 0.0, 10.0, 11.0, 43.0, 114.0, 22.0, 0.0, 0.0, 0.0, 1.0, 28.0, 21.0, 32.0, 5.0, 0.0, 0.0, 1.0, 5.0}"
"{5.0, 18.0, 20.0, 27.0, 14.0, 30.0, 9.0, 2.0, 7.0, 14.0, 4.0, 0.0, 0.0, 21.0, 81.0, 39.0, 62.0, 6.0, 2.0, 0.0, 0.0, 3.0, 47.0, 95.0, 41.0, 9.0, 5.0, 3.0, 0.0, 0.0, 0.0, 25.0, 0.0, 3.0, 2.0, 5.0, 101.0, 105.0, 7.0, 0.0, 0.0, 0.0, 1.0, 5.0, 14.0, 34.0, 60.0, 23.0, 126.0, 5.0, 1.0, 0.0, 4.0, 2.0, 39.0, 98.0, 126.0, 12.0, 7.0, 5.0, 4.0, 2.0, 1.0, 58.0, 0.0, 0.0, 48.0, 126.0, 44.0, 27.0, 0.0, 0.0, 0.0, 23.0, 126.0, 29.0, 13.0, 4.0, 2.0, 2.0, 126.0, 126.0, 85.0, 3.0, 6.0, 0.0, 2.0, 19.0, 67.0, 59.0, 28.0, 1.0, 2.0, 1.0, 2.0, 54.0, 1.0, 1.0, 78.0, 108.0, 0.0, 0.0, 0.0, 2.0, 1.0, 39.0, 126.0, 43.0, 2.0, 1.0, 0.0, 0.0, 3.0, 126.0, 115.0, 3.0, 3.0, 0.0, 2.0, 1.0, 0.0, 23.0, 24.0, 7.0, 18.0, 3.0, 4.0, 0.0}"
"{4.0, 3.0, 0.0, 0.0, 0.0, 26.0, 42.0, 10.0, 7.0, 5.0, 4.0, 2.0, 1.0, 54.0, 126.0, 15.0, 27.0, 1.0, 1.0, 1.0, 2.0, 55.0, 69.0, 56.0, 24.0, 7.0, 16.0, 4.0, 4.0, 0.0, 0.0, 23.0, 2.0, 0.0, 0.0, 4.0, 46.0, 94.0, 61.0, 7.0, 1.0, 0.0, 4.0, 2.0, 40.0, 96.0, 126.0, 6.0, 87.0, 3.0, 6.0, 0.0, 3.0, 17.0, 126.0, 126.0, 118.0, 3.0, 4.0, 0.0, 2.0, 1.0, 3.0, 126.0, 5.0, 0.0, 0.0, 21.0, 79.0, 41.0, 6.0, 14.0, 1.0, 5.0, 15.0, 34.0, 58.0, 23.0, 0.0, 0.0, 126.0, 31.0, 13.0, 4.0, 2.0, 2.0, 0.0, 23.0, 126.0, 45.0, 2.0, 1.0, 0.0, 0.0, 1.0, 39.0, 22.0, 26.0, 16.0, 32.0, 9.0, 2.0, 5.0, 16.0, 2.0, 4.0, 100.0, 108.0, 7.0, 0.0, 0.0, 2.0, 48.0, 126.0, 41.0, 26.0, 0.0, 0.0, 0.0, 0.0, 76.0, 107.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0}"
"{35.0, 10.0, 0.0, 0.0, 0.0, 7.0, 34.0, 18.0, 11.0, 3.0, 0.0, 0.0, 0.0, 82.0, 122.0, 22.0, 10.0, 0.0, 3.0, 25.0, 79.0, 40.0, 89.0, 58.0, 3.0, 0.0, 10.0, 68.0, 99.0, 8.0, 5.0, 6.0, 3.0, 2.0, 0.0, 0.0, 17.0, 122.0, 78.0, 8.0, 41.0, 11.0, 2.0, 6.0, 17.0, 122.0, 122.0, 21.0, 122.0, 12.0, 5.0, 6.0, 7.0, 10.0, 56.0, 122.0, 22.0, 2.0, 17.0, 75.0, 105.0, 62.0, 0.0, 14.0, 7.0, 40.0, 24.0, 5.0, 27.0, 122.0, 10.0, 2.0, 24.0, 39.0, 81.0, 45.0, 10.0, 13.0, 3.0, 0.0, 122.0, 25.0, 21.0, 17.0, 8.0, 14.0, 13.0, 122.0, 38.0, 14.0, 20.0, 23.0, 62.0, 88.0, 5.0, 51.0, 30.0, 52.0, 8.0, 3.0, 2.0, 14.0, 5.0, 6.0, 3.0, 10.0, 42.0, 26.0, 4.0, 12.0, 15.0, 8.0, 16.0, 14.0, 28.0, 36.0, 7.0, 6.0, 10.0, 20.0, 32.0, 20.0, 3.0, 4.0, 1.0, 5.0, 8.0, 24.0}"
"{13.0, 13.0, 16.0, 5.0, 1.0, 4.0, 3.0, 4.0, 10.0, 7.0, 4.0, 15.0, 34.0, 44.0, 26.0, 4.0, 89.0, 8.0, 0.0, 19.0, 47.0, 16.0, 12.0, 37.0, 75.0, 9.0, 3.0, 2.0, 2.0, 0.0, 0.0, 24.0, 5.0, 10.0, 28.0, 32.0, 17.0, 25.0, 28.0, 9.0, 1.0, 10.0, 30.0, 52.0, 33.0, 77.0, 40.0, 4.0, 117.0, 35.0, 2.0, 24.0, 21.0, 16.0, 24.0, 44.0, 117.0, 102.0, 13.0, 10.0, 2.0, 0.0, 0.0, 10.0, 36.0, 18.0, 67.0, 64.0, 16.0, 12.0, 16.0, 61.0, 23.0, 11.0, 117.0, 117.0, 8.0, 1.0, 1.0, 22.0, 107.0, 117.0, 114.0, 117.0, 8.0, 0.0, 0.0, 8.0, 110.0, 117.0, 10.0, 12.0, 3.0, 0.0, 0.0, 1.0, 42.0, 0.0, 0.0, 1.0, 13.0, 17.0, 44.0, 86.0, 37.0, 1.0, 19.0, 19.0, 4.0, 22.0, 89.0, 109.0, 20.0, 48.0, 57.0, 18.0, 3.0, 14.0, 49.0, 94.0, 17.0, 24.0, 2.0, 1.0, 1.0, 0.0, 8.0, 37.0}"
"{31.0, 0.0, 0.0, 0.0, 0.0, 11.0, 17.0, 27.0, 22.0, 2.0, 0.0, 0.0, 2.0, 86.0, 52.0, 18.0, 91.0, 27.0, 0.0, 0.0, 0.0, 15.0, 34.0, 82.0, 16.0, 3.0, 0.0, 1.0, 20.0, 22.0, 10.0, 28.0, 70.0, 3.0, 2.0, 2.0, 37.0, 53.0, 15.0, 58.0, 27.0, 5.0, 0.0, 0.0, 38.0, 128.0, 62.0, 18.0, 128.0, 6.0, 0.0, 0.0, 0.0, 106.0, 97.0, 128.0, 60.0, 1.0, 0.0, 1.0, 14.0, 114.0, 42.0, 62.0, 20.0, 4.0, 6.0, 4.0, 25.0, 93.0, 25.0, 17.0, 3.0, 49.0, 6.0, 0.0, 21.0, 128.0, 11.0, 0.0, 128.0, 128.0, 4.0, 0.0, 1.0, 42.0, 5.0, 26.0, 47.0, 21.0, 9.0, 6.0, 13.0, 51.0, 9.0, 21.0, 1.0, 22.0, 9.0, 9.0, 62.0, 33.0, 0.0, 0.0, 1.0, 128.0, 72.0, 5.0, 11.0, 27.0, 0.0, 0.0, 36.0, 101.0, 45.0, 12.0, 6.0, 6.0, 7.0, 9.0, 8.0, 3.0, 1.0, 2.0, 16.0, 38.0, 12.0, 5.0}"
"{120.0, 15.0, 0.0, 0.0, 0.0, 8.0, 8.0, 10.0, 34.0, 0.0, 0.0, 0.0, 1.0, 120.0, 89.0, 25.0, 16.0, 0.0, 0.0, 1.0, 3.0, 39.0, 120.0, 120.0, 26.0, 0.0, 0.0, 11.0, 70.0, 7.0, 19.0, 26.0, 111.0, 16.0, 0.0, 0.0, 0.0, 105.0, 49.0, 16.0, 3.0, 0.0, 0.0, 0.0, 1.0, 120.0, 83.0, 8.0, 120.0, 2.0, 0.0, 0.0, 0.0, 20.0, 55.0, 98.0, 120.0, 1.0, 4.0, 10.0, 55.0, 39.0, 13.0, 46.0, 16.0, 0.0, 0.0, 0.0, 8.0, 87.0, 31.0, 8.0, 8.0, 0.0, 0.0, 0.0, 0.0, 36.0, 34.0, 14.0, 120.0, 89.0, 6.0, 6.0, 11.0, 7.0, 12.0, 31.0, 64.0, 24.0, 4.0, 6.0, 26.0, 54.0, 117.0, 25.0, 0.0, 0.0, 0.0, 2.0, 16.0, 7.0, 1.0, 0.0, 11.0, 0.0, 0.0, 2.0, 2.0, 0.0, 0.0, 2.0, 120.0, 26.0, 2.0, 4.0, 25.0, 28.0, 19.0, 69.0, 3.0, 6.0, 1.0, 4.0, 57.0, 52.0, 24.0, 12.0}"
I have a table with column v REAL[]
. When I tried to create an index I got this error
lantern=# CREATE INDEX sift_base10k_index ON sift_base10k USING
hnsw (v) WITH (
M=2,
ef_construction=16,
ef=10
);
INFO: done init usearch index
ERROR: Wrong number of dimensions: 128 instead of 3 expected
It seems like dims=128
is a required parameter. It would be nice if this could be derived from the data itself.
Properly implement postgres index cost estimation for postgres query planner
Resources
Bad things happen if two postgres extensions define symbols having the same name.
Make sure CI/CD fails if this is detected.
Related to #145 and can probably use the same pipeline
sidenote: you can compare what are the common symbols exported by a pair of object files like this::
comm -12 <(nm -gD ./lantern.so | awk '{print $3}'| sort ) <(nm -gD ~/pgvector/vector.so | awk '{print $3}'| sort)
TLDR, we @arguflow would like to test this out asap to see how it compares to pgvector v0.5.0 but would like to avoid building from source if at all possible.
It should be an extremely quick test considering we have a notebook already working for pgvector - https://github.com/arguflow/pgvector-vs-qdrant-comparison/blob/main/qdrant-to-pgvector.ipynb .
context:
detailed description of the issue in the comment here:
https://github.com/lanterndata/lantern/blob/main/scripts/extern_defined.sh#L5-L34
https://github.com/lanterndata/lantern/blob/main/scripts/check_symbols.sh
usearch_save
function of Usearch saves the index to a file. We need the index structure in memory so we can transfer it to postgres WAL. So, we currently save it to a file in the /tmp
directory, then map it back into memory to transfer it to postgres WAL (see the code).
Instead, we should directly get the index memory buffers for writing it to WAL, without spilling the index to a file first.
This will require modifying the C interface of usearch
Currently maximum index size is limitted by the number of blockmap pages we initially create in the index.
The current implementation assumes all these pages are sequential and live between the index header page and the node pages.
This should be changed to accomodate dynamically growing blockmap area
pgvector does this. Similar to SET LOCAL hnsw.ef_search = 100;
Currently we have only one operator <->
In src/hnsw/options.c
file there's a function HnswGetMetricKind
which will determine current operator class being used with and index and detect right metric kind for usearch
index using the support function pointers.
This is great as we can have only one operator which will support various distance functions, but when used out of index scope for example in SELECT
statement, the operator can not automatically detect which distance function should be used.
We are currently throwing an error when <->
is used out of index lookup. We are doing this using ExecutorStart_hook
the hook implementation is defined in src/hnsw/options.c
void executor_hook
.
This function receives QueryDesc struct, and we are currently doing regexp matching on sourceText
. This approach is not covering cases when the operator will be used with ORDER BY
, but there won't be an index scan.
To fix all the cases we might use plannedstmt which contains the AST of planned statement, where we can find information about the indexes and much more.
After doing this changes theres hnsw_operators_todo.sql
test file. The file should be renamed to hnsw_operators.sql
and included in schedule.txt
file
An updating list of ideas to add to CI/CD (in decreasing order of priority)
/tmp/lanterndb
usage. The tmp is currently also used for a temporary `schedule.txtTODO README:
CREATE EXTENSION lanterndb;
should be run in postgres, but there is no mention of it.Currently usearch code uses the default allocator (malloc-based)
We should use palloc to make sure all memory is manged through postgres
We can do that by adding a field to usearch_init
that will take a custom allocation function and will build a custom default allocator on it in usearch code
Currently, in the code we hardcode MAIN_FORK
as fork number.
But unlogged tables (tables belonging to the current session and not persisted in WAL or elsewhere) live on a separate fork (INIT_FORK I think?)
We need to change the hardcoded MAIN_FORKs in the code and make sure all LanternDB functionality works smoothly on unlogged tables
B tree indices support nested loops within index scans.
postgres=# create table temp_table(id serial primary key, v integer, b integer);
CREATE TABLE
postgres=# insert into temp_table(v) values (1), (2), (3);
INSERT 0 3
postgres=# create index on temp_table(v);
CREATE INDEX
postgres=# set enable_seqscan = off;
SET
postgres=# create table temp_table2(id serial primary key);
CREATE TABLE
postgres=# explain select 1 from temp_table join temp_table2 using (id) order by v;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (cost=0.29..28.70 rows=3 width=8)
-> Index Scan using temp_table_v_idx on temp_table (cost=0.13..12.18 rows=3 width=8)
-> Index Only Scan using temp_table2_pkey on temp_table2 (cost=0.15..5.51 rows=1 width=4)
Index Cond: (id = temp_table.id)
We should do the same.
Currently some of our tests are using vector
type which is creating a dependency over pgvector
.
We should convert the tests to use native postgres real[]
type by default and separate tests using vector
type in test files named $testname_vector.sql
.
By doing this we will be able to detect if pgvector
extension exists in the system and include vector tests in our test schedule as well. This checks can be done in run_all_tests.sh
file.
For every test being run using run_all_tests.sh
, data files are being downloaded. This can be sped up by checking if the file is downloaded and checking its' hash, before proceeding with the rest of the code.
This saves a few seconds every run.
CI should not check out the main branch. Instead, it should check out the commit that the repo is pointing to.
Potentially relevant source:
According to Narek, it's possible to SSH into CI/CD so maybe we can do that and get some info out.
Likely, the github action configuration needs to be updated.
Implementation for this exists at #114
Needs to rebased and properly benchmarked.
l2sq_dist works on pgvector's vector type.
e.g., SELECT l2sq_dist('[1,1]'::vector, '[0,0]'::vector);
and SELECT vector_l2sq_dist('[1,1]'::vector, '[0,1]'::vector);
Currently our index returns at most 50 elements (see here)
So SELECT .. LIMIT 50
and SELECT .. LIMIT 100
will both return 50 results.
The correct behaviour is to return the number of results the query asked for.
In the index scanning routine we do not know how many results the user has asked for. So, we should initially retrieve a fixed number of results from the index (e.g. 50, this value should be configured according to init_k
) and then retrieve more (2*init_k
, 4*init_k
...) until scanning is over.
This should format lantern.sql
and regression test SQL files.
Sqlfluff is one good option to use for this, but have not looked too much into it.
Do not allocate a large flat buffer to cache LanternDB hnsw index blockmap pages.
Use a hashmap instead.
We want to make sure our regression tests cover different scenarios in which a feature can be used.
These different scenarios do not always have SQL-visible output. For these cases we currently rely on adding elog(DEBUG5,..)
statements and make sure those elog
s are only enabled for the test under question.
See PRs #78 and #103 For more details on how this kind of testing works. In #78, for example, with this kinds of elogs we make sure that for different values of init_k
usearch_search
is triggered expected number of times.
This has downsides
elog
s have performance impact even if their output is not logged anywhere.It would be better to add a special variable on an internal schema `(e.g. _lanterndb_internal.state_for_tests) that can be used to comminicate data to our regression tests. In tests where we want to enable this kind of communication we would set the variable and print it from the regression test
This will make sure that when we use one distance metric but we should in-fact be using another one (e.g. because the user supplied a custom distance metric), then a regression test failure gets triggered.
We should have a methodical way of catching all the places where the wrong-sized variable is used (e.g. int
in place of int64
)
Inspiration: #186
Note: this is quite an involved issue. It will require understanding of the index storage format and structures on postgres blocks.
Relevant links:
docs: https://www.postgresql.org/docs/current/queries-limit.html
context: https://news.ycombinator.com/reply?id=37501161&goto=item%3Fid%3D37499375%2337501161
Consider a scenario where I retrieve and render X
elements from the table for my first page of database results and allow the user to page into N
other pages of X
results each. Assume there is P% chance the user will naviate to the page i+1
given they are on page i
.
What is the best strategy from the app developers perspective here? (options below)
LIMIT...OFFSET
for each next pageM
pages and only access the database on the first pageIt would be great if benchmarking gave some insight into how to navigate this situation.
For scan queries, LanternDB currently interfaces with Usearch in one of two ways:
node
requested by Usearch into a temporary memory buffer and returns its pointer to Usearchnode
and returns its pointer to UsearchIn both cases LanternDB has to keep track of all the currently possibly alive objects. Now, in both cases we use a fixed sized array for this tracking. This requires a larger than necessary allocation in some cases and fails returning the requested results in others, when the fixed buffer is filled.
A better approach is to use a linked list for both of these use-cases. Grow it as necessary and traverse-cleanup upon exit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.