Comments (7)
FYI. i used this command to see disk usage:
SELECT pg_relation_filepath(oid), relpages*8 as kb, relname FROM pg_class order by kb desc;
from pgvector.
Hey @ncoder, use pg_table_size
to get the table size.
from pgvector.
fwiw, test script:
CREATE TABLE items (embedding vector(1500));
INSERT INTO items (embedding)
SELECT (
SELECT array_agg(i) FROM generate_series(1, 1500) i
) FROM generate_series(1, 100000) n;
SELECT pg_size_pretty(pg_table_size('items')) AS table_size;
SET maintenance_work_mem = '500MB';
CREATE INDEX my_index ON items USING ivfflat (embedding) WITH (lists = 1000);
SELECT pg_size_pretty(pg_total_relation_size('my_index')) AS index_size;
and output:
CREATE TABLE
INSERT 0 100000
table_size
------------
795 MB
(1 row)
SET
CREATE INDEX
index_size
------------
797 MB
(1 row)
from pgvector.
db=# SELECT pg_size_pretty(pg_table_size('qa')) as pg_table_size, pg_size_pretty(pg_total_relation_size('qa')) as pg_total_relation_size;
pg_table_size | pg_total_relation_size
---------------+------------------------
25 GB | 48 GB
from pgvector.
wait, i made a second index to test on this one... hold up.
(Good, now Edited...)
from pgvector.
Using your script, I replicate your results exactly.
corrected my query to be equivalent on my data:
SELECT pg_size_pretty(pg_table_size('qa')) as pg_table_size, pg_size_pretty(pg_total_relation_size('qa_embedding_idx')) as pg_total_relation_size;
pg_table_size | pg_total_relation_size
---------------+------------------------
25 GB | 24 GB
(1 row)
from pgvector.
So I accept that nothing is wrong, except my understanding of this page:
https://www.postgresql.org/docs/current/disk-usage.html
given I have 3 million rows, vectors alone should take 18GB, so pg_table_size() is in the right ballpark
Thanks for the pointers, @ankane.
from pgvector.
Related Issues (20)
- Expected output does not match test HOT 1
- Custom classification algorithm? HOT 3
- Different results when doing a query with or without and HNSW index HOT 5
- Issue installing on mac HOT 3
- Is there some doc to illustrate the storage layout in index file for HNSW ? HOT 2
- Warning: unknown attribute target_clones HOT 4
- A quick question about length of neighbors for one Element HOT 1
- Fail to Use index query when add order by desc HOT 6
- HNSW index cannot recall any data HOT 6
- Understanding HNSW and IVFFLAT index creation and storage HOT 3
- Make function multiversioning configurable HOT 1
- Lack of result when selecting data without limit
- Support for multiquery? HOT 3
- Cannot install pgvector on windows no matter what is tried. HOT 1
- pgvector Query Time Slower Than ChromaDB and pgvector Not Building HNSW Index on Small Number of Rows HOT 7
- Supported datatypes HOT 4
- vector.so: undefined symbol: _xgetbv HOT 5
- Adding data after building hnsw index is much slower HOT 4
- 【ef_search】set hnsw.ef_search = 1001 failed HOT 2
- Error building Docker image `make: /usr/bin/clang-11: No such file or directory` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.