Coder Social home page Coder Social logo

Comments (7)

ncoder avatar ncoder commented on May 21, 2024

FYI. i used this command to see disk usage:

SELECT pg_relation_filepath(oid), relpages*8 as kb, relname FROM pg_class order by kb desc;

from pgvector.

ankane avatar ankane commented on May 21, 2024

Hey @ncoder, use pg_table_size to get the table size.

from pgvector.

ankane avatar ankane commented on May 21, 2024

fwiw, test script:

CREATE TABLE items (embedding vector(1500));
INSERT INTO items (embedding)
    SELECT (
        SELECT array_agg(i) FROM generate_series(1, 1500) i
    ) FROM generate_series(1, 100000) n;
SELECT pg_size_pretty(pg_table_size('items')) AS table_size;
SET maintenance_work_mem = '500MB';
CREATE INDEX my_index ON items USING ivfflat (embedding) WITH (lists = 1000);
SELECT pg_size_pretty(pg_total_relation_size('my_index')) AS index_size;

and output:

CREATE TABLE
INSERT 0 100000
 table_size 
------------
 795 MB
(1 row)

SET
CREATE INDEX
 index_size 
------------
 797 MB
(1 row)

from pgvector.

ncoder avatar ncoder commented on May 21, 2024
db=# SELECT pg_size_pretty(pg_table_size('qa')) as pg_table_size, pg_size_pretty(pg_total_relation_size('qa')) as pg_total_relation_size;
 pg_table_size | pg_total_relation_size
---------------+------------------------
 25 GB         | 48 GB

from pgvector.

ncoder avatar ncoder commented on May 21, 2024

wait, i made a second index to test on this one... hold up.

(Good, now Edited...)

from pgvector.

ncoder avatar ncoder commented on May 21, 2024

Using your script, I replicate your results exactly.

corrected my query to be equivalent on my data:

SELECT pg_size_pretty(pg_table_size('qa')) as pg_table_size, pg_size_pretty(pg_total_relation_size('qa_embedding_idx')) as pg_total_relation_size;
 pg_table_size | pg_total_relation_size
---------------+------------------------
 25 GB         | 24 GB
(1 row)

from pgvector.

ncoder avatar ncoder commented on May 21, 2024

So I accept that nothing is wrong, except my understanding of this page:
https://www.postgresql.org/docs/current/disk-usage.html

given I have 3 million rows, vectors alone should take 18GB, so pg_table_size() is in the right ballpark

Thanks for the pointers, @ankane.

from pgvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.