I am Acton, a database testing engineer. Recently, I have been conducting benchmark te

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thank you all <a class="user-mention notranslate" data-hovercard-type="user" data-hove

Table Insert Performance with HNSW Index about pgvector HOT 3 CLOSED

actonxidian99 commented on September 25, 2024

Table Insert Performance with HNSW Index

from pgvector.

Comments (3)

jkatz commented on September 25, 2024

I'm unsure of how you "manually committed" every row when ingesting the data, but I'd make a few recommendations:

If you can, use COPY
If you're using COPY, use BINARY mode to avoid the translation from binary => text => binary, which adds significant overhead. See here for an example:

erikbern/ann-benchmarks#488

If you need to use INSERT and the rows are bulked together, use a bulk INSERY statement (multiple rows in the VALUES section).
You can also INSERT (and COPY) concurrently, especially with a target machine with those resources. Here is a test on an older version that shows the benefits of doing that:

https://aws.amazon.com/blogs/database/accelerate-hnsw-indexing-and-searching-with-pgvector-on-amazon-aurora-postgresql-compatible-edition-and-amazon-rds-for-postgresql/

from pgvector.

ankane commented on September 25, 2024

Hi @actonxidian99, you'll also want to create any indexes after loading the data for best performance (docs).

from pgvector.

actonxidian99 commented on September 25, 2024

Thank you all @jkatz @ankane for your valuable suggestions for improvement! First, I used the method of bulk insertion of binary data with COPY, which resulted in a slight performance improvement. Later, I found that creating data first and then building indexes performed 10 to 20 times faster than inserting data first and then building indexes.

Is this performance normal? Inserting data for 1 million rows first and then building indexes only took 2 minutes. However, building indexes first and then inserting data (COPY) took as long as 70 to 80 minutes.

from pgvector.

Recommend Projects

Table Insert Performance with HNSW Index about pgvector HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent