Comments (3)
I'm unsure of how you "manually committed" every row when ingesting the data, but I'd make a few recommendations:
- If you can, use COPY
- If you're using COPY, use BINARY mode to avoid the translation from binary => text => binary, which adds significant overhead. See here for an example:
-
If you need to use INSERT and the rows are bulked together, use a bulk INSERY statement (multiple rows in the VALUES section).
-
You can also INSERT (and COPY) concurrently, especially with a target machine with those resources. Here is a test on an older version that shows the benefits of doing that:
from pgvector.
Hi @actonxidian99, you'll also want to create any indexes after loading the data for best performance (docs).
from pgvector.
Thank you all @jkatz @ankane for your valuable suggestions for improvement! First, I used the method of bulk insertion of binary data with COPY, which resulted in a slight performance improvement. Later, I found that creating data first and then building indexes performed 10 to 20 times faster than inserting data first and then building indexes.
Is this performance normal? Inserting data for 1 million rows first and then building indexes only took 2 minutes. However, building indexes first and then inserting data (COPY) took as long as 70 to 80 minutes.
from pgvector.
Related Issues (20)
- how to delete or update a vector node? HOT 1
- Whether we should lock element when UpdateNeighborsInMemory ? HOT 8
- Improving Query Performance in LangChain with PGVector HOT 6
- Update neighbors tuple atomically HOT 1
- Question about using existing PostgreSQL database HOT 1
- Late interaction embedding support HOT 4
- Cant use DISTINCT ON HOT 4
- Search with IVF and HNSW indexes HOT 3
- OOM errors - during insertion and HNSW l1 INDEX builds on version "0.7.2" HOT 6
- Does IVFFlat index changes when deleting some vectors HOT 1
- What is the best way to find nearest-neighbor for N vectors (batch query)? HOT 2
- Changing User and Group ID from 999 to Custom ID. HOT 1
- Restoring a backup suddenly breaks with CREATE INDEX ... USING hnsw HOT 6
- Custom scoring with HNSW index HOT 3
- Support index based on DiskANN HOT 2
- Unable to install pg_vector on Mac for PostgresSQL15 HOT 1
- mac到make报错 HOT 1
- Table Size Estimate Doesn't Match Actual Size HOT 1
- Docker release using CI? HOT 1
- Thoughts on StreamingDiskANN? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.