Coder Social home page Coder Social logo

Comments (3)

jkatz avatar jkatz commented on September 25, 2024

I'm unsure of how you "manually committed" every row when ingesting the data, but I'd make a few recommendations:

  1. If you can, use COPY
  2. If you're using COPY, use BINARY mode to avoid the translation from binary => text => binary, which adds significant overhead. See here for an example:

erikbern/ann-benchmarks#488

  1. If you need to use INSERT and the rows are bulked together, use a bulk INSERY statement (multiple rows in the VALUES section).

  2. You can also INSERT (and COPY) concurrently, especially with a target machine with those resources. Here is a test on an older version that shows the benefits of doing that:

https://aws.amazon.com/blogs/database/accelerate-hnsw-indexing-and-searching-with-pgvector-on-amazon-aurora-postgresql-compatible-edition-and-amazon-rds-for-postgresql/

from pgvector.

ankane avatar ankane commented on September 25, 2024

Hi @actonxidian99, you'll also want to create any indexes after loading the data for best performance (docs).

from pgvector.

actonxidian99 avatar actonxidian99 commented on September 25, 2024

Thank you all @jkatz @ankane for your valuable suggestions for improvement! First, I used the method of bulk insertion of binary data with COPY, which resulted in a slight performance improvement. Later, I found that creating data first and then building indexes performed 10 to 20 times faster than inserting data first and then building indexes.

Is this performance normal? Inserting data for 1 million rows first and then building indexes only took 2 minutes. However, building indexes first and then inserting data (COPY) took as long as 70 to 80 minutes.

from pgvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.