I have the following table <div class="snippet-clipboard-content notranslate posit

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

The explain with buffers gives the following: <div class="snippet-clipboard-conten

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

question how to interpret query performance in gcp could sql about pgvector HOT 5 CLOSED

david-gang commented on June 2, 2024

question how to interpret query performance in gcp could sql

from pgvector.

Comments (5)

jkatz commented on June 2, 2024 2

What you're describing is independent of where you run PostgreSQL.

The switch from EXTENDED to PLAIN storage does affect how PostgreSQL costs the serial scan and plans parallel workers. Specifically, PostgreSQL only accounts for the value of relpages in the table, not the TOAST table (where vectors whose storage is above the threshold [typically the 2KB default] are stored out-of-line in the TOAST table). With PLAIN storage, given the vectors are stored in the table pages, PostgreSQL sees a larger value of relpages and will plan for more workers. While this post describes this in relation to creating workers for parallel index builds, this also applies to sequential scans.

For the rest of your question, it'd help to understand more about your experiment, e.g. how frequently you ran your queries, the output of EXPLAIN ANALYZE to see if that changed the execution plan at all.

from pgvector.

david-gang commented on June 2, 2024

Thanks @jkatz for your help !

It is a production system in the early stages with a request rate of about 1 query per second.
Every day we load about 50_000 vectors more and examine it's performance. The graph above shows a time span of about one week, so between the drop of the time to the right we loaded about 250_000 rows more

The explain analyze is :

Limit  (cost=222555.47..222567.14 rows=100 width=24) (actual time=530.305..530.491 rows=100 loops=1)
   ->  Gather Merge  (cost=222555.47..283149.37 rows=519340 width=24) (actual time=530.303..530.473 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=221555.45..222204.63 rows=259670 width=24) (actual time=519.330..519.349 rows=78 loops=3)
               Sort Key: (abs((embedding <#> '[...]'::vector))) DESC
               Sort Method: top-N heapsort  Memory: 37kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 38kB
               ->  Parallel Seq Scan on vectors v  (cost=0.00..211631.05 rows=259670 width=24) (actual time=0.008..449.042 rows=207736 loops=3)
 Planning Time: 0.345 ms
 Execution Time: 530.576 ms

and here is the explain analyze after i run the command:

SET max_parallel_workers_per_gather = 4;

Limit  (cost=217152.52..217164.49 rows=100 width=24) (actual time=545.110..547.536 rows=100 loops=1)
   ->  Gather Merge  (cost=217152.52..291815.21 rows=623568 width=24) (actual time=545.108..547.524 rows=100 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         ->  Sort  (cost=216152.46..216542.19 rows=155892 width=24) (actual time=519.393..519.407 rows=85 loops=5)
               Sort Key: (abs((embedding <#> '[...]'::vector))) DESC
               Sort Method: top-N heapsort  Memory: 38kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 37kB
               Worker 2:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 3:  Sort Method: top-N heapsort  Memory: 37kB
               ->  Parallel Seq Scan on vectors v  (cost=0.00..210194.38 rows=155892 width=24) (actual time=0.009..445.478 rows=124713 loops=5)
 Planning Time: 0.095 ms
 Execution Time: 547.578 ms

As you can see, setting the max_parallel_workers_per_gather to a higher value did not help. I also tried the same on my mac M1 which has lots of cpus.

from pgvector.

jkatz commented on June 2, 2024

I should have added EXPLAIN (ANALYZE,BUFFERS) to see how much you're reading from memory vs. going to disk, which could impact this, as well as your total data set size. From my local tests using a data set of 1_000_000 768-dim vectors (using EXTENDED storage but I set min_parallel_table_scan_size to 1 to maximize parallel workers), my execution time did continue to go down until I reached 6 parallel workers, but it wasn't as dramatic as when I had to retrieve pages from disk.

from pgvector.

david-gang commented on June 2, 2024

The explain with buffers gives the following:

Limit  (cost=230157.22..230168.89 rows=100 width=24) (actual time=648.258..648.449 rows=100 loops=1)
   Buffers: shared hit=215569
   ->  Gather Merge  (cost=230157.22..293014.38 rows=538738 width=24) (actual time=648.256..648.427 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=215569
         ->  Sort  (cost=229157.20..229830.62 rows=269369 width=24) (actual time=633.809..633.823 rows=81 loops=3)
               Sort Key: ((embedding <#> '[...]'::vector))
               Sort Method: top-N heapsort  Memory: 38kB
               Buffers: shared hit=215569
               Worker 0:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 37kB
               ->  Parallel Seq Scan on vectors v  (cost=0.00..218862.11 rows=269369 width=24) (actual time=0.017..542.565 rows=215495 loops=3)
                     Buffers: shared hit=215495
 Planning:
   Buffers: shared hit=100
 Planning Time: 0.501 ms
 Execution Time: 648.568 ms
(18 rows)

after setting

SET max_parallel_workers_per_gather = 4;

 Limit  (cost=224692.36..224704.33 rows=100 width=24) (actual time=612.711..613.165 rows=100 loops=1)
   Buffers: shared hit=215643
   ->  Gather Merge  (cost=224692.36..302098.89 rows=646484 width=24) (actual time=612.709..613.147 rows=100 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=215643
         ->  Sort  (cost=223692.30..224096.36 rows=161621 width=24) (actual time=578.866..578.881 rows=86 loops=5)
               Sort Key: ((embedding <#> '[,,,]'::vector))
               Sort Method: top-N heapsort  Memory: 38kB
               Buffers: shared hit=215643
               Worker 0:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 2:  Sort Method: top-N heapsort  Memory: 38kB
               Worker 3:  Sort Method: top-N heapsort  Memory: 38kB
               ->  Parallel Seq Scan on vectors v  (cost=0.00..217515.26 rows=161621 width=24) (actual time=0.008..493.254 rows=129297 loops=5)
                     Buffers: shared hit=215495
 Planning Time: 0.124 ms
 Execution Time: 613.212 ms
(18 rows)

as we can see the problem is not disk reads as u have a shared buffer hit.
Does this explain buffers make sense? It looks like my query is RAM, so using 4 workers should make i much faster. Just want to ensure that there is no sort of lock which prevents the workers to run in parallel or some similar issue.

from pgvector.

ankane commented on June 2, 2024

Hi @david-gang, the explain output looks consistent with the data being in memory. There are coordination costs with parallel workers, so at a certain point, it won't help performance (which isn't specific to pgvector).

from pgvector.

question how to interpret query performance in gcp could sql about pgvector HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent