Help me!!! I created the index using hnsw, but I can't use the index for even the simp

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

The difference likely has to do with <a href="https://www.postgresql.org/docs/current/

pg-vector not using hnsw indexes about pgvector HOT 9 CLOSED

lin-goo commented on June 7, 2024 1

pg-vector not using hnsw indexes

from pgvector.

Comments (9)

lin-goo commented on June 7, 2024 1

No, it means the planner will use (and show) a different plan when there's more data.不，这意味着当有更多数据时，计划器将使用（并显示）不同的计划。

Hi~ I have now increased the amount of data to 8000 entries and the index is now working properly. Thanks again for your answer!

-- analyse result
Limit  (cost=108.60..108.72 rows=2 width=16) (actual time=4.346..4.375 rows=2 loops=1)
  ->  Index Scan using faces_tsv_content_hnsw_idx on faces  (cost=108.60..628.14 rows=8923 width=16) (actual time=4.344..4.372 rows=2 loops=1)
        Order By: (tsv_content <=> '[-0.121626005... ,0.015510366]'::vector)"
Planning Time: 0.357 ms
Execution Time: 4.467 ms

from pgvector.

ankane commented on June 7, 2024

Hi @lin-goo, it looks like you only have ~500 rows, so a table scan will likely be around the same speed. See the docs for how to encourage the planner to use the index.

from pgvector.

lin-goo commented on June 7, 2024

I recreated the table with smaller dimensions, this time resulting in the use of indexes, with the following information

-- create table sql
CREATE TABLE tf (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL DEFAULT 0,
    tsv_content vector(3) UNIQUE NOT NULL,
    created_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_time BIGINT DEFAULT 0
) ;
CREATE INDEX tf_tsv_content_hnsw_idx ON tf USING hnsw (tsv_content vector_cosine_ops) WITH (m = 16, ef_construction = 64);


-- query sql
EXPLAIN ANALYSE SELECT id FROM tf ORDER BY
    tsv_content <=> '[1, 2, 3]'
LIMIT 2;


-- analyse result
Limit  (cost=4.48..4.60 rows=2 width=16) (actual time=0.045..0.047 rows=2 loops=1)
  ->  Index Scan using tf_tsv_content_hnsw_idx on tf  (cost=4.48..54.60 rows=810 width=16) (actual time=0.043..0.044 rows=2 loops=1)
"        Order By: (tsv_content <=> '[0,0,0]'::vector)"
Planning Time: 0.077 ms
Execution Time: 0.070 ms

from pgvector.

lin-goo commented on June 7, 2024

Does the use of an index correlate with the size of the vector dimension? @ankane

from pgvector.

lin-goo commented on June 7, 2024

Hi @lin-goo, it looks like you only have ~500 rows, so a table scan will likely be around the same speed. See the docs for how to encourage the planner to use the index.嗨，看起来你只有500行，所以表扫描的速度可能是一样的。请参阅文档了解如何鼓励计划者使用索引。

The data is only 500 rows because it is in the development phase and does not store more data, the amount of data in the production environment will be a lot of

from pgvector.

ankane commented on June 7, 2024

The difference likely has to do with TOAST (vectors over 498 dimensions / 2 KB are stored out-of-line by default, and this isn't included in the table scan cost estimate). When there are more rows, it should use the index.

from pgvector.

lin-goo commented on June 7, 2024

Do you mean that even though it doesn't show the use of indexes in the analysis results, it is used in the actual query?

from pgvector.

ankane commented on June 7, 2024

No, it means the planner will use (and show) a different plan when there's more data.

from pgvector.

lin-goo commented on June 7, 2024

I'll try increasing the amount of data then and see if the index is used, thank you very much for your reply!

from pgvector.

pg-vector not using hnsw indexes about pgvector HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent