Comments (9)
No, it means the planner will use (and show) a different plan when there's more data.不,这意味着当有更多数据时,计划器将使用(并显示)不同的计划。
Hi~ I have now increased the amount of data to 8000 entries and the index is now working properly. Thanks again for your answer!
-- analyse result
Limit (cost=108.60..108.72 rows=2 width=16) (actual time=4.346..4.375 rows=2 loops=1)
-> Index Scan using faces_tsv_content_hnsw_idx on faces (cost=108.60..628.14 rows=8923 width=16) (actual time=4.344..4.372 rows=2 loops=1)
Order By: (tsv_content <=> '[-0.121626005... ,0.015510366]'::vector)"
Planning Time: 0.357 ms
Execution Time: 4.467 ms
from pgvector.
Hi @lin-goo, it looks like you only have ~500 rows, so a table scan will likely be around the same speed. See the docs for how to encourage the planner to use the index.
from pgvector.
I recreated the table with smaller dimensions, this time resulting in the use of indexes, with the following information
-- create table sql
CREATE TABLE tf (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL DEFAULT 0,
tsv_content vector(3) UNIQUE NOT NULL,
created_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
deleted_time BIGINT DEFAULT 0
) ;
CREATE INDEX tf_tsv_content_hnsw_idx ON tf USING hnsw (tsv_content vector_cosine_ops) WITH (m = 16, ef_construction = 64);
-- query sql
EXPLAIN ANALYSE SELECT id FROM tf ORDER BY
tsv_content <=> '[1, 2, 3]'
LIMIT 2;
-- analyse result
Limit (cost=4.48..4.60 rows=2 width=16) (actual time=0.045..0.047 rows=2 loops=1)
-> Index Scan using tf_tsv_content_hnsw_idx on tf (cost=4.48..54.60 rows=810 width=16) (actual time=0.043..0.044 rows=2 loops=1)
" Order By: (tsv_content <=> '[0,0,0]'::vector)"
Planning Time: 0.077 ms
Execution Time: 0.070 ms
from pgvector.
Does the use of an index correlate with the size of the vector dimension? @ankane
from pgvector.
Hi @lin-goo, it looks like you only have ~500 rows, so a table scan will likely be around the same speed. See the docs for how to encourage the planner to use the index.嗨,看起来你只有500行,所以表扫描的速度可能是一样的。请参阅文档了解如何鼓励计划者使用索引。
The data is only 500 rows because it is in the development phase and does not store more data, the amount of data in the production environment will be a lot of
from pgvector.
The difference likely has to do with TOAST (vectors over 498 dimensions / 2 KB are stored out-of-line by default, and this isn't included in the table scan cost estimate). When there are more rows, it should use the index.
from pgvector.
Do you mean that even though it doesn't show the use of indexes in the analysis results, it is used in the actual query?
from pgvector.
No, it means the planner will use (and show) a different plan when there's more data.
from pgvector.
I'll try increasing the amount of data then and see if the index is used, thank you very much for your reply!
from pgvector.
Related Issues (20)
- What are the impacts of dimension for sparsevec? HOT 1
- Duplicate error when creating a vector index using HNSW HOT 6
- tuning the tmpCtx to improve HNSW build performance HOT 5
- Index vector_ip_ops does not work for halfvec HOT 1
- Can the "LIMIT" statement be included as one of arguments when doing scan operation? HOT 1
- SQL Error [XX000]: FATAL: failed to open bitcode file "/usr/local/lib/postgresql/bitcode/vector/src/vector.bc": No such file or directory HOT 4
- how to list existing databases in postgresql using python HOT 2
- HNSW Indexing and Filtering HOT 2
- A question about building index in background. HOT 1
- Installation instructions unclear HOT 1
- Large vector data type will cause performance decline? HOT 1
- A question regard table_open() in background worker when building index HOT 3
- jVector Implementation
- Type Error when working with Langchain (Missing Positional Argument: evalue) HOT 1
- pgvector still use row-based storage instead of columnar storage ? HOT 1
- Can't get the query planner to use HNSW index HOT 3
- 【search failed】 2000w、768dim, data search failed HOT 1
- ERROR: index row size 6160 exceeds btree version 4 maximum 2704 for index HOT 1
- Make difficulties HOT 1
- Table Insert Performance with HNSW Index HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.