stac-utils / pgstac Goto Github PK

View Code? Open in Web Editor NEW

128.0 15.0 28.0 9.89 MB

Schema, functions and a python library for storing and accessing STAC collections and items in PostgreSQL

License: MIT License

Dockerfile 0.06% Shell 0.56% PLpgSQL 97.96% Python 1.37% Rust 0.04%

postgis stac sql

pgstac's People

Stargazers

Watchers

pgstac's Issues

add bulk insert function

In order to add a bulk insert endpoint it would be great to have a create_items function

CREATE OR REPLACE FUNCTION create_items(datas jsonb) RETURNS VOID AS $$
    INSERT INTO items (content) SELECT * FROM jsonb_array_elements(datas);
    SELECT backfill_partitions();
$$ LANGUAGE SQL SET SEARCH_PATH TO pgstac,public;

Collections search does not work with pgstac >= 0.3

first mentioned in #39 (comment) the collections search doesn't seems to work anymore.

I think I found out what is going on:

# using `collection**s**`
$ postgis=# SELECT * from search('{"collections": ["facebook-population-density"]}');
NOTICE:  SEARCH CQL 1: {"collections": ["facebook-population-density"]}
NOTICE:  SEARCH CQL 2: {"filter": {}, "collections": ["facebook-population-density"]}
NOTICE:  newprops: []
NOTICE:  SEARCH CQL Final: {"filter": {}, "collections": ["facebook-population-density"]}
NOTICE:  j: {}, op: <NULL>, jtype: object
NOTICE:  parsing object
NOTICE:  Getting stats for  TRUE 
NOTICE:  Estimated Count: 253
NOTICE:  Actual Count: 253
NOTICE:  SEARCH: (a8cb91b69e25c49753b759a3f73e9462,"{""collections"": [""facebook-population-density""]}"," TRUE ","datetime DESC, id DESC","2021-08-16 10:19:03.68915+00",1,"2021-08-16 10:19:03.68915+00",253,253)

# using `collection` without **s**
$ SELECT * from search('{"collection": ["facebook-population-density"]}');
NOTICE:  SEARCH: (ef6ca26f3bf560dfc90e2227abfa5735,"{""collection"": [""facebook-population-density""]}","(((collection_id = ANY ( '{facebook-population-density}'::text[] ))))","datetime DESC, id DESC","2021-08-16 10:20:43.002928+00",2,"2021-08-16 10:18:14.842254+00",91,90)
NOTICE:  FULL QUERY (((collection_id = ANY ( '{facebook-population-density}'::text[] )))) 00:00:00.011999

it seems that when we translate the query to CQL, the code expects collection while the specs says collections

pgstac/sql/004_search.sql

Line 28 in be27663

ELSIF path_elements[1] = 'collection' THEN

pgstac/sql/004_search.sql

Lines 149 to 158 in be27663

    
           IF j ? 'collection' THEN 
        
               newprop := jsonb_build_object( 
        
                   'in', 
        
                   jsonb_build_array( 
        
                       '{"property":"collection"}'::jsonb, 
        
                       j->'collection' 
        
                   ) 
        
               ); 
        
               newprops := jsonb_insert(newprops, '{1}', newprop); 
        
           END IF;

pgstac/sql/004_search.sql

Line 191 in be27663

) - '{id,collection,datetime,bbox,intersects}'::text[];

In previous (~0.2.*) version here is what we had:

pgstac/sql/004_search.sql

Lines 454 to 456 in c9f0f64

    
           IF _search ? 'collections' THEN 
        
               qa := array_append(qa, in_array_q('collection_id', _search->'collections')); 
        
           END IF;

cc @bitner @lossyrob

load_ndjson with loadopt.upsert throws UniqueViolationError

pg-stac version - 0.4.3
pypgstac version - 0.4.5

When using load_ndjson with the method loadopt.upsert https://github.com/NASA-IMPACT/cmr-pgstac-loader/blob/main/lambdas/pgstac_loader/handler.py#L40 records with duplicate ids throw UniqueViolationError. An example full stack trace

[ERROR] UniqueViolationError: duplicate key value violates unique constraint "items_p2022w11_id_pk"
DETAIL:  Key (id)=(HLS.S30.T49TCF.2022073T032539.v2.0) already exists.
Traceback (most recent call last):
  File "/var/task/aws_lambda_powertools/middleware_factory/factory.py", line 133, in wrapper
    response = middleware()
  File "/var/task/aws_lambda_powertools/utilities/data_classes/event_source.py", line 39, in event_source
    return handler(data_class(event), context)
  File "/var/task/handler.py", line 36, in handler
    asyncio.run(
  File "/var/lang/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/var/task/pypgstac/load.py", line 246, in load_ndjson
    await load_iterator(f, table, conn, method)
  File "/var/task/pypgstac/load.py", line 234, in load_iterator
    await copy_upsert(iter, table, conn)
  File "/var/task/pypgstac/load.py", line 212, in copy_upsert
    await conn.copy_to_table(
  File "/var/task/asyncpg/connection.py", line 897, in copy_to_table
    return await self._copy_in(copy_stmt, source, timeout)
  File "/var/task/asyncpg/connection.py", line 1101, in _copy_in
    return await self._protocol.copy_in(
  File "asyncpg/protocol/protocol.pyx", line 529, in copy_in
    status_msg = await waiter

I can confirm that loadopt.upsert appears to work in pypgstac tests by modifying https://github.com/stac-utils/pgstac/blob/main/test/testdata/items.ndjson to include an item with a duplicate id and running the pypgstac LoadTest.

Inconsistently I am also seeing the following error immediately following

[ERROR] CardinalityViolationError: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Traceback (most recent call last):
  File "/var/task/aws_lambda_powertools/middleware_factory/factory.py", line 133, in wrapper
    response = middleware()
  File "/var/task/aws_lambda_powertools/utilities/data_classes/event_source.py", line 39, in event_source
    return handler(data_class(event), context)
  File "/var/task/handler.py", line 36, in handler
    asyncio.run(
  File "/var/lang/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/var/lang/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/var/task/pypgstac/load.py", line 246, in load_ndjson
    await load_iterator(f, table, conn, method)
  File "/var/task/pypgstac/load.py", line 234, in load_iterator
    await copy_upsert(iter, table, conn)
  File "/var/task/pypgstac/load.py", line 212, in copy_upsert
    await conn.copy_to_table(
  File "/var/task/asyncpg/connection.py", line 897, in copy_to_table
    return await self._copy_in(copy_stmt, source, timeout)
  File "/var/task/asyncpg/connection.py", line 1101, in _copy_in
    return await self._protocol.copy_in(
  File "asyncpg/protocol/protocol.pyx", line 529, in copy_in
    status_msg = await waiter

Issue running migration against DB with sslmode=require

Running migrations against a --dsn that has ?sslmode=require (or with PGSSLMODE=require), I get:

ERROR:asyncio:Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x7f4882e35e20>
transport: <_SelectorSocketTransport closing fd=8>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/asyncio/selector_events.py", line 910, in write
    n = self._sock.send(data)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/asyncio/sslproto.py", line 685, in _process_write_backlog
    self._transport.write(chunk)
  File "/opt/conda/lib/python3.8/asyncio/selector_events.py", line 916, in write
    self._fatal_error(exc, 'Fatal write error on socket transport')
  File "/opt/conda/lib/python3.8/asyncio/selector_events.py", line 711, in _fatal_error
    self._force_close(exc)
  File "/opt/conda/lib/python3.8/asyncio/selector_events.py", line 723, in _force_close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 719, in call_soon
    self._check_closed()
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 508, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

I'm able to connect to the database using the same dsn using psql. Any ideas?

Push to dockerhub during CI publish workflow

As follow up work on #18, we need to add the ability to push docker images to the cipublish script. Pypi publishing is already handled by cipublish, and the addition of publishing docker images will take care of most of the functionality in the Makefile, which should be removed once this and any other functionality is covered by scripts/

allow additional `metadata` to be saved for searches row

pgstac/sql/004_search.sql

Lines 645 to 655 in b210b68

    
           CREATE TABLE IF NOT EXISTS searches( 
        
               hash text GENERATED ALWAYS AS (search_hash(search)) STORED PRIMARY KEY, 
        
               search jsonb NOT NULL, 
        
               _where text, 
        
               orderby text, 
        
               lastused timestamptz DEFAULT now(), 
        
               usecount bigint DEFAULT 0, 
        
               statslastupdated timestamptz, 
        
               estimated_count bigint, 
        
               total_count bigint 
        
           );

cannot start database service built with `pgstac:v0.3.5`

docker-compose.yml

  database:
    container_name: stac-db
    platform: linux/amd64
    image: ghcr.io/stac-utils/pgstac:v0.3.5
    environment:
      - POSTGRES_USER=username
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=postgis
    ports:
      - "5439:5432"
    command: postgres -N 500

docker-compose up database

stac-db  | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/pgstac.sql
stac-db  | BEGIN
stac-db  | psql:/docker-entrypoint-initdb.d/pgstac.sql:2: error: sql/001_core.sql: No such file or directory

cc @bitner

Add documentation for using `\copy` for faster data ingest

Any thoughts on adding a section in the documentation (maybe for "advanced users") for how to ingest data to pgstac using \copy?

I tried using pypgstac load, but for some reason it appeared to hang and after 24 hours I killed that process.

I saw that the existing load commands basically use copy through asyncpg:

pgstac/pypgstac/pypgstac/load.py

Lines 112 to 119 in 8d85be7

    
           await conn.copy_to_table( 
        
               table, 
        
               source=bytes_iter, 
        
               columns=["content"], 
        
               format="csv", 
        
               quote=chr(27), 
        
               delimiter=chr(31), 
        
           )

So I figured it would be worth exploring using \copy directly through psql and seeing if that would be faster. The following steps worked for me, appearing to load 1.4 million Landsat STAC items (the same file as above) in about 30 minutes on my local Mac machine.

Added a volume to the docker-compose section to be able to copy in data
Connect to psql with docker exec -it stac-db psql -U username -d postgis
I think I just needed to run SET SEARCH_PATH TO pgstac, public; in psql. ~~Not sure why but I needed to run the pgstac SQL definitions manually; they weren't automatically picked up by psql, \i /docker-entrypoint-initdb.d/001_core.sql, etc~~
Following roughly https://github.com/stac-utils/pgstac/blob/8d85be7f26b0be0384140aed6d850d122a2a9947/pypgstac/pypgstac/load.py#L133-L161
- Create a temp table CREATE TEMP TABLE pgstactemp (content jsonb);
- Copy data to the temp table from the volume (6 minutes) \copy pgstactemp FROM '/app/data/stac_items.jsonl';
- Insert data into pgstac.items (14 minutes)
```
 INSERT INTO pgstac.items (content)
 SELECT content FROM pgstactemp
 ON CONFLICT DO NOTHING;
```
- Recompute partitions (10 minutes) (SELECT backfill_partitions();)

Hierarchically organized STAC APIs

In the most recent release of the STAC API specification (i.e. v1.0.0-beta.5), two features have been added which imply a hierarchical representation of of STAC documents within STAC APIs: children and browseable. A recently added draft on stac_fastapi implements these features with an in memory representation of the hierarchy which defines behavior for both of these features.

In this draft, the hierarchy is specified separately from the ingested contents of the database. For instance, the data which defines the Joplin collection and its various related items is supplied via a json file and an environment variable which tells the server where a valid hierarchy definition can be located.

Assuming this structure is sufficient and desiring a more robust implementation which leverages the performance and benefits for implementers of other STAC libraries which might lean on pgstac, it would be great to push this information and associated behaviors down to postgres.

Below are some TODOs in the stac_fastapi draft which are next to functions/behavior which could likely find its home in pgstac

Search with only Item ID can return invalid item

I am seeing behavior where if a search is performed with only "items" specified, but no "collections", that invalid results can occur. I saw this through stac-fastapi, where the get_item() call was only supplying an item_id to the PgstacSearch. This is incorrect and I'll be fixing in a PR there, but what I was seeing was that an item - that did not have a matching Item ID - was being returned, and that item belonged to another collection. A search on an Item ID without a collection ID should return all (potentially multiple) matching Items from across collections. I have not reproduces this error in the project, and it may be hard to reproduce - this seems to have been working fine, but after a redeploy - I'm not sure if some state changed in the DB - I started consistently getting incorrect results from Item queries.

Can't override `default-filter-lang` if it is set to `cql-json`

If the pgstac_settings default-filter-lang is set to cql-json, I don't believe it can be overridden as intended by this line:

pgstac/sql/004_search.sql

Line 651 in 3afa5df

    
           IF (search ? 'filter-lang' AND search->>'filter-lang' = 'cql-json') OR get_setting('default-filter-lang', _search->'conf')='cql-json' THEN

In the case where the default is set to "cql-json" and the client sends a filter-lang property of "cql2-json", I read the logic of that lines as (false or true) and it then attempts and fails to parse the cql2 as cql (which is the behavior I'm seeing). Instead, I think the default just need to be checked is the search didn't provide a filter-lang property.

If the default is set to cql2-json, the search-supplied and override behavior seems correct.

Update where_stats after ingest of new data

I'm running into a situation where a query is logged in the search_wheres table before new data is brought in. This causes queries to miss the newly ingested items, and returning only results from the previous set of partitions. In the case where a query was done before any items are ingested, it will return 0 results as no partitions are logged in search_wheres for that query.

Workaround is to perform a query like

delete from search_wheres  where _where LIKE '%COLLECTIONID%';

where COLLECTIONID is the collection ID for the recently ingested items.

Perhaps some purging of the search_wheres table should be done after a load to avoid mis-cached results? Or turn down the default TTL - as 1 day is a long time to wait for new items to show up.

error message: pglogger() takes 1 positional argument but 2 were given

I'm seeing in logging output:

ERROR    asyncio:base_events.py:1707 Unhandled exception in asyncpg log message listener callback <function pglogger at 0x7fc70bb4c280>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/asyncpg/connection.py", line 1435, in _call_log_listener
    cb(con_ref, message)
TypeError: pglogger() takes 1 positional argument but 2 were given

This doesn't error out code so is really more of a warning, but perhaps there's a mispatch on the pglogger method params and the callback signature asyncpg expects

Add `pypgstac version` command

Use case: I was about to migrate a database and wanted to confirm I had the latest pypgstac installed in my container. Would be useful to have a pypgstac version or pypgstac --verison to check the version installed.

Update Changelog!

Changelog hasn't been updated since 0.3.4 😬

Search returning null features for searches with no results

Upgrading stac-fastapi to pgstac 0.3.2, some tests are failing because search returns look like:

{
  "type": "FeatureCollection",
  "context": {
    "limit": 10,
    "matched": 0,
    "returned": 0
  },
  "features": null
}

Seems as though pgstac is returning null instead of [] for empty searches, at least in some cases.

Partitioned tables cannot have BEFORE / FOR EACH ROW triggers

Executing migrations/pgstac.0.4.3.sql on PostgreSQL 12.8 fails with the following

"items" is a partitioned table
Partitioned tables cannot have BEFORE / FOR EACH ROW triggers

Issue #63 reported the same error, and the issue was later closed by the same user reporting "Forgot to install pg_partman.", indicating that installing pg_partman solved the problem. However PR #66 removed pg_partman from the list of required extensions. My testing suggests pg_partman does not make a difference.

pgstac.0.3.4.sql fails in the same way on PostgreSQL 12.8.
pgstac.0.4.3.sql succeeds in PostgreSQL 13.4 with no further changes besides version number.

To reproduce: https://github.com/captaincoordinates/pgstac-pg12-test/

Does pgstac still support PostgreSQL 12.x?
Is pg_partman required for 12.x?

Any feedback much appreciated 🙏

Issues running migrations

Working on a project that had pypgstac 0.2.3 installed

Running pypgstac migrate against a development database using image bitner/pgstac:0.1.9, I got the error:

root@39835833116a:/opt/src# pypgstac migrate --dsn postgresql://username:password@database:5432/postgis
Traceback (most recent call last):
  File "/opt/conda/bin/pypgstac", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.8/site-packages/pypgstac/pypgstac.py", line 125, in migrate
    typer.echo(asyncio.run(run_migration(dsn)))
  File "/opt/conda/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.8/site-packages/pypgstac/pypgstac.py", line 110, in run_migration
    await conn.execute(migration_sql)
  File "/opt/conda/lib/python3.8/site-packages/asyncpg/connection.py", line 297, in execute
    return await self._protocol.query(query, timeout)
  File "asyncpg/protocol/protocol.pyx", line 336, in query
asyncpg.exceptions.PostgresSyntaxError: syntax error at or near "'0.2.3'"

I then updated to pypgstac 0.2.4 and reran, and got the error:

root@62d1d24ad3c3:/opt/src# pypgstac migrate --dsn postgresql://username:password
@database:5432/postgis
Traceback (most recent call last):
  File "/opt/conda/bin/pypgstac", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.8/site-packages/pypgstac/pypgstac.py", line 125, in migrate
    typer.echo(asyncio.run(run_migration(dsn)))
  File "/opt/conda/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.8/site-packages/pypgstac/pypgstac.py", line 110, in run_migration
    await conn.execute(migration_sql)
  File "/opt/conda/lib/python3.8/site-packages/asyncpg/connection.py", line 297, in execute
    return await self._protocol.query(query, timeout)
  File "asyncpg/protocol/protocol.pyx", line 336, in query
asyncpg.exceptions.DuplicateObjectError: constraint "items_collections_fk" for relation "items" already exists

Looking up a valid item by ID with the wrong collection incorrectly returns the item

https://planetarycomputer-staging.microsoft.com/api/stac/v1/collections/mtbs/items/S2A_MSIL2A_20220329T215531_R029_T07WDU_20220330T055949 is a link to a valid item from Sentinel-2, but it's using the wrong collection in the path (mtbs). That should return an error saying that that item isn't in the collection mtbs. Instead, it returns the sentinel-2 item.

This seems to be fixed on the 0.5 branch (https://pc-pqe-red-mqe.westeurope.cloudapp.azure.com/api/stac/v1/collections/mtbs/items/S2B_MSIL2A_20220309T105759_R094_T39XVH_20220309T203945), but I wanted to report it anyway in case we want to add a regression test. This is also not present in previous versions of pgstac (e.g. https://planetarycomputer.microsoft.com/api/stac/v1/collections/mtbs/items/S2B_MSIL2A_20220309T105759_R094_T39XVH_20220309T203945; I'm not sure what version that's running)

Feel free to close if you think the 0.5 release will make this impossible by design.

looking for consistency in search hash.

It seems that every time there is a pgstac update, the hash for a search will be different (I have hash based test in https://github.com/stac-utils/titiler-pgstac/blob/master/tests/test_titiler_pgstac.py#L7-L8 and I have to update the hash every time pgstac version change)

I think this is usually fine, but it would be great if we could have a way to have a consistent id for registering/searching/accessing a search query.

I'm mostly thinking about using a name to describe a search query. This would be optional, a search could still be located with it's hash, but also with it's name.

pgstac/sql/004_search.sql

Lines 1000 to 1004 in 76512ab

    
           CREATE OR REPLACE FUNCTION search_query( 
        
               _search jsonb = '{}'::jsonb, 
        
               updatestats boolean = false, 
        
               _metadata jsonb = '{}'::jsonb 
        
           ) RETURNS searches AS $$

CI for publishing pypgstac

I noticed that the latest version pypgstac on pypi is 0.2.4. The main branch has a version of 0.2.3, and there's not tags for all released versions.

We should set up GitHub Actions to do the publishing, much like we are doing in pystac. That way releases happen on each GitHub release, and the python package is published by CI against tagged versions. This will require the stacutils user to be made a maintainer on the pypgstac pypi repository.

Support for adding sslrootcert with asyncpg 0.24.0

Latest version of pypgstac (0.4.3) requires asyncpg < 0.23.0 and >= 0.22.0. However, the support for sslrootcert is added to asyncpg only in 0.24.0. Need to update the dependencies to at least asyncpg 0.24.0, so that the Postgresql that require a root cert to connect like Azure Postgresql can be supported.

Multiple entries of version in pgstc.migrations table

Running the pypgstac migrate command from a previous version to a new version leads to multiple entries of the final version in the pgstac.migrations table:

Running migrations for ['pgstac.0.2.4-0.2.7.sql', 'pgstac.0.2.7-0.2.8.sql', 'pgstac.0.2.8-0.2.9.sql', 'pgstac.0.2.9-0.3.0.sql', 'pgstac.0.3.0-0.3.1.sql', 'pgstac.0.3.1-0.3.2.sql', 'pgstac.0.3.2-0.3.3.sql'].
pgstac version 0.3.3

 version |           datetime
---------+-------------------------------
 0.2.4   | 2021-07-23 00:00:00+00
 0.2.7   | 2021-09-03 09:02:22.32407+00
 0.3.3   | 2021-09-03 09:02:22.360333+00
 0.2.8   | 2021-09-03 09:02:22.362009+00
 0.3.3   | 2021-09-03 09:02:22.362903+00
 0.2.9   | 2021-09-03 09:02:22.364059+00
 0.3.3   | 2021-09-03 09:02:22.364059+00
 0.3.0   | 2021-09-03 09:02:22.366378+00
 0.3.3   | 2021-09-03 09:02:22.366378+00
 0.3.1   | 2021-09-03 09:05:24.690869+00
 0.3.3   | 2021-09-03 09:05:24.690869+00
 0.3.2   | 2021-09-03 09:05:24.716236+00
 0.3.3   | 2021-09-03 09:05:24.716236+00
 0.3.3   | 2021-09-03 09:05:24.723922+00
 0.3.3   | 2021-09-03 09:05:24.723922+00
(15 rows)

For any migration file (e.g., 0.2.4 to 0.2.7) two inserts are currently conducted:

From the migration file
https://github.com/stac-utils/pgstac/blob/main/pypgstac/pypgstac/migrations/pgstac.0.2.4-0.2.7.sql#L313
From the python script (which inserts always to latest version to migrate to - in my case 0.3.3):
https://github.com/stac-utils/pgstac/blob/main/pypgstac/pypgstac/migrate.py#L160-L170

I think this can be resolved by removing the insert statement from the python script?

Multiple pgstac.migrations records can have same timestamp

I was testing out pgstac migrations (with pypgstac migrate) and it looks like multiple version numbers can be inserted at the same time.

db=> SELECT * FROM pgstac.migrations;
 version |           datetime
---------+-------------------------------
 0.2.4   | 2021-08-24 18:34:59.364497+00
 0.2.7   | 2021-08-24 18:39:26.047286+00
 0.3.4   | 2021-08-24 18:39:26.124702+00
 0.2.8   | 2021-08-24 18:39:26.129415+00
 0.3.4   | 2021-08-24 18:39:26.133057+00
 0.2.9   | 2021-08-24 18:39:26.136371+00
 0.3.4   | 2021-08-24 18:39:26.136371+00
(7 rows)

0.2.4 was added to the table manually, but all other rows were added as part of pypgstac migrate. This is an issue because pypgstac migrate picks up the most-recently-added row

pgstac/pypgstac/pypgstac/migrate.py

Lines 113 to 118 in 6e43fb3

    
                       version = await conn.fetchval( 
        
                           """ 
        
                           SELECT version FROM pgstac.migrations 
        
                           ORDER BY datetime DESC LIMIT 1; 
        
                           """ 
        
                       )

db=> SELECT version FROM pgstac.migrations
db-> ORDER BY datetime DESC LIMIT 1;
 version
---------
 0.2.9
(1 row)

If pgstac is dedicated to Semver, you could just take Max(version) and use lexicographic sort on the string?

db=> SELECT MAX(version) FROM pgstac.migrations;
  max
-------
 0.3.4
(1 row)

Remove timeStamp from search response

https://github.com/stac-utils/pgstac/blob/main/sql/004_search.sql#L561

This field doesn't match the STAC API response and is extraneous; I think we should remove to avoid confusion and stick to standard fields.

Implement PGTap SQL tests in scripts/test

As follow up work to #18, we need to get the PGTap SQL tests to run as part of the testing workflow. With that PR, only the python tests are run as part of scripts/test and the CI testing; this will integrate the tests in the top level test folder.

Intersects geometry in search appears limited to 8K bytes in version >= 0.3.5

When running searches that include an "intersects" for complex geometries, I am seeing this error:

ERROR: index row requires 293792 bytes, maximum size is 8191

The "293792 bytes" seems to correspond to the size of the input geometry.

The error does not occur for simpler geometries (I ran one that was a simple polygon covering the same area loosely as the complex polygon) nor in earlier versions of pgstac. Below is the stack trace from the database when running a search with one collection and an intersects of California, although I left off the first two notices of NOTICE: ARGS after array cleaning: since the geometry value is so long. Happy to provide the specific California geometry if needed.

NOTICE:  Checking if update is needed.
NOTICE:  Stats Last Updated: <NULL>
NOTICE:  TTL: 1 day, Age: <NULL>
NOTICE:  Context: off, Existing Total: <NULL>
NOTICE:  Time for just the explain: 00:00:00.428574
NOTICE:  Time for explain + join: 00:00:00.431062

ERROR:  index row requires 293792 bytes, maximum size is 8191
CONTEXT:  SQL statement "INSERT INTO search_wheres SELECT sw.*

    ON CONFLICT (_where)

    DO UPDATE

        SET

            lastused = sw.lastused,

            usecount = sw.usecount,

            statslastupdated = sw.statslastupdated,

            estimated_count = sw.estimated_count,

            estimated_cost = sw.estimated_cost,

            time_to_estimate = sw.time_to_estimate,

            partitions = sw.partitions,

            total_count = sw.total_count,

            time_to_count = sw.time_to_count"
PL/pgSQL function where_stats(text,boolean) line 189 at SQL statement
SQL statement "SELECT where_stats(search._where, updatestats)"
PL/pgSQL function search_query(jsonb,boolean,jsonb) line 41 at PERFORM
PL/pgSQL function search(jsonb) line 71 at assignment
SQL state: 54000

Thank you!

Add support for the Context Extension

Token Pagination not following full datetime filter.

The code to add the temporal component to queries with pagination is not appropriately intersecting the token datetime with the query datetime.

Errors on fresh run of `psql -f pgstac.sql`

Tried psql -f pgstac.sql on an empty PostgreSQL 12 GCP Cloud SQL database on commit d60d7e8 and got some errors, especially:

psql:sql/003_items.sql:269: NOTICE:  trigger "items_stmt_trigger" for relation "items" does not exist, skipping
psql:sql/004_search.sql:191: ERROR:  return type mismatch in function declared to return box3d

Here's the full output:

BEGIN
CREATE EXTENSION
CREATE SCHEMA
CREATE EXTENSION
CREATE SCHEMA
SET
CREATE TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
SET
CREATE TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
DELETE 0
 create_parent 
---------------
 t
(1 row)

CREATE FUNCTION
CREATE FUNCTION
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TYPE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
psql:sql/003_items.sql:269: NOTICE:  trigger "items_stmt_trigger" for relation "items" does not exist, skipping
DROP TRIGGER
CREATE TRIGGER
CREATE VIEW
CREATE VIEW
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
SET
DROP VIEW
CREATE VIEW
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
psql:sql/004_search.sql:191: ERROR:  return type mismatch in function declared to return box3d
DETAIL:  Actual return type is geometry.
CONTEXT:  SQL function "bbox_geom"
psql:sql/004_search.sql:195: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:199: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:259: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:279: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:288: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:330: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:361: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:371: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:sql/004_search.sql:595: ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK

Posgres 13 `create_items` result in `UndefinedTableError`

Hi there!

We are using the stac-fastapi with the pgstac backend. The database was initially set up using pypgstac migrate

We are attempting to insert a test item:

{
  "type": "Feature",
  "stac_version": "1.0.0",
  "id": "test_item",
  "properties": {
    "datetime": "1990-12-27T08:27:00Z"
  },
  "bbox": [-180, -90, 180, 90],
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "links": [
    {
      "rel": "root",
      "href": "http://stac.mydomain.com/",
      "type": "application/json",
      "title": "stac-fastapi"
    },
    {
      "rel": "collection",
      "href": "https://stac.mydomain.com/collections/test_collection",
      "type": "application/json"
    },
    {
      "rel": "parent",
      "href": "https://stac.mydomain.com/collections/test_collection",
      "type": "application/json"
    },
    {
      "rel": "self",
      "href": "https://stac.mydomain.com/collections/test_collection/items/test_item",
      "type": "application/json"
    }
  ],
  "assets": {},
  "stac_extensions": [],
  "collection": "test_collection"
}

However the API is returning the following error:

asyncpg.exceptions.UndefinedTableError: relation "items_p1990w52" does not exist

I figured this might be caused by Partman not being installed, so we amended the database:

create schema partman;
create extension pg_partman with schema partman;

create table partman.pgstac_items_template(
    like pgstac.items
);

create index on partman.pgstac_items_template (datetime asc);
create index on partman.pgstac_items_template (end_datetime asc);
create index on partman.pgstac_items_template using gist(geometry);

select partman.create_parent('pgstac.items', 'datetime', 'native', 'weekly',
                             p_template_table := 'partman.pgstac_items_template');

This unfortunately did not help.

Not sure what is going on here. Are anyone able to point us in the direction of a solution?

Specify schema in called functions for pg_dump / pg_restore

Let me know if you have another recommendation for backing up and restoring the database, but I attempted to use pg_dump and pg_restore and I get this error:

CONTEXT:  SQL function "stac_geom" during inlining
....
pg_restore: error: COPY failed for table "items_p2021w37": ERROR:  function st_geomfromgeojson(text) does not exist
LINE 5:                 ST_GeomFromGeoJSON(value->>'geometry')
                        ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
QUERY:  
SELECT
    CASE
            WHEN value->>'geometry' IS NOT NULL THEN
                ST_GeomFromGeoJSON(value->>'geometry')
            WHEN value->>'bbox' IS NOT NULL THEN
                ST_MakeEnvelope(
                    (value->'bbox'->>0)::float,
                    (value->'bbox'->>1)::float,
                    (value->'bbox'->>2)::float,
                    (value->'bbox'->>3)::float,
                    4326
                )
            ELSE NULL
        END as geometry
;

This is happening because pg_dump explicitly sets the search_path to empty so custom functions (e.g., from the extension or pgstac's function) without specific schemas set aren't found. I was able to fix this specific error but I adding "public" in front of the ST_GeomFromGeoJSON function but there were many more errors like this.

If functions called in the pgstac functions had explicit schemas set, I think it would solve the issue and pg_dump and pg_restore could be used for backups.

remove asyncio from pypgstac requirements

pgstac/pypgstac/pyproject.toml

Line 18 in 4b58277

asyncio = "^3.4.3"

AsyncIO is part native python install so there is no need to force-install it.

This could cause issue when using pypgstac in AWS Lambda: https://stackoverflow.com/a/61746719

Add Cache to store Queries and an MD5 Hash of the queries.

Allow typer 0.4.0

Currently the dependencies of the project block using typer 0.4.0, which is needed for click compatibility (see tiangolo/typer#280).

Support CQL JSON for Queries

Postgres version 12 error due to BEFORE row triggers in partitioned tables

Running pypgstac migrate --toversion 0.3.5 against a clean Postgres version 12 database fails with message:

asyncpg.exceptions.WrongObjectTypeError: "items" is a partitioned table
DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.

This might be because BEFORE row level triggers for partitioned tables are not supported for Postgres version 12, but are supported for Postgres version 13 (found some hints here: https://www.depesz.com/2020/03/31/waiting-for-postgresql-13-enable-before-row-level-triggers-for-partitioned-tables/ ).

Upgrading to Postgres 13 resolved the error.

Full stack trace:

No pgstac version set, installing 0.3.5 from scratch.
Running migrations for ['/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/pypgstac/migrations/pgstac.0.3.5.sql'].
Traceback (most recent call last):
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/bin/pypgstac", line 8, in <module>
    sys.exit(app())
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/pypgstac/pypgstac.py", line 31, in migrate
    version = asyncio.run(run_migration(dsn, toversion))
  File "/home/jthetzel/.pyenv/versions/3.9.6/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/jthetzel/.pyenv/versions/3.9.6/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/pypgstac/migrate.py", line 163, in run_migration
    await conn.execute(migration_sql)
  File "/home/jthetzel/.cache/pypoetry/virtualenvs/c-stac-api-4u_-WkNV-py3.9/lib/python3.9/site-packages/asyncpg/connection.py", line 297, in execute
    return await self._protocol.query(query, timeout)
  File "asyncpg/protocol/protocol.pyx", line 336, in query
asyncpg.exceptions.WrongObjectTypeError: "items" is a partitioned table
DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jthetzel/src/c-core-labs/c-stac-api/scripts.py", line 53, in migrate
    check_call(["pypgstac", "migrate"])
  File "/home/jthetzel/.pyenv/versions/3.9.6/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['pypgstac', 'migrate']' returned non-zero exit status 1.

Handling start_datetime and end_datetime?

There's some datasets that do not have a datetime, but instead have a start_datetime and end_datetime. For example, a landcover dataset that represents a year of data - there's no single datetime to represent the data. STAC specifies that in this case datetime is null and the start_datetime and end_datetime is set.

How can pgstac handle these types of items?

pgstac seems to require globally unique item IDs contrary to STAC spec

I've been loading some STAC items into my pgstac instance, and it seems that pgstac requires globally unique Item IDs. For example, if I create a new collection, and then add an item that I've already added, with the same ID as an existing item, then I get the following error:

$ pypgstac load items bec-items.json
How to deal conflicting ids (insert, insert_ignore, upsert) [insert]:
loading bec-items.json into items using insert
Traceback (most recent call last):
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\Scripts\pypgstac.exe\__main__.py", line 7, in <module>
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\pypgstac\pypgstac.py", line 43, in load
    typer.echo(asyncio.run(load_ndjson(file=file, table=table, dsn=dsn, method=method)))
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\asyncio\base_events.py", line 642, in run_until_complete
    return future.result()
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\pypgstac\load.py", line 246, in load_ndjson
    await load_iterator(f, table, conn, method)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\pypgstac\load.py", line 230, in load_iterator
    await copy(iter, table, conn)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\pypgstac\load.py", line 134, in copy
    await conn.copy_to_table(
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\asyncpg\connection.py", line 859, in copy_to_table
    return await self._copy_in(copy_stmt, source, timeout)
  File "C:\Users\rwilson3\Documents\mambaforge\envs\anglo\lib\site-packages\asyncpg\connection.py", line 1041, in _copy_in
    return await self._protocol.copy_in(
  File "asyncpg\protocol\protocol.pyx", line 506, in copy_in
asyncpg.exceptions.UniqueViolationError: duplicate key value violates unique constraint "items_p1999w52_datetime_id_idx"
DETAIL:  Key (datetime, id)=(2000-01-01 00:00:00+00, 4096a_0_10.0_32_7_-46) already exists.

Is this intentional? The STAC spec states that:

The ID should be unique within the Collection that contains the Item.

Split SQL and Python testing in CI

The CI process is a bit complex IMO. It's built on top of a series of shell scripts which call other shell script.

I think it will be good to refactor a bit our CI process to:

test SQL Only
test Python Only
avoid creating migration files every time we want to change the SQL code.

Ideally the migration files should also be created automatically within the CI when pushing a new Tag.

Support for CQL2

cql2 docs:

Add support for tiled search.

Rather than a limit based on a count, the limit is filled when the coverage of the found items covers the entire tile.

Support isNull in filters

As described here, it would be useful to be able to filter a search request based on whether a given property is null.

Search returning incorrect results after migration to latest version

I had an old version of pgstac running, which I have just upgraded to the latest version. Since then, I've found that some of the search queries I've run have returned incorrect results. I actually found the incorrect results when querying using stac_fastapi, but I've checked by running queries using the underlying pgstac SQL functions and have found that they are returning incorrect results, which are then just displayed by stac_fastapi).

For example, when running the query:

SELECT * FROM pgstac.search('{"collection":"test-bec-4"}'::jsonb)

I get a JSON result which starts like this:

{
  "next": "N51E000.tif",
  "prev": null,
  "type": "FeatureCollection",
  "context": {
    "limit": 10,
    "returned": 10
  },
  "features": [
    {
      "id": "core-4",
      "bbox": [
        35,
        19.3,
        35,
        19.3
      ],
      "type": "Feature",
      "links": [
        {
          "rel": "self",
          "href": "http://localhost/test.tif",
          "type": "application/json"
        }
      ],
      "assets": {},
      "geometry": {
        "type": "Point",
        "coordinates": [
          35,
          19.3
        ]
      },
      "collection": "test-anglo-cores-3",
      "properties": {
        "datetime": "2020-02-08T11:23:00Z",
        "core-type": "shallow",
        "last-reviewed": "2021-03-19"
      },
...

In that returned JSON, it lists the collection as test-anglo-cores-3 not test-bec-4.

Running

SELECT * FROM items WHERE collection_id = 'test-bec-4'

gives me 100 results, which is the correct number - and shows that there are items in the database with that collection ID. I've also checked the content field of the items table, and in there the collection is also specified correctly as test-bec-4.

I have no idea how to go forward with debugging this, as my SQL skills are quite limited. Any help would be very much appreciated.

Support for arbitrary partitioning

The data upload functionality assumes the data is temporally dense and creates datetime partitions that are 1 week long.

Some datasets (eg: CMIP6) are sparsely populated over a very large period of time (eg: 20 files per month spanning 150 years) - this leads to about 1800 partitions with about 20 files each (the monthly data is timestamped to the 1st of the month, so only the partition for the 1st week of the month is created).

I haven't benchmarked the search functionality with this many partitions, but after a discussion today with @bitner, I wanted to log an issue for implementing arbitrary partitions, since it seems like a valuable feature anyways.

The STAC API will contain both the CMIP6 dataset and other datasets that are much more temporally dense and with shorter temporal extent. The preferred solution in this case would be to have the ability to make large partitions for the sparsely populated time-ranges and smaller partitions for the densely populated time-ranges (eg: 1 year or 5 year partitions from 1950-1999, week or month partitions from 2000-2030 and then 1 or 5 year partitions again from 2031-2100).

This can be a function that needs to be executed manually before uploading the data (as opposed to dynamically calculating the partition size during upload).

A further improvement could be a function to further split up partitions (or even recombine them) in the case of data being added or removed from the STAC database

Move search to use reusable cursor code for scanning through partitions.

publish two docker images: `pgstac` and `pypgstac`

Right now the official image published is built with https://github.com/stac-utils/pgstac/blob/main/Dockerfile packaging pypgstac python module and postgres (+pgstac SQL).

I don't think the image will be used for both function running migration/loading items and starting the database, so IMO we should split the dockerfile in 2:

ghcr.io/stac-utils/pgstac:v{VERSION}
ghcr.io/stac-utils/pypgstac:v{VERSION}

Document process for release

Document the process for version tagging a release and generating the appropriate migration scripts.

This is follow-up work from #18, which introduced a release process that is based on GitHub releases, so that releases are automated based on a new tag/GitHub release.

The Makefile processing currently creates a script that inserts the version in the migration table. This issue would create a similar workflow that is done as a manual step prior to releasing.

Integrate poetry into development environment scripts

As follow on work to #18, we need to integrate poetry into the development environment setup, which was left as follow-up work.

query operators should match spec

Here a list of operators are defined to include le and ge: https://github.com/stac-utils/pgstac/blob/main/sql/004_search.sql#L218-L230

The spec has these as lte and gte

	IF j ? 'collection' THEN
	newprop := jsonb_build_object(
	'in',
	jsonb_build_array(
	'{"property":"collection"}'::jsonb,
	j->'collection'
	)
	);
	newprops := jsonb_insert(newprops, '{1}', newprop);
	END IF;

	IF _search ? 'collections' THEN
	qa := array_append(qa, in_array_q('collection_id', _search->'collections'));
	END IF;

	CREATE TABLE IF NOT EXISTS searches(
	hash text GENERATED ALWAYS AS (search_hash(search)) STORED PRIMARY KEY,
	search jsonb NOT NULL,
	_where text,
	orderby text,
	lastused timestamptz DEFAULT now(),
	usecount bigint DEFAULT 0,
	statslastupdated timestamptz,
	estimated_count bigint,
	total_count bigint
	);

	await conn.copy_to_table(
	table,
	source=bytes_iter,
	columns=["content"],
	format="csv",
	quote=chr(27),
	delimiter=chr(31),
	)

	CREATE OR REPLACE FUNCTION search_query(
	_search jsonb = '{}'::jsonb,
	updatestats boolean = false,
	_metadata jsonb = '{}'::jsonb
	) RETURNS searches AS $$

	version = await conn.fetchval(
	"""
	SELECT version FROM pgstac.migrations
	ORDER BY datetime DESC LIMIT 1;
	"""
	)

stac-utils / pgstac Goto Github PK

pgstac's People

Stargazers

Watchers

Forkers

pgstac's Issues

Recommend Projects

Recommend Topics

Recommend Org