PostgreSQL schema and functions for Spatio Temporal Asset Catalog (STAC)
PGStac
PGDatabase Schema and Functions for Storing and Accessing STAC collections and items in PostgreSQL
STAC Client that uses PGStac available in STAC-FastAPI
PGStac requires Postgresql>=13 and PostGIS>=3. Best performance will be had using PostGIS>=3.1.
PGStac Settings
PGStac installs everything into the pgstac schema in the database. You will need to make sure that this schema is set up in the search_path for the database.
There are additional variables that control the settings used for calculating and displaying context (total row count) for a search, as well as a variable to set the filter language (cql-json or cql-json2). The context is "off" by default, and the default filter language is set to "cql2-json".
Variables can be set either by passing them in via the connection options using your connection library, setting them in the pgstac_settings table or by setting them on the Role that is used to log in to the database.
Example for updating the pgstac_settings table with a new value:
INSERT INTO pgstac_settings (name, value)
VALUES
('default-filter-lang', 'cql-json'),
('context', 'on')
ON CONFLICT ON CONSTRAINT pgstac_settings_pkey DO UPDATE SET value = excluded.value;
Alternatively, update the role:
ALTER ROLE <username> SET SEARCH_PATH to pgstac, public;
ALTER ROLE <username> SET pgstac.context TO <'on','off','auto'>;
ALTER ROLE <username> SET pgstac.context_estimated_count TO '<number of estimated rows when in auto mode that when an estimated count is less than will trigger a full count>';
ALTER ROLE <username> SET pgstac.context_estimated_cost TO '<estimated query cost from explain when in auto mode that when an estimated cost is less than will trigger a full count>';
ALTER ROLE <username> SET pgstac.context_stats_ttl TO '<an interval string ie "1 day" after which pgstac search will force recalculation of it's estimates>>';
PyPGStac
PGStac includes a Python utility for bulk data loading and managing migrations.
PyPGStac is available on PyPI
pip install pypgstac
Or can be built locally using Poetry
git clone https://github.com/stac-utils/pgstac
cd pgstac/pypgstac
poetry build pypgstac
pip install dist/pypgstac-[version]-py3-none-any.whl
pypgstac --help
Usage: pypgstac [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
--help Show this message and exit.
Commands:
initversion Get initial version.
load Load STAC data into a pgstac database.
migrate Migrate a pgstac database.
pgready Wait for a pgstac database to accept connections.
version Get version from a pgstac database.
PyPGStac will get the database connection settings from the standard PG environment variables:
- PGHOST=0.0.0.0
- PGPORT=5432
- PGUSER=username
- PGDATABASE=postgis
- PGPASSWORD=asupersecretpassword
It can also take a DSN database url "postgresql://..." via the --dsn flag.
Migrations
PyPGStac has a utility to help apply migrations to an existing PGStac instance to bring it up to date.
There are two types of migrations:
- Base migrations install PGStac into a database with no current PGStac installation. These migrations follow the file pattern
"pgstac.[version].sql"
- Incremental migrations are used to move PGStac from one version to the next. These migrations follow the file pattern
"pgstac.[version].[fromversion].sql"
Migrations are stored in pypgstac/pypgstac/migration`s
and are distributed with the PyPGStac package.
Running Migrations
PyPGStac has a utility for checking the version of an existing PGStac database and applying the appropriate migrations in the correct order. It can also be used to setup a database from scratch.
To create an initial PGStac database or bring an existing one up to date, check you have the pypgstac version installed you want to migrate to and run:
pypgstac migrate
Bulk Data Loading
A python utility is included which allows to load data from any source openable by smart-open using python in a memory efficient streaming manner using PostgreSQL copy. There are options for collections and items and can be used either as a command line or a library.
To load an ndjson of items directly using copy (will fail on any duplicate ids but is the fastest option to load new data you know will not conflict)
pypgstac load items
To load skipping any records that conflict with existing data
pypgstac load items --method insert_ignore
To upsert any records, adding anything new and replacing anything with the same id
pypgstac load items --method upsert
Contribution & Development
PGStac uses a dockerized development environment. You can set this up using:
scripts/setup
To bring up the development database:
scripts/server
To run tests, use:
scripts/test
To rebuild docker images:
scripts/update
To drop into a console, use
scripts/console
To drop into a psql console on the database container, use:
scripts/console --db
To run migrations on the development database, use
scripts/migrate
To stage code and configurations and create template migrations for a version release, use
scripts/stageversion [version or increment type]
Examples:
scripts/stageversion 0.2.8
scripts/stageversion patch # if current version is 0.2.7 will bump to 0.2.8
scripts/stageversion minor # if current version is 0.2.7 will bump to 0.3.0
scripts/stageversion major # if current version is 0.2.7 will bump to 1.0.0
This will create a base migration for the new version and will create incremental migrations between any existing base migrations. The incremental migrations that are automatically generated by this script will have the extension ".staged" on the file. You must manually review (and make any modifications necessary) this file and remove the ".staged" extension to enable the migration.
Making Changes to SQL
All changes to SQL should only be made in the /sql
directory. SQL Files will be run in alphabetical order.
Adding Tests
PGStac uses PGTap to test SQL. Tests can be found in tests/pgtap.sql and are run using scripts/test
Release Process
- Make sure all your code is added and committed
- Create a PR against the main branch
- Once the PR has been merged, start the release process.
- Use
scripts/stagerelease
as documented in migrations section above making sure to rename any files ending in ".staged" in the migrations section - Add details for release to the CHANGELOG
- Add/Commit any changes
- Run tests
scripts/test
- Create a git tag
git tag v0.2.8
using new version number - Push the git tag
git push origin v0.2.8
- The CI process will push pypgstac to PyPi, create a docker image on ghcr.io, and create a release on github.