smnorris / fwapg Goto Github PK

View Code? Open in Web Editor NEW

9.0 5.0 5.0 7 MB

PostgreSQL tools for working with British Columbia's Freshwater Atlas

Home Page: https://smnorris.github.io/fwapg

License: MIT License

Shell 6.28% PLpgSQL 82.01% Makefile 10.94% Dockerfile 0.76%

fwa british-columbia postgis freshwater-atlas linear-referencing watersheds rivers streams lakes

fwapg's People

Contributors

Stargazers

Watchers

Forkers

newgraphenvironment poissonconsulting chazgorman frantarkenton haileye

fwapg's Issues

FWA_WatershedAtMeasure QA

don't include polys with invalid codes
double check against fwakit.watersheds
test cut watersheds for validity

FWA_LocateAlongInterval - add max measure for interval

poissonconsulting/fwatlasbc#23

default to max measure for the given stream
warn user if provided max is greater than max in db

Not sure how/if pg_featureserv can handle warnings but otherwise this should be simple.

complete large FWA tables now available via BCDC

Provincial layers previously too large for the distribution service are now directly downloadable via the DataBC catalogue.

makefile / postgis-baselayers

To support loading via GUI supplied by postgis-baselayers, the script could potentially be converted to a make file, see https://github.com/kokoalberti/postgis-baselayers/blob/master/app/datasets/naturalearth/Makefile

tidy upstream area scripts

Adding the upstream area attribute to streams is useful but can be taken from fwa_watersheds_upstream_area, no need to calculate upstream area twice

FWA_WatershedAtMeasure performance

Unsurprisingly, bigger areas take longer to build:

Sooke River @ mouth:

time psql -c "SELECT wscode_ltree FROM fwa_refinedwatershed(354153927, 0)"
real	0m1.366s
user	0m0.009s
sys	0m0.010s

Cowichan River @ 7.2km:

time psql -c "SELECT wscode_ltree FROM fwa_refinedwatershed(354155148, 7200)"
real	0m4.057s
user	0m0.007s
sys	0m0.006s

Running the Fraser River at measure of 1000m takes too long to wait for.

Fix for now is to adjust the query to use fwa_assessment_watersheds_poly as pre-aggregated areas. As a test, this takes ~47s for Fraser River at measure 1000m. This is acceptable for current use case, but is there a faster way to get the outer edge of the geometry collection? Also, using the assessment watersheds is only quick if only using that layer, there doesn't seem to be a quick way to select upstream assessment watersheds and then tack on the first order watersheds that are not upstream of that first selection.

A cache of some kind is probably the way to go but using a built in postgis function would be nice.

FWA_WatershedAtMeasure fails on Spillimacheen River

SELECT * FROM FWA_WatershedAtMeasure(356570193, 4999.9999);
ERROR:  more than one row returned by a subquery used as an expression
CONTEXT:  PL/pgSQL function fwa_watershedatmeasure(integer,double precision) line 26 at RETURN QUERY

add m values on load

We should probably be adding M values to the geoms on load (rather than calculating them on the fly whenever doing linear referencing):
http://postgis.net/docs/ST_AddMeasure.html
http://postgis.net/docs/ST_LocateAlong.html
http://postgis.net/docs/ST_LocateBetween.html

FWA_Upstream - queries fail to use index when wrapped in the function

Calling something like this can be extremely slow, the function still doesn't seem to use the indexes reliably.

(4.5m for a query joining ~6000 points to streams, restricted to a single watershed group, but only 2.4s when using the raw upstream SQL)

-- using function, 4.5m
SELECT a.col1, sum(st_length(geom)) as length_upstream
FROM a
LEFT OUTER JOIN streams b
ON FWA_Upstream(
    a.blue_line_key,
    a.downstream_route_measure,
    a.wscode_ltree,
    a.localcode_ltree,
    b.blue_line_key,
    b.downstream_route_measure,
    b.wscode_ltree,
    b.localcode_ltree
   )

-- spell it out, 2.4s
SELECT a.col1, sum(st_length(geom)) as length_upstream
FROM a
LEFT OUTER JOIN streams b
ON 
    -- b is a child of a, always
  b.wscode_ltree <@ a.wscode_ltree
  AND
      -- conditional upstream join logic, based on whether watershed codes are equivalent
    CASE
      -- first, consider simple case - streams where wscode and localcode are equivalent
      -- this is all segments with equivalent bluelinekey and a larger measure
      -- (plus fudge factor)
       WHEN
          a.wscode_ltree = a.localcode_ltree AND
          (
              (b.blue_line_key <> a.blue_line_key OR
               a.downstream_route_measure < b.downstream_route_measure + .01)
          )
       THEN TRUE
       -- next, the more complicated case - where wscode and localcode are not equal
       WHEN
          a.wscode_ltree != a.localcode_ltree AND
          (
           -- higher up the blue line (plus fudge factor)
              (b.blue_line_key = a.blue_line_key AND
               a.downstream_route_measure < b.downstream_route_measure + .01 )
              OR
           -- tributaries: b wscode > a localcode and b wscode is not a child of a localcode
              (b.wscode_ltree > a.localcode_ltree AND
               NOT b.wscode_ltree <@ a.localcode_ltree)
              OR
           -- capture side channels: b is the same watershed code, with larger localcode
              (b.wscode_ltree = a.wscode_ltree
               AND b.localcode_ltree > a.localcode_ltree)
          )
        THEN TRUE
    END

Tidy, modify, or remove function FWA_LengthInstream()

Although I don't think I'm currently using the function, the code in FWA_LengthInstream() is outdated. It is useful though, and could be used for fish habitat modelling. It should be updated to use the FWA_Upstream/ FWA_Downstream funcs directly.

Also, the func would probably be much more useful if it was generalized to be similar to those functions - simply return true if a feature is 'instream' between two points. Length can be derived from the returned features.

To find features between two locations (without creating a graph/making routing queries), find downstream path of both input locations and return only features that are not common to the two result sets:

Something like this, but with some kind of logic that ensures no results are returned when querying features that are not connected (I think this would connect features across the ocean):

WITH dnstr_a AS
(
    SELECT
      linear_feature_id,
      blue_line_key,
      downstream_route_measure,
      wscode_ltree,
      localcode_ltree,
      geom
    FROM whse_basemapping.fwa_stream_networks_sp s
    WHERE FWA_Downstream(360878058, 10524, '400.431358.815756'::ltree, '400.431358.815756.466791'::ltree,
    s.blue_line_key, s.downstream_route_measure, s.wscode_ltree, s.localcode_ltree)
),

dnstr_b AS
(
    SELECT
      linear_feature_id,
      blue_line_key,
      downstream_route_measure,
      wscode_ltree,
      localcode_ltree,
      geom
    FROM whse_basemapping.fwa_stream_networks_sp s
    WHERE FWA_Downstream(360881038, 5115, '400.431358.918528'::ltree, '400.431358.918528.104334'::ltree,
    s.blue_line_key, s.downstream_route_measure, s.wscode_ltree, s.localcode_ltree)
)

SELECT
  COALESCE(a.linear_feature_id, b.linear_feature_id) as linear_feature_id,
  COALESCE(a.geom, b.geom) as geom
FROM dnstr_a a
FULL OUTER JOIN dnstr_b b
on a.linear_feature_id = b.linear_feature_id
WHERE (a.linear_feature_id IS NULL AND b.linear_feature_id IS NOT NULL)
OR (a.linear_feature_id IS NOT NULL AND b.linear_feature_id IS NULL)

add_area_upstream references incorrect table

Point to whse_basemapping schema, not temp (used for test)

extras - add lake/reservoir/wetland upstream areas

Add to whse_basemapping.fwa_stream_networks_sp, for quicker fish habitat modelling queries:

upstream_lake_ha
upstream_reservoir_ha
upstream_wetland_ha

CUT method at measure very close to mouth of a river emptying into lake

SELECT * FROM fwa_watershedrefine(356568595, 21.6)

Too many watersheds adjacent to the double line river are selected for the cut. And cutting is not necessary in this case, the point is at the start of the river waterbody where the fundamental watersheds terminate at the same location (although the waterbody does not start at measure 0 - we may be able to assume the fundamental watersheds terminate at the same location where fwa_watershed_code = local_watershed_code but this needs checking.)

streams-assessment wsds lookup query returns non-unique linear_feature_id with latest FWA data

The query works with FTP data posted 2016-04-25 but not with the latest:

psql:sql/create_fwa_assessment_watersheds_streams.sql:27: ERROR:  duplicate key value violates unique constraint "fwa_assessment_watersheds_streams_lut_pkey"
DETAIL:  Key (linear_feature_id)=(868144294) already exists.

FWA_WatershedExBC - only create hydroshed basin when actually needed

Smoky Creek (blue_line_key=356534008) is nowhere near the border, but result returned includes all of the Kootenay drainage in the US:
https://www.hillcrestgeo.ca/fwapg/functions/fwa_watershedatmeasure/items.html?blue_line_key=356534008&downstream_route_measure=1016&limit=50

basic handling of points in YT/NWT/AB/AK using hydrosheds

As per smnorris/bcbasins#12.

A function is needed that takes a point or coordinates and returns and aggregated shape including the hydroshed it is within plus those upstream. Just take the query from FWA_WatershedExBC():

-- For streams in other areas we use hydrosheds, which has far less detail
-- Find the hydroshed at the point of interest and work upstream from there
    WITH RECURSIVE walkup (hybas_id, geom) AS
        (
            SELECT hybas_id, wsd.geom
            FROM hydrosheds.hybas_lev12_v1c wsd
            INNER JOIN (select * FROM FWA_LocateAlong(blkey, meas)) as pt
            ON ST_Intersects(wsd.geom, pt.geom)

            UNION ALL

            SELECT b.hybas_id, b.geom
            FROM hydrosheds.hybas_lev12_v1c b,
            walkup w
            WHERE b.next_down = w.hybas_id
        )
    SELECT
      'hybas_na_lev12_v1c' AS source,
      ST_Union(w.geom) as geom
    FROM walkup w;

FWA_IndexPoint() - a point outside BC returns no results

The function needs to return an indicator if a point is outside BC. Currently this only happens if there is also stream in BC within the tolerance.

finding trans-boundary streams and border - stream intersections

When finding points where streams likely cross the border with fwa_upstreambordercrossings(blkey, meas), the fuzzy result of the -75m shift and the approximate location of the border created by create_fwa_approx_borders.sql can lead to border points being created in the wrong drainage, resulting in incorrect returns from subsequent calls to fwa_upstreambordercrossings.
For example on Ewart Creek, because the height of land is right on the border, the point is created in the adjacent drainage (Wall Creek):

QGIS slow to list layers

QGIS is extremely slow to list spatial tables of general 'Geometry' type.
Work around this by specifying type, eg geometry(MultiLineString,3005)

FWA_WatershedExBC - when returning hydrosheds data, only return areas outside of BC

Clip hydrosheds data to areas outside of BC, FWA boundaries need to take priority within BC.

fwa_watershed_groups_subdivided lacks the source primary key

add watershed_group_id

add simple linear upstream/downstream functions

Rather than copying pasting the uptream/downstream logic into all the various queries.

watershed query bails on Parsnip River

See poissonconsulting/fwapgr#12

request https://hillcrestgeo.ca/fwa/v1/watershed/359567332?downstream_route_measure=10&srid=4326

returns

{
  "statusCode": 500,
  "code": "42702",
  "error": "Internal Server Error",
  "message": "column reference \"geom\" is ambiguous"
}

Python dependencies

Note python dependency for the extras in README, provide a requirements.txt and/or environment.yml and remove pgdata dependency from the extras scripts.

upgrade to postgres 13 / postgis 3.1 / geos 3.9

Upgrading to pg13/postgis3.1/geos3.9 gives significant performance benefits. The fixed precision operations should also remove the need for the lostgis ST_Safe_ functions.

Also, because the ST_Safe_ functions do not work with postgres parallelization, some of the FWA functions that currently use them may not work under postgres 13 anyway - removing these functions is a priority.

download fwa.zip

This isn't exactly an issue with your scripts but I'm having trouble downloading the fwa.zip file required before running your scripts.

When I run

wget https://geobc.s3-us-west-2.amazonaws.com/FWA.zip

I get

--2020-05-05 20:01:35--  https://geobc.s3-us-west-2.amazonaws.com/FWA.zip
Resolving geobc.s3-us-west-2.amazonaws.com (geobc.s3-us-west-2.amazonaws.com)... 52.218.251.41
Connecting to geobc.s3-us-west-2.amazonaws.com (geobc.s3-us-west-2.amazonaws.com)|52.218.251.41|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2020-05-05 20:01:36 ERROR 403: Forbidden.

Can you confirm that you are still able to download from that link?
Cheers!

add new linear referencing functions for pg_featureserv

Return a point at specified meters upstream
Return multiple points at specified distance along stream with labels (0 starting at mouth and moving up)

I think a linear function would be useful too (provided a blue_line_key and two measures, return a single feature representing the stream between them) - but that is out of scope for now.

FWA_WatershedAtMeasure - for lake watersheds, use difference of exbc data and bc boundary

Cut exbc data by bc boundary for the LAKE method as we do for others here: https://github.com/smnorris/fwapg/blob/main/sql/functions/FWA_WatershedAtMeasure.sql#L329

fwa_locatealonginterval API function fails on columbia river

blue_line_key = 356570372

https://www.hillcrestgeo.ca/fwapg/functions/fwa_locatealonginterval/items.html?blue_line_key=356570372

works on other blue_line_keys, e.g.
https://www.hillcrestgeo.ca/fwapg/functions/fwa_locatealonginterval/items.html?blue_line_key=356568123

FWA_Upstream - improve comparison of lines and points

For locations on the same blue_line_key, the function compares point locations only (at the provided measures).
This gives a weird result if considering whether point b at measure 10 is upstream of line at measure 5, with length 10. The point is at the midpoint of the line but the function currently returns true for this instance, length/upstream measure is not considered.

v0.0.1

Other tools depend on fwapg and the FWA data is being continuously updated by GeoBC. To help with stability, fwapg releases should be created, with each version linked to a specific data extract.

FWA_WatershedAtMeasure still too slow

Peace @ Ft St John takes >5min. Ok for local processing but not so good for requests via pg_featureserv:

 LOG:  duration: 313465.312 ms  statement: select * from postgisftw.fwa_watershedatmeasure(359572348, 1590599);

FWA_WatershedAtMeasure - returns no results if no watersheds upstream

For example:

# SELECT * FROM  FWA_WatershedAtMeasure(356551270, 681);
 wscode_ltree | localcode_ltree | area_ha | refine_method | geom
--------------+-----------------+---------+---------------+------
(0 rows)

Terminus of this first order watershed poly is at downstream_route_measure = 606.3m

I can't recall if this is expected behaviour based on other watersheds but it doesn't seem right in this case.

All streams have gnis_name IS NULL WHERE watershed_group_code='DRIR'

This is a data error that has been reported to GeoBC.
I'm only noting the issue here as a reminder to run an extract when fixes are available.

upstream area / upstream length queries - speed and disk space issues

Rather than calculate as needed, just include it in the table.

encoding of gnis_name in FWA_NAMED_WATERSHEDS

Encoding of NAMED_WATERSHEDS_POLY column gnis_name in FWA.gpkg file dumped from the warehouse is off.

Appears to be an issue for that layer only, the gnis_name Barrière River comes through fine in the streams table.

$ ogrinfo FWA.gpkg -sql "SELECT gnis_name FROM FWA_NAMED_WATERSHEDS_POLY WHERE GNIS_NAME LIKE 'Barri%'"
INFO: Open of `FWA.gpkg'
      using driver `GPKG' successful.
Layer name: SELECT
Geometry: None
Feature Count: 7
Layer SRS WKT:
(unknown)
GNIS_NAME: String (80.0)
OGRFeature(SELECT):0
  GNIS_NAME (String) = Barri� River

extras - calculate upstream area for watersheds

Current script populates a column in the streams table. Because upstream of each watershed is needed as well, it is probably more efficient to calculate upstream area for each watershed and work with that.

clean outputs from hydrosheds

Some cleaning of the aggregated output might be required, there are some slivers in the test shape. Fixing #45 might be enough though.

tests

A few basic tests for the various fwa_ functions would be valuable.

FWA_WatershedAtMeasure - add tests

Expect more changes to this func so add a simple test to make sure it is returning something consistent in the common use cases.

FWA_WatershedAtMeasure fails on 0 measures

poissonconsulting/fwatlasbc#16

# select fwa_watershedatmeasure(356528119, 0);
ERROR:  more than one row returned by a subquery used as an expression
CONTEXT:  PL/pgSQL function fwa_watershedatmeasure(integer,double precision) line 26 at RETURN QUERY

remove data fixes for watershed lookup

George has applied the fixes and they are available in the warehouse.
On next extract, remove fixes from https://github.com/smnorris/fwapg/blob/main/04_assmnt_wsds_lut.sh#L20

FWA_WatershedAtMeasure() ERROR: 2nd arg must be smaller then 3rd arg

Halfway River @ 90536m

# SELECT * FROM fwa_watershedrefined(359570578, 90536);
ERROR:  2nd arg must be smaller then 3rd arg
CONTEXT:  PL/pgSQL function fwa_watershedrefined(integer,double precision) line 21 at RETURN QUERY

FWA_NearestStream name

This name is still a bit misleading - the function doesn't return the stream geoms it returns the nearest point geom on the nearest stream(s). FWA_IndexPoint is probably better, like the EPA service (https://www.epa.gov/waterdata/point-indexing-service)

functions missing

Not all required functions are currently being added. Review and tidy.

FWA_WatershedAtMeasure - point within river poly with no associated fundamental watersheds

The function tries to cut watersheds that aren't present.
Eg hydrometric station 10CD005 at Adsett Creek blue_line_key = 359565544, downstream_route_measure=6715

FWA_WatershedAtMeasure output type

Currently the function RETURNS TABLE. This makes sense for the typical case where just requesting a single watershed, especially via the current api.

When using the db directly, it would be nice to use the function to generate watersheds within a more general/typical context.

eg - I have a bunch of IDs and want a set of results with watersheds for each:

SELECT
 stream_crossing_id,
 downstream_route_measure,
 blue_line_key,
 FWA_WatershedAtMeasure(blue_line_key, downstream_route_measure)
FROM bcfishpass.crossings
WHERE stream_crossing_id in (50152,62515,50181,50185,50261,62423,62516,197533,197534,197542,197555);

This gives a result like this, the output from the function is a nested table / values with watershed codes, refine method and geom:

62515	112	356547712	(300.625474.584724.233916.539602,300.625474.584724.233916.539602,444.72,DEM,<geom>)

Returning the watershed codes and area isn't really necessary, but returning the refine method as well as the output geom definitely is - perhaps return an array holding these two values rather than a table would be easier to work with?

postgres 12

https://www.postgresql.org/about/news/1943/

could use generated columns for ltree watershed codes rather than creating with ogr2ogr and altering table
with planner inlining CTEs, tidying of some queries is probably possible

FWA_Upstream/Downstream - tolerances and returning features at equivalent measures

Returning all streams upstream of a point location where there is a break in the stream network is a common query.
Currently, the stream feature at the measure provided will not be returned by FWA_Upstream() because measures of the stream and point are equivalent (+/- the provided tolerance). There should to be an option to return features with equivalent measures rather than tacking them on with another subquery.

FWA_WatershedExBC - simplify trans-border query when using hydrosheds data

blue_line_key = 360887232
downstream_route_measure = 8412

Result includes entire Atlin/Tagish drainage, should just be Tutshi Lake. This is probably a trans-boundary data quirk?