Coder Social home page Coder Social logo

stactools's Introduction

stactools

Build Status Documentation PyPI version Conda (channel only) License

stactools is a high-level command line tool and Python library for working with STAC. It is based on PySTAC.

This is the core stactools repository, which provides a basic command line interface (CLI) and API for working with STAC catalogs. There are a suite of packages available in other repositories for working with a variety of datasets and for doing more complicated operations on STAC data. See packages for more information.

Table of Contents

Installation

To install the latest version via pip:

pip install stactools

To install the latest version with conda:

conda install -c conda-forge stactools

To install the latest development version from the source repository:

git clone https://github.com/stac-utils/stactools.git
cd stactools
pip install .

NOTE: In order to read and write Cloud Optimized Geotiffs, GDAL version 3.1 or greater is required. If your system GDAL is older than version 3.1, consider using Docker or Conda to get a modern GDAL.

Optional dependencies

stactools includes two optional dependency:

  • s3: Enables s3 hrefs via fsspec and s3fs
  • validate: Enables stac validate and stac lint

To install optional dependencies:

pip install 'stactools[s3]'
pip install 'stactools[validate]'

Docker

To download the Docker image from the registry:

docker pull ghcr.io/stac-utils/stactools:latest

Running

stac --help

Running from docker

docker run --rm ghcr.io/stac-utils/stactools:latest --help

Documentation

See the documentation page for the latest docs.

Packages

stactools is comprised of many other sub-packages that provide library and CLI functionality. Officially supported packages are hosted in the Github stactools-packages organization, and other subpackages may be available from other sources.

There are over 25 packages that translate specific types of data into STAC, including imagery sources like aster, landsat, modis, naip, planet, sentinel1, sentinel1-grd, sentinel2, sentinel3, landuse/landcover data (corine, cgls_lc100, aafc-landuse), Digital Elevation Models (DEMs) (cop-dem, alos-dem), population data (gpw, worldpop), pointclouds and many more.

There are also cool tools like stactools-browse which makes it super easy to deploy a STAC Browser from the command line to browse any local data.

For the list of officially supported packages see the list of STAC packages on the stactools-packages GitHub organization. Each package can be installed via pip install stactools-{package}, e.g. pip install stactools-landsat. Third-party packages can be installed in the same way, or, if they are not on PyPI, directly from the source repository, e.g. pip install /path/to/my/code/stactools-greatdata.

Developing

Clone the repository and install it in editable mode with the dev optional dependencies:

git clone https://github.com/stac-utils/stactools.git
cd stactools
pip install -e '.[dev]'

Linting and formatting are handled with pre-commit. You will need to install pre-commit before committing any changes:

pre-commit install

Tests are handled with pytest:

pytest

Run a Juypter notebook:

scripts/notebook

Using docker

You can also develop in a Docker container. Build the container with:

docker/build

Once the container is built, you can run the scripts/ scripts inside a docker console by running:

docker/console

A complete build and test can be run with:

docker/cibuild

In scenarios where you want to run scripts in docker/ but don't want to run the build, images can be downloaded via the pull script:

docker/pull

Run a Juypter notebook:

docker/notebook

You can run the CLI through docker by running:

docker/stac --help

Using conda

conda is a useful tool for managing dependencies, both binary and Python-based. If you have conda installed, you can create a new environment for stactools development by running the following command from the top-level directory in this repo:

conda env create -f environment.yml

Then activate the stactools environment:

conda activate stactools

Finally, install stactools in editable mode and all development requirements:

pip install -e '.[dev]'

Developing the docs

To build and serve the docs, the development requirements must be installed with pip install -e '.[docs]'. To build the docs, you can use make html from inside of the docs directory, and to build the docs and start a server that watches for changes, use make livehtml:

cd docs
make html
make livehtml

If using make livehtml, once the server starts, navigate to http://localhost:8000 to see the docs. Use 'make' without arguments to see a list of available commands.

You can also run the previous commands in the docker container using:

docker/console

Code owners and repository maintainer(s)

This repository uses a code owners file to automatically request reviews for new pull requests. The current primary maintainer(s) of this repository are listed under the * rule in the CODEOWNERS file.

Adding a new package

To create a new stactools package, use the stactools package template. stactools utilizes Python's namespace packages to provide a suite of tools all under the stactools namespace. If you would like your package to be considered for inclusion as a core stactools package, please open an issue on this repository with a link to your package repository.

Releasing

See RELEASING.md for the steps to create a new release.

stactools's People

Contributors

alexgleith avatar carioca-au avatar cholmes avatar constantinius avatar cuttlefish avatar dependabot[bot] avatar gadomski avatar geomatician avatar jbants avatar jonhealy1 avatar jpolchlo avatar jsignell avatar justinfisk avatar lossyrob avatar pflickin avatar philvarner avatar pjhartzell avatar rushgeo avatar sharkinsspatial avatar sunu avatar thomas-maschler avatar tomaugspurger avatar tomiiwa avatar volaya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stactools's Issues

Use a 'via' rel to link back to the Planet data API url that is equivalent.

For RC1 release we will add a 'via' rel, see https://github.com/radiantearth/stac-spec/blob/dev/best-practices.md#using-relation-types

This is perfect to point back at the URL of the data API of Planet, as the original, online source of the STAC record. We can use the 'via' without needing the new STAC release, as you can put anything in 'rel' links.

So for each Item we should construct the data API url from the item_type and ID, and put it in the links with rel = via.

Add eo:band information to Planet skysat output

It'd be good to include eo:band information in any Planet output, so that STAC/COG readers can get the proper band ordering automatically.

Planet has flown a number of different filters on their dove constellations, so doing this really well would likely involve particular lookups by satellite id, getting the exact band info for each filter flown, and doing the whole mapping. A simpler approach that is likely 'good enough' would be to just use the instrument id and use those.

Skysat may be easier, and is likely where we should start on this.

Better description of UDM in Planet conversion / output.

The new file extension has an example of using a mapping object to describe a cloud cover mask. Seems like it'd be useful to include that information about Planet's cloud mask.

Info at https://developers.planet.com/docs/data/udm-2/ - hrm, it looks like we use one band per class, instead of one band with all classes in it, so perhaps it'd be better to describe it with a 'bands' object, even if there are no common names. Regardless, it'd be good to have more description of the UDM asset in STAC.

publishing capabilities

It could be nice if there were easy ways to 'publish' a STAC catalog, onto AWS/GCP/Azure. Perhaps it'd be a publish command for each cloud. I'd expect it'd just use their CLI's / client libraries, but it'd be a short cut to make sure that it's a published catalog (default to absolute, option for relative), and to upload all the assets to the cloud. More advanced would be to have options to keep a cloud catalog in sync, like an 'update' option.

Display the name of the file where a validation error occurred.

stac tools is useful for doing validation, but it can be hard to see exactly where things went wrong.

I'm using stac info (maybe there's a better one to use), and on a 3 item catalog I get:

Traceback (most recent call last):
  File "/Users/cholmes/.local/bin/stac", line 8, in <module>
    sys.exit(run_cli())
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/cli/cli.py", line 40, in run_cli
    cli(prog_name='stac')
  File "/Users/cholmes/.local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cholmes/.local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/cli/commands/info.py", line 49, in info_command
    print_info(catalog_path)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/cli/commands/info.py", line 23, in print_info
    for item in items:
  File "/Users/cholmes/.local/lib/python3.7/site-packages/pystac/stac_object.py", line 372, in get_stac_objects
    link.resolve_stac_object(root=self.get_root())
  File "/Users/cholmes/.local/lib/python3.7/site-packages/pystac/link.py", line 166, in resolve_stac_object
    obj = STAC_IO.read_stac_object(target_href, root=root)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/pystac/stac_io.py", line 130, in read_stac_object
    d = cls.read_json(uri)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/pystac/stac_io.py", line 108, in read_json
    return json.loads(STAC_IO.read_text(uri))
  File "/opt/salt/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/opt/salt/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/salt/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 54 column 3 (char 1122)

Very useful to tell me the exact place in the file that is messed up. But I don't see which file it is, I have to hunt through all of them. Some chance this is user error, but it'd be good to pop up the file where things messed up. Or maybe it's easier to just make a stac validate command that can be more informative? And then errors on other operations can tell you to use stac validate.

Add 'roles' to Planet conversion

Currently converting from Planet does not use the new asset 'roles'. I'm not sure if stactools does in general, it just doesn't show up since Planet is not using thumbnails (#46). But it'd be good to use the best practice.

public-datasets repo

Work has been started in https://github.com/developmentseed/public-datasets by @vincentsarago and @kylebarron to create an end-to-end solution for indexing aws public datasets in a STAC API. I propose that we merge the two projects into one to consolidate the code and to prevent creating the same catalogs in two slightly different ways. Interesting enough, the structure and intent of the two projects are almost identical.

I won't pretend to know the best way to do this. I'm not an owner of either repo. But wanted to create a space where we could discuss the possibility of merging the two.

Option to generate extents per catalog

I believe stactools generates the extents of the collection, but it doesn't seem to for the catalogs. These show up in stac browser, and are informative as to the area that you are looking at. It would be nice if there was an option to generate extents for catalogs.

Implement templating for easier modifications of metadata like descriptions

In radiantearth/stac-spec#986, @schwehr outlines a technique to use jsonnet to implement templating, which would allow users who are not necessarily Python devs to be able to effectively edit metadata that is then used to generate STAC Catalogs, Collections and Items.

I think this would be an effective technique to use in stactools. If we had jsonnet templates for the collections and items that were used to generate those objects, then users could make pull requests against stactools to update those templates in case there's any metadata errors or additions. There could be a core function that would take an object, say a Collection, and a template, and then update the collection based on any template values as a way to update the information.

For example, as someone who maintains a STAC API, I could:

  • Generate the collection and items from stactools for dataset X
  • Someone from the community notices an issue with the metadata, perhaps a misspelling in the description
  • Someone makes a PR to the jsonnet template to update that collection.
  • After merge, I run a workflow that reads the collection out of my STAC API, applies the template to update the values, and writes it back.

eo:band object w/ common names in for Planetscope data

With #40 we got band info for sksyat.

We were going to wait on full band information (center wavelength and fmhw) for Planet scope, but let's not let great be the enemy good. It'd be very valuable to have band objects for Planetscope, and they just need to have 'common name', to report bgrn for analytic stuff and rgb for the visual assets.

(in the future it may be pretty easy to add dove-r and superdove bands, but I'll do that in its own issue).

Issues with move-assets command

I'm trying to make a completely relative catalog so that I can easily zip it up and send to someone.

One route I attempted was to use the move-assets command, but it never quite worked right. It's seemingly failed in a few different ways, I'm sure some of which are user error, but it's preventing me from making a super clear bug report.

The command that seemed to work the best was:

stac move-assets -h RELATIVE -s assets peru-t2/collection.json  

It actually moved things to the /assets directory. But the links to the assets from the item weren't updated at all. I was hoping that those would update.

I didn't actually care about the sub-directory, but for some reason it seemed to make things work a bit better. My other errors were on: stac move-assets -h RELATIVE peru-t2/collection.json. I got an error that said 'UnboundLocalError: local variable 'op' referenced before assignment'

Bottom of the stack trace is:

  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/cli/commands/copy.py", line 30, in move_assets_command
    copy=copy)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/core/copy.py", line 201, in move_all_assets
    ignore_conflicts)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/core/copy.py", line 166, in move_assets
    ignore_conflicts=ignore_conflicts)
  File "/Users/cholmes/.local/lib/python3.7/site-packages/stactools/core/copy.py", line 115, in move_asset_file_to_item
    if op is not None:

My other attempt was: stac move-assets -h RELATIVE -c peru-t2/collection.json, which got me:

Usage: stactools move-assets [OPTIONS] CATALOG_PATH
Try 'stactools move-assets --help' for help.

Error: Missing argument 'CATALOG_PATH'.

Maybe I'm misunderstanding what the command is supposed to do, but the main thing I want out of it is to transform all my asset paths that refer to local file locations, as absolute paths, to be relative, so I can easily zip and send it.

More curated Planet field output

Right now any Planet data API field that is not translated into stac gets dumped out with a pl: prefix. I originally had a more 'curated' approach of not including less valuable / legacy fields from Planet's data api in or STAC output. So I think we should switch from dumping everything, to more of a 'whitelist' - if it's one of the values we want in STAC then include it, if not just drop it.

Ones included from the Planet JSON should be:

anomalous_pixels
ground_control
item_type
pixel_resolution
quality_category
strip_id
publishing_stage
clear_percent

Note columns and rows are going away because the same info should be in proj:shape #39

All the rest should be ignored - many of them were before there were equivalent stac concepts, and then they also just have a lot of percents based on the UDM, which don't seem too necessary. If users ask we can add them later.

To be clear, the ones

Add Sentinel 1 subpackage

I'd love to get a STAC catalog generated for the Sentinel-1 RTC public dataset covering CONUS in AWS https://registry.opendata.aws/sentinel-1-rtc-indigo/

I put together the framework for this over in the stac-sentinel repo, but it seems like the current recommendation is to add a submodule here:
stac-utils/stac-sentinel#1 (comment)

I'm thinking of trying to follow the sentinel2 subpackage closely. For starters this would just cover the RTC dataset, but there are others out there that could likely be added later (e.g. GRD, SLCs https://registry.opendata.aws/sentinel-1/). Doesn't seem like anyone else is actively working on this right @matthewhanson @lossyrob ?

Add linting various use-cases

Over in stac-vrt, we rely on the presence of some extensions to efficiently build VRTs from STAC metadata. This issue is a request for two things

  1. Write a tool (CLI) to check that a STAC item contains the extended data needed by stac-vrt.
  2. Verify that the STAC items generated by stactools include the extended data (as a linting check, that can optionally be skipped when it's not applicable / desired for whatever reason).

https://stac-vrt.readthedocs.io/en/latest/#building-stac-vrt-compatible-stac-items has the list of items currently required by stac-vrt.

Option to specify catalog type in more places

This is probably a useful thing for other 'convert' commands, but this is the one I'm using the most.

Most of my workflows involve eventually sending the catalog out, either uploading to google cloud or putting into a zip to send someone. It'd be nice if I could just continually work with self-contained catalogs, instead of having to remember to do a copy with -t. This hits me most with planet convert, but also probably in merging (or my guess is it might).

Allow more customization of 'stac browse' command

In working on implementing a Sentinel-1 subpackage I came across an issue with the current implementation of stac browse which is limited to RGB assets it seems, leading to log errors like
JPEG driver doesn't support data type UInt16. Only eight bit byte bands supported. described in more detail here #84 (comment)

I spent some time familiarizing with how stac browse is using titiler behind the scenes, with configuration here:

- TILE_SOURCE_TEMPLATE=http://localhost:8000/cog/tiles/{z}/{x}/{y}?url={ASSET_HREF}

S1 assets are single-band dtype Uint16 or Float32, so a client has to specify a rescale parameter
https://api.cogeo.xyz/cog/tiles/8/38/88?url=https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/U/CU/2017/S1A_20170101_10UCU_ASC/Gamma0_VV.tif&rescale=0,0.5

It would be great for stac browse to support all the endpoint parameters. In particular rescaling with expressions is extremely useful for SAR (for example converting power -> amplitude -> dB to modify dynamic range for visualizations)
https://api.cogeo.xyz/cog/preview?url=https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/U/CU/2017/S1A_20170101_10UCU_ASC/Gamma0_VV.tif&expression=sqrt(b1)&rescale=0,1

What would be particularly nice would be to use the titiler /stac endpoint instead of /cog to extract default expressions and rescale ranges that could be stored in the metadata. cc @vincentsarago

transform_stac_to_stac with wrong blue band

I have identified the issue while I was trying to transform some ST 0.7 stac to 1.0. Promptly I fixed the library and created a new PR.

The issue already fixed through the following PR. Could you review, approve and release a new version, please?

#91

Thank you very much.
Rodrigo Carvalho
GA developer

Add logic to utilize MTL file in landsat subpackage

#23 adds a command to migrate and fixup the collection 2 STAC for Landsat 8 (thank you Alex!)

The usage of MTL and other sidecar files is still unimplemented. There's a lot of great discussion here about the challenges and potential for MTL conversion.

This issue is to determine the need and potential for using the MTL and other files to derive STAC data for landsat; if we should, implement it, and if it's not a good choice, remove the MTL codepath in the subpackage.

transform_stac_to_stac

Good afternoon,
I'm hoping to use the Landsat STAC transform from this PR: #23 is there any chance we could help create a new release?

Option to generate summaries for a catalog/collection in CLI

It'd be great if it was much easier to generate summaries. This is tracked in pystac at stac-utils/pystac#178

In the command line I could see a dedicated 'generate summary' command, that takes a catalog or collection as an argument and updates it to have a summary of all its child elements. It should have an option to generate 'all' (at least the ones that make sense), or take a list of the properties to summarize.

I think it could also make sense to add it as an option to some of the commands, like copy and merge.

We might even consider making summaries default to true, as it's a good practice. Probably should see how they actually work first, as I worry a bit about generating really large summary lists. But once we get some sensible rules in for when/how to generate the summaries (like don't do them if it's more than 5 or 10 distinct values in a property, or only do 'known' STAC extension properties by default) then perhaps we make them default.

Include reflectance coefficients and license information

For Planetscope data we include an XML metadata file that is mostly redundant information, but does include two interesting and useful pieces:

  • Reflectance coefficients, per band. We need to figure out the structure in eo:bands for where to put per band information, but once we do we should add it.
  • License information - the xml file seems to have a link to the actual license text, so we should make use of that in the license link.

Enable writing remote HREF links, but saving locally

There's a common use case where a STAC will be copied to a remote location as either RELATIVE_PUBLISHED or ABSOLUTE_PUBLISHED, and it would be convenient to write files locally before copying them to a remote location.

For instance, if you'll be serving an ABSOLUTE_PUBLISHED catalog at https://example.com/catalog.json, you may want to write out the STAC files locally that have all links valid for the final destination, and then use file copying to write those files from the local machine up to their final destination.

The goal of this issue is to add stactools CLI commands to allow for that use case to be easily accomplished.

Set href title based on Catalog title

For nicer STAC Browser navigation it'd be great if on creating a new STAC if the link 'title' attribute of 'child' links was set to be the same as the 'title' of the catalog that is linked to. Could do similar things if title is set on the item. See:

Screen Shot 2020-12-16 at 9 17 02 AM

The first I set the title on, the other two don't have title set on the link. But they all have nice titles after you click on the links. Could be an option for 'copy' command, to 'update titles' (perhaps default to true), or a special command that you run. I'm open on the approach, it just strikes me that if you're working with these tools it has the ability to access that information and make it all work together.

Add more file extension fields to Planet output

It'd be good to make use of at least some of the file extension fields in Planet output. Most of these will be new for RC1, so may need pystac improvements to be able to do them.

Top ones to use I think are: data_type, size and nodata. I'm not sure exactly how to calculate them. It might be useful to try to do more of the fields too, like bits_per_sample, byte_order, checksum...

The fields could also be interesting for the UDM, but I'll put that in its own issue.

Sentinel-2 band assets with multiple spatial resolutions

Hi all,

most bands of Sentinel-2 are provided in multiple spatial resolutions (10m, 20m, 60m). The current implementation does not handle this - only the latest asset in the list of image files is listed in the STAC document (the previous ones are overwritten because of the same asset key). I guess this is not intended.

The variable band_id returned should be unique - at least when describing the original Sentinel-2 dataset, e.g. B02-10m (similar to the auxiliary images, such as visual-10m, visual-20m, visual-60m). This would reference all image files available. Otherwise only the image file with the highest spatial resolution (not the latest in list) could be referenced as an asset.

band_id_search = re.search(r'_(B\w{2})_', asset_href)
if band_id_search is not None:
band_id = band_id_search.group(1)
band = SENTINEL_BANDS[band_id]
asset = pystac.Asset(href=asset_href,
media_type=asset_media_type,
title=band.description,
roles=['data'])
item.ext.eo.set_bands([SENTINEL_BANDS[band_id]], asset)
set_asset_properties(asset)
return (band_id, asset)

Best
Jonas

Documentation for development environment and pip editable install

I'm not accustomed to working with subpackages and pip so maybe I'm missing something, but I spent a while trying to setup a development environment for #84. Wanted to document some things here, and could follow up with some additions to documentation. I think this is also related to discussion in #95.

For Python package development I'm accustomed to installing in 'editable' mode (pip install -e) into a virtual environment. For example:

conda create -n stactools-dev python 
conda activate stactools-dev
git clone https://github.com/stac-utils/stactools.git
cd stactools
pip install -e . 
# Successfully installed stactools-0.1.4

If I try to run command line tools (scripts/stac --help), I end up with a traceback. This comes from the threedep subpackage, which I was not intending on installing or working with:

Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/scott/Downloads/stactools/stactools/cli/__init__.py", line 27, in <module>
    registry.load_plugins()
  File "/Users/scott/Downloads/stactools/stactools/cli/registry.py", line 29, in load_plugins
    discovered_plugins = {
  File "/Users/scott/Downloads/stactools/stactools/cli/registry.py", line 30, in <dictcomp>
    name: importlib.import_module(name)
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/scott/Downloads/stactools/stactools/threedep/__init__.py", line 3, in <module>
    from stactools.threedep.metadata import Metadata
  File "/Users/scott/Downloads/stactools/stactools/threedep/metadata.py", line 14, in <module>
    from stactools.threedep import utils
  File "/Users/scott/Downloads/stactools/stactools/threedep/utils.py", line 3, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'

But to get around it installed the threedep subpackage: pip install stactools_threedep and then re-run scripts/stac --help

Traceback (most recent call last):
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 147, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/Users/scott/miniconda3/envs/stactools-test/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/scott/Downloads/stactools/stactools/cli/__init__.py", line 27, in <module>
    registry.load_plugins()
  File "/Users/scott/Downloads/stactools/stactools/cli/registry.py", line 37, in load_plugins
    register_plugin(self)
  File "/Users/scott/Downloads/stactools/stactools/aster/__init__.py", line 11, in register_plugin
    from stactools.aster import commands
  File "/Users/scott/Downloads/stactools/stactools/aster/commands.py", line 9, in <module>
    from stactools.aster.cog import create_cogs
  File "/Users/scott/Downloads/stactools/stactools/aster/cog.py", line 14, in <module>
    from stactools.aster.xml_metadata import XmlMetadata
  File "/Users/scott/Downloads/stactools/stactools/aster/xml_metadata.py", line 9, in <module>
    from stactools.core.io.xml import XmlElement
  File "/Users/scott/Downloads/stactools/stactools/core/io/xml.py", line 4, in <module>
    from lxml import etree
ModuleNotFoundError: No module named 'lxml'

Strange b/c lxml is a dependency of stactools_core. glancing at the logs, I'm surprised that subpackages are not installed from local source, and instead are downloaded, not sure if that is related to lxml not ending up in the environment?

Obtaining file:///Users/scott/Downloads/stactools
Collecting stactools_core==0.1.4
  Downloading stactools_core-0.1.4-py3-none-any.whl (7.3 kB)

workaround

What I'm currently doing is looping through all the subdirectories, maybe there is a better way?

conda create -n stactools-dev python 
conda activate stactools-dev
git clone https://github.com/stac-utils/stactools.git
cd stactools
pip install -e .
for dir in stactools_*; do (cd "$dir" && pip install -e .); done

After that I'm able to modify code (create a sentinel1 package and run terminal commands like stac sentinel1 create-item --help)

Add method for generating data footprints

I posted this #84 (comment) for Sentinel-1 and #83 for Landsat, but seems like it would be a good addition to stactools to be able to generate footprints when there are none provided with the original metadata.


I spent some time looking into this for Sentinel-1 a while back and used rasterio to get the footprints. I also tried using overviews to generate the footprints so it would be faster and wouldn't require reading as much of the file, but it really worked quite a bit better using the full resolution file.

The resulting geometry tends to be very large and way too big, so you have to simplify it. It's a balancing act to simplify it enough to you still have the detail needed while keeping the geometry size small.

You can see in the gifs below the difference between the provided geometry (blue) and the one calculated with the above code (red)

This is a typical image and the provided boundary overestimates the area cross-track, and underestimates it along-track.

sentinel1-boundaries-small

This is a more unusual collect that is along the water where some of the water was masked out in the data, and we see the situation is even worse. The generated footprint in red is far better.

sentinel1-boundaries-odd-small

There might be edge cases where this would fail, thus before going operational I think it should be used to generate a few hundred boundaries to be visually inspected.

If the image were already downloaded or locally available, it's not that bad to run this processing and certainly worth it for the better footprint.

The code is here:
https://gist.github.com/matthewhanson/6be66c97c828acd1d39d8cbb97a0981e

Command / module to 'cog-ify' relevant assets

Often I'll download data, make a catalog from it, and then realize it's in a non-cloud-optimized format. It would remove a lot of pain if there was a command that would crawl the entire stac catalog, convert the relevant assets to COG's, and update the assets if needed (like if it went from jp2 to COG, or I suppose just making sure the mime type is write). Ideally the tool would also just 'report' on items that are not COG, before deciding to run it.

This would need to depend on GDAL or rio-cogeo, so probably best in its own module.

Convert pointcloud .las to STAC

It'd be great to have a stactool that can convert from a pointcloud in the las format to a STAC item. And perhaps a way to handle a directory of point clouds and convert that to a catalog. May also be interesting to have an explicit 3dep one, if it has particular metadata.

Edit properties by catalog - bulk edit

Often when working with a catalog I'd like to add a new field to all of the items in it. It would be nice if I could run a stac command that sets a value for a property for all the items beneath the catalog.

stac copy with SELF_CONTAINED still has absolute links for assets

The main goal I'm trying to accomplish here is to be able to make a stac catalog that has asset links that are all relative, so I can upload to the cloud or zip up and send to someone. One route I tried was move-assets, and my problems there are in #30.

The other thing I was hoping would help was to do stac copy -t SELF_CONTAINED. I assumed that it would make it so all the asset links would turn relative. This didn't happen, they all just stayed absolute, even if I included the assets in my copying. I also tried RELATIVE_PUBLISHED, and it didn't work either.

Maybe I'm expecting too much, and the assets should have their links updated with move-assets, but if so then we need to figure out #30, and then should also put a note here.

Copy or merge catalog as a child

A use case I've been encountering is that I've created a small catalog or collection, and I'd like to 'add' it to my larger catalog. It'd be great if there was a command that let me add this catalog as a 'child' in the right catalog/collection that I choose. I think this could be a stac copy where you add a 'parent' catalog as an argument, and it copies the full catalog 'underneath' it. Or it could be a stac merge where you specify that the catalog being copied should be a child of your destination, instead of merge into it.

Add view:azimuth to Planet conversion

I originally left the satellite_azimuth field in Planet off of view, as I wasn't sure it was the same. @matthewhanson confirmed online that they should be the same.

So we should stop doing pl:satellite_azimuth, and just use view:azimuth - just using the exact same value.

STAC Command to redo catalogs based on directory

The stac layout command is great, but often the organization scheme I want is not a clear property - it's more of something derived spatially.

As a user I would like to be able to take an existing set of STAC items and re-organize them in the end folder structure I want, and then have a STAC command that crawls the directory structure and makes a new catalog derived from it. Obviously the user would have to get the assets right - either copy them right, or else refer to them in an alternate location. The command would just ignore the existing catalog structure, and create new catalogs based on the items that are contained in my new folder organization.

(An alternate approach to this problem could be cool, which would be a command that adds a 'country' or 'location' property that is a string, made from a reverse geocode of the location, and then it could just work with the existing layout directory.)

Add a subcommand to print a summary of static catalog

From @cholmes via Gitter:

'catalog summary'. Reports back number of items in the whole catalog, and number of collections.
Perhaps items per collection, but not necessary as I could just point the CLI at a collection directly to get that (since it's also a catalog).

This could be implemented as an info command:

> stactools info /some/catalog.json

a la gdalinfo and friends.

Add proj:shape and proj:transform to Planet output

It'd be good to include the proj:shape and proj:transform fields in planet data output. Planet's columns plus origin_x and origin_y communiate the same information. I believe/hope we could get the shape and transform from those values (though how to do so is beyond me). If not then perhaps we could look at the planet data referenced, and/or I can work to get Planet to add it to their data api.

Enable stactools to work with STAC APIs

Currently stactools and the stac CLI only work with static catalogs. This issue marks the need for each command, where appropriate, to work with a STAC API endpoint. Each command will require thinking through how to interact with the STAC API to generate similar functionality, and deciding if the functionality can be baked into the existing command or needs it's own STAC API-specific command.

STAC Command to add 'location string'

It'd be cool if there was a stac add-location command that would use a reverse geocoding service to take all the items in a catalog take their geometries and add a string or strings to the properties. Ideally the user could specify if they wanted country, region, place, and if they wanted them as each their own property or concataned into one string.

Idea from #19

Failing to transform stac 0.7 to 1.0

The process was failing due neither B10 nor B2 were found within Landsat 5 stac.
The issue is addressed in the following PR, which adds B6 as a fallback to B10.
#96

As soon as this change is added to a new version, 0.1.5 I suppose, I'll be adding it to our project.

Thank you for your attention and time.

Better options to reorganize catalogs / items

As a user setting up a small catalog I often want to group things into catalogs in particular ways. The defaults and the automated 'templates' are nice, but I have some sets of images that really belong in one 'folder' with no clear template to generate it.

I'd like some way to be able to reorganize the structure easily. Today I'll manually move files, but then I have to correct a lot of links Potential ideas:

  • Add a 'move item' command, which takes an item json and a destination catalog, and puts the item in the new catalog, updating all the relevant links.
  • Add a 'add item' command, so I can move the item json manually and then 'reconnect' it, ignoring its links. Could also have a 'remove item' command, to remove the links

#19 would also likely satisfy, but this is a more granular approach.

Landsat's `transform_stac_to_stac` fails when missing some expected assets

@gadomski I have changed the code and now we don't test the band or image anymore, instead, we get all the information from the first GeoTIFF image available.
Also, I fixed the test that previously was returning an error due to the fact that the first image wasn't accessible.

Could you please have a look, review, and merge that for me?

Any extra information, please let me know.

Thank you

PR=> #100

Add options to skip item counts in stac info

Add an option to the stac info command to skip counting items and to just collect info on Catalogs and Collections, to allow for speedy information to be reported on larger catalogs.

What should go in a subpackage's requirements.txt?

As raised in #84 (comment) and the subsequent discussion, it's not clear how stactools subpackages should define their dependencies. Should subpackages explicitly call out every third-party package they use (e.g. rasterio), even if these packages are also required in stacutils_core?

I've pulled out the main arguments below. Once we resolve the question, we should

  1. Update all subpackages to conform, and
  2. Document the decision in https://github.com/stac-utils/stactools#adding-a-new-sub-package

All third-party package dependencies

As @kylebarron said (#84 (comment)):

It makes it more complicated if you have to remember "well we import rasterio, but use it as a side effect of stactools-core, so if we ever change that dependency in stactools-core, we'll have to remember it has changes here"

Only dependencies not included in stactools_core

I thought (#84 (comment)):

These packages are tightly coupled such that stactools_sentinel1 is never going to live in an environment w/o stactools_core.

also

By repeating the dependency multiple places in the same environment, you might inadvertently overwrite a pinned dependency, e.g. the bare rasterio dependency in this PR (#84) could do a major/minor rasterio bump over the pinned version in stactools_core.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.