nasa / harmony-py Goto Github PK

Python client library for working with NASA’s Earth observing system data using Harmony. https://harmony.earthdata.nasa.gov

License: Other

Jupyter Notebook 17.26% Python 81.97% Makefile 0.78%

harmony-py's Introduction

harmony-py

Harmony-Py is a Python library for integrating with NASA's Harmony Services.

Harmony-Py provides a Python alternative to directly using Harmony's RESTful API. It handles NASA Earthdata Login (EDL) authentication and optionally integrates with the CMR Python Wrapper by accepting collection results as a request parameter. It's convenient for scientists who wish to use Harmony from Jupyter notebooks as well as machine-to-machine communication with larger Python applications.

We welcome feedback on Harmony-Py via GitHub Issues

Using Harmony Py

Prerequisites

Python 3.8+

Installing

The library is available from PyPI and can be installed with pip:

    $ pip install -U harmony-py

This will install harmony-py and its dependencies into your current Python environment. It's recommended that you install harmony-py into a virtual environment along with any other dependencies you may have.

Running Examples & Developing on Harmony Py

Prerequisites

Python 3.7+, ideally installed via a virtual environment

Installing Development & Example Dependencies

Install dependencies:
```
 $ make install
```
Optionally register your local copy with pip:
```
 $ pip install -e ./path/to/harmony_py
```

Running the Example Jupyter Notebooks

Jupyter notebooks in the examples subdirectory show how to use the Harmony Py library. Start up the Jupyter Lab notebook server and run these examples:

The Jupyter Lab server will start and open in your browser. Double-click on a notebook in the file-browser sidebar and run the notebook. Note that some notebooks may have cells which prompt for your EDL username and password. Be sure to use your UAT credentials since most of the example notebooks use the Harmony UAT environment.

    $ make examples

Developing

Generating Documentation

Documentation on the Read The Docs site is generated automatically. It is generated by using sphinx with reStructuredText (.rst) and other files in the docs directory. To generate the docs locally and see what they look like:

    $ make docs

You can then view the documentation in a web browser under ./docs/_build/html/index.html.

IMPORTANT: The documentation uses a notebook from the examples directory rendered as HTML. If you've modified that notebook (see Makefile for notebook that is currently rendered), you will need to run make docs locally. You will see a change to the docs/user/notebook.html file after doing so. This file should be committed to the git repo since it is used when the latest docs are pushed to the Read The Docs site (it can't currently be generated as part of the build).

Running the Linter & Unit Tests

Run the linter on the project source:

    $ make lint

Run unit tests and test coverage. This will display terminal output and generate an HTML coverage report in the htmlcov directory.

    $ make test

For development, you may want to run the unit tests continuously as you update tests and the code-under-test:

    $ make test-watch

Generating Request Parameters

The harmony.Request constructor can accept parameters that are defined in the Harmony OGC API schema. If this schema has been changed and the Request constructor needs to be updated, you may run the generator utility. This tool reads the Harmony schema and generates a partial constructor signature with docstrings:

    $ python internal/genparams.py ${HARMONY_DIR}/app/schemas/ogc-api-coverages/1.0.0/ogc-api-coverages-v1.0.0.yml

Either set HARMONY_DIR or replace it with your Harmony project directory path. You may then write standard output to a file and then use it to update the harmony.Request constructor and code.

CI

Harmony-py uses GitHub Actions to run the Linter & Unit Tests. The test coverage output is saved as a build artifact.

Building and Releasing

New versions of Harmony-Py will be published to PyPi via a GitHub action whenever a draft release is marked as published https://github.com/nasa/harmony-py/releases.

harmony-py's People

Contributors

Stargazers

Watchers

Forkers

bilts vermeerlee anushkrishnav cthi77 nasa-openscapes hailiangzhang hailiangzhangnasa lauro-cesar kira-hart frankinspace standardgalactic owenlittlejohns python-repository-hub chathu84 terminalcult eni-awowale jhkennedy

harmony-py's Issues

Conda forge recipe

It would be great to have a conda-forge recipe for Harmony, to facilitate installation and version management via conda.

Here are the docs for adding recipes to conda-forge. I played around with this briefly and it is very straightforward to create recipes from pip using grayskull; it took me about 3 minutes to produce the recipe file below.

Once the recipe is created, I think all you have to do is submit a PR to https://github.com/conda-forge/staged-recipes.

Since it doesn't seem like much work, I can do this myself and implicitly volunteer as a maintainer of the recipe. However, I thought I'd post on here first to see if there were any reasons not to do this and/or if someone internal to ESDIS would step up to maintain the recipe instead.

Example recipe file

{% set name = "harmony-py" %}
{% set version = "0.4.2" %}

package:
  name: {{ name|lower }}
  version: {{ version }}

source:
  url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/harmony-py-{{ version }}.tar.gz
  sha256: 69ac699973c8dc430780cb6a2169a072eeffec7e11be4e6d0b4da8755b086d8f

build:
  noarch: python
  script: {{ PYTHON }} -m pip install . -vv
  number: 0

requirements:
  host:
    - pip
    - python >=3.6,<4.0
  run:
    - curlify >=2.2,==2.*
    - progressbar2 >=3.5,==3.*
    - python >=3.6,<4.0
    - python-dateutil >=2.7.5,==2.7.*
    - python-dotenv >=0.1,==0.*
    - requests >=2.2,==2.*
    - sphinxcontrib-napoleon >=0.7

test:
  imports:
    - harmony_py
  commands:
    - pip check
  requires:
    - pip

about:
  home: https://github.com/nasa/harmony-py
  summary: The NASA Harmony Python library
  license: Apache-2.0
  license_file: LICENSE

extra:
  recipe-maintainers:
    - ashiklom

introduction and first glance high-level thoughts

Hello! I'm Jessica, and @asteiker and I have been chatting about this awesome new library. I'm the lead developer for icepyx, which leverages NSIDC's API for programmatic access to ICESat-2 data. I'd encourage you to check out our code base, as many of the tools I've developed there could readily be moved to a higher level library like this one (they'd just need some generalization where I've made them ICESat-2 specific and probably some transition to better practices since I based them primarily on one NSIDC data access notebook from a couple years ago). I'd be happy to help build on those tools here and then leverage them within icepyx as appropriate. I took a look at your Harmony introduction notebook and wanted to provide some high level first thoughts about where it looks like harmony-py is going and how that might align with icepyx dev goals (and with the acknowledgement that some of these things might already be in the works).

As it shows currently, the notebook showcases functionality that I would argue is too low level for most science users. For instance, functions such as setup_earthdata_login_auth should be wrapped entirely under the hood. I suspect this is the path that particular function is headed on, but that's less clear as you go and to my eye, there's a lot of code here the user should never see. For a case in point, check out the icepyx modules for granules and variables (https://github.com/icesat2py/icepyx/tree/development/icepyx/core). Most users don't even know they exist to handle granule searches/availability and variable-based subsetting. Inputs for data orders and downloads are validated and formatted automatically for the user in API_formatting.py, so they can enter the date as a string and not worry about it being put into the proper url string format or dictionary with the appropriate key to submit. Preferred submission params are used by default, with the option for the user to edit them by updating the relevant parameter dictionary stored with the search parameters. For most use cases, users interact with just a single query object, which includes properties that return pretty-formatted abridged and full information about the datasets, what subsetting options are available, and what granules meet their search criteria. I'm excited for the possibilities being opened by having a python library specifically aimed at easier programmatic access for NASA datasets and eager to chat with folks about how our work can be complimentary!

What is the minimum python version?

It's unclear what the minimum Python version should be for `harmony py as it's reported differently in various places:

The setup.py states Python 3.6+: https://github.com/nasa/harmony-py/blob/main/setup.py#L61
The "Using Harmony Py" section of the README list Python 3.8+: https://github.com/nasa/harmony-py?tab=readme-ov-file#using-harmony-py
The "Running Examples & Developing on Harmony Py" section of the README lists Python 3.7+: https://github.com/nasa/harmony-py?tab=readme-ov-file#running-examples--developing-on-harmony-py

Since Python 3.6 and 3.7 are EOL, I suggest aligning everything on 3.8+.

Consider removing upper bounds from dependencies in setup.py

harmony-py currently pins its core (and develop) dependencies down to compatible releases, specified all the way down to a specific patch set in some instances.

Since Python has a flat dependency tree (can't have two versions of the same package in an environment) and harmony-py is a library^, not an application, this is overly restrictive.

For example, python-dotenv is pinned to ~=0.20.0 or >=0.20.0,==0.20.* which was released two years ago and is a major version behind (current v1.0.1). If any package in a user's environment depends on python-dotenv>=0.21.0, harmony-py cannot be installed.

There's an excellent long-form discussion of upper bounds here: https://iscinumpy.dev/post/bound-version-constraints/

Notably, this problem also exists with over-constrained lower bounds, although it's less problematic over time.

I'd suggest including the dependencies directly in the project metadata (setup.py/setup.cfg/pyproject.toml) and only excluding known incompatibilities with version bounds, and only migrating bounds when new incompatibilities are discovered or resolved.

For development, it's still often desirable to have a known working environment for developers, which is where requirements.txt files come into play.

While this requires some duplication as new requirements must be added to both the project metadata and the development environment specification, that cost significantly improves the usability of the library.

^ Here, I'm using a common generalization of "library" and "application" to primarily distinguish between how packages are installed into environments and who owns said environments.

Library: intended to be installed into user environments alongside whatever packages are in their environments. The more restrictive the version bounds, the less likely it is possible to install the package in user environments, and the primary responsibility of the dependency bounds, in this case, is to say, "This package will not work with these package versions."
Application: intended to be installed into environments/infrastructure that the developers own and maintain but users do not have access to, and must work so dependencies are strictly specified so that the environment is known to work.

Clarifying internal vs external contributing requirements

In the contributing guide, there's a section on "Commits" which refers to a seemingly internal ticketing tool (jira?) and requests you include ticket numbers in your commits.

For outside contributors, it isn't clear how to handle this, and it invokes some questions, like:

Where is the ticketing tool, can I see it/access it, and should I be using it?
Should all contributors open a GitHub issue first and wait to receive a ticket number to be worked?
Do all feature branch commits need to have that prefix, or will branches be squash-merged and the prefix added at that point?

nasa / harmony-py Goto Github PK

harmony-py's Introduction

harmony-py

Using Harmony Py

Prerequisites

Installing

Running Examples & Developing on Harmony Py

Prerequisites

Installing Development & Example Dependencies

Running the Example Jupyter Notebooks

Developing

Generating Documentation

Running the Linter & Unit Tests

Generating Request Parameters

CI

Building and Releasing

harmony-py's People

Contributors

Stargazers

Watchers

Forkers

harmony-py's Issues

Recommend Projects

Recommend Topics

Recommend Org