Coder Social home page Coder Social logo

gsoc-kechunk-2022's People

Contributors

martindurant avatar peterm790 avatar rsignell-usgs avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gsoc-kechunk-2022's Issues

IOOS Success Story with LiveOcean forecast collection

This GSoC was chosen by the IOOS organization, so I just wanted to report that @peterm790's help with understanding a problem with the fill_value in led to a Kerchunk PR which was then successfully applied to an IOOS model collection -- the LiveOcean forecast collection from the NaNOOS Regional Association of IOOS.

We took 24 sample hourly LiveOcean NetCDF from ROMS, put them on the Open Storage Network and kerchunked them into a single virtual Zarr dataset. We were able to modify the metadata, adding the standard_name='time' attribute to the ocean_time variable.

Here is a notebook demonstrating access and simple visualization:

Snapshot:

2022-06-22_15-13-30

LCMAP example: constructs a time coordinate from info found in the filenames

I didn't remember this quite right -- we didn't read the original geotiff files here because kerchunk doesn't create xarray spatial
coordinates yet when reading geotiff. So we first convered the geotiff files to netcdf, and it's the NetCDF files we are using in this example:

https://nbviewer.org/gist/cea8e1ee16e318e05c128353482836b9

It would be good to add this example to kerchunk (it also shows accessing an S3 endpoint that is not on AWS!)

Develop code to convert NcML to Kerchunk JSON?

The NetCDF Markup Language (NcML) has been widely used for decades to create virtual NetCDF datasets that point to collections of NetCDF files and add/modify global and variable metadata and/or values.

It would be very useful to have a package that parses this info from an NcML XML file and generates the equivalent Kerchunk JSON virtual dataset.

A fairly complicated example of NcML that aggregates a collection of ROMS NetCDF files and makes them CF and SGRID compliant is here:

https://github.com/rsignell-usgs/xml/blob/master/THREDDS/geoport/COAWST_catalog.xml

Create pangeo-forge recipes

@peterm790 , I think it would be useful to create pangeo-forge recipes for ERA5 and other datasets.

I didn't really see the point of using pangeo-forge for our workflows up until now, because:

  • we have functioning workflows for kerchunk already running elsewhere
  • it doesn't yet have a way to update on a schedule
  • it doesn't currently handle kerchunking of grib2 files

But @sharkinsspatial convinced me at SciPy2022 that even if we didn't need pangeo-forge there was value in capturing the workflow for others using pangeo-forge formalism. And they are apparently working on the updating part.

At the scipy sprint on pangeo-forge, I created my first recipe, following the example here: https://pangeo-forge.readthedocs.io/en/latest/pangeo_forge_recipes/tutorials/hdf_reference/reference_cmip6.html

@martindurant, two questions:

  • do you think this is a good idea?
  • should I submit an issue to create a GRIB2ReferenceRecipe method, or should it be for a more generic ReferenceRecipe method?

Add Peter to the ESIP JupyterHub on AWS

The ESIP JupyterHub on AWS (https://jupyter.qhub.esipfed.org) is a great place to run workflows that analyze and visualize kerchunked data because:

  • it runs on AWS in us-west-2, thus on very fast internet
  • if the data being accessed is on AWS the performance is good and the data transmission (egress) costs are low for accessing requester-pays buckets. (and if the data is on AWS us-west-2, the costs are zero!
  • you can easily scale up remote Dask clusters on demand to gain large data rates on chunked data
  • there are conda environments already set up for this type of work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.