cogrib - Cloud Optimized GRIB

This library has utilities for facilitating access to GRIB2 files stored in cloud Blob Storage.

The problem

Traditionally, to load data from a GRIB2 file you would need to first download the entire GRIB2 file to disk and then read a portion of it.

Disks are relatively slow
GRIB files contain many variables, and you might just be interested in a small portion of it

We'd like to access subsets of the GRIB2 file directly from Blob Storage, without having to download all or part of the file locally first.

We could just convert the GRIB2 file to Zarr. But we'll assume that the data provider has to host the unmodified GRIB2 files and doesn't want to host two copies of the data.

How it works

There are two distinct stages: First a data host scans the GRIB2 file for datasets and figures out which portions the GRIB2 file each dataset refers to. These references, which are byte offsets and lengths for each variable, are saved off to a "kerchunk index file." The index file contains

All the metadata (the dimensions, coordinate values, each variable's attributes, etc.)
References to the GRIB2 as (url, offset, length) tuples.

In code, that looks like

>>> datasets = cfgrib.open_datasets("/path/to/file.grib2")  # must be local
>>> references = [cogrib.make_references(ds) for ds in datasets]

These references would be provided as, e.g., assets on a STAC item or collection.

Second, a user loads up this kerchunk index file and loads it into xarray using the Zarr engine using the "normal" Kerchunk / fsspec reference access pattern. For those unfamiliar, with this process, Zarr doesn't natively understand what to do with these references files. So we need a slightly intelligent Zarr store that will perform the HTTP range requests for the actual data from the GRIB2 file. The fsspec "reference" filesystem does just that.

>>> references = requests.get("http://path/to/references.json")
>>> store = fsspec.filesystem("reference", fo=references).get_mapper("")
>>> ds = xr.open_dataset(store, engine="zarr", chunks={})

Why isn't this in kerchunk / fsspec's reference filesystem?

It probably should be. Just experimenting for now.

tomaugspurger / cogrib Goto Github PK

cogrib's Introduction

cogrib - Cloud Optimized GRIB

The problem

How it works

Why isn't this in kerchunk / fsspec's reference filesystem?

cogrib's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent