Coder Social home page Coder Social logo

blaylockbk / herbie Goto Github PK

View Code? Open in Web Editor NEW
465.0 14.0 73.0 169.4 MB

Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.

Home Page: https://herbie.readthedocs.io/

License: MIT License

Python 99.26% Makefile 0.74%
grib hrrr cfgrib xarray noaa-data big-data-program python rap nomads grib2

herbie's People

Contributors

alcoat avatar alexander0042 avatar amotl avatar blaylockbk avatar cyrilbois avatar djgagne avatar fleegs79 avatar gabrielks avatar gitter-badger avatar haim0n avatar haimjether avatar incubatorshokuhou avatar joshuaeh avatar karlwx avatar rafa-guedes avatar swnesbitt avatar williamhobbs avatar wtoma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

herbie's Issues

Function to make soundings at any latitude/longitude point.

This is a common request, to pluck out vertical data at a point and make a sounding. This would really be useful, just need the time to implement this.

It wouldn't be very efficient, because I would need to download most of the data on the pressure levels, but could still be handy.

Subsetting issue on Windows, file not downloaded

Discussed in #84

Originally posted by imcslatte July 13, 2022
I've installed Herbie on a Windows 10 machine running python 3.10. My objective is to generate regional subsets of surface met variables (ugrd,vgrd,tmp at 2m,sfcpress , etc) for use as forcing for a regional ocean model. The first step is to subset the data by variable.

I stared with a simple example from the tutorial

from herbie.archive import Herbie
H = Herbie('2022-7-12 00:00',product='sfc',source='aws')
✅ Found ┊ model=hrrr ┊ product=sfc ┊ 2022-Jul-12 00:00 UTC F00 ┊ GRIB2 @ aws ┊ IDX @ `aws`
searchstring=':TMP:2 m'
H.read_idx(searchstring)
  grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
71 36433970 37698543 36433970-37698543 2022-07-12 2022-07-12 TMP 2 m above ground anl :TMP:2 m above ground:anl
H.download(verbose=True)
✅ Success! Downloaded HRRR from local               
	src: C:\Users\Eli Hunter\data\hrrr\20220712\hrrr.t00z.wrfsfcf00.grib2
	dst: C:\Users\Eli Hunter\data\hrrr\20220712\hrrr.t00z.wrfsfcf00.grib2
WindowsPath('C:/Users/Eli Hunter/data/hrrr/20220712/hrrr.t00z.wrfsfcf00.grib2')

Downloading the entire file works fine. It appears in my home directory as expected.

However, when I try this:

H.download(searchstring,verbose=True)
📇 Download subset: ▌▌Herbie HRRR model sfc product initialized 2022-Jul-12 00:00 UTC F00 ┊ source=local                                                            
 cURL from [file://C:\Users\Eli](file:///C:/Users/Eli) Hunter\data\hrrr\20220712\hrrr.t00z.wrfsfcf00.grib2
  71  :TMP:2 m above ground:anl
💾 Saved the subset to C:\Users\Eli Hunter\data\hrrr\20220712\subset_b7103ca278a75cad8f7d065acda0c2e80da0b7dc__hrrr.t00z.wrfsfcf00.grib2
WindowsPath('C:/Users/Eli Hunter/data/hrrr/20220712/subset_b7103ca278a75cad8f7d065acda0c2e80da0b7dc__hrrr.t00z.wrfsfcf00.grib2')

The output suggests a subset file was downloaded and is now in my home directory. However, the file subset_b7103ca278a75cad8f7d065acda0c2e80da0b7dc__hrrr.t00z.wrfsfcf00.grib2 does not exist. And it was instantaneous. There did not seem to be any delay associated with a file download.

Is there a configurations step I am missing?

Thanks.,
Eli

Docs: Restore my original header color (for light theme)

The new PyData Sphinx dark theme is awesome! But how do I restore the header color? The header banner background is now white instead of tan
image

https://pydata-sphinx-theme.readthedocs.io/en/stable/user_guide/customizing.html?highlight=css#customize-the-css-of-light-and-dark-themes

This didn't seem to help:

.bd-header[data-theme="light"] {
    background-color: #f0ead2 !important;
}

a:-webkit-any-link[data-theme="light"] {
    color: #25529b !important;
}

On Windows, Herbie can't remove subset file

When I open a subset from with xarray, Herbie tries to remove the file if it didn't exist before (some basic clean up). But this doesn't work on Windows

H = Herbie(
    "2021-10-9",
    model="hrrr",
    product="prs",
)

ds = H.xarray('^TMP:2 m')
PermissionError: [WinError 32] 
The process cannot access the file because it is being used by another process: 
'C:\\Users\\blayl_depgywe\\data\\hrrr\\20211009\\hrrr.t00z.wrfprsf00.grib2.subset_8746b7e5d534efa196e92e53c61ec747f4c936a5'

The offending line is Line 761: local_file.unlink() # Removes file

Move subset filename hash back to the end of the filename

Hey Brian,

I just wanted to add a comment here. Let me know if I should post this as a separate issue.

I noticed when downloading subset ECMWF operational forecast grib files that the subset hash portion of the filename changes even though the actual variable being requested does not. This change in the subset hash causes the filenames to be generated in such a way that the forecast hours are listed out of order, thus resulting in a need to rename the files in such a way that they are listed in order (e.g. by removing the hash).

An example of what I mean is shown below. The files shown are for a single variable (MSLP) over a 48hr forecast period. I double checked that all files are indeed subsetting the same variable (MSLP) in pygrib.

subset_0716d9708d321ffb6a00818614779e779925365c__20220512000000-24h-oper-fc.grib2
subset_0716d9708d321ffb6a00818614779e779925365c__20220512000000-39h-oper-fc.grib2
subset_12c6fc06c99a462375eeb3f43dfd832b08ca9e17__20220512000000-0h-oper-fc.grib2
subset_12c6fc06c99a462375eeb3f43dfd832b08ca9e17__20220512000000-15h-oper-fc.grib2
subset_22d200f8670dbdb3e253a90eee5098477c95c23d__20220512000000-18h-oper-fc.grib2
subset_22d200f8670dbdb3e253a90eee5098477c95c23d__20220512000000-30h-oper-fc.grib2
subset_22d200f8670dbdb3e253a90eee5098477c95c23d__20220512000000-9h-oper-fc.grib2
subset_632667547e7cd3e0466547863e1207a8c0c0c549__20220512000000-6h-oper-fc.grib2
subset_761f22b2c1593d0bb87e0b606f990ba4974706de__20220512000000-12h-oper-fc.grib2
subset_7719a1c782a1ba91c031a682a0a2f8658209adbf__20220512000000-42h-oper-fc.grib2
subset_887309d048beef83ad3eabf2a79a64a389ab1c9f__20220512000000-33h-oper-fc.grib2
subset_bc33ea4e26e5e1af1408321416956113a4658763__20220512000000-45h-oper-fc.grib2
subset_bd307a3ec329e10a2cff8fb87480823da114f8f4__20220512000000-21h-oper-fc.grib2
subset_bd307a3ec329e10a2cff8fb87480823da114f8f4__20220512000000-3h-oper-fc.grib2
subset_f6e1126cedebf23e1463aee73f9df08783640400__20220512000000-36h-oper-fc.grib2
subset_fa35e192121eabf3dabf9f5ea6abdbcbc107ac3b__20220512000000-27h-oper-fc.grib2

This does not seem to be occurring for other models I have tried (i.e. GFS, NAM, HRRR), so I assume it has something to do with how ECMWF releases and/or packages their data maybe?

The only quick fix I could suggest would by moving the hash portion of the file name to the end of the file, which would look something like:

20220512000000-0h-oper-fc_subset_12c6fc06c99a462375eeb3f43dfd832b08ca9e17.grib2

Maybe there are issues with this suggestion though. Anyways, thanks for your time and attention. I hope this helps to improve the Herbie package.

Originally posted by @mariandob in #60 (comment)

How to dump GRIB data into a text file.

I am truly aweful at python I can't seem to work with grib outside of python so please don't laugh too hard when you read my code:

from herbie.archive import Herbie
import numpy as np

H = Herbie('2022-01-26', model='ecmwf', product='oper', fxx=12)

ds = H.xarray(':2t:', remove_grib=False)

dsw = H.xarray(':10(u|v):', remove_grib=False)
ds['spd'] = np.sqrt(dsw['u10'] ** 2 + dsw['v10'] ** 2)

dsp = H.xarray(':tp:', remove_grib=False)
ds['tp'] = dsp['tp']

file = open('test.txt', 'a')
for lon in ds['longitude']:
    for lat in ds['latitude']:
        point = ds.sel(longitude=lon, latitude=lat, method='nearest')
        line = str(point['longitude'].values) + ',' + str(point['latitude'].values) + ',' + str(point['t2m'].values) + ',' + str(point['spd'].values) + ',' + str(point['tp'].values) + '\n'
        file.write(line)
file.close()

After 5 minutes it was like 5% done. I get why it's bad, but I honestly just don't want to spend a month learning python.

I prefer to just make like a (405900, 5) array and store a raw blob file of float32s like so:

lon1,lat1,t2m1,spd1,tp1,.....,lonN,latN,t2mN,spdN,tpN

Any advice would be amazing.

Exception 'PosixPath' object is not iterable when using the Herbie xarray method

There appears to be a bug in the Herbie.xarray method where the file path is not made into a string before calling cfgrib.open_datasets

Logs in question:

🏋🏻‍♂️ Found  2022-Feb-09 00:00 UTC F01 [HRRR] [product=sfc] GRIB2 file from aws and index file from aws.

Search string: APCP:surface:0-1 hour
👨🏻‍🏭 Created directory: [/tmp/hrrr/20220209]
📇 Download subset: [HRRR] model [sfc] product run at 2022-Feb-09 00:00 UTC F01
 cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20220209/conus/hrrr.t00z.wrfsfcf01.grib2
   1: GRIB_message=84  :APCP:surface:0-1 hour acc fcst
indexpath value  is ignored
No results found for search string: APCP:surface:0-1 hour, Exception: 'PosixPath' object is not iterable

After changing

        Hxr = cfgrib.open_datasets(
            self.get_localFilePath(searchString=searchString),
            backend_kwargs=backend_kwargs,
        )

to

        fp = self.get_localFilePath(searchString=searchString)
        Hxr = cfgrib.open_datasets(
            str(fp),
            backend_kwargs=backend_kwargs,
        )

in the herbie.archive xarray method, the exception no longer occurs.

herbie.archive download issues

Hi Brain,

Thank you for creating Herbie! I had an issue with downloading HRRR sub-hourly data. It would be much appreciated if you can help! I'm using the recent released Herbie.
So I edited the parameters as follows. However, I got the warning saying that file is not found. I wonder if I miss anything.

H = Herbie(
    '2016-10-09 00:00',
    model='hrrr',
    product='subh', #subh
    fxx=0
)

I tried to change the product to "sfc" just for testing and the file is found this time. However, when using H.download(":UGRD:10 m"). An error came out:

TypeError: _searchString_help() takes 0 positional arguments but 1 was given

Thanks so much!

Yue

Using Username/Password authorization when extending Herbie

I'm trying to add IMERG to the list of models. To download the data one must register with NASA and then use a username/password to access the data.

Typically I've gotten the data using python through a url using this format https://username:password@URL however when I add that style of url to the model's self.SOURCES I get the following Traceback

Traceback (most recent call last):
  File "herbie_methods.py", line 37, in <module>
    H.download(verbose=False)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/site-packages/herbie/archive.py", line 642, in download
    urllib.request.urlretrieve(self.grib, outFile, _reporthook)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 1397, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/urllib/request.py", line 1323, in do_open
    h = http_class(host, timeout=req.timeout, **http_conn_args)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/http/client.py", line 1383, in __init__
    super(HTTPSConnection, self).__init__(host, port, timeout,
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/http/client.py", line 834, in __init__
    (self.host, self.port) = self._get_hostport(host, port)
  File "/Users/judson/opt/anaconda3/envs/main/lib/python3.8/http/client.py", line 877, in _get_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: '[email protected]'

Is there a way around this, or should I add the username/password differently?

Improve RAP historical access from NCEI

The RAP/RUC historical files are messy to wade through. How can Herbie help manage the intricate file paths/products/model name changes by certain dates?

Herbie could loop through different sources. Will need to figure out all possible file paths.

Herbie GUI

Herbie is great, but some people don't know Python. Some sort of GUI like Brian's HRRR download page would be very helpful to many. The clickable webpage is also an easy way to know what data is available and with a quick click can get a file of interest.

How should the GUI be implemented?

'fast_Herbie_xarray' unable to grab 'gribfile_projection'

Hello,

I am downloading multiple forecasts using fast_Herbie_xarray(), and so far I have successfully used it to download HRRR data. Now I am attempting to grab GFS data, but I get the error:

Traceback (most recent call last):

  File "/var/folders/r9/_yc_yf6d5k38whv10mx_tzkd15bsjb/T/ipykernel_20173/4263070027.py", line 4, in <cell line: 1>
    ds = fast_Herbie_xarray(DATES=total_date_range, fxx=fxx, model=json_file[mod]['model_name'],

  File "/Users/akumler/miniconda3/envs/naerm/lib/python3.8/site-packages/herbie/tools.py", line 247, in fast_Herbie_xarray
    ds["gribfile_projection"] = ds.gribfile_projection[0][0]

  File "/Users/akumler/miniconda3/envs/naerm/lib/python3.8/site-packages/xarray/core/common.py", line 239, in __getattr__
    raise AttributeError(

AttributeError: 'Dataset' object has no attribute 'gribfile_projection'

Looking at some example GFS data via xarray, we see that 'gribfile_projection' is a data variable, but not an attribute.

<xarray.Dataset>
Dimensions:              (latitude: 721, longitude: 1440)
Coordinates:
    time                 datetime64[ns] 2021-07-11
    step                 timedelta64[ns] 00:00:00
    heightAboveGround    float64 2.0
  * latitude             (latitude) float64 90.0 89.75 89.5 ... -89.75 -90.0
  * longitude            (longitude) float64 0.0 0.25 0.5 ... 359.2 359.5 359.8
    valid_time           datetime64[ns] 2021-07-11
Data variables:
    t2m                  (latitude, longitude) float32 273.7 273.7 ... 221.6
    aptmp                (latitude, longitude) float32 269.0 269.0 ... 202.0
    gribfile_projection  object None
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   gfs
    product:                 pgrb2.0p25
    description:             Global Forecast System
    remote_grib:             https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20...
    local_grib:              /Users/akumler/data/gfs/20210711/subset_446869af...
    searchString:            TMP:2 m above

Digging deeper, it appears there is a projection, but the way that 'fast_Herbie_xarray' is currently set up doesn't recognize this? Am I missing something? Thanks!

x.gribfile_projection
Out[25]: 
<xarray.DataArray 'gribfile_projection' ()>
array(None, dtype=object)
Coordinates:
    time               datetime64[ns] 2021-07-11
    step               timedelta64[ns] 00:00:00
    heightAboveGround  float64 2.0
    valid_time         datetime64[ns] 2021-07-11
Attributes:
    crs_wkt:                      GEOGCRS["unknown",DATUM["unknown",ELLIPSOID...
    semi_major_axis:              6371229.0
    semi_minor_axis:              6371229.0
    inverse_flattening:           0.0
    reference_ellipsoid_name:     unknown
    longitude_of_prime_meridian:  0.0
    prime_meridian_name:          Greenwich
    geographic_crs_name:          unknown
    grid_mapping_name:            latitude_longitude
    long_name:                    GFS model grid projection

GFS data downloaded with searchString is not complete. Radiation variables are omitted.

Hi!

I have tried to download some radiation variables from GFS with no success. Some months ago I was able to download this data with the same version (Herbie 0.0.6), however, now it only gives me the following set of allowed variables:

array(['PRMSL', 'CLWMR', 'ICMR', 'RWMR', 'SNMR', 'GRLE', 'REFD', 'REFC',
       'VIS', 'UGRD', 'VGRD', 'VRATE', 'GUST', 'HGT', 'TMP', 'RH', 'SPFH',
       'VVEL', 'DZDT', 'ABSV', 'O3MR', 'TCDC', 'HINDEX', 'MSLET', 'PRES',
       'TSOIL', 'SOILW', 'SOILL', 'CNWAT', 'WEASD', 'SNOD', 'ICETK',
       'DPT', 'APTMP', 'ICEG', 'CPOFP', 'PRATE', 'CSNOW', 'CICEP',
       'CFRZR', 'CRAIN', 'SFCR', 'FRICV', 'VEG', 'SOTYP', 'WILT', 'FLDCP',
       'SUNSD', 'LFTX', 'CAPE', 'CIN', 'PWAT', 'CWAT', 'TOZNE', 'LCDC',
       'MCDC', 'HCDC', 'HLCY', 'USTM', 'VSTM', 'ICAHT', 'VWSH', '4LFTX',
       'HPBL', 'POT', 'PLPL', 'LAND', 'ICEC', 'ICETMP'], dtype=object)

It is only a small subset of the full set of variables in NOAA GFS. Is not possible to download radiation variables from GFS anymore with Herbie?

Filename Too Long

When subsetting from a full GRIB2 with lots of fields, the code at:

Herbie/herbie/archive.py

Lines 395 to 400 in 0beffe1

# Get a list of all GRIB message numbers. We will use this
# in the output file name as a unique identifier.
all_grib_msg = "-".join([f"{i:g}" for i in self.idx_df.index])
# Append the filename to distinguish it from the full file.
outFile = outFile.with_suffix(f".grib2.subset_{all_grib_msg}")

attempts to create a filename that's too long and fails.

An example would be:

h = Herbie('2021-09-22 12:00', model='gfs')
#download full GRIB2 for the GFS @ f000
h.download()
ds = h.xarray('HGT')
#so many field numbers match that Herbie fails with an OSError filename too long

Fix Ideas

I have a couple of thoughts:

  1. Could hash the list of field numbers (something quick and short, sha1 maybe?) and use that as the filename. That would allow for future access to the same set of GRIB fields (by hashing the same list)... but would make it impossible to figure out what's in the subsetted file just from the filename.
  2. Write a herbie index file that defines "these fields are in filename X" and use a shorter filename.
  3. Write each GRIB2 record to its own file (although this would create a large number of files potentially, there's basically zero overhead otherwise, since GRIB2 files are just concatenated records...) This actually could be useful as the granularity would allow for easy selection. (And also possibly solve another issue that I'm investigating -- if you've downloaded the full GRIB2 as above and then run the .xarray('HGT') command as above, it appears that Herbie won't subset the file that was already downloaded ... but will instead redownload the requested subset. In other words, instead of opening the full GRIB2 that was already downloadrd and grabbing all of the "HGT" records, it'll redownload the GRIB2 "HGT" records...)
  4. Use a tempfile ?

@functools.cached_property

I just learned about @functools.cached_property. Can this decorator be applied anywhere? Perhaps with the read_index method.

The @functools.lru_cache may also be useful.

pseudo code:

class Herbie:
    def __init__(self, ...):
        #stuff here

    @functools.lru_cache
    def read_idx(self, searchString):
        #load the index inventory DataFrame
    
    @functools.cached_property
    def some_prop(self):
        #can anythinthing else be cached?

From https://docs.python.org/3/library/functools.html
In general, the LRU cache should only be used when you want to reuse previously computed values. Accordingly, it doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each call, or impure functions such as time() or random().

Curl not a python package

Your setup.py includes curl which is not a python package and causes pip to fail. Just a heads up

Collecting curl (from hrrrb)
Exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/usr/lib/python3/dist-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 554, in _prepare_file
require_hashes
File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 278, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/usr/lib/python3/dist-packages/pip/index.py", line 465, in find_requirement
all_candidates = self.find_all_candidates(req.name)
File "/usr/lib/python3/dist-packages/pip/index.py", line 423, in find_all_candidates
for page in self._get_pages(url_locations, project_name):
File "/usr/lib/python3/dist-packages/pip/index.py", line 568, in _get_pages
page = self._get_page(location)
File "/usr/lib/python3/dist-packages/pip/index.py", line 683, in _get_page
return HTMLPage.get_page(link, session=self.session)
File "/usr/lib/python3/dist-packages/pip/index.py", line 795, in get_page
resp.raise_for_status()
File "/usr/share/python-wheels/requests-2.18.4-py2.py3-none-any.whl/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/curl/

Use Pandas to read .idx files directly

The Herbie read_idx method currently uses this to read the .idx files.

# Open the idx file
r = requests.get(self.idx)
assert r.ok, f"Index file does not exist: {self.idx}"   

read_idx = r.text.split('\n')[:-1]  # last line is empty
df = pd.DataFrame([i.split(':') for i in read_idx], 
                    columns=['grib_message', 'start_byte', 
                             'reference_time', 'variable', 
                             'level', 'forecast_time', 'none'])

But this can be done by pandas directly. I propose changing it to

r = requests.get(self.idx)
assert r.ok, f"Index file does not exist: {self.idx}"   

read_idx = pd.read_csv(self.idx,
                       sep=':', 
                       names=['grib_message', 'start_byte', 
                              'reference_time', 'variable', 
                              'level', 'forecast_time', 'none'])

Is there any reason for not doing it this way?

  • run time seems similar.
  • is this the appropriate exception handling?
  • What about for .idx files that are different?
    • Some .idx files on Pando don't end in ':', but the real .idx file do end in ':'. Need to take this into account when "slitting" the file using the separator.

examples:
Does not have ':' on end of each line
https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2.idx

Does have ':' on end of each line
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20180101/conus/hrrr.t00z.wrfsfcf00.grib2.idx

cfgrib can't read the RAP wrfprs files because gridType is not included in file

When I try to open the full-grid RAP files, like wrfprs

from herbie.archive import Herbie
H = Herbie('2021-7-23 00:00', model='rap', product='wrfprs')
ds = H.xarray('TMP:2 m')

cfgrib throws an an error...

KeyError: 'gridType'

There must be a key missing from the RAP prs files that cfgrib is trying to read. How can Herbie handle this error?

  • Herbie needs to tell cfgrib to not read the gridType key.
  • Can Herbie determine the grid_projection from other details?

Try This: Extending Herbie with custom templates

I think Herbie should grab custom templates from the ~/.config/Herbie/ folder. The reason for this is because if someone installs Herbie pip, it's not easy to edit the template files in the source code. This would enable someone to put their own custom templates in the Herbie config folder and edit them there, which would be loaded by Herbie.

Herbie would need to have append the PYTHONPATH with

sys.path.append('~/.config/Herbie/')
from <local_template_name> import *

to access templates from that folder.

Alternatively, could "option 4" in this example to load the package from a filepath. This is a bit verbose.

# importing the importlib.util module
import importlib.util        
  
# passing the file name and path as argument
spec = importlib.util.spec_from_file_location(
  "mod", "D:/projects/base/app/modules/mod.py")    
  
# importing the module as foo 
foo = importlib.util.module_from_spec(spec)        
spec.loader.exec_module(foo)
  
# calling the hello function of mod.py
foo.hello()

HRRR as Zarr on AWS

@blaylockbk , this is probably the wrong place to raise this, but I saw in your HRRR Archive FAQ, you said:

One day, we hope this data will be archived elsewhere that is more accessible to everyone. Perhaps soon it will be hosted by Amazon by their Opendata initiative. I would advocate to keep it in the GRIB2 format (the original format it is output as), but it would also be nice to store the data in a "cloud-friendly" format such as zarr.

To have archived HRRR data in Zarr would be AMAZING. We were trying to figure out how to download 1 year of HRRR surface fields to drive a Delaware Bay hydrodynamics simulation, and thinking how useful it would be to have the data on AWS. We could store as Zarr but create GRIB-on-demand service for those who need it. I've been active on the Pangeo project, and we have some tools now that could make the conversion, chunking and upload to cloud much easier. And I'd be happy to help out.

@zflamig, you guys would be up for a proposal on this, right ?

Try this: Multithreading for downloading speedup

Would Multithreading help download many files (and many parts of files) quickly?

This would be a helper tool used in herbie.tools (in bulk_download)

Check out this article for some inspiration.

It seems like simply downloading files are parts of files is an IO-bound task that could see some speedup fro multi threading.

Possiblly could see speedup when iterating on downloading chunks of the file.

HRRR native subsetting

Might be slightly an edge case but when trying to subset in a native file

H = Herbie('2021-07-19',
           model='hrrr',
           product='nat')
H.xarray(":(U|V)GRD:")

Produces " No GRIB messages found. There might be something wrong with searchString=':(U|V)GRD:hybrid'"
H.xarray("(U|V)GRD") works but does a list of xarray datasets at each hybrid level rather than an xarray dataset with a hybrid dimension

fast_Herbie_xarray() does not work with hrrr's subh product

In attempting to run

dates = pd.date_range('2022-01-01 1:00', 
                      '2022-01-01 3:00',
                      freq='1H')
fxx = 1
h_list = fast_Herbie_xarray(DATES=dates, fxx=fxx, model='hrrr', product='subh', searchString=':PRES:surface')

I get the following traceback

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Untitled-1.ipynb Cell 5' in <cell line: 1>()
----> [1](vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#ch0000011untitled?line=0) hh = fast_Herbie_xarray(DATES=dates, fxx=fxx, model='hrrr', product='subh', searchString=':PRES:surface')

File ~/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py:228, in fast_Herbie_xarray(DATES, searchString, fxx, max_threads, xarray_kw, **kwargs)
    [225](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=224)     ds_list = [future.result() for future in as_completed(futures)]
    [227](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=226) # Sort the DataSets, first by lead time (step), then by run time (time)
--> [228](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=227) ds_list.sort(key=lambda ds: ds.step.item())
    [229](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=228) ds_list.sort(key=lambda ds: ds.time.item())
    [231](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=230) # Reshape list with dimensions (len(DATES), len(fxx))

File ~/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py:228, in fast_Herbie_xarray.<locals>.<lambda>(ds)
    [225](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=224)     ds_list = [future.result() for future in as_completed(futures)]
    [227](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=226) # Sort the DataSets, first by lead time (step), then by run time (time)
--> [228](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=227) ds_list.sort(key=lambda ds: ds.step.item())
    [229](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=228) ds_list.sort(key=lambda ds: ds.time.item())
    [231](file:///Users/judson/miniforge3/envs/herbie/lib/python3.8/site-packages/herbie/tools.py?line=230) # Reshape list with dimensions (len(DATES), len(fxx))

AttributeError: 'list' object has no attribute 'step'

When I just get one herbie obj and read data into xarray I get the following note

Note: Returning a list of [2] xarray.Datasets because of multiple hypercubes.

and H.xarray() ends up returning a list of two xarray.Datasets (one with the 15min,30min,45min forecast and one with the 1hr fcst). Pretty sure this is what's causing the issue of not being able to use fast_Herbie_xarray() with hrrr subh. Not sure if there's a way around it?

Add template for GEFS

Data is available on Amazon in GRIB2 format and does includes an index file. It should be straightforward to include a GEFS template in Herbie.
https://registry.opendata.aws/noaa-gefs-reforecast/

Here is an example URL
https://noaa-gefs-retrospective.s3.amazonaws.com/GEFSv12/reforecast/2019/2019011200/p04/Days%3A1-10/apcp_sfc_2019011200_p04.grib2.idx

Herbie will need to process some arguemnts:

  • (Days 1-10 | Days 10-16) : use the fxx argument, but need some processing to find which directory to go into.
  • ( c00 | p01 | p02 | p03 | p04 ) : use the member argument, but needs some processing to find with directory to go into (c is control, p is perturbed)
  • *Variable Name (the user would need to have a knowledge of what file they want, as Herbie doesn't make a listing of the directories).

Create a conda-forge recipe so Herbie can be installed via conda

It would be nice if Herbie could be installed with Conda directly, instead of using Pip. Especially since Herbie depends on cfgrib and cartopy, which have dependencies that can't be installed with pip (proj, GEOS, eccodes)

I don't have experience with this, but want to learn. If anyone can help out with this, that would be awesome 😁

Build some CI with GitHub actions

I need some CI to check that Herbie can be installed and run on multiple platforms.

  • Test installing Herbie on different platforms with Conda
  • Run Herbie's tests with Python 3.8 - 3.10

I'm not at all experienced with this yet, but I want to learn. If anyone has some tips, please let me know.

When a grib2 file exists locally, create the index file with wgrib2 if it is installed

For the case when a user downloads the GRIB2 files, and later wants to subset the file (e.g. to read into xarray), Herbie should look for the index files in the following order:

  1. Create index file locally with wgrib2 (if it exists) <-- If this is faster than option 2
  2. Check for the index file on the network and use that if it exists. If not, then make a copy locally (if wgrib2 exists)

Carpenter Workshop instructions missing from tutorials

First, this tool is awesome. Thanks for publishing and maintaining it!

I installed your package using conda install -c conda-forge herbie-data, but wasn't able to run this portion of your tutorial https://blaylockbk.github.io/Herbie/_build/html/user_guide/notebooks/data_hrrr.html without first installing Carpenter Workshop, e.g.,

pip install git+https://github.com/blaylockbk/Carpenter_Workshop.git

It might be helpful to add some additional instruction/details in the tutorials or here https://github.com/blaylockbk/Herbie#installation.

Multiple Levels Subsetting

When using the subset function and attempting to subset multiple levels (t2m, surface, 10m wind, etc), xarray does not open all variables, only attained surface variables. Wondering if this has to do with Herbie or xarray. Thought I would make you aware of this.

issues with herbie installation

Have python 3.7 installed, and so I created a virtualenv for python 3.9 to fulfill the version requirement.

When importing herbie I get the following error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 from herbie import Herbie

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\herbie\__init__.py:101, in <module>
     96 config = toml.load(_config_path)
     98 config["default"]["save_dir"] = Path(config["default"]["save_dir"]).expand()
--> 101 from herbie.archive import Herbie
    102 from herbie.tools import fast_Herbie, fast_Herbie_download, fast_Herbie_xarray

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\herbie\archive.py:58, in <module>
     55 from datetime import datetime, timedelta
     56 from io import StringIO
---> 58 import cfgrib
     59 import pandas as pd
     60 import pygrib

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\cfgrib\__init__.py:20, in <module>
     18 # cfgrib core API depends on the ECMWF ecCodes C-library only
     19 from .abc import Field, Fieldset, Index, MappingFieldset
---> 20 from .cfmessage import COMPUTED_KEYS
     21 from .dataset import (
     22     Dataset,
     23     DatasetBuildError,
   (...)
     27     open_from_index,
     28 )
     29 from .messages import FieldsetIndex, FileStream, Message

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\cfgrib\cfmessage.py:29, in <module>
     26 import attr
     27 import numpy as np
---> 29 from . import abc, messages
     31 LOG = logging.getLogger(__name__)
     33 # taken from eccodes stepUnits.table

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\cfgrib\messages.py:28, in <module>
     25 import typing as T
     27 import attr
---> 28 import eccodes  # type: ignore
     29 import numpy as np
     31 from . import abc

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\eccodes\__init__.py:13, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 13 from .eccodes import *

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\eccodes\eccodes.py:12, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 12 from gribapi import (
     13     CODES_PRODUCT_ANY,
     14     CODES_PRODUCT_BUFR,
     15     CODES_PRODUCT_GRIB,
     16     CODES_PRODUCT_GTS,
     17     CODES_PRODUCT_METAR,
     18 )
     19 from gribapi import GRIB_CHECK as CODES_CHECK
     20 from gribapi import GRIB_MISSING_DOUBLE as CODES_MISSING_DOUBLE

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\gribapi\__init__.py:13, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 13 from .gribapi import *  # noqa
     14 from .gribapi import __version__, lib
     16 # The minimum recommended version for the ecCodes package

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\gribapi\gribapi.py:34, in <module>
     30 from functools import wraps
     32 import numpy as np
---> 34 from gribapi.errors import GribInternalError
     36 from . import errors
     37 from .bindings import ENC

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\gribapi\errors.py:16, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
      9 # does it submit to any jurisdiction.
     10 #
     12 """
     13 Exception class hierarchy
     14 """
---> 16 from .bindings import ENC, ffi, lib
     19 class GribInternalError(Exception):
     20     """
     21     @brief Wrap errors coming from the C API in a Python exception object.
     22 
     23     Base class for all exceptions
     24     """

File C:\ProgramData\Anaconda3\envs\python39\lib\site-packages\gribapi\bindings.py:35, in <module>
     33 library_path = findlibs.find("eccodes")
     34 if library_path is None:
---> 35     raise RuntimeError("Cannot find the ecCodes library")
     37 # default encoding for ecCodes strings
     38 ENC = "ascii"

RuntimeError: Cannot find the ecCodes library

Specify cutsom filename with 'H.download()'

See #58.

You can do this by setting the H.LOCALFILE before doing H.download(), but this only works for downloading full files. Need to come up with a method to override the filename for downloading subsets.

Interest in Adding Support for NAM Model

Hi Brian,

I was wondering if there are plans to try and add the North American Mesoscale (NAM) model to the list of models that Herbie works with? I recently found that this data is available through Amazon Web Services and would think that adding it the list of Herbie compatible models would maybe not be too difficult? I am a relatively novice python programmer, so maybe I am mistaken on this point, but if you are not working on this currently I would be happy to contribute with a small bit guidance on what needs to be done.

Thanks for the excellent documentation on Herbie, and for the package itself.

Best,

Marian

conda environment.yml install

I am trying to install Herbie using the conda environment.yml file but conda is finding conflicts that are causing the install to fail:

(base) jmiller@ubuntu:~$ conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:



Package geos conflicts for:
metpy -> cartopy[version='>=0.15.0'] -> shapely[version='>=1.6.4'] -> geos[version='>=3.4']
cartopy[version='>=0.20.3'] -> geos[version='>=3.10.3,<3.10.4.0a0|>=3.11.0,<3.11.1.0a0']
metpy -> cartopy[version='>=0.15.0'] -> geos[version='3.6.2|>=3.10.0,<3.10.1.0a0|>=3.10.1,<3.10.2.0a0|>=3.10.2,<3.10.3.0a0|>=3.10.3,<3.10.4.0a0|>=3.11.0,<3.11.1.0a0|>=3.6.2,<3.6.3.0a0|>=3.7.0,<3.7.1.0a0|>=3.7.1,<3.7.2.0a0|>=3.7.2,<3.7.3.0a0|>=3.8.0,<3.8.1.0a0|>=3.8.1,<3.8.2.0a0|>=3.9.0,<3.9.1.0a0|>=3.9.1,<3.9.2.0a0']
geopandas -> shapely -> geos[version='3.6.2|>=3.10.0,<3.10.1.0a0|>=3.10.1,<3.10.2.0a0|>=3.10.2,<3.10.3.0a0|>=3.10.3,<3.10.4.0a0|>=3.11.0,<3.11.1.0a0|>=3.4|>=3.6.2,<3.6.3.0a0|>=3.7.0,<3.7.1.0a0|>=3.7.1,<3.7.2.0a0|>=3.7.2,<3.7.3.0a0|>=3.8.0,<3.8.1.0a0|>=3.8.1,<3.8.2.0a0|>=3.9.0,<3.9.1.0a0|>=3.9.1,<3.9.2.0a0']

I was able to manually create a conda environment and then install using pip

UPDATE:
Turns out the conda environment I set up using pip actually doesn't want to work either. I'll keep trying and see if I can get it working.

Write some tests

I have never written tests before, but I should learn how to do it.

Herbie can't find index file for GFS grib2 files

Seems there exist some problems when finding indexes even it they exist from version>=0.0.7
Even if I replace the suffix, it raises an error: "No index found for..."

H = Herbie(date, priority='aws', product='pgrb2.0p25', model='gfs', fxx=fxx, IDX_SUFFIX='.idx')
H.download() <- error

Is the subsetting of downloaded files working correctly?

(issue brought up by @danieldjewell)

If you have downloaded the full GRIB2 and then run the .xarray('HGT') method to open a subset, it appears that Herbie won't subset the file that was already downloaded, but will instead redownload the requested subset. In other words, instead of opening the full GRIB2 that was already downloaded and grabbing all of the "HGT" records, it'll redownload the GRIB2 "HGT" records from the remote source URL.

from herbie.archive import Herbie

# Download the full file
h = Herbie('2021-09-22 12:00', model='gfs')

# Read a subset of the file
ds = h.xarray('HGT')

See...the print message says the file is being redownloaded from the remote!
image

This may be harder to implement cleanly than I thought, because reading a subset of the downloaded data would make a copy of the full dataset. That makes redundant data copies on your local machine. I wonder how I could read the subset from the full local file based on the searchString argument without depending on wgrib2.

  1. One solution might be to not use Herbie to subset the local file, but to use cfgrib to subset the local file based on messages of interest by level and variable.
  2. Another option is to use the remote index file to subset-cURL the local file (this is how I thought the local subseting worked, but I need to double check).

herbie.tools.bulk_download runs into error when file is unavailable

Problem

herbie.tools.bulk_download runs into error when a file in unavailable

from herbie.tools import bulk_download
import pandas as pd

DATES = pd.date_range('2017-10-01', '2017-10-02', freq='1H')

h = bulk_download(
    DATES,
    model='hrrr',
    fxx=0,
    product='prs',
    searchString=":500 mb"
)
AssertionError                            Traceback (most recent call last)
/p/work1/tmp/blaylock/ipykernel_56524/274337681.py in <module>
      1 DATES = pd.date_range('2017-10-01', '2017-10-02', freq='1H')
      2 
----> 3 h = bulk_download(
      4     DATES,
      5     model='hrrr',

~/BB_python/Herbie/herbie/tools.py in bulk_download(DATES, searchString, fxx, model, product, priority, verbose)
     67     for i, g in enumerate(grib_sources):
     68         timer = datetime.now()
---> 69         g.download(searchString=searchString)
     70 
     71         # ---------------------------------------------------------

~/BB_python/Herbie/herbie/archive.py in download(self, searchString, source, save_dir, overwrite, verbose, errors)
    631         # If the file exists in the localPath and we don't want to
    632         # overwrite, then we don't need to download it.
--> 633         outFile = self.get_localFilePath(searchString=searchString)
    634 
    635         # This overrides the overwrite specified in __init__

~/BB_python/Herbie/herbie/archive.py in get_localFilePath(self, searchString)
    422         if searchString is not None:
    423             # Reassign the index DataFrame with the requested searchString
--> 424             self.idx_df = self.read_idx(searchString)
    425 
    426             # Get a list of all GRIB message numbers. We will use this

~/BB_python/Herbie/herbie/archive.py in read_idx(self, searchString)
    456         A Pandas DataFrame of the index file.
    457         """
--> 458         assert self.idx is not None, f"No index file found for {self.grib}."
    459 
    460         # Sometimes idx end in ':', other times it doesn't (in some Pando files).

AssertionError: No index file found for None.

Solution

  • Need to step over files that don't exist
  • return a list of datetimes/files not downloaded

Smarter curl ranges

Because of #40 (reply in thread)

It would be nice if Herbie grouped curl ranges for adjacent ranges. (does that make sense?)

Instead of getting message 4 and 5 separately, get 4 and 5 in the same range get.

15-minute data

Hello,
I was wondering if the 15-minute data can be downloaded using Herbie. If so, how is it possible?

Thank you very much for your help.

Add documentation for new `FastHerbie()` class

FastHerbie will replace the fast_herbie, fast_herbie_xarray, and fast_herbie_download functions. I need to document it so people stop using those old functions (and then remove those functions, because they don't work as well as FastHerbie).

Let Herbie find the most recent GRIB2 file.

From San Joaquin Valley Newspaper

Hi Brian,

I'm an environmental reporter from the San Joaquin Valley who is going to use the HRRR smoke forecast system to help create outdoor activity guidelines during this year's wildfire season. On the server end, I'm just calling for the MASSDEN field and subsetting that geo data to some custom boundaries for the SJV.

After fiddling around with the raw index and wgrib2 calls in my own python script, I found your amazing package that seems to handle most of the stuff I need.

I'm writing to you for some advice. Still unclear to me is how to get the most up-to-date forecast data with the lowest time latencies. What strategy would you use? Is there a location at the HRRR AWS location that always has the latest simulation data?

Also, is there any difference in the latencies on the server side between when the Zarr database gets updated and the GRIB2 data?

Best,

Greg

Deal with starting byte range for UGRD and VGRD in RAP

This isn't a problem if you target both UGRD and VGRD (searchString="(U|V)GRD"), but if you only want the UGRD, then you will run into this problem...

In this example inventory file you will see that the UGRD and VGRD starting byte range are the same.

112.1:3395201:d=2016021722:UGRD:1000 mb:anl:
112.2:3395201:d=2016021722:VGRD:1000 mb:anl:

One solution is if the next byte range is the same, to advance to the next line again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.