unidata / siphon Goto Github PK

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.

Home Page: https://unidata.github.io/siphon

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

python unidata netcdf thredds thredds-catalogs hacktoberfest

siphon's Introduction

Siphon

Siphon is a collection of Python utilities for downloading data from Unidata data technologies. See our support page for ways to get help with Siphon.

Siphon follows semantic versioning in its version number. With our current 0.x version, that implies that Siphon's APIs (application programming interfaces) are still evolving (we won't break things just for fun, but many things are still changing as we work through design issues). Also, for a version 0.x.y, we change x when we release new features, and y when we make a release with only bug fixes.

We support Python >= 3.7.

Important Links

Source code repository: https://github.com/Unidata/siphon
HTML Documentation: http://unidata.github.io/siphon/
Unidata Python Gallery: https://unidata.github.io/python-gallery/
Issue tracker: http://github.com/Unidata/siphon/issues
"python-siphon" tagged questions on Stack Overflow: https://stackoverflow.com/questions/tagged/python-siphon
Gitter chat room: https://gitter.im/Unidata/siphon

Dependencies

requests>=1.2
numpy>=1.8
protobuf>=3.0.0a3
beautifulsoup4>=4.6
pandas

Developer Dependencies

pytest
vcrpy
flake8

siphon's People

Contributors

Stargazers

Watchers

Forkers

dopplershift lesserwhirls scottwales wholmgren moonraker rsignell-usgs huangynj scollis lizaclark ahill818 uberstig ucyo joelrahman teardro adaube brian-rose jrleeman mwengren ocefpaf highplainsregionalclimatecenter wep11 tsupinie pjpokran macweather tkubt2 justin1dennison boeboe knut0815 deeplycloudy pandinosaurus swnesbitt juozasg jthielen haileyajohnson tyronewebb story645 donglx2018 aodhansweeney zbruick scotthavens kmosiejczuk apottr jagular1785 abeta040 dcamron kahemker francescogranella lprox2020 methanerain zypher22 naveenrshahi kedoneyes danielmwatkins newharmony inos-soft krishna999 oxelson anilthumpala akrherz rpmanser paulyc mabaxter elenathornton keltonhalbert xavierafid kgoebber xenos-code tmharty jamespolly tkschuler bhanubuchi tlogan2000 jmineau isat-drg

siphon's Issues

Add collection of Metadata to TDSCatalog

Add functionality to read and store metadata contained in the TDS Client Catalogs. For this, a simple dictionary, or subclass thereof, will do.

"Effective Python" ideas

Been reading effective Python. Some things that might need reviewing:

Use generators instead of constructing lists and returning
enumerate(container, start)

(will add others as I keep reading)

Point collection catalog yields warnings

The following code:

from siphon.catalog import TDSCatalog
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/nws/metar/ncdecoded/catalog.xml?'
                 'dataset=nws/metar/ncdecoded/Metar_Station_Data_fc.cdmr')

yields:

WARNING:root:controlledVocabulary must have an attribute: vocabulary
WARNING:root:controlledVocabulary must have an attribute: vocabulary

I'm not sure if we're wrong or if thredds is wrong, but since we control it all, we really should make these warning go away somehow. (i.e. fix TDS or fix Siphon).

Fully document TDSCatalog

Currently, most of TDSCatalog is not documented.

method already defined line 15

Codacy detected an issue:

Message: `method already defined line 15`

Occurred on:

Commit: 2cf889c
File: siphon/ncss_dataset.py
LineNum: 50
Code: def handle_attribute(self, element): # noqa

Currently on:

Commit: 95fa1a0
File: siphon/ncss_dataset.py
LineNum: 54

Radarserver metadata.

Now that TDSCatalog will parse it without raising an exception, we should parse it and pull out the variables.

Update versioneer

Newest versioneer now puts config in setup.cfg, which is cleaner. Need to update.

Python 3 features

Just a tracker for python 3 features we could eventually use:

int.to_bytes() and int.from_bytes()
pathlib in stdlib
Simplify i/o (bytes vs. str more universal)

Get Sphinx docs building

Get Sphinx building docs as done in MetPy.

Timeline for v0.4.0 release and pypi upload?

I noticed when trying to work with the radarserver class that I was unable to retrieve metadata because I believe this was added right after 0.3.0 release (06/17 if I recall). I was able to resolve this of course by cloning the repo...just wondering in case there are radarserver examples for the workshop and users try to pip install or conda install. Also curious about what features you're considering for the v0.4.0 milestone.

Add Python 3.5 to Travis

Automate release process

Based on scipy 2015 talks, it would be possible and really nice to automate the release process. General concept:

Only on tagged builds
Use issues/PRs to make release notes
PyPI
- Build:
  - src tarball
  - universal wheel
- upload
Build conda package
- Build linux package
- use conda convert to make for all platforms (since we're pure python)
- upload

Catalog API enhancement

The base catalog API is a bit clunky to use; we need to add some user-friendly options. Some examples are in bird_house/threddsclient. Ideas:

follow method for CatalogRef (see urllib.parse.urljoin)
Be able to see all top level datasets
Simple regex search for datasets (i.e. find everything with name GFS)
Easy access to latest, Best (outside of datasets dictionary)

The driving goal is to be able to access a TDS and inquire about datasets easily, without using a web browser, only the Python API. Writing example notebooks to do some tasks, starting from the top-level catalog.xml is a great way to identify API clunkiness.

Remove unused import

Remove unused netcdf4 import from cdmremote example

https://github.com/Unidata/siphon/blob/f7f2d6a7cf645e4c652c165f65e20b92ec1dd3d4/examples/notebooks/cdmr/Basic%20CDMRemote%20Demo.ipynb

Radar server station info should have floats

Latitude, longitude, and elevation are strings, which is never what anyone wants to use. Convert them to float.

Prototype support for CDMRemoteFeature

Tracker for doing python implementation of the prototype CDMRemoteFeature protocol in TDS 5.0

Merge TDSCatalog related tests

Merge:

test_catalog.py
test_dods.py

into one test for TDSCatalog.py

Use Siphon to obtain useful WRF initialization grids from TDS

The WRF initialization process relies heavily on GRIB data...this is a travesty, and is an issue that the WRF group does not think is an issue (because people only want to initialize WRF using grids that are FTP'd from NCEP, right?).

Thankfully, the data from the GRIB file is quickly extracted and put into a temporary file early in the WRF initialization process using the UNGRIB program.This feature would do the following in an attempt to circumvent the UNGRIB step of WRF initialization. This would allow users to utilize the netCDF Subset Service of the TDS to request and transfer only the data that is needed to spin-up WRF.

Use WRF v-tables to make NCSS request for only the grids that WRF needs for initialization (no need to download every grid available from the model run).
Allow the user to subset temporally and spatially to further cut down on data transferred over the network (possibly reading directly from the WRF config file).
Return a compressed netCDF-4 file containing those grids (even more reduction in data transferred over the network)
Extract the netCDF-4 data into a binary file following the expected output by UNGRIB, which is written as fortran unformatted writes.

A side benefit of step 4. is that users could potentially take their own netCDF files and get them into the format needed by the METGRID step of the WRF initialization process, thus making the process of using your their own, non-GRIB data to initialize WRF easier (for example, several climate models only output netCDF, not GRIB).

Add client for TDS radarServer

This should be able to use TDSCatalog to parse the dataset.xml files and speak the proper NCSS-like query language.

Improve example testing process

Examples solely showing siphon functionality can be pretty boring, but combining those examples with a pretty picture can help quite a bit.

Unfortunately, this tends to introduce non-siphon related dependencies. While matplotlib is not a big deal, as is it pip installable, Cartopy is a big deal and at this point cannot be used on Travis using our current notebook testing infrastructure. At some point we need to revisit the example testing process to see if we can enable our examples to use cartopy.

Also, examples could probably use vcrpy to eliminate network traffic and reliance of our examples on a server being up and running 24x7.

Handle errors when passing HTML to `TDSCatalog`

Right now, if accidentally pass an html catalog to TDSCatalog, we get:

  File "<string>", line unknown
ParseError: mismatched tag: line 7, column 66

I don't know if it's a good idea to rewrite the URL, but we should at least detect it somehow (maybe using the content type on the headers) and throw a clear error.

API warts

Just a few things that bug me as I write notebooks:

The need to do: NCSS(ds.access_urls['NetcdfSubset']). Would be nice to just get the object back, since it's clear what needs to happen
The whole interface to pulling out datasets. i.e. list(cat.datasets.values())[0]. There needs to be a simple way to pull out a dataset by position, since first/last are really the most common use. I rarely actually want to grab a dataset by name.

Use OrderedDict in catalogs

Right now, all key-value stores within TDSCatalog are just plain Python dicts. It would be better to use OrderedDict, so that the order of iteration matches the order we parse in. That, in theory, would eliminate the need sometimes to sort on date/time to get things in proper order, since the server usually puts the most recent first.

Handle HTTP redirects properly

To be consistent with other services, TDS 5.0 serves out all catalogs using

thredds/catalog

The top level catalog's old url:

thredds/catalog.(html|xml)

redirects to:

thredds/catalog/catalog.(html|xml)

which siphon catches. However, the TDSCatalog object still has the old url, and thus any relative catalogRefs are constructed incorrectly (except for catalogRefs generated by catalogScan elements). For example, a TDS 5.0 catalogRef to forecastModels.xml is turned into an absolute URL of:

/thredds/forecastModels.xml

instead of the new, correct absolute URL of:

/thredds/catalog/forecastModels.xml

A request to the old absolute URL does not get a redirect, and thus fails with a 404.

Perhaps the correct thing to do here is check for the redirect to set the catalog objects "cat_url" attribute in TDSCatalog and make sure that absolute URLs for catalogRefs are built with the correct url. This should keep backwards compatibility with TDS <= 4.6 while future proofing for TDS 5.0.

Add NCSS client.

This should encapsulate constructing NCSS query strings, as well as parse the dataset.xml file that is served up initially by the NCSS endpoint. This should also handle dumping the returned NetCDF data to a temp file and opening using netCDF4-python.

NetCDF tempfile handling broken on windows

When using the NCSS client to open data we saved to a tempfile, we get a RuntimeError due to "Permission denied". I'm guessing Windows doesn't like someone opening the file a second time.

We could try catching the error and opening a different way, but in that case it's a much bigger pain to ensure the file gets deleted. sigh Why can't windows just behave in a unix-like fashion?

This may be obviated once we get the in-memory solution in the netcdf library, but I think it will be awhile after that before it's widely available. Might punt on this until a windows user complains...

Build against nightly

Add a build against Travis Python nightly binary. Would require a source for a NetCDF wheel.

create util for http get requests

Create a util for doing basic http get requests with the siphon user agent string using the package Requests:

http://docs.python-requests.org/en/latest/

Make sure required metadata attrs are actually required

review metadata.py to make sure required attribute checks are actually checking actual required attributes.

Implement functionality from THREDDS client

Hi *,

i was looking for a thredds client to browse and catalogs and to get resources (http, opendap, ...). I didn't find something working and i started to write my one based on already existing project. Now, its working for me and i found yesterday siphon ;)

In my implementation i'm using beautiful-soup python module to parse xml (based on lxml) which makes it easier to read ...

You might have a look at my implementation as an example ...

https://github.com/bird-house/threddsclient

Cheers,
Carsten

Replace print with exceptions and warnings

Using print() is a terrible idea for a library. If the condition is serious enough, throw an exception. If not, use the warnings module, so that client code can suppress them if so desired.

lonlat_box() is all jacked up

please fix ryan...thx

Add S3 radar notebook

Add a notebook showing how to get data from the new S3 instance.

Narrative documentation.

We have good API docs, and we have some good examples. We just need some docs to discuss how the pieces fit together and just how to approach solving problems with the library. Might want to wait for the catalog API enhancements before doing this.

nose -> py.test

Nose is dying (https://nose.readthedocs.org/en/latest/index.html) and py.test seems to be what all the cool kids are using. Shouldn't take much to move over, but will be easier to do sooner rather than later.

Add license header

All of the files need a license header. Should keep it simple like so:

# Copyright 2015 by Unidata
# SPDX-License-Identifier: MIT

Docs should have a Creative Commons license. For more information see:
http://esr.ibiblio.org/?p=6867#more-6867
https://spdx.org/licenses/

TDSCatalog error on radar server catalogs

from siphon.catalog import TDSCatalog
cat = TDSCatalog('http://thredds.ucar.edu/thredds/radarServer/nexrad/level2/IDD/dataset.xml')

yields

ERROR:root:No parser found for element variables
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-0fb0245feb85> in <module>()
      1 from siphon.catalog import TDSCatalog
----> 2 cat = TDSCatalog('http://thredds.ucar.edu/thredds/radarServer/nexrad/level2/IDD/dataset.xml')

/Users/rmay/repos/siphon/siphon/catalog.py in __init__(self, catalog_url)
     62                 self._process_catalog_ref(child)
     63             elif (tag_type == "metadata") or (tag_type == ""):
---> 64                 self._process_metadata(child, tag_type)
     65             elif tag_type == "service":
     66                 if child.attrib["serviceType"] != "Compound":

/Users/rmay/repos/siphon/siphon/catalog.py in _process_metadata(self, element, tag_type)
     96         if tag_type == "":
     97             logging.warning("Trying empty tag type as metadata")
---> 98         self.metadata = TDSCatalogMetadata(element, self.metadata).metadata
     99 
    100     def _process_datasets(self):

/Users/rmay/repos/siphon/siphon/metadata.py in __init__(self, element, metadata_in)
    469         if element_name == "metadata":
    470             for child in element:
--> 471                 self._parse_element(child)
    472         else:
    473             self._parse_element(element)

/Users/rmay/repos/siphon/siphon/metadata.py in _parse_element(self, element)
    513 
    514         try:
--> 515             parser[element_name](element)
    516         except KeyError:
    517             logging.error("No parser found for element %s", element_name)

KeyError: 'variables'

Only expose services enabled on a given dataset

Currently, TDSCatalog builds a set of access urls on a dataset based on the services listed in the catalog. However, this is not quite correct. Once we are correctly parsing the metadata tags, then each dataset should only expose the services and access urls that are endabled on the dataset (i.e. the element fullServices as listed in the dataset metadata).

Streamlined interface

Should investigate whether a shortened .sel() interface makes sense for NCSS (and RadarServer?) to do a one liner around query() and get_data(), possibly with a reduced set of support query terms.

Update vcrpy

Try out the new 1.7.4 release and see if everything works now.

Fix handling of CDMRemote StructureData

Bugs in TDS < 4.6 yielded weird data for StructureData, so we didn't fully implement. These have been fixed, so complete our implementation.

Get functionality of catalog_util into TDSCatalog

Make sure TDSCatalog covers the functionality of catalog_util. This may already be done.

Add `environment.yml`

That would make it easy to codify the steps for getting up and running with conda. This needs Unidata/conda-recipes#10 and Unidata/conda-recipes#9 to be handled.

Unused statement in test_metadata.py

Codacy detected an issue:

Message: `Statement seems to have no effect`

Occurred on:

Commit: 499ae18
File: siphon/tests/test_metadata.py
LineNum: 428
Code: md["serviceName"] == "VirtualServices"

Currently on:

Commit: 95fa1a0
File: siphon/tests/test_metadata.py
LineNum: 438

Remove Basemap from example notebooks

Simple accessors for Best, latest, 2DTime

Like:

dap_url = siphon.catalog.get_latest_access_url("http://thredds-jumbo.unidata.ucar.edu/thredds/catalog/grib/HRRR/CONUS_3km/surface/catalog.xml”, “OPENDAP”)

Whoever wrote those docs (me) forgot about it.

Function annotations

Would be good to experiment with adding PEP-484 function annotations to assist PyCharm with consuming our API. Might also be able to find some static-checking tools that can use them.

While we remain Python 2.x compatible, we can put these in .pyi files

Overly strict/verbose warnings for dataFormat when only case is different, e.g. netCDF instead of NetCDF

I ran the following code:

maca_tasmax_url = 'http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.html?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc'
maca_tasmax_cat = TDSCatalog(maca_tasmax_url)

And got the following long warning message:

/Users/mturner/miniconda3/envs/workshop2015/lib/python3.4/site-packages/siphon-0.3.1-py3.4.egg/siphon/catalog.py:61: 
UserWarning: URL http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.html?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc returned HTML. 
Changing to: http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.xml?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc
  new_url))

Value netCDF not valid for type dataFormat: must be ['BUFR', 'ESML', 'GEMPAK', 'GINI', 
'GRIB-1', 'GRIB-2', 'HDF4', 'HDF5', 'McIDAS-AREA', 'NcML', 'NetCDF', 'NetCDF-4', 
'NEXRAD2', 'NIDS', 'image/gif', 'image/jpeg', 'image/tiff', 'text/csv', 'text/html', 'text/plain',
 'text/tab-separated-values', 'text/xml', 'video/mpeg', 'video/quicktime',
...

In this case, the problem arose because of the following metadata [link]:

<metadata inherited="true">
    ...
    <dataFormat>netCDF</dataFormat>
    ....
</metadata>

The warning seems overly strict since it's just a capitalization difference, plus it seems too loud since the metadata entry doesn't have any real effect on whether or not the underlying data is valid.

Improve radar server example

Now that we have a better catalogRef API, we should do the right thing in the example and start at the top level catalog.

Vertical Coordinate Units

It would be nice to use hPa instead of Pa for grabbing vertical levels.

unidata / siphon Goto Github PK

siphon's Introduction

Siphon

Important Links

Dependencies

Developer Dependencies

siphon's People

Contributors

Stargazers

Watchers

Forkers

siphon's Issues

Codacy detected an issue:

Message: method already defined line 15

Occurred on:

Currently on:

Codacy detected an issue:

Message: Statement seems to have no effect

Occurred on:

Currently on:

Recommend Projects

Recommend Topics

Recommend Org

Message: `method already defined line 15`

Message: `Statement seems to have no effect`