Coder Social home page Coder Social logo

unidata / siphon Goto Github PK

View Code? Open in Web Editor NEW
211.0 27.0 74.0 188.67 MB

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.

Home Page: https://unidata.github.io/siphon

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
python unidata netcdf thredds thredds-catalogs hacktoberfest

siphon's Introduction

Siphon

License PRs Welcome

Latest Docs PyPI Package Conda Package

Travis Build Status AppVeyor Build Status Code Coverage Status

Codacy code issues

Siphon is a collection of Python utilities for downloading data from Unidata data technologies. See our support page for ways to get help with Siphon.

Siphon follows semantic versioning in its version number. With our current 0.x version, that implies that Siphon's APIs (application programming interfaces) are still evolving (we won't break things just for fun, but many things are still changing as we work through design issues). Also, for a version 0.x.y, we change x when we release new features, and y when we make a release with only bug fixes.

We support Python >= 3.7.

Dependencies

  • requests>=1.2
  • numpy>=1.8
  • protobuf>=3.0.0a3
  • beautifulsoup4>=4.6
  • pandas

Developer Dependencies

  • pytest
  • vcrpy
  • flake8

siphon's People

Contributors

akrherz avatar danielmwatkins avatar dcamron avatar deeplycloudy avatar dependabot[bot] avatar dopplershift avatar haileyajohnson avatar jcla490 avatar joelrahman avatar jrleeman avatar jthielen avatar julienchastang avatar kmosiejczuk avatar lesserwhirls avatar lprox2020 avatar moonraker avatar ocefpaf avatar pharaohcola13 avatar pjpokran avatar rpmanser avatar scollis avatar scotthavens avatar story645 avatar swnesbitt avatar wep11 avatar zbruick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siphon's Issues

Catalog API enhancement

The base catalog API is a bit clunky to use; we need to add some user-friendly options. Some examples are in bird_house/threddsclient. Ideas:

  • follow method for CatalogRef (see urllib.parse.urljoin)
  • Be able to see all top level datasets
  • Simple regex search for datasets (i.e. find everything with name GFS)
  • Easy access to latest, Best (outside of datasets dictionary)

The driving goal is to be able to access a TDS and inquire about datasets easily, without using a web browser, only the Python API. Writing example notebooks to do some tasks, starting from the top-level catalog.xml is a great way to identify API clunkiness.

Improve example testing process

Examples solely showing siphon functionality can be pretty boring, but combining those examples with a pretty picture can help quite a bit.

Unfortunately, this tends to introduce non-siphon related dependencies. While matplotlib is not a big deal, as is it pip installable, Cartopy is a big deal and at this point cannot be used on Travis using our current notebook testing infrastructure. At some point we need to revisit the example testing process to see if we can enable our examples to use cartopy.

Also, examples could probably use vcrpy to eliminate network traffic and reliance of our examples on a server being up and running 24x7.

Automate release process

Based on scipy 2015 talks, it would be possible and really nice to automate the release process. General concept:

  • Only on tagged builds
  • Use issues/PRs to make release notes
  • PyPI
    • Build:
      • src tarball
      • universal wheel
    • upload
  • Build conda package
    • Build linux package
    • use conda convert to make for all platforms (since we're pure python)
    • upload

Function annotations

Would be good to experiment with adding PEP-484 function annotations to assist PyCharm with consuming our API. Might also be able to find some static-checking tools that can use them.

While we remain Python 2.x compatible, we can put these in .pyi files

Point collection catalog yields warnings

The following code:

from siphon.catalog import TDSCatalog
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/nws/metar/ncdecoded/catalog.xml?'
                 'dataset=nws/metar/ncdecoded/Metar_Station_Data_fc.cdmr')

yields:

WARNING:root:controlledVocabulary must have an attribute: vocabulary
WARNING:root:controlledVocabulary must have an attribute: vocabulary

I'm not sure if we're wrong or if thredds is wrong, but since we control it all, we really should make these warning go away somehow. (i.e. fix TDS or fix Siphon).

Build against nightly

Add a build against Travis Python nightly binary. Would require a source for a NetCDF wheel.

Handle HTTP redirects properly

To be consistent with other services, TDS 5.0 serves out all catalogs using

thredds/catalog

The top level catalog's old url:

thredds/catalog.(html|xml)

redirects to:

thredds/catalog/catalog.(html|xml)

which siphon catches. However, the TDSCatalog object still has the old url, and thus any relative catalogRefs are constructed incorrectly (except for catalogRefs generated by catalogScan elements). For example, a TDS 5.0 catalogRef to forecastModels.xml is turned into an absolute URL of:

/thredds/forecastModels.xml

instead of the new, correct absolute URL of:

/thredds/catalog/forecastModels.xml

A request to the old absolute URL does not get a redirect, and thus fails with a 404.

Perhaps the correct thing to do here is check for the redirect to set the catalog objects "cat_url" attribute in TDSCatalog and make sure that absolute URLs for catalogRefs are built with the correct url. This should keep backwards compatibility with TDS <= 4.6 while future proofing for TDS 5.0.

Timeline for v0.4.0 release and pypi upload?

I noticed when trying to work with the radarserver class that I was unable to retrieve metadata because I believe this was added right after 0.3.0 release (06/17 if I recall). I was able to resolve this of course by cloning the repo...just wondering in case there are radarserver examples for the workshop and users try to pip install or conda install. Also curious about what features you're considering for the v0.4.0 milestone.

Add NCSS client.

This should encapsulate constructing NCSS query strings, as well as parse the dataset.xml file that is served up initially by the NCSS endpoint. This should also handle dumping the returned NetCDF data to a temp file and opening using netCDF4-python.

Add `environment.yml`

That would make it easy to codify the steps for getting up and running with conda. This needs Unidata/conda-recipes#10 and Unidata/conda-recipes#9 to be handled.

Replace print with exceptions and warnings

Using print() is a terrible idea for a library. If the condition is serious enough, throw an exception. If not, use the warnings module, so that client code can suppress them if so desired.

Python 3 features

Just a tracker for python 3 features we could eventually use:

  • int.to_bytes() and int.from_bytes()
  • pathlib in stdlib
  • Simplify i/o (bytes vs. str more universal)

Implement functionality from THREDDS client

Hi *,

i was looking for a thredds client to browse and catalogs and to get resources (http, opendap, ...). I didn't find something working and i started to write my one based on already existing project. Now, its working for me and i found yesterday siphon ;)

In my implementation i'm using beautiful-soup python module to parse xml (based on lxml) which makes it easier to read ...

You might have a look at my implementation as an example ...

https://github.com/bird-house/threddsclient

Cheers,
Carsten

Use Siphon to obtain useful WRF initialization grids from TDS

The WRF initialization process relies heavily on GRIB data...this is a travesty, and is an issue that the WRF group does not think is an issue (because people only want to initialize WRF using grids that are FTP'd from NCEP, right?).

Thankfully, the data from the GRIB file is quickly extracted and put into a temporary file early in the WRF initialization process using the UNGRIB program.This feature would do the following in an attempt to circumvent the UNGRIB step of WRF initialization. This would allow users to utilize the netCDF Subset Service of the TDS to request and transfer only the data that is needed to spin-up WRF.

  1. Use WRF v-tables to make NCSS request for only the grids that WRF needs for initialization (no need to download every grid available from the model run).
  2. Allow the user to subset temporally and spatially to further cut down on data transferred over the network (possibly reading directly from the WRF config file).
  3. Return a compressed netCDF-4 file containing those grids (even more reduction in data transferred over the network)
  4. Extract the netCDF-4 data into a binary file following the expected output by UNGRIB, which is written as fortran unformatted writes.

A side benefit of step 4. is that users could potentially take their own netCDF files and get them into the format needed by the METGRID step of the WRF initialization process, thus making the process of using your their own, non-GRIB data to initialize WRF easier (for example, several climate models only output netCDF, not GRIB).

Handle errors when passing HTML to `TDSCatalog`

Right now, if accidentally pass an html catalog to TDSCatalog, we get:

  File "<string>", line unknown
ParseError: mismatched tag: line 7, column 66

I don't know if it's a good idea to rewrite the URL, but we should at least detect it somehow (maybe using the content type on the headers) and throw a clear error.

Overly strict/verbose warnings for dataFormat when only case is different, e.g. netCDF instead of NetCDF

I ran the following code:

maca_tasmax_url = 'http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.html?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc'
maca_tasmax_cat = TDSCatalog(maca_tasmax_url)

And got the following long warning message:

/Users/mturner/miniconda3/envs/workshop2015/lib/python3.4/site-packages/siphon-0.3.1-py3.4.egg/siphon/catalog.py:61: 
UserWarning: URL http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.html?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc returned HTML. 
Changing to: http://inside-dev1.nkn.uidaho.edu:8080/thredds/catalog/MACAV1/inmcm4/catalog.xml?dataset=REACCHDatasetScan_inmcm4/macav1metdata_tasmax_inmcm4_r1i1p1_rcp45_2096_2100_WUSA.nc
  new_url))

Value netCDF not valid for type dataFormat: must be ['BUFR', 'ESML', 'GEMPAK', 'GINI', 
'GRIB-1', 'GRIB-2', 'HDF4', 'HDF5', 'McIDAS-AREA', 'NcML', 'NetCDF', 'NetCDF-4', 
'NEXRAD2', 'NIDS', 'image/gif', 'image/jpeg', 'image/tiff', 'text/csv', 'text/html', 'text/plain',
 'text/tab-separated-values', 'text/xml', 'video/mpeg', 'video/quicktime',
...

In this case, the problem arose because of the following metadata [link]:

<metadata inherited="true">
    ...
    <dataFormat>netCDF</dataFormat>
    ....
</metadata>

The warning seems overly strict since it's just a capitalization difference, plus it seems too loud since the metadata entry doesn't have any real effect on whether or not the underlying data is valid.

NetCDF tempfile handling broken on windows

When using the NCSS client to open data we saved to a tempfile, we get a RuntimeError due to "Permission denied". I'm guessing Windows doesn't like someone opening the file a second time.

We could try catching the error and opening a different way, but in that case it's a much bigger pain to ensure the file gets deleted. sigh Why can't windows just behave in a unix-like fashion?

This may be obviated once we get the in-memory solution in the netcdf library, but I think it will be awhile after that before it's widely available. Might punt on this until a windows user complains...

Simple accessors for Best, latest, 2DTime

Like:

dap_url = siphon.catalog.get_latest_access_url("http://thredds-jumbo.unidata.ucar.edu/thredds/catalog/grib/HRRR/CONUS_3km/surface/catalog.xml”, “OPENDAP”)

Whoever wrote those docs (me) forgot about it.

Streamlined interface

Should investigate whether a shortened .sel() interface makes sense for NCSS (and RadarServer?) to do a one liner around query() and get_data(), possibly with a reduced set of support query terms.

Improve radar server example

Now that we have a better catalogRef API, we should do the right thing in the example and start at the top level catalog.

Use OrderedDict in catalogs

Right now, all key-value stores within TDSCatalog are just plain Python dicts. It would be better to use OrderedDict, so that the order of iteration matches the order we parse in. That, in theory, would eliminate the need sometimes to sort on date/time to get things in proper order, since the server usually puts the most recent first.

Radarserver metadata.

Now that TDSCatalog will parse it without raising an exception, we should parse it and pull out the variables.

Update versioneer

Newest versioneer now puts config in setup.cfg, which is cleaner. Need to update.

Only expose services enabled on a given dataset

Currently, TDSCatalog builds a set of access urls on a dataset based on the services listed in the catalog. However, this is not quite correct. Once we are correctly parsing the metadata tags, then each dataset should only expose the services and access urls that are endabled on the dataset (i.e. the element fullServices as listed in the dataset metadata).

"Effective Python" ideas

Been reading effective Python. Some things that might need reviewing:

  • Use generators instead of constructing lists and returning
  • enumerate(container, start)

(will add others as I keep reading)

TDSCatalog error on radar server catalogs

from siphon.catalog import TDSCatalog
cat = TDSCatalog('http://thredds.ucar.edu/thredds/radarServer/nexrad/level2/IDD/dataset.xml')

yields

ERROR:root:No parser found for element variables
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-0fb0245feb85> in <module>()
      1 from siphon.catalog import TDSCatalog
----> 2 cat = TDSCatalog('http://thredds.ucar.edu/thredds/radarServer/nexrad/level2/IDD/dataset.xml')

/Users/rmay/repos/siphon/siphon/catalog.py in __init__(self, catalog_url)
     62                 self._process_catalog_ref(child)
     63             elif (tag_type == "metadata") or (tag_type == ""):
---> 64                 self._process_metadata(child, tag_type)
     65             elif tag_type == "service":
     66                 if child.attrib["serviceType"] != "Compound":

/Users/rmay/repos/siphon/siphon/catalog.py in _process_metadata(self, element, tag_type)
     96         if tag_type == "":
     97             logging.warning("Trying empty tag type as metadata")
---> 98         self.metadata = TDSCatalogMetadata(element, self.metadata).metadata
     99 
    100     def _process_datasets(self):

/Users/rmay/repos/siphon/siphon/metadata.py in __init__(self, element, metadata_in)
    469         if element_name == "metadata":
    470             for child in element:
--> 471                 self._parse_element(child)
    472         else:
    473             self._parse_element(element)

/Users/rmay/repos/siphon/siphon/metadata.py in _parse_element(self, element)
    513 
    514         try:
--> 515             parser[element_name](element)
    516         except KeyError:
    517             logging.error("No parser found for element %s", element_name)

KeyError: 'variables'

API warts

Just a few things that bug me as I write notebooks:

  • The need to do: NCSS(ds.access_urls['NetcdfSubset']). Would be nice to just get the object back, since it's clear what needs to happen
  • The whole interface to pulling out datasets. i.e. list(cat.datasets.values())[0]. There needs to be a simple way to pull out a dataset by position, since first/last are really the most common use. I rarely actually want to grab a dataset by name.

Narrative documentation.

We have good API docs, and we have some good examples. We just need some docs to discuss how the pieces fit together and just how to approach solving problems with the library. Might want to wait for the catalog API enhancements before doing this.

Update vcrpy

Try out the new 1.7.4 release and see if everything works now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.