Coder Social home page Coder Social logo

h5netcdf's Introduction

h5netcdf

image

image

image

A Python interface for the netCDF4 file-format that reads and writes local or remote HDF5 files directly via h5py or h5pyd, without relying on the Unidata netCDF library.

Why h5netcdf?

  • It has one less binary dependency (netCDF C). If you already have h5py installed, reading netCDF4 with h5netcdf may be much easier than installing netCDF4-Python.
  • We've seen occasional reports of better performance with h5py than netCDF4-python, though in many cases performance is identical. For one workflow, h5netcdf was reported to be almost 4x faster than netCDF4-python.
  • Anecdotally, HDF5 users seem to be unexcited about switching to netCDF --hopefully this will convince them that netCDF4 is actually quite sane!
  • Finally, side-stepping the netCDF C library (and Cython bindings to it) gives us an easier way to identify the source of performance issues and bugs in the netCDF libraries/specification.

Install

Ensure you have a recent version of h5py installed (I recommend using conda or the community effort conda-forge). At least version 3.0 is required. Then:

$ pip install h5netcdf

Or if you are already using conda:

$ conda install h5netcdf

Note:

From version 1.2. h5netcdf tries to align with a nep29-like support policy with regard to it's upstream dependencies.

Usage

h5netcdf has two APIs, a new API and a legacy API. Both interfaces currently reproduce most of the features of the netCDF interface, with the notable exception of support for operations that rename or delete existing objects. We simply haven't gotten around to implementing this yet. Patches would be very welcome.

New API

The new API supports direct hierarchical access of variables and groups. Its design is an adaptation of h5py to the netCDF data model. For example:

import h5netcdf
import numpy as np

with h5netcdf.File('mydata.nc', 'w') as f:
    # set dimensions with a dictionary
    f.dimensions = {'x': 5}
    # and update them with a dict-like interface
    # f.dimensions['x'] = 5
    # f.dimensions.update({'x': 5})

    v = f.create_variable('hello', ('x',), float)
    v[:] = np.ones(5)

    # you don't need to create groups first
    # you also don't need to create dimensions first if you supply data
    # with the new variable
    v = f.create_variable('/grouped/data', ('y',), data=np.arange(10))

    # access and modify attributes with a dict-like interface
    v.attrs['foo'] = 'bar'

    # you can access variables and groups directly using a hierarchical
    # keys like h5py
    print(f['/grouped/data'])

    # add an unlimited dimension
    f.dimensions['z'] = None
    # explicitly resize a dimension and all variables using it
    f.resize_dimension('z', 3)

Notes:

  • Automatic resizing of unlimited dimensions with array indexing is not available.
  • Dimensions need to be manually resized with Group.resize_dimension(dimension, size).
  • Arrays are returned padded with fillvalue (taken from underlying hdf5 dataset) up to current size of variable's dimensions. The behaviour is equivalent to netCDF4-python's Dataset.set_auto_mask(False).

Legacy API

The legacy API is designed for compatibility with netCDF4-python. To use it, import h5netcdf.legacyapi:

import h5netcdf.legacyapi as netCDF4
# everything here would also work with this instead:
# import netCDF4
import numpy as np

with netCDF4.Dataset('mydata.nc', 'w') as ds:
    ds.createDimension('x', 5)
    v = ds.createVariable('hello', float, ('x',))
    v[:] = np.ones(5)

    g = ds.createGroup('grouped')
    g.createDimension('y', 10)
    g.createVariable('data', 'i8', ('y',))
    v = g['data']
    v[:] = np.arange(10)
    v.foo = 'bar'
    print(ds.groups['grouped'].variables['data'])

The legacy API is designed to be easy to try-out for netCDF4-python users, but it is not an exact match. Here is an incomplete list of functionality we don't include:

  • Utility functions chartostring, num2date, etc., that are not directly necessary for writing netCDF files.
  • h5netcdf variables do not support automatic masking or scaling (e.g., of values matching the _FillValue attribute). We prefer to leave this functionality to client libraries (e.g., xarray), which can implement their exact desired scaling behavior. Nevertheless arrays are returned padded with fillvalue (taken from underlying hdf5 dataset) up to current size of variable's dimensions. The behaviour is equivalent to netCDF4-python's Dataset.set_auto_mask(False).

Invalid netCDF files

h5py implements some features that do not (yet) result in valid netCDF files:

  • Data types:
    • Booleans
    • Complex values
    • Non-string variable length types
    • Reference types
  • Arbitrary filters:
    • Scale-offset filters

By default1, h5netcdf will not allow writing files using any of these features, as files with such features are not readable by other netCDF tools.

However, these are still valid HDF5 files. If you don't care about netCDF compatibility, you can use these features by setting invalid_netcdf=True when creating a file:

# avoid the .nc extension for non-netcdf files
f = h5netcdf.File('mydata.h5', invalid_netcdf=True)
...

# works with the legacy API, too, though compression options are not exposed
ds = h5netcdf.legacyapi.Dataset('mydata.h5', invalid_netcdf=True)
...

In such cases the _NCProperties attribute will not be saved to the file or be removed from an existing file. A warning will be issued if the file has .nc-extension.

Footnotes

Decoding variable length strings

h5py 3.0 introduced new behavior for handling variable length string. Instead of being automatically decoded with UTF-8 into NumPy arrays of str, they are required as arrays of bytes.

The legacy API preserves the old behavior of h5py (which matches netCDF4), and automatically decodes strings.

The new API matches h5py behavior. Explicitly set decode_vlen_strings=True in the h5netcdf.File constructor to opt-in to automatic decoding.

Datasets with missing dimension scales

By default2 h5netcdf raises a ValueError if variables with no dimension scale associated with one of their axes are accessed. You can set phony_dims='sort' when opening a file to let h5netcdf invent phony dimensions according to netCDF behaviour.

# mimic netCDF-behaviour for non-netcdf files
f = h5netcdf.File('mydata.h5', mode='r', phony_dims='sort')
...

Note, that this iterates once over the whole group-hierarchy. This has affects on performance in case you rely on laziness of group access. You can set phony_dims='access' instead to defer phony dimension creation to group access time. The created phony dimension naming will differ from netCDF behaviour.

f = h5netcdf.File('mydata.h5', mode='r', phony_dims='access')
...

Footnotes

Track Order

As of h5netcdf 1.1.0, if h5py 3.7.0 or greater is detected, the track_order parameter is set to True enabling order tracking for newly created netCDF4 files. This helps ensure that files created with the h5netcdf library can be modified by the netCDF4-c and netCDF4-python implementation used in other software stacks. Since this change should be transparent to most users, it was made without deprecation.

Since track_order is set at creation time, any dataset that was created with track_order=False (h5netcdf version 1.0.2 and older except for 0.13.0) will continue to opened with order tracker disabled.

The following describes the behavior of h5netcdf with respect to order tracking for a few key versions:

  • Version 0.12.0 and earlier, the track_order parameter`order was missing and thus order tracking was implicitely set to False.
  • Version 0.13.0 enabled order tracking by setting the parameter track_order to True by default without deprecation.
  • Versions 0.13.1 to 1.0.2 set track_order to False due to a bug in a core dependency of h5netcdf, h5py upstream bug which was resolved in h5py 3.7.0 with the help of the h5netcdf team.
  • In version 1.1.0, if h5py 3.7.0 or above is detected, the track_order parameter is set to True by default.

Changelog

Changelog

License

3-clause BSD


  1. h5netcdf we will raise h5netcdf.CompatibilityError.

  2. Keyword default setting phony_dims=None for backwards compatibility.

h5netcdf's People

Contributors

ajelenak avatar bnaul avatar dependabot[bot] avatar dionhaefner avatar drew-parsons avatar ghisvail avatar groutr avatar hmaarrfk avatar kmuehlbauer avatar krischer avatar laliberte avatar mraspaud avatar paugier avatar scottyhq avatar shoyer avatar tomaugspurger avatar zequihg50 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

h5netcdf's Issues

Test failure with libnetcdf 4.6.2

These only seem to appear when I install libnetcdf 4.6.2 specifically. With libnetcdf 4.6.1, they go away:

=================================== FAILURES ===================================
______________________ test_write_legacyapi_read_netCDF4 _______________________
tmp_local_netcdf = '/tmp/pytest-of-travis/pytest-0/test_write_legacyapi_read_netC0/testfile.nc'
    def test_write_legacyapi_read_netCDF4(tmp_local_netcdf):
>       roundtrip_legacy_netcdf(tmp_local_netcdf, netCDF4, legacyapi)
h5netcdf/tests/test_h5netcdf.py:383: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
h5netcdf/tests/test_h5netcdf.py:379: in roundtrip_legacy_netcdf
    read_legacy_netcdf(tmp_netcdf, read_module, write_module)
h5netcdf/tests/test_h5netcdf.py:196: in read_legacy_netcdf
    ds = read_module.Dataset(tmp_netcdf, 'r')
netCDF4/_netCDF4.pyx:2287: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   AttributeError: 'NoneType' object has no attribute 'dimensions'
netCDF4/_netCDF4.pyx:1818: AttributeError
_______________________ test_write_h5netcdf_read_netCDF4 _______________________
tmp_local_netcdf = '/tmp/pytest-of-travis/pytest-0/test_write_h5netcdf_read_netCD0/testfile.nc'
    def test_write_h5netcdf_read_netCDF4(tmp_local_netcdf):
        write_h5netcdf(tmp_local_netcdf)
>       read_legacy_netcdf(tmp_local_netcdf, netCDF4, h5netcdf)
h5netcdf/tests/test_h5netcdf.py:401: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
h5netcdf/tests/test_h5netcdf.py:196: in read_legacy_netcdf
    ds = read_module.Dataset(tmp_netcdf, 'r')
netCDF4/_netCDF4.pyx:2287: in netCDF4._netCDF4.Dataset.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   AttributeError: 'NoneType' object has no attribute 'dimensions'
netCDF4/_netCDF4.pyx:1818: AttributeError

cannot read netCDF with more than one unlimited dimension

import numpy as np
import xarray as xr
import h5netcdf
import h5netcdf.legacyapi as netCDF4

# create a netCDF file with more than one unlimited dimension

x = np.arange(10.0)
ds = xr.Dataset({'data': ('dim0', x)}, 
             {'dim0': ('dim0', x), 'dim1': ('dim1', x)})

ds.to_netcdf('tst.nc', unlimited_dims=['dim0', 'dim1'])

# try to read it with h5netcdf

h5netcdf.File('tst.nc', 'r')
netCDF4.Dataset('tst.nc', 'r')

# works when using netCDF4
xr.Dataset('tst.nc', engine='netcdf4')

This yields ValueError: Each dimension with an actual length must have a 'REFERENCE_LIST' attribute. I think this should work.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.120-45-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

xarray: 0.10.2+dev6.g9261601
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.17.2
distributed: 1.21.3
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.2
conda: None
pytest: 3.4.2
IPython: 6.2.1
sphinx: None

Write the _NCProperties property

This was introduced in netcdf-c v4.4.1:

$ h5dump tmp.nc
HDF5 "tmp.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 8192;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17"
      }
   }
}
}

Test suite is flaky due to failure to connect to h5pyd test server

Over the last few months, h5netcdf's test suite has been failing about 50% of the time on Travis-CI:
screen shot 2018-08-18 at 10 22 12 pm

This is just enough to start to get annoying.

In particular, it fails at:

$ curl ${HS_ENDPOINT}/about && export WITHRESTAPI=--restapi
curl: (7) Failed to connect to 52.4.181.237 port 5104: Connection timed out
The command "curl ${HS_ENDPOINT}/about && export WITHRESTAPI=--restapi" failed and exited with 7 during .
Your build has been stopped.

@ajelenak-thg any ideas?

Tests yield OSError exceptions

I am getting a bunch of OSError exceptions when running the test suite for both the packaged version 0.3.1 and the new version 0.4.2. Here is the full log:

============================= test session starts ==============================
platform linux -- Python 3.6.3rc1, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /<<PKGBUILDDIR>>, inifile:
collected 21 items

tests/test_h5netcdf.py ..F...F..............

=================================== FAILURES ===================================
______________________ test_write_netCDF4_read_legacyapi _______________________

tmp_netcdf = '/tmp/pytest-of-root/pytest-0/test_write_netCDF4_read_legacy0/testfile.nc'

    def test_write_netCDF4_read_legacyapi(tmp_netcdf):
>       roundtrip_legacy_netcdf(tmp_netcdf, legacyapi, netCDF4)

tests/test_h5netcdf.py:343: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_h5netcdf.py:331: in roundtrip_legacy_netcdf
    read_legacy_netcdf(tmp_netcdf, read_module, write_module)
tests/test_h5netcdf.py:193: in read_legacy_netcdf
    if is_h5py_char_working(tmp_netcdf, 'z'):
tests/test_h5netcdf.py:52: in is_h5py_char_working
    assert array_equal(v, _char_array)
tests/test_h5netcdf.py:32: in array_equal
    a, b = map(np.array, (a[...], b[...]))
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
/usr/lib/python3/dist-packages/h5py/_hl/dataset.py:496: in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
h5py/h5d.pyx:181: in h5py.h5d.DatasetID.read
    ???
h5py/_proxy.pyx:130: in h5py._proxy.dset_rw
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Can't read data (no appropriate function for conversion path)

h5py/_proxy.pyx:84: OSError
_______________________ test_write_netCDF4_read_h5netcdf _______________________

tmp_netcdf = '/tmp/pytest-of-root/pytest-0/test_write_netCDF4_read_h5netc0/testfile.nc'

    def test_write_netCDF4_read_h5netcdf(tmp_netcdf):
        write_legacy_netcdf(tmp_netcdf, netCDF4)
>       read_h5netcdf(tmp_netcdf, netCDF4)

tests/test_h5netcdf.py:363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_h5netcdf.py:281: in read_h5netcdf
    if is_h5py_char_working(tmp_netcdf, 'z'):
tests/test_h5netcdf.py:52: in is_h5py_char_working
    assert array_equal(v, _char_array)
tests/test_h5netcdf.py:32: in array_equal
    a, b = map(np.array, (a[...], b[...]))
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
/usr/lib/python3/dist-packages/h5py/_hl/dataset.py:496: in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
h5py/h5d.pyx:181: in h5py.h5d.DatasetID.read
    ???
h5py/_proxy.pyx:130: in h5py._proxy.dset_rw
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Can't read data (no appropriate function for conversion path)

h5py/_proxy.pyx:84: OSError
===================== 2 failed, 19 passed in 1.13 seconds ======================

Group.name property returns full path instead of name

XRef: pydata/xarray#3680 for more detail.

Sorry, if that has been discussed before.

I see two points here:

  • alignment with h5py which returns the full name of the object (full path from root), status quo
  • alignment with netCDF4 which returns the name

Xarray tries to use h5netcdf in the same manner as netCDF4, which breaks due to this.

The questions is, where should this be fixed? Thanks for any ideas!

List of Functions

Hello @shoyer

Let me congratulate you again for you work, I really think it's an important one and I hope you got the recognition.

That being said I would like to suggest that you add a list of the supported features, in especial for the legacyapi. NetCDF4 has lots of features and new ones are constantly added, so you can can not guaranty to support "most of the features of the netCDF interface". In special in NetCDF4, createVariable has an argument endian, also NetCDF4 has the mask and scale feature, I may be wrong, but it seems like h5netcdf don't have those features.

netcdf4 in wxpython can't visualize plot !

hi,

i have my app in wxpython , i need to plot some netcdf4 fil in folder
the problem i have 2 leftpanel the first leftpaneltop when i select with button choose fil netcdf4 and the second leftpanelbottom is for visualize the file but when i choose my file they can't open with error :

Traceback (most recent call last):
File "/home/sarah/app2.py", line 80, in onOpen
LeftPanelBottom(parent=self, netCDF4=chosen_file)
TypeError: init() got an unexpected keyword argument 'netCDF4'
i work with wxpython4 for python3.6

thanks for help

suggest variable has no dimension scale associated with axis 0 warning, not error

Stephan,
(noob here, just want numpy-able data)

I have a dataset with 2 keys [u'NS', u'AlgorithmRuntimeInfo'] which

  • h5py reads fine
  • h5netcdf reads fine, but print f["AlgorithmRuntimeInfo"]
    ValueError: variable u'/AlgorithmRuntimeInfo' has no dimension scale associated with axis 0
  • xr.open_dataset( infile, engine="h5netcdf" )
    dies with the same ValueError.

Make this a warning, not an error ?

(The dataset is 50M, not mine; shall I copy it to some netcdf zoo for you ?
Its original name is 2A.GPM.Ku.V720170308.20180502-S014128-E021127.V05A.RT-H5
if that tells you anything.)

Thanks, cheers
-- denis

Pytest API change

All tests may fail because of using deprecated pytest features.

    def __getattr__(self, attr):
>       warnings.warn(PYTEST_CONFIG_GLOBAL, stacklevel=2)
E       pytest.PytestDeprecationWarning: the `pytest.config` global is deprecated.  Please use `request.config` or `pytest_configure` (if you're a pytest plugin) instead.

broken with h5py 2.7.1

Hello, seems that some recent changes in h5py are making it FTBFS (testsuite failures)

can you please have a look?
https://launchpadlibrarian.net/336263681/buildlog_ubuntu-artful-amd64.python-h5netcdf_0.4.1-0ubuntu1_BUILDING.txt.gz

   dh_auto_test -O--buildsystem=pybuild
I: pybuild base:184: cd /<<PKGBUILDDIR>>/.pybuild/pythonX.Y_3.6/build; python3.6 -m pytest /<<PKGBUILDDIR>>/h5netcdf/tests
============================= test session starts ==============================
platform linux -- Python 3.6.2, pytest-3.1.3, py-1.4.34, pluggy-0.4.0
rootdir: /<<PKGBUILDDIR>>, inifile:
collected 21 items

../../../h5netcdf/tests/test_h5netcdf.py ..F...F..............

=================================== FAILURES ===================================
______________________ test_write_netCDF4_read_legacyapi _______________________

tmp_netcdf = '/tmp/pytest-of-buildd/pytest-0/test_write_netCDF4_read_legacy0/testfile.nc'

    def test_write_netCDF4_read_legacyapi(tmp_netcdf):
>       roundtrip_legacy_netcdf(tmp_netcdf, legacyapi, netCDF4)

../../../h5netcdf/tests/test_h5netcdf.py:341: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../h5netcdf/tests/test_h5netcdf.py:329: in roundtrip_legacy_netcdf
    read_legacy_netcdf(tmp_netcdf, read_module, write_module)
../../../h5netcdf/tests/test_h5netcdf.py:191: in read_legacy_netcdf
    if is_h5py_char_working(tmp_netcdf, 'z'):
../../../h5netcdf/tests/test_h5netcdf.py:52: in is_h5py_char_working
    assert array_equal(v, _char_array)
../../../h5netcdf/tests/test_h5netcdf.py:32: in array_equal
    a, b = map(np.array, (a[...], b[...]))
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2847)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2805)
    ???
/usr/lib/python3/dist-packages/h5py/_hl/dataset.py:496: in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2847)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2805)
    ???
h5py/h5d.pyx:181: in h5py.h5d.DatasetID.read (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/h5d.c:3428)
    ???
h5py/_proxy.pyx:130: in h5py._proxy.dset_rw (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_proxy.c:2006)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Can't read data (no appropriate function for conversion path)

h5py/_proxy.pyx:84: OSError
_______________________ test_write_netCDF4_read_h5netcdf _______________________

tmp_netcdf = '/tmp/pytest-of-buildd/pytest-0/test_write_netCDF4_read_h5netc0/testfile.nc'

    def test_write_netCDF4_read_h5netcdf(tmp_netcdf):
        write_legacy_netcdf(tmp_netcdf, netCDF4)
>       read_h5netcdf(tmp_netcdf, netCDF4)

../../../h5netcdf/tests/test_h5netcdf.py:361: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../h5netcdf/tests/test_h5netcdf.py:279: in read_h5netcdf
    if is_h5py_char_working(tmp_netcdf, 'z'):
../../../h5netcdf/tests/test_h5netcdf.py:52: in is_h5py_char_working
    assert array_equal(v, _char_array)
../../../h5netcdf/tests/test_h5netcdf.py:32: in array_equal
    a, b = map(np.array, (a[...], b[...]))
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2847)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2805)
    ???
/usr/lib/python3/dist-packages/h5py/_hl/dataset.py:496: in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
h5py/_objects.pyx:54: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2847)
    ???
h5py/_objects.pyx:55: in h5py._objects.with_phil.wrapper (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_objects.c:2805)
    ???
h5py/h5d.pyx:181: in h5py.h5d.DatasetID.read (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/h5d.c:3428)
    ???
h5py/_proxy.pyx:130: in h5py._proxy.dset_rw (/build/h5py-ZDuQPo/h5py-2.7.1/h5py/_proxy.c:2006)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   OSError: Can't read data (no appropriate function for conversion path)

h5py/_proxy.pyx:84: OSError
===================== 2 failed, 19 passed in 2.28 seconds ======================
E: pybuild pybuild:283: test: plugin distutils failed with: exit code=1: cd /<<PKGBUILDDIR>>/.pybuild/pythonX.Y_3.6/build; python3.6 -m pytest {dir}/h5netcdf/tests

Non-attached HDF5 dimension scale does not have REFERENCE_LIST attribute

A file created using the example code from the New API section of the README file causes the following error on file open:

ValueError                                Traceback (most recent call last)
----> 1 f = h5netcdf.File('mydata.nc', 'r')

---DIR---/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, **kwargs)
    591         # if we actually use invalid NetCDF features.
    592         self._write_ncproperties = (invalid_netcdf is not True)
--> 593         super(File, self).__init__(self, self._h5path)
    594
    595     def _check_valid_netcdf_dtype(self, dtype, stacklevel=3):

---DIR---/h5netcdf/core.py in __init__(self, parent, name)
    242                     # variables.
    243                     self._current_dim_sizes[k] = \
--> 244                         self._determine_current_dimension_size(k, current_size)
    245
    246                     if dim_id is None:

---DIR---/h5netcdf/core.py in _determine_current_dimension_size(self, dim_name, max_size)
    284             else:  # pragma: no cover
    285                 raise ValueError(
--> 286                     "Each dimension with an actual length must have a "
    287                     "'REFERENCE_LIST' attribute.")
    288

ValueError: Each dimension with an actual length must have a 'REFERENCE_LIST' attribute.

This is caused by the netCDF dimension (HDF5 dimension scale) z in the file. The example code creates and reshapes it but never assigns to any netCDF variable. On the HDF5 side this translates to not attaching the z dimension scale to another HDF5 dataset. The HDF5 library creates the REFERENCE_LIST attribute only when a dimension scale is attached for the first time.

h5dump output of the mydata.nc file:

HDF5 "mydata.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=1|h5netcdfversion=0.5.0|hdf5libversion=1.8.18"
      }
   }
   GROUP "grouped" {
      DATASET "data" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }
         ATTRIBUTE "DIMENSION_LIST" {
            DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): (DATASET 5480 /grouped/y )
            }
         }
         ATTRIBUTE "foo" {
            DATATYPE  H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_UTF8;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "bar"
            }
         }
      }
      DATASET "y" {
         DATATYPE  H5T_STRING {
            STRSIZE 1;
            STRPAD H5T_STR_NULLPAD;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 10 ) / ( 10 ) }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 54;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "This is a netCDF dimension but not a netCDF variable."
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 4192 /grouped/data ,
                  0
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
      }
   }
   DATASET "hello" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 4792 /x )
         }
      }
   }
   DATASET "x" {
      DATATYPE  H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 54;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "This is a netCDF dimension but not a netCDF variable."
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_I32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 800 /hello ,
               0
            }
         }
      }
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
   }
   DATASET "z" {
      DATATYPE  H5T_STRING {
         STRSIZE 1;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 3 ) / ( H5S_UNLIMITED ) }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 54;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "This is a netCDF dimension but not a netCDF variable."
         }
      }
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SCALAR
         DATA {
         (0): 1
         }
      }
   }
}
}

I am not familiar with the h5netcdf code enough but when the REFERENCE_LIST attribute is not present there should be no netCDF variables that depend on it so should be safe to report the maximum size of that HDF5 dimension scale as the netCDF dimension's maximum size. Something like the code below could fix this issue.

In the h5netcdf.core.Group._determine_current_dimension_size():

        dim_variable = _find_dim(self._h5group, dim_name)

        if "REFERENCE_LIST" not in dim_variable.attrs:
                return max_size

        root = self._h5group["/"]

suggestion - make it possible to use dimension objects instead of names

in netcdf-python

lon_dim = dset.createDimension('lon', 80)

longitudes = dset.createVariable('lon', 'f4', ('lon',))

suggest to make it possible to use the dimension object as an alternative to using the dimension name:

lon_dim = dset.create_dimension('lon', 80)

longitudes = dset.create_variable('lon', 'f4', (lon_dim,))

Move the change log to a separate file

Please consider moving the change log section of the README to a separate file (CHANGELOG.rst) and ensure the latter is shipped within the release tarball. Debian and other Linux distributions recommend usage of a separate change log file, and can automatically process it as part of the packaging process.

Cheers,

pytest failures

Hi,

I built from master with Python 3.8, h5py 2.10.0, netCDF4 1.5.4, and hdf5 1.10.4.

Running py.test -v h5netcdf/ (with WITHRESTAPI not set) I got the following two failures:

FAILED h5netcdf/tests/test_h5netcdf.py::test_invalid_netcdf4[testfile.nc] - AssertionError: assert 'phony_dim_0' == 'phony_dim_1'
FAILED h5netcdf/tests/test_h5netcdf.py::test_invalid_netcdf4_mixed[testfile.nc] - AssertionError: assert 'y1' == 'z1'

Any idea what's up with these?

Tried again with h5py 2.8.0 and got the same result. Strange, as it looks like the travis tests are passing.

Can't create variable on type "str"

Hi,
It looks like I can't create variables of type 'str' with the legacy api (works with netCDF4 lib)

import h5netcdf.legacyapi as netCDF4
# everything here would also work with this instead:
# import netCDF4
import numpy as np
with netCDF4.Dataset('mydata.nc', 'w') as ds:
    ds.createDimension('x', 1)
    v = ds.createVariable('myvar', str, ('x',))
    v[:] = np.array(["cool stuff"])

The error message in the end is

ValueError: Size must be positive (Size must be positive)

Is there a way to make this work ?

Thanks for the nice software !

Attributes not readable with ncdump?

The file written from example code is not readable by ncdump.

import h5netcdf
import numpy as np

with h5netcdf.File('mydata.nc', 'w') as f:
    f.dimensions = {'x': 5}
    v = f.create_variable('hello', ('x',), float)
    v[:] = np.ones(5)

with h5netcdf.File("mydata_attr.nc", "w") as f:
    f.dimensions = {"x": 5}
    v = f.create_variable("hello", ("x",), 'f8')
    v[:] = np.ones(5)
    v.attrs["test"] = "hello"
ncdump: mydata_attr.nc: NetCDF: Can't open HDF5 attribute

but without the attribute (mydata.nc), it displays just fine.

netcdf mydata {
dimensions:
    x = 5 ;
variables:
    double hello(x) ;
data:

 hello = 1, 1, 1, 1, 1 ;
}

Is there a workaround for this?

Add a bit of doc on motivation for this package

It would be great to have a bit of info on the Readme.md about the motivation for this package, and perhaps what the perceived advantages and disadvantages are (or could be) relative to other netcdf packages.

Missing git tag for 0.8.0 release

I noticed that 0.8.0 is out on PyPI and conda-forge, but not tagged on GitHub. Also, the the changelog says TBD here and "Version 0.8.0 (February 4, 2020):" on PyPI. Seems like the release commit is missing.

New API

I've started to write a new API for h5netcdf as an alternative to h5netcdf.legacyapi which will continue to mirror netCDF4-python. The new API will follow PEP-8 standards and adhere much more closely to the design of h5py. The idea is to (1) explore alternative options for a low level netCDF API in Python and (2) make it more natural for h5py users. In general, h5py seems to have a better design for exploring hierarchical datasets.

Some ideas (some of these are implemented on master, some are not):

import h5netcdf

# to reduce confusion, we don't define h5netcdf.Dataset
f = h5netcdf.File('data.nc', mode='w')

# you can set dimensions either with a dictionary
f.dimensions = {'a': 2, 'b': 3}
# or with dictionary like assignment (this mirrors .attrs)
f.dimensions['c'] = 4

# you don't need to explicitly create a dimension first if you supply data with create_variable
f.create_variable('foo', ('x',), data=np.arange(5))

Some questions:

  1. Should we use dims or dimensions as an identifier? The former is shorter, and there is some precedence with attrs in h5py (and dims in xray), but the later is more descriptive.
  2. Should we eliminate create_dimensions as a method entirely? The alternative is to only support setting dimensions via the dictionary like f.dimensions (see examples above). This would preclude #5.

CC @mangecoeur

legacyapi: set_auto_mask, S1 variables don't work

More things that don't work in h5netcdf.legacyapi but do work in netCDF4:

import h5netcdf.legacyapi as netCDF4
# import netCDF4  # works

with netCDF4.Dataset("out.e", "w") as rootgrp:
    rootgrp.createDimension("num_dim", 2)
    rootgrp.createDimension("len_string", 33)
    coor_names = rootgrp.createVariable("coor_names", "S1", ("num_dim", "len_string"))
    coor_names.set_auto_mask(False)  # fails
    coor_names[0, 0] = "X"  # fails
    coor_names[1, 0] = "Y"
AttributeError: NetCDF: attribute Variable not found
TypeError: No conversion path for dtype: dtype('<U1')

Error importing compound type

import h5netcdf
import netCDF4
import numpy as np
ncfile = netCDF4.Dataset('test.nc','w',format='NETCDF4')
complex128 = np.dtype([('real',np.float64),('imag',np.float64)])
complex128_t = ncfile.createCompoundType(complex128,'complex128')
ncfile.close()
h5netcdf.File('test.nc', 'r')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.7/site-packages/h5netcdf/core.py", line 683, in __repr__
    return '\n'.join([header] + self._repr_body())
  File "/tmp/venv/lib/python3.7/site-packages/h5netcdf/core.py", line 538, in _repr_body
    ['Attributes:'] +
  File "/tmp/venv/lib/python3.7/site-packages/h5netcdf/core.py", line 537, in <listcomp>
    for k, v in self.variables.items()] +
  File "/tmp/venv/lib/python3.7/site-packages/h5netcdf/core.py", line 112, in dimensions
    self._dimensions = self._lookup_dimensions()
  File "/tmp/venv/lib/python3.7/site-packages/h5netcdf/core.py", line 96, in _lookup_dimensions
    for axis, dim in enumerate(self._h5ds.dims):
AttributeError: 'Datatype' object has no attribute 'dims'
% h5dump test.nc
HDF5 "test.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 57;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.10.2"
      }
   }
   DATATYPE "complex128" H5T_COMPOUND {
      H5T_IEEE_F64LE "real";
      H5T_IEEE_F64LE "imag";
   }
}
}

include pep8-compliant API alternative

It seems to me that if you are making an alternative to the existing netcdf packages, it's a good opportunity to have an API that follows Python naming conventions as defined in pep8, principally with names separated by underscore rather than camelCase.

Also recommend make sure your Docs are uploaded to readthedocs.

Disable writing files with features not supported by netCDF-C?

Currently, we support several features implicitly via h5py that make for files the netCDF-C library can't read:

I am inclined to disable writing files with these features to avoid incompatibilities with standard netCDF tools. We would need to add some sort of encoding for handling complex data directly with xarray first to support that common use case.

createVariable: KeyError: 't' (works with netCDF4)

MWE:

import h5netcdf.legacyapi as netCDF4
# import netCDF4  # works


with netCDF4.Dataset("out.e", "w") as rootgrp:
    rootgrp.createDimension("time_step", None)
    data = rootgrp.createVariable("time_whole", "f4", "time_step")
    data[:] = 0.0
Traceback (most recent call last):
  File "b.py", line 7, in <module>
    data = rootgrp.createVariable("time_whole", "f4", "time_step")
  File "/home/nschloe/.local/lib/python3.8/site-packages/h5netcdf/legacyapi.py", line 91, in createVariable
    return super(Group, self).create_variable(
  File "/home/nschloe/.local/lib/python3.8/site-packages/h5netcdf/core.py", line 500, in create_variable
    return group._create_child_variable(keys[-1], dimensions, dtype, data,
  File "/home/nschloe/.local/lib/python3.8/site-packages/h5netcdf/core.py", line 463, in _create_child_variable
    shape = tuple(self._current_dim_sizes[d] for d in dimensions)
  File "/home/nschloe/.local/lib/python3.8/site-packages/h5netcdf/core.py", line 463, in <genexpr>
    shape = tuple(self._current_dim_sizes[d] for d in dimensions)
  File "/usr/lib/python3.8/collections/__init__.py", line 891, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
  File "/usr/lib/python3.8/collections/__init__.py", line 883, in __missing__
    raise KeyError(key)
KeyError: 't'

OSError: Unable to read attribute (No appropriate function for conversion path)

I'm trying to open a netCDF4 file created by Xarray (made by opening a netCDF3 file and re-saving as netCDF4 using `.to_netcdf) and I get the error below.

import h5netcdf

import h5py as h5
with h5netcdf.File('test.nc', 'r') as fd:
    print(fd['TMP_L103'])
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-f423ec5ccbc8> in <module>()
----> 1 dset2 = xr.open_dataset('test.nc', engine='h5netcdf')

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, engine, chunks, lock, drop_variables)
    225             lock = _default_lock(filename_or_obj, engine)
    226         with close_on_error(store):
--> 227             return maybe_decode_store(store, lock)
    228     else:
    229         if engine is not None and engine != 'scipy':

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
    156             store, mask_and_scale=mask_and_scale, decode_times=decode_times,
    157             concat_characters=concat_characters, decode_coords=decode_coords,
--> 158             drop_variables=drop_variables)
    159 
    160         if chunks is not None:

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
    880         file_obj = obj._file_obj
    881     elif isinstance(obj, AbstractDataStore):
--> 882         vars, attrs = obj.load()
    883         extra_coords = set()
    884         file_obj = obj

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/common.py in load(self)
    112         """
    113         variables = FrozenOrderedDict((_decode_variable_name(k), v)
--> 114                                       for k, v in iteritems(self.get_variables()))
    115         attributes = FrozenOrderedDict(self.get_attrs())
    116         return variables, attributes

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/h5netcdf_.py in get_variables(self)
     68     def get_variables(self):
     69         return FrozenOrderedDict((k, self.open_store_variable(v))
---> 70                                  for k, v in iteritems(self.ds.variables))
     71 
     72     def get_attrs(self):

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/core/utils.py in FrozenOrderedDict(*args, **kwargs)
    274 
    275 def FrozenOrderedDict(*args, **kwargs):
--> 276     return Frozen(OrderedDict(*args, **kwargs))
    277 
    278 

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/h5netcdf_.py in <genexpr>(.0)
     68     def get_variables(self):
     69         return FrozenOrderedDict((k, self.open_store_variable(v))
---> 70                                  for k, v in iteritems(self.ds.variables))
     71 
     72     def get_attrs(self):

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/h5netcdf_.py in open_store_variable(self, var)
     53         dimensions = var.dimensions
     54         data = indexing.LazilyIndexedArray(var)
---> 55         attrs = _read_attributes(var)
     56 
     57         # netCDF4 specific encoding

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/xarray/backends/h5netcdf_.py in _read_attributes(h5netcdf_var)
     24     attrs = OrderedDict()
     25     for k in h5netcdf_var.ncattrs():
---> 26         v = h5netcdf_var.getncattr(k)
     27         if k not in ['_FillValue', 'missing_value']:
     28             v = maybe_decode_bytes(v)

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/h5netcdf/legacyapi.py in getncattr(self, name)
      6 
      7     def getncattr(self, name):
----> 8         return self.attrs[name]
      9 
     10     def setncattr(self, name, value):

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/h5netcdf/attrs.py in __getitem__(self, key)
     14         if key in _hidden_attrs:
     15             raise KeyError(key)
---> 16         return self._h5attrs[key]
     17 
     18     def __setitem__(self, key, value):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2582)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2541)()

/Users/jonathanchambers/anaconda/lib/python3.5/site-packages/h5py/_hl/attrs.py in __getitem__(self, name)
     77 
     78         arr = numpy.ndarray(shape, dtype=dtype, order='C')
---> 79         attr.read(arr, mtype=htype)
     80 
     81         if len(arr.shape) == 0:

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2582)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2541)()

h5py/h5a.pyx in h5py.h5a.AttrID.read (-------src-dir--------/h5py/h5a.c:5123)()

h5py/_proxy.pyx in h5py._proxy.attr_rw (-------src-dir--------/h5py/_proxy.c:915)()

OSError: Unable to read attribute (No appropriate function for conversion path)

By contrast, using h5py directly:

import h5py as h5
with h5.File('test.nc', 'r') as fd:
    print(fd['TMP_L103'])

Works fine. This error carries through to xarray itself if I try to use the h5netcdf backend.

createVariable chokes on scalar datasets

With the legacy API, the following crashes (while it works with netCDF4):

import h5netcdf.legacyapi as netCDF4
with netCDF4.Dataset('mydata.nc', 'w') as ds:
    v = ds.createVariable('myvar', int, (), zlib=6, fill_value=None)

here is the traceback:

Traceback (most recent call last):
  File "test_h5nc.py", line 31, in <module>
    v = ds.createVariable('myvar', int, (), zlib=6, fill_value=None)
  File "/home/a001673/usr/src/h5netcdf/h5netcdf/legacyapi.py", line 68, in createVariable
    chunks=chunksizes, fillvalue=fill_value, **kwds)
  File "/home/a001673/usr/src/h5netcdf/h5netcdf/core.py", line 262, in create_variable
    fillvalue, **kwargs)
  File "/home/a001673/usr/src/h5netcdf/h5netcdf/core.py", line 244, in _create_child_variable
    **kwargs)
  File "/home/a001673/usr/src/h5py/h5py/_hl/group.py", line 103, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/home/a001673/usr/src/h5py/h5py/_hl/dataset.py", line 107, in make_new_dset
    shuffle, fletcher32, maxshape, scaleoffset)
  File "/home/a001673/usr/src/h5py/h5py/_hl/filters.py", line 87, in generate_dcpl
    raise TypeError("Scalar datasets don't support chunk/filter options")
TypeError: Scalar datasets don't support chunk/filter options

netCDF attributes loaded as ndarrays of bytes

When using h5netcdf as the xarray backend to open files created using netCDF4, it seems all the attributes are being loaded as ndarrays with a single item of 'byte' instead of being converted to strings.

Feature Request: support endian argument to createVariable

You mention in the readme "We don't support the endian argument to createVariable. The h5py API does not appear to offer this feature."

h5py dose support endianness buts its a component of the dtype i.e. dtype='<f8' is little, while dtype='>f8' is big, you can use the newbyteorder function on a numpy dype object to set it like so

>>> np.dtype("<f, >f")
dtype([('f0', '<f4'), ('f1', '>f4')])
>>> np.dtype("<f, >f").newbyteorder('>')
dtype([('f0', '>f4'), ('f1', '>f4')])
>>> np.dtype("<f, >f").newbyteorder('<')
dtype([('f0', '<f4'), ('f1', '<f4')])

problems with user defined types?

Howdy!

I'm the author of netcdf-4, and currently help maintain the C library. I just heard about this project and it looks really fun and interesting!

From a brief read of the docs, it seems that you don't believe vlens and some other user-defined types are not supported in netCDF. Everything should be supported except:

  • transiant types (I just learned of them!)
  • reference types (I can't figure out how to handle them.)
  • circular group structures.

But things like vlens and other user-defined types should work.

Let me know if you have any questions about netcdf-c. Keep on netCDFing!

Humongous netCDF files with unlimited dimensions

I am the author of a pure-Python ocean model. After switching from netCDF4 to h5netcdf I noticed that output files were much larger.

This seems to be the case only when using unlimited dimensions, and the file does not grow by much when the dimension is resized later on, so I guess it is down to how much space is preallocated for unlimited dimensions.

Example script to reproduce the problem:

import os
import tempfile

import h5netcdf
import netCDF4

import numpy as np


DIMENSIONS = {
    "x": 250,
    "y": 250,
    "z": 40,
    "time": None,
}

VAR_DIMS = ["time", "z", "y", "x"]


def create_h5netcdf_file(outfile):
    with h5netcdf.File(outfile, "w") as ncfile:
        for dim, size in DIMENSIONS.items():
            ncfile.dimensions[dim] = size
            ncfile.create_variable(dim, (dim,), "float64")

        ncfile.create_variable("test", VAR_DIMS, "float64")

        if DIMENSIONS["time"] is None:
            ncfile.resize_dimension("time", 1)

        ncfile.variables["time"][0] = 1
        ncfile.variables["test"][:] = np.random.rand(
            *(DIMENSIONS[dim] or 1 for dim in VAR_DIMS)
        )


def create_netcdf4_file(outfile):
    with netCDF4.Dataset(outfile, "w") as ncfile:
        for dim, size in DIMENSIONS.items():
            ncfile.createDimension(dim, size)
            ncfile.createVariable(dim, "f8", dim)

        ncfile.createVariable("test", "f8", VAR_DIMS)
        ncfile.variables["time"][0] = 1
        ncfile.variables["test"][:] = np.random.rand(
            *(DIMENSIONS[dim] or 1 for dim in VAR_DIMS)
        )


def get_filesize_mb(infile):
    return os.stat(infile).st_size // 1024 ** 2


if __name__ == "__main__":
    outfile = tempfile.NamedTemporaryFile(delete=False)
    outfile.close()

    try:
        create_h5netcdf_file(outfile.name)
        print("h5netcdf: %sMB" % get_filesize_mb(outfile.name))
    finally:
        os.remove(outfile.name)

    outfile = tempfile.NamedTemporaryFile(delete=False)
    outfile.close()

    try:
        create_netcdf4_file(outfile.name)
        print("netcdf4: %sMB" % get_filesize_mb(outfile.name))
    finally:
        os.remove(outfile.name)

On my machine, this prints:

$ python h5netcdf_bug.py 
h5netcdf: 1344MB
netcdf4: 19MB

I.e., an increase in file size by a factor of ~100. If I replace the DIMENSIONS dict by

DIMENSIONS = {
    "x": 250,
    "y": 250,
    "z": 40,
    "time": 1,
}

I get

$ python h5netcdf_bug.py 
h5netcdf: 19MB
netcdf4: 19MB

as expected.

Platform

  • OSX Mojave
  • HDF5 1.10.5 built from source
  • h5py 2.9.0
  • netCDF4 4.6.2 via Homebrew
  • netCDF4-python 1.4.3

Iteration over Variable errors

In [1]: from h5netcdf.legacyapi import Dataset

In [2]: data = Dataset('/Users/jcrist/Code/ocean_slider_demo/noaa.oisst.v2.highres/sst.day.mean.1983.v2.nc').variables['sst']

In [3]: data.shape
Out[3]: (365, 720, 1440)

In [4]: import numpy as np

In [6]: np.array([a.mean() for a in data])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-336ed3fb3c22> in <module>()
----> 1 np.array([a.mean() for a in data])

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5netcdf/core.pyc in __getitem__(self, key)
     77
     78     def __getitem__(self, key):
---> 79         return self._h5ds[key]
     80
     81     def __setitem__(self, key, value):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (--------src-dir---------/h5py-2.5.0/h5py/_objects.c:2458)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (--------src-dir---------/h5py-2.5.0/h5py/_objects.c:2415)()

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in __getitem__(self, args)
    429
    430         # Perform the dataspace selection.
--> 431         selection = sel.select(self.shape, args, dsid=self.id)
    432
    433         if selection.nselect == 0:

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5py/_hl/selections.pyc in select(shape, args, dsid)
     97
     98     sel = SimpleSelection(shape)
---> 99     sel[args]
    100     return sel
    101

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5py/_hl/selections.pyc in __getitem__(self, args)
    264             return self
    265
--> 266         start, count, step, scalar = _handle_simple(self.shape,args)
    267
    268         self._id.select_hyperslab(start, count, step)

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5py/_hl/selections.pyc in _handle_simple(shape, args)
    519         else:
    520             try:
--> 521                 x,y,z = _translate_int(int(arg), length)
    522                 s = True
    523             except TypeError:

/Users/jcrist/anaconda/envs/ocean/lib/python2.7/site-packages/h5py/_hl/selections.pyc in _translate_int(exp, length)
    539
    540     if not 0<=exp<length:
--> 541         raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
    542
    543     return exp, 1, 1

ValueError: Index (365) out of range (0-364)

Might be as simple as pointing the __iter__ method at Variable._h5ds.__iter__, as this works fine:

In [7]: np.array([a.mean() for a in data._h5ds])

AttributeError when opening file with only groups at root level

When I open the this file with no other arguments, h5netcdf raises an AttributeError:

Data available from EUMETSAT: https://www.eumetsat.int/website/home/Satellites/FutureSatellites/MeteosatThirdGeneration/MTGData/MTGUserTestData/index.html --> ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/ --> uncompressed

import h5netcdf
f = "/path/to/file/W_XX...0067.nc"
ds = h5netcdf.File(f, 'r')

results in an AttributeError as follows:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 719, in __repr__
    return '\n'.join([header] + self._repr_body())
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 541, in _repr_body
    ['Attributes:'] +
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 540, in <listcomp>
    for k, v in self.variables.items()] +
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 114, in dimensions
    self._dimensions = self._lookup_dimensions()
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 98, in _lookup_dimensions
    for axis, dim in enumerate(self._h5ds.dims):
AttributeError: 'Datatype' object has no attribute 'dims'

Strangely, when run on the commandline I get:

Exception ignored in: <function File.close at 0x7f1acf790598>
Traceback (most recent call last):
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5netcdf/core.py", line 701, in close
  File "/media/nas/x21324/miniconda3/envs/py37e/lib/python3.7/site-packages/h5py/_hl/files.py", line 431, in close
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 267, in h5py.h5f.get_obj_ids
  File "h5py/h5i.pyx", line 43, in h5py.h5i.wrap_identifier
ImportError: sys.meta_path is None, Python is likely shutting down

LZF is part of the NetCDF API

xarray 0.10.6, h5netcdf 0.5.1:

>>> import xarray
>>> xarray.Dataset({'x': [1, 2]}).to_netcdf('foo.nc', engine='h5netcdf', encoding={'x': {'compression': 'lzf'}})

xarray/backends/h5netcdf_.py:222: FutureWarning: lzf compression are supported by h5py, but not part of the NetCDF API. You are writing an HDF5 file that is not a valid NetCDF file! In the future, this will be an error, unless you set invalid_netcdf=True.
  fillvalue=fillvalue, **kwargs)

However, I understand that arbitrary compression plugins have recently become part of the NetCDF specification.

ValueError: Index (1) out of range (0-0)

When slicing a length-1 dimension with a boolean, an unexpected ValueError is raised. This differs from the netCDF4 behaviour where such an operation is legal.

This is a test function that illustrates this behaviour:

def test_bool_slicing_length_one_dim(tmp_netcdf):
    with h5netcdf.File(tmp_netcdf, 'w') as ds:
        ds.dimensions = {'x' : 1, 'y' : 2}
        v = ds.create_variable('hello', ('x', 'y'), 'float')
        v[:] = np.ones((1,2))

    bool_slice = np.array([1], dtype=np.bool)
    #This should work:
    with netCDF4.Dataset(tmp_netcdf, 'r') as ds:
        data = ds.variables['hello'][bool_slice,...]

    #So does this:
    with h5netcdf.File(tmp_netcdf, 'r') as ds:
        data = ds['hello'][bool_slice,...]

  File "build/bdist.linux-x86_64/egg/h5netcdf/core.py", line 98, in __getitem__
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2582)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir-------/h5py/_objects.c:2541)
  File "/home/laliberte/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 462, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/laliberte/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 92, in select
    sel[args]
  File "/home/laliberte/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 259, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/laliberte/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 447, in _handle_simple
    x,y,z = _translate_int(int(arg), length)
  File "/home/laliberte/anaconda2/lib/python2.7/site-packages/h5py/_hl/selections.py", line 467, in _translate_int
    raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (1) out of range (0-0)

ImportError: sys.meta_path is None, Python is likely shutting down

This works :

import xarray as xr    
ds = xr.open_mfdataset(['test.h5'], concat_dim = 'datetime', engine='h5netcdf',decode_cf=False)                                                                                     
ds_new = ds['var'].copy()  

But this does not :

import xarray as xr                                                                                                                                                                 
def f():                                                                                                                                                                            
   ds = xr.open_mfdataset(['test.h5'], concat_dim = 'datetime', engine='h5netcdf',decode_cf=False)                                                                                 
   ds_new = ds['var'].copy()                                                                                                                                                       
   return ds_new                                                                                                                                                                   
ds_new = f()     

The following exception message is witten on stderr when the script ends.

Exception ignored in: <bound method CachingFileManager.del of CachingFileManager(<function open_h5netcdf_group at 0x7f0a90e7ec80>, '/home/pinaultf/sandbox/dev/xarray/test.h5', mode='r', kwargs={'group': None})>
Traceback (most recent call last):
File "/home/pinaultf/sandbox/dev/xarray/xarray/backends/file_manager.py", line 210, in del
File "/home/pinaultf/sandbox/dev/xarray/xarray/backends/file_manager.py", line 188, in close
File "/home/pinaultf/sandbox/dev/xarray/xarray/backends/netCDF4
.py", line 241, in close
File "/home/pinaultf/miniconda3/envs/env_vegeo_12_2018/lib/python3.6/site-packages/h5netcdf/core.py", line 665, in close
File "/home/pinaultf/miniconda3/envs/env_vegeo_12_2018/lib/python3.6/site-packages/h5py/_hl/files.py", line 330, in close
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 236, in h5py.h5f.get_obj_ids
File "h5py/h5i.pyx", line 43, in h5py.h5i.wrap_identifier
ImportError: sys.meta_path is None, Python is likely shutting down

As far as I kow, this may be related to #21 and to the garbage collector.

Note that adding "ds.close()" removes the message.

import xarray as xr                                                                                                                                                                 
def f():                                                                                                                                                                            
   ds = xr.open_mfdataset(['test.h5'], concat_dim = 'datetime', engine='h5netcdf',decode_cf=False) 
   ds.close()
   ds_new = ds['var'].copy()                                                                                                                                                       
   return ds_new                                                                                                                                                                   
ds_new = f()     

Works as expected.

h5netcdf creates files with groups that cannot be read with netCDF4

As reported over in pydata/xarray#2974.

For reference, the output of h5dump for files created with netcdf4 and h5netcdf:

HDF5 "test-netcdf4.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 34;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=2,netcdf=4.6.2,hdf5=1.10.4"
      }
   }
   GROUP "grp1" {
      DATASET "data1" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
         DATA {
         (0,0): 1, 2,
         (1,0): 3, 4
         }
         ATTRIBUTE "DIMENSION_LIST" {
            DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): (DATASET 566 /grp1/y ), (DATASET 876 /grp1/x )
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 0
            }
         }
      }
      DATASET "x" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 1, 2
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "x"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 3294 /grp1/data1 ,
                  1
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
      }
      DATASET "y" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 1, 2
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "y"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 3294 /grp1/data1 ,
                  0
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 0
            }
         }
      }
   }
   GROUP "grp2" {
      DATASET "data2" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3, 3 ) / ( 3, 3 ) }
         DATA {
         (0,0): 1, 2, 3,
         (1,0): 4, 5, 6,
         (2,0): 7, 8, 9
         }
         ATTRIBUTE "DIMENSION_LIST" {
            DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): (DATASET 9761 /grp2/y ), (DATASET 10071 /grp2/x )
            }
         }
      }
      DATASET "x" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): 1, 2, 3
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "x"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 12497 /grp2/data2 ,
                  1
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 3
            }
         }
      }
      DATASET "y" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): 1, 2, 3
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "y"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 12497 /grp2/data2 ,
                  0
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I32LE
            DATASPACE  SCALAR
            DATA {
            (0): 2
            }
         }
      }
   }
}
}

and

HDF5 "test-h5netcdf.nc" {
GROUP "/" {
   GROUP "grp1" {
      DATASET "data1" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
         DATA {
         (0,0): 1, 2,
         (1,0): 3, 4
         }
         ATTRIBUTE "DIMENSION_LIST" {
            DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): (DATASET 1832 /grp1/y ), (DATASET 4480 /grp1/x )
            }
         }
      }
      DATASET "x" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 1, 2
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "x"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): {
                  DATASET 4752 /grp1/data1 ,
                  1
               },
            (1): {
                  DATASET 4752 /grp1/data1 ,
                  1
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
      }
      DATASET "y" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
         DATA {
         (0): 1, 2
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "y"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): {
                  DATASET 4752 /grp1/data1 ,
                  0
               },
            (1): {
                  DATASET 4752 /grp1/data1 ,
                  0
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 0
            }
         }
      }
   }
   GROUP "grp2" {
      DATASET "data2" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3, 3 ) / ( 3, 3 ) }
         DATA {
         (0,0): 1, 2, 3,
         (1,0): 4, 5, 6,
         (2,0): 7, 8, 9
         }
         ATTRIBUTE "DIMENSION_LIST" {
            DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
            DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
            DATA {
            (0): (DATASET 11328 /grp2/y ), (DATASET 11928 /grp2/x )
            }
         }
      }
      DATASET "x" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): 1, 2, 3
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "x"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 12200 /grp2/data2 ,
                  1
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 1
            }
         }
      }
      DATASET "y" {
         DATATYPE  H5T_STD_I64LE
         DATASPACE  SIMPLE { ( 3 ) / ( 3 ) }
         DATA {
         (0): 1, 2, 3
         }
         ATTRIBUTE "CLASS" {
            DATATYPE  H5T_STRING {
               STRSIZE 16;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "DIMENSION_SCALE"
            }
         }
         ATTRIBUTE "NAME" {
            DATATYPE  H5T_STRING {
               STRSIZE 2;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SCALAR
            DATA {
            (0): "y"
            }
         }
         ATTRIBUTE "REFERENCE_LIST" {
            DATATYPE  H5T_COMPOUND {
               H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
               H5T_STD_I32LE "dimension";
            }
            DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
            DATA {
            (0): {
                  DATASET 12200 /grp2/data2 ,
                  0
               }
            }
         }
         ATTRIBUTE "_Netcdf4Dimid" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SCALAR
            DATA {
            (0): 0
            }
         }
      }
   }
}
}

Here's the diff:

1c1
< HDF5 "test-netcdf4.nc" {
---
> HDF5 "test.nc" {
3,14d2
<    ATTRIBUTE "_NCProperties" {
<       DATATYPE  H5T_STRING {
<          STRSIZE 34;
<          STRPAD H5T_STR_NULLTERM;
<          CSET H5T_CSET_ASCII;
<          CTYPE H5T_C_S1;
<       }
<       DATASPACE  SCALAR
<       DATA {
<       (0): "version=2,netcdf=4.6.2,hdf5=1.10.4"
<       }
<    }
27,34c15
<             (0): (DATASET 566 /grp1/y ), (DATASET 876 /grp1/x )
<             }
<          }
<          ATTRIBUTE "_Netcdf4Dimid" {
<             DATATYPE  H5T_STD_I32LE
<             DATASPACE  SCALAR
<             DATA {
<             (0): 0
---
>             (0): (DATASET 1832 /grp1/y ), (DATASET 4480 /grp1/x )
73c54
<             DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
---
>             DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
76c57,61
<                   DATASET 3294 /grp1/data1 ,
---
>                   DATASET 4752 /grp1/data1 ,
>                   1
>                },
>             (1): {
>                   DATASET 4752 /grp1/data1 ,
82c67
<             DATATYPE  H5T_STD_I32LE
---
>             DATATYPE  H5T_STD_I64LE
124c109
<             DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
---
>             DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
127c112,116
<                   DATASET 3294 /grp1/data1 ,
---
>                   DATASET 4752 /grp1/data1 ,
>                   0
>                },
>             (1): {
>                   DATASET 4752 /grp1/data1 ,
133c122
<             DATATYPE  H5T_STD_I32LE
---
>             DATATYPE  H5T_STD_I64LE
154c143
<             (0): (DATASET 9761 /grp2/y ), (DATASET 10071 /grp2/x )
---
>             (0): (DATASET 11328 /grp2/y ), (DATASET 11928 /grp2/x )
196c185
<                   DATASET 12497 /grp2/data2 ,
---
>                   DATASET 12200 /grp2/data2 ,
202c191
<             DATATYPE  H5T_STD_I32LE
---
>             DATATYPE  H5T_STD_I64LE
205c194
<             (0): 3
---
>             (0): 1
247c236
<                   DATASET 12497 /grp2/data2 ,
---
>                   DATASET 12200 /grp2/data2 ,
253c242
<             DATATYPE  H5T_STD_I32LE
---
>             DATATYPE  H5T_STD_I64LE
256c245
<             (0): 2
---
>             (0): 0

test_create_variable_matching_saved_dimension is failing with h5pyd

This recently introduced test (added in #46) is failing with h5pyd.

Example failures (from https://travis-ci.org/shoyer/h5netcdf/builds/422824485) on Python 3.6:

=================================== FAILURES ===================================
________ test_create_variable_matching_saved_dimension[hdf5://testfile] ________
tmp_local_or_remote_netcdf = 'http://hsdshdflab.hdfgroup.org/home/shoyer/h5pyd_test/3.6/testfileDLJFW.nc'
    def test_create_variable_matching_saved_dimension(tmp_local_or_remote_netcdf):
        h5 = get_hdf5_module(tmp_local_or_remote_netcdf)
    
        with h5netcdf.File(tmp_local_or_remote_netcdf) as f:
            f.dimensions['x'] = 2
            f.create_variable('y', data=[1, 2], dimensions=('x',))
    
        with h5.File(tmp_local_or_remote_netcdf) as f:
>           assert f['y'].dims[0].keys() == [NOT_A_VARIABLE.decode('ascii')]
E           AssertionError: assert [b'This is a ...DF variable.'] == ['This is a ne...DF variable.']
E             At index 0 diff: b'This is a netCDF dimension but not a netCDF variable.' != 'This is a netCDF dimension but not a netCDF variable.'
E             Full diff:
E             - [b'This is a netCDF dimension but not a netCDF variable.']
E             ?  -
E             + ['This is a netCDF dimension but not a netCDF variable.']
h5netcdf/tests/test_h5netcdf.py:584: AssertionError
=============== 1 failed, 42 passed, 2 skipped in 252.53 seconds ===============

and on Python 2.7:


=================================== FAILURES ===================================
________ test_create_variable_matching_saved_dimension[hdf5://testfile] ________
tmp_local_or_remote_netcdf = 'http://hsdshdflab.hdfgroup.org/home/shoyer/h5pyd_test/2.7/testfilePZDTT.nc'
    def test_create_variable_matching_saved_dimension(tmp_local_or_remote_netcdf):
        h5 = get_hdf5_module(tmp_local_or_remote_netcdf)
    
        with h5netcdf.File(tmp_local_or_remote_netcdf) as f:
            f.dimensions['x'] = 2
            f.create_variable('y', data=[1, 2], dimensions=('x',))
    
        with h5.File(tmp_local_or_remote_netcdf) as f:
            assert f['y'].dims[0].keys() == [NOT_A_VARIABLE.decode('ascii')]
    
        with h5netcdf.File(tmp_local_or_remote_netcdf) as f:
            f.create_variable('x', data=[0, 1], dimensions=('x',))
    
        with h5.File(tmp_local_or_remote_netcdf) as f:
>           assert f['y'].dims[0].keys() == ['x']
E           AssertionError: assert ['This is a n...DF variable.'] == ['x']
E             At index 0 diff: 'This is a netCDF dimension but not a netCDF variable.' != 'x'
E             Full diff:
E             - ['This is a netCDF dimension but not a netCDF variable.']
E             + ['x']
h5netcdf/tests/test_h5netcdf.py:590: AssertionError
=============== 1 failed, 42 passed, 2 skipped in 275.54 seconds ===============

The Python 3.6 failure is a minor unicode/bytes inconsistency that might not actually matter for users. I haven't looked into it enough to have an opinion on whether this is an h5py or h5pyd bug :).

The Python 2.7 failure looks more substantial: it suggests that h5pyd isn't detaching dimension scales properly.

For now, I'm going to mark this test as xfail.

CC @ajelenak-thg @jreadey

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.