Comments (12)
@scottstanie I've a fix for this issue in #181
from h5netcdf.
yes this looks great from my perspective, thank you!
from h5netcdf.
For more info about "this used to work", when I install v0.13.0, the code runs successfully, but starting with v0.15.0, it fails with the AttributeError.
from h5netcdf.
@scottstanie Thanks for the report. Could you please add the respective xarray version?
I might not find time in the next days to fully investigate, but it looks like an issue with "phony_dims" kwarg.
Btw, have you tried to use this kwarg within xarray.open_dataset?
Normally h5netcdf should raise an error if dimensions without scale are detected.
from h5netcdf.
Do you mean passing the kwarg like this?
In [6]: ds = xr.open_dataset("test.h5", phony_dims="sort", engine="h5netcdf")
In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions: (phony_dim_0: 3, phony_dim_1: 3)
Dimensions without coordinates: phony_dim_0, phony_dim_1
Data variables:
data (phony_dim_0, phony_dim_1) float64 ...
In [8]: ds.to_netcdf("test.nc", engine="h5netcdf")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 ds.to_netcdf("test.nc", engine="h5netcdf")
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/core/dataset.py:1901, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1898 encoding = {}
1899 from ..backends.api import to_netcdf
-> 1901 return to_netcdf(
1902 self,
1903 path,
1904 mode,
1905 format=format,
1906 group=group,
1907 engine=engine,
1908 encoding=encoding,
1909 unlimited_dims=unlimited_dims,
1910 compute=compute,
1911 invalid_netcdf=invalid_netcdf,
1912 )
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
1068 # to avoid this mess of conditionals
1069 try:
1070 # TODO: allow this work (setting up the file for writing array data)
1071 # to be parallelized with dask
-> 1072 dump_to_store(
1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1074 )
1075 if autoclose:
1076 store.close()
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1116 if encoder:
1117 variables, attrs = encoder(variables, attrs)
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/common.py:264, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
261 variables, attributes = self.encode(variables, attributes)
263 self.set_attributes(attributes)
--> 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
265 self.set_variables(
266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
267 )
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/common.py:341, in AbstractWritableDataStore.set_dimensions(self, variables, unlimited_dims)
339 elif dim not in existing_dims:
340 is_unlimited = dim in unlimited_dims
--> 341 self.set_dimension(dim, length, is_unlimited)
File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py:262, in H5NetCDFStore.set_dimension(self, name, length, is_unlimited)
260 self.ds.resize_dimension(name, length)
261 else:
--> 262 self.ds.dimensions[name] = length
File ~/repos/h5netcdf/h5netcdf/dimensions.py:30, in Dimensions.__setitem__(self, name, size)
27 if name in self._objects:
28 raise ValueError("dimension %r already exists" % name)
---> 30 self._objects[name] = Dimension(self._group, name, size, create_h5ds=True)
File ~/repos/h5netcdf/h5netcdf/dimensions.py:80, in Dimension.__init__(self, parent, name, size, create_h5ds)
78 self._size = 0 if size is None else size
79 if self._phony:
---> 80 self._root._phony_dim_count += 1
81 else:
82 self._root._max_dim_id += 1
AttributeError: 'File' object has no attribute '_phony_dim_count'
Here's the version info:
>>> xarray.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:07:06) [Clang 13.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: 0.13.0
h5py: 3.6.0
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.6.0
distributed: 2022.6.0
matplotlib: 3.4.3
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
setuptools: 62.6.0
pip: 22.1.2
conda: 4.13.0
pytest: 7.1.2
IPython: 8.4.0
sphinx: 5.0.2
from h5netcdf.
Thanks! Yes something along these lines.The complete stack trace would be useful. Bu I think you might need to wrap this kwarg in the backend_kwargs.
It would be great if you could reproduce this without using xarray (does reading the file work with only h5netcdf? ). Thanks for your efforts.
from h5netcdf.
(edited my previous comment to include the whole stack trace)
Correcting that backend kwarg doesn't change the result:
In [1]: import numpy as np, h5py, xarray as xr
...:
...: with h5py.File("test.h5", "w") as hf:
...: dset = "data"
...: hf["data"] = np.random.rand(3, 3)
...:
...: with xr.open_dataset("test.h5", engine='h5netcdf', backend_kwargs={"phony_dims":"sort"}) as ds:
...: ds.to_netcdf("test.nc", engine="h5netcdf", invalid_netcdf=True)
# raises the same AttributeError as above
It does seem like using engine="netcdf4"
makes it not error:
In [2]: import numpy as np, h5py, xarray as xr
...:
...: with h5py.File("test.h5", "w") as hf:
...: dset = "data"
...: hf["data"] = np.random.rand(3, 3)
...:
...: with xr.open_dataset("test.h5", engine='netcdf4') as ds:
...: ds.to_netcdf("test.nc", engine="netcdf4",)
In [3]:
$ ncdump test.nc
netcdf test {
dimensions:
phony_dim_0 = 3 ;
phony_dim_1 = 3 ;
variables:
double data(phony_dim_0, phony_dim_1) ;
data :_FillValue = NaN ;
data:
data =
0.554398717043143, 0.720796883960318, 0.228745365018886,
0.542514784472168, 0.218146351090197, 0.156048073946609,
0.560945499800545, 0.857158648825899, 0.68198861443614 ;
}
I'm not immediately sure how to recreate this without xarray (as I've only every interacted with h5netcdf via xarray), but can try to look into it
from h5netcdf.
@scottstanie Thanks again for describing in detail.
I've identified the issue. It's connected with the naming scheme (phony_dim_0
etc.). The bad things happen at initialization of the Dimension-object, see below.
h5netcdf/h5netcdf/dimensions.py
Lines 73 to 86 in ca0ba1b
In Line 74 we assume that any dimension with phony_dim
in it's name is a real phony_dim
(dimension without scale). That holds true for reading real phony_dims but not for reading/writing normal dims which just happen to be named phony_dim
. I'm afraid I need to think better next time, which is now.
As a workaround for the writing case you can use:
with xr.open_dataset("test.h5") as ds:
ds = ds.rename_dims({"phony_dim_0": "x", "phony_dim_1": "y"})
ds.to_netcdf("test.nc", engine="h5netcdf")
Unfortunately reading files with fake phony_dim
(meaning real dimensions with scale but named phony_dim_N
) is currently broken.
from h5netcdf.
The renaming fix seems perfectly good for me for now, as I doubt this is a common situation to create something partially-compliant in HDF5, then convert using xarray. thanks for figuring that out!
from h5netcdf.
Glad it helps you for the time being.
from h5netcdf.
@scottstanie I'd appreciate if you could test that PR with some of your use cases.
The thing is, that for the moment h5netcdf does not allow mixed variables (eg. one dimension with scale and the other without). I tried to compare with netcdf-c/netcdf4-python to iron out any issues (with little outcome so far). I'll let this sit for some days and start with fresh eyes then
from h5netcdf.
resolved by #181
from h5netcdf.
Related Issues (20)
- Can numpy objects be supported? HOT 5
- h5netcdf writes invalid netcdf to existing netcdf files HOT 6
- Corrupted headers when serialising using dask.distributed client HOT 9
- Test failures with NetCDF 4.9.0 HOT 6
- Remove h5py2 related code and CI builds HOT 3
- FAILED h5netcdf/tests/test_h5netcdf.py::test_group_names HOT 6
- Tests test_more_than_7_attr_creation_track_order and test_bool_slicing_length_one_dim fail in the test suite HOT 18
- very slow partial reading when saved with index shift HOT 10
- h5py minimum version update? HOT 5
- Improving performance for h5netcdf HOT 7
- Documentation request: Alternative way to obtain h5netcdf HOT 1
- Segmentation fault after upgrading to h5netcdf==1.1.0 HOT 14
- ValueError raised when attribute has type `h5py.Reference` HOT 5
- support for HDF5 dimension scales with null dataspace HOT 2
- Modifying attributes safely is not possible with all datasets. HOT 10
- Better Error for illegal variable names HOT 3
- md5 checksum mismatch for identical files/data HOT 7
- Question: does `h5netcdf` bring in the entire data from a netCDF file on a remote disk (like S3)? HOT 5
- Provide an example with time dimension readable by paraview HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h5netcdf.