Coder Social home page Coder Social logo

Comments (12)

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024 1

@scottstanie I've a fix for this issue in #181

from h5netcdf.

scottstanie avatar scottstanie commented on June 11, 2024 1

yes this looks great from my perspective, thank you!

from h5netcdf.

scottstanie avatar scottstanie commented on June 11, 2024

For more info about "this used to work", when I install v0.13.0, the code runs successfully, but starting with v0.15.0, it fails with the AttributeError.

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

@scottstanie Thanks for the report. Could you please add the respective xarray version?

I might not find time in the next days to fully investigate, but it looks like an issue with "phony_dims" kwarg.

Btw, have you tried to use this kwarg within xarray.open_dataset?

Normally h5netcdf should raise an error if dimensions without scale are detected.

from h5netcdf.

scottstanie avatar scottstanie commented on June 11, 2024

Do you mean passing the kwarg like this?

In [6]: ds = xr.open_dataset("test.h5", phony_dims="sort", engine="h5netcdf")

In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions:  (phony_dim_0: 3, phony_dim_1: 3)
Dimensions without coordinates: phony_dim_0, phony_dim_1
Data variables:
    data     (phony_dim_0, phony_dim_1) float64 ...

In [8]: ds.to_netcdf("test.nc", engine="h5netcdf")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 ds.to_netcdf("test.nc", engine="h5netcdf")

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/core/dataset.py:1901, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1898     encoding = {}
   1899 from ..backends.api import to_netcdf
-> 1901 return to_netcdf(
   1902     self,
   1903     path,
   1904     mode,
   1905     format=format,
   1906     group=group,
   1907     engine=engine,
   1908     encoding=encoding,
   1909     unlimited_dims=unlimited_dims,
   1910     compute=compute,
   1911     invalid_netcdf=invalid_netcdf,
   1912 )

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1068 # to avoid this mess of conditionals
   1069 try:
   1070     # TODO: allow this work (setting up the file for writing array data)
   1071     # to be parallelized with dask
-> 1072     dump_to_store(
   1073         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1074     )
   1075     if autoclose:
   1076         store.close()

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1116 if encoder:
   1117     variables, attrs = encoder(variables, attrs)
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/common.py:264, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    261 variables, attributes = self.encode(variables, attributes)
    263 self.set_attributes(attributes)
--> 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    265 self.set_variables(
    266     variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    267 )

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/common.py:341, in AbstractWritableDataStore.set_dimensions(self, variables, unlimited_dims)
    339 elif dim not in existing_dims:
    340     is_unlimited = dim in unlimited_dims
--> 341     self.set_dimension(dim, length, is_unlimited)

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/xarray/backends/h5netcdf_.py:262, in H5NetCDFStore.set_dimension(self, name, length, is_unlimited)
    260     self.ds.resize_dimension(name, length)
    261 else:
--> 262     self.ds.dimensions[name] = length

File ~/repos/h5netcdf/h5netcdf/dimensions.py:30, in Dimensions.__setitem__(self, name, size)
     27 if name in self._objects:
     28     raise ValueError("dimension %r already exists" % name)
---> 30 self._objects[name] = Dimension(self._group, name, size, create_h5ds=True)

File ~/repos/h5netcdf/h5netcdf/dimensions.py:80, in Dimension.__init__(self, parent, name, size, create_h5ds)
     78 self._size = 0 if size is None else size
     79 if self._phony:
---> 80     self._root._phony_dim_count += 1
     81 else:
     82     self._root._max_dim_id += 1

AttributeError: 'File' object has no attribute '_phony_dim_count'

Here's the version info:

>>> xarray.show_versions()
/Users/staniewi/miniconda3/envs/mapping/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:07:06) [Clang 13.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.4
scipy: 1.8.1
netCDF4: 1.5.8
pydap: None
h5netcdf: 0.13.0
h5py: 3.6.0
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: None
dask: 2022.6.0
distributed: 2022.6.0
matplotlib: 3.4.3
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.5.0
cupy: None
pint: None
sparse: None
setuptools: 62.6.0
pip: 22.1.2
conda: 4.13.0
pytest: 7.1.2
IPython: 8.4.0
sphinx: 5.0.2

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

Thanks! Yes something along these lines.The complete stack trace would be useful. Bu I think you might need to wrap this kwarg in the backend_kwargs.

It would be great if you could reproduce this without using xarray (does reading the file work with only h5netcdf? ). Thanks for your efforts.

from h5netcdf.

scottstanie avatar scottstanie commented on June 11, 2024

(edited my previous comment to include the whole stack trace)

Correcting that backend kwarg doesn't change the result:

In [1]: import numpy as np, h5py, xarray as xr
   ...:
   ...: with h5py.File("test.h5", "w") as hf:
   ...:     dset = "data"
   ...:     hf["data"] = np.random.rand(3, 3)
   ...:
   ...: with xr.open_dataset("test.h5", engine='h5netcdf', backend_kwargs={"phony_dims":"sort"}) as ds:
   ...:     ds.to_netcdf("test.nc", engine="h5netcdf", invalid_netcdf=True)
# raises the same AttributeError as above

It does seem like using engine="netcdf4" makes it not error:

In [2]: import numpy as np, h5py, xarray as xr
   ...:
   ...: with h5py.File("test.h5", "w") as hf:
   ...:     dset = "data"
   ...:     hf["data"] = np.random.rand(3, 3)
   ...:
   ...: with xr.open_dataset("test.h5", engine='netcdf4') as ds:
   ...:     ds.to_netcdf("test.nc", engine="netcdf4",)

In [3]: 
$ ncdump test.nc
netcdf test {
dimensions:
	phony_dim_0 = 3 ;
	phony_dim_1 = 3 ;
variables:
	double data(phony_dim_0, phony_dim_1) ;
		data :_FillValue = NaN ;
data:

 data =
  0.554398717043143, 0.720796883960318, 0.228745365018886,
  0.542514784472168, 0.218146351090197, 0.156048073946609,
  0.560945499800545, 0.857158648825899, 0.68198861443614 ;
}

I'm not immediately sure how to recreate this without xarray (as I've only every interacted with h5netcdf via xarray), but can try to look into it

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

@scottstanie Thanks again for describing in detail.

I've identified the issue. It's connected with the naming scheme (phony_dim_0 etc.). The bad things happen at initialization of the Dimension-object, see below.

self._parent_ref = weakref.ref(parent)
self._phony = "phony_dim" in name
self._root_ref = weakref.ref(parent._root)
self._h5path = _join_h5paths(parent.name, name)
self._name = name
self._size = 0 if size is None else size
if self._phony:
self._root._phony_dim_count += 1
else:
self._root._max_dim_id += 1
self._dimensionid = self._root._max_dim_id
if parent._root._writable and create_h5ds and not self._phony:
self._create_scale()
self._initialized = True

In Line 74 we assume that any dimension with phony_dim in it's name is a real phony_dim (dimension without scale). That holds true for reading real phony_dims but not for reading/writing normal dims which just happen to be named phony_dim. I'm afraid I need to think better next time, which is now.

As a workaround for the writing case you can use:

with xr.open_dataset("test.h5") as ds:
    ds = ds.rename_dims({"phony_dim_0": "x", "phony_dim_1": "y"})
    ds.to_netcdf("test.nc", engine="h5netcdf")

Unfortunately reading files with fake phony_dim (meaning real dimensions with scale but named phony_dim_N) is currently broken.

from h5netcdf.

scottstanie avatar scottstanie commented on June 11, 2024

The renaming fix seems perfectly good for me for now, as I doubt this is a common situation to create something partially-compliant in HDF5, then convert using xarray. thanks for figuring that out!

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

Glad it helps you for the time being.

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

@scottstanie I'd appreciate if you could test that PR with some of your use cases.

The thing is, that for the moment h5netcdf does not allow mixed variables (eg. one dimension with scale and the other without). I tried to compare with netcdf-c/netcdf4-python to iron out any issues (with little outcome so far). I'll let this sit for some days and start with fresh eyes then

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on June 11, 2024

resolved by #181

from h5netcdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.