hdfgroup / h5pyd Goto Github PK
View Code? Open in Web Editor NEWh5py distributed - Python client library for HDF Rest API
License: Other
h5py distributed - Python client library for HDF Rest API
License: Other
Use of session objects should improve performance by allow re-use of SSL connections. See: http://docs.python-requests.org/en/master/user/advanced/#session-objects.
In h5py, I use the file id to determine if a file is open or not. So defining this function:
def isopen(f):
if f:
print('File is open')
else:
print('File is closed')
I get the following with a local HDF5 file.
>>> a=h5.File('chopper.nxs')
>>> isopen(a)
File is open
>>> a.close()
>>> isopen(a)
File is closed
With h5pyd, I get the following:
>>> a=h5d.File('chopper.exfac', mode='r', endpoint='http://some.server:5000')
>>> isopen(a)
File is open
>>> a.close()
>>> isopen(a)
File is open
I can use the File id.uuid property instead, since that is set to 0 when the file is closed, but the current behavior is not fully compatible with h5py.
Even something preliminary, like release candidates or alpha or beta or whatever, would be useful.
Reasons for:
In order to use h5netcdf as a netcdf4
interface on top of h5pyd
, we first need dimensions scales working, which will enable the shared dimensions in netcdf4
.
fill_value dataset creation properties are not implemented.
Seems there is mention of pip install
as a mechanism of installation. Though I don't see it on PyPI and there are tags on the repo. Can a release be made?
@jreadey , I think you were aware of it since you mentioned Coordinate list (dset[(x,y,z),:]
) not being supported yet (not sure if you actually meant list
). But I considered it as the one of most used selection type that is worth the efforts to be added.
ds_local[1,[1,3,5]]
Out[64]: array([ 0., 0., 0.], dtype=float32)
ds_remote[1,[1,3,5]]
Traceback (most recent call last):
File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-65-bc569229dc91>", line 1, in <module>
ds_remote[1,[1,3,5]]
File "/home/wjiang2/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 796, in __getitem__
raise ValueError("selection type not supported")
ValueError: selection type not supported
Add support for ACL operations.
This is just a minor annoyance, but thought I'd mention it anyway:
Why Linux style hsls
, but Windows style hsdel
?
Would have thought Linux like: hsls
and hsrm
or Windows like: hsdir
and hsdel
or would be more consistent...
@jreadey, you used hsload
to put our Hurricane Sandy netcdf4 file on HSDS:
(IOOS) rsignell@0e6be50c3dc2:~$ hsls /home/john/sandy.nc/
john domain 2017-09-07 22:11:07 /home/john/sandy.nc
1 items
If I try to use hsget
to get that dataset back, I get errors:
(IOOS) rsignell@0e6be50c3dc2:~$ hsget /home/john/sandy.nc sandy.nc
2017-10-14 14:00:39,424 ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options
ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options
2017-10-14 14:01:50,324 ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options
And although I do end up with a sandy.nc
file, if I try to ncdump it, it doesn't work (see below). I guess that is not too surprising in light of #32, right?
But do you think one day we will be able to round-trip a dataset using hsload
and hsget
?
(IOOS) rsignell@0e6be50c3dc2:~$ ncdump -h sandy.nc
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140414440146688:
#000: H5L.c line 1183 in H5Literate(): link iteration failed
major: Symbol table
minor: Iteration failed
#001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
major: Symbol table
minor: Iteration failed
#002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
major: Symbol table
minor: Iteration failed
#003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
major: Symbol table
minor: Can't move to next iterator location
#004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
major: Symbol table
minor: Can't move to next iterator location
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140414440146688:
#000: H5L.c line 1183 in H5Literate(): link iteration failed
major: Symbol table
minor: Iteration failed
#001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
major: Symbol table
minor: Iteration failed
#002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
major: Symbol table
minor: Iteration failed
#003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
major: Symbol table
minor: Can't move to next iterator location
#004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
major: Symbol table
minor: Can't move to next iterator location
ncdump: sandy.nc: NetCDF: HDF error
(IOOS) rsignell@0e6be50c3dc2:~$
I have an HDF5 file that contains several datasets containing boolean values, both scalar and arrays, along with many other datasets and groups. Trying to read these using h5pyd returns an Internal Server Error, which doesn't seem to happen with datasets of other types. Here is a trace:
>>> import h5pyd as h5
>>> a=h5.File('mullite_300K.mullite.exfac', mode='r', endpoint='http://some.server:5000')
>>> a['/f1/instrument/detector/pixel_mask']
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-16-2a29cb1cdcaa> in <module>()
----> 1 a['/f1/instrument/detector/pixel_mask_applied']
/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/group.pyc in __getitem__(self, name)
314 if link_class == 'H5L_TYPE_HARD':
315 #print "hard link, collection:", link_json['collection']
--> 316 tgt = getObjByUuid(link_json['collection'], link_json['id'])
317 elif link_class == 'H5L_TYPE_SOFT':
318 h5path = link_json['h5path']
/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/group.pyc in getObjByUuid(collection_type, uuid)
287 elif link_json['collection'] == 'datasets':
288 req = "/datasets/" + uuid
--> 289 dataset_json = self.GET(req)
290 tgt = Dataset(DatasetID(self, dataset_json))
291 else:
/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/base.pyc in GET(self, req, format)
522
523 if rsp.status_code != 200:
--> 524 raise IOError(rsp.reason)
525 if rsp.headers['Content-Type'] == "application/octet-stream":
526 self.log.info("returning binary content, length: " +
IOError: Internal Server Error
I get the impression that 'gzip' compression is not taking effect. I set the compression when creating the dataset.
My file size has doubled after adding one entry, which would be about 1/20 the size of my entire file.
grpOut.create_dataset("C_PEAT",data=PEAT_emissions, compression="gzip")
This is the same method I used to create my entire file originally.
Create Travis test script.
Running pip install h5pyd
installs version 0.3.3 instead of the current 0.4.0.
$ pip --no-cache-dir install h5pyd
Collecting h5pyd
Using cached https://files.pythonhosted.org/packages/4e/00/513f05db05e5dc3b599f541b042d8b47f9ec7c4ca62312f92fd33e11b607/h5pyd-0.3.3.tar.gz
Requirement already satisfied: ...
...
...
Building wheels for collected packages: h5pyd
Building wheel for h5pyd (setup.py) ... done
...
Successfully built h5pyd
Installing collected packages: h5pyd
Successfully installed h5pyd-0.3.3
Other useful information:
$ python --version
Python 3.6.8
$ pip --version
pip 19.1.1
$ pip search h5pyd
h5pyd (0.4.0) - h5py compatible client lib for HDF REST API
INSTALLED: 0.3.3
LATEST: 0.4.0
I know pip
has had its issues over time but this is for a fairly fresh Python virtual environment and the latest pip
.
Implement Obj refs.
For me, code like the following:
# specify a chunk layout
f.create_dataset("chunked_data", (1024,1024,1024), dtype='f4',chunks=(1,1024,1024))
dset = f["chunked_data"]
dset.chunks
Is producing the following numpy type error:
TypeError Traceback (most recent call last)
<ipython-input-13-1e2d51a33867> in <module>()
1 # specify a chunk layout
----> 2 f.create_dataset("chunked_data", (1024,1024,1024), dtype='f4',chunks=(1,1024,1024))
3 dset = f["chunked_data"]
4 dset.chunks
~/src/h5pyd/h5pyd/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
148
149 with phil:
--> 150 dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
151 dset = dataset.Dataset(dsid)
152
~/src/h5pyd/h5pyd/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times)
116 tmp_shape = maxshape if maxshape is not None else shape
117 # Validate chunk shape
--> 118 if isinstance(chunks, tuple) and (-numpy.array([ i>=j for i,j in zip(tmp_shape,chunks) if i is not None])).any():
119 errmsg = "Chunk shape must not be greater than data shape in any dimension. "\
120 "{} is not compatible with {}".format(chunks, shape)
TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~` operator or the logical_not function instead.
I am running python 3.6.1 and numpy 1.13.1 on archlinux inside of a virtual environment.
Use of the '-' operator is apparently being debated. Either way it looks like numpy is trying to move away from its usage.
Working with the tall data distributed with h5pyd, a simple selection generates invalid point argument error; below, the same operation succeeds with local hdf5 resource. Is there a reference on essential discrepancies between the two approaches?
%vjcair> python
Python 2.7.12 (default, Nov 17 2016, 17:26:31)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import h5pyd as h5py
f = h5py.File("tall.data.hdfgroup.org", "r", endpoint="https://data.hdfgroup.org:7258")
g2 = f['g2']
dset22 = g2['dset2.2']
dset22[[1,2]]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/site-packages/h5pyd-0.1.0-py2.7.egg/h5pyd/_hl/dataset.py", line 664, in getitem
raise ValueError("invalid point argument")
ValueError: invalid point argument
h5py.version.version
'0.0.1'
%vjcair> python
Python 2.7.12 (default, Nov 17 2016, 17:26:31)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import h5py
f = h5py.File("tall.h5")
g2 = f['g2']
dset22 = g2['dset2.2']
dset22[[1,2]]
array([[ 0. , 0.2 , 0.40000001, 0.60000002, 0.80000001],
[ 0. , 0.30000001, 0.60000002, 0.89999998, 1.20000005]], dtype=float32)
h5py.version.version
'2.7.0'
>>> import h5pyd as h5py
>>> a = h5py.File('f3.hdfgroup.org',mode='r',endpoint='http://localhost:5001')
>>> a['/raw']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/h5pyd/_hl/group.py", line 397, in __getitem__
parent_uuid, link_json = self.get_link_json(name)
File "/usr/local/lib/python2.7/dist-packages/h5pyd/_hl/group.py", line 303, in get_link_json
raise KeyError("Unable to open object (Component not found)")
KeyError: 'Unable to open object (Component not found)'
...am I missing something? I looked at the read_example.py in the examples folder of this repo and it seemed to me it should work like above.
When I try to get the items that are in there I get
>>> a.items()
[(u'f3', <HDF5 group "/" (1 members)>)]
but both seem to be infinite loops:
>>> a.items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['f3'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['f3']['f3'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['f3']['f3']['f3'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['f3']['f3']['f3']['f3'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a.items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['/'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['/']['/'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['/']['/']['/'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>> a['/']['/']['/']['/'].items()
[(u'f3', <HDF5 group "/" (1 members)>)]
>>>
It worked ok for the smaller region size.
(xstart, xend)
Out[155]: (15986, 25986)
(ystart, yend)
Out[156]: (59448, 69448)
vals3 = ds_remote[xstart:xend, ystart:yend]
Traceback (most recent call last):
File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-153-2ab7845b65c0>", line 1, in <module>
vals3 = ds_remote[xstart:xend, ystart:yend]
File "/home/wjiang2/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 759, in __getitem__
page_arr = numpy.reshape(arr1d, page_mshape)
File "/app/python3/3.6.0/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 232, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/app/python3/3.6.0/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 10498 into shape (10000,10000)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2927, in run_code
self.showtraceback(running_compiled_code=True)
TypeError: showtraceback() got an unexpected keyword argument 'running_compiled_code'
We have an h5serv server running, and loading regular HDF5 files works well. However, if the file contains an external link, it cannot access the external file because the file path is not converted to a valid domain.
Here's some output when accessing a file with path mullite/mullite_300K.nxs
with respect to the h5serv datapath, with the root domain name of exfac
(the config file sets the file extension to be .nxs
on this server):
>>> b=h5pyd.File('mullite_300K.mullite.exfac', mode='r', endpoint='http://some.server:5000')
>>> c=b['/entry/transform/v']
KeyErrorTraceback (most recent call last)
<ipython-input-5-610e75ab34f0> in <module>()
----> 1 c=b['/entry/transform/v']
/Users/rosborn/anaconda/envs/py27/lib/python2.7/site-packages/h5pyd/_hl/group.pyc in __getitem__(self, name)
327 except IOError:
328 # unable to find external link
--> 329 raise KeyError("Unable to open file: " + link_json['h5domain'])
330 return f[link_json['h5path']]
331
KeyError: u'Unable to open file: 300K/transform.nxs'
Presumably, h5pyd should convert the external file path to a valid domain string. In this case, the file path is relative to the parent HDF5 file - I'm not sure what a correct domain name would be if the file path was absolute.
If I create a dataset with create_dataset
, write to it, and then call create_dataset
again with the same name, then (a) no error occurs and (b) it seems like the old one is moved to /__db__/{datasets}/
. Behavior (a) differs from h5py, which I think throws a KeyError when a dataset already exists of the name given to create_dataset
.
To me (b) seems like a "dataset leak". Is this a feature or a bug?
I'd expect one of two behaviors:
I upgraded to the latest h5pyd, and now the fillvalues in my dataset are not working:
https://gist.github.com/rsignell-usgs/c3555fd60c391699197d53dcd0cb007c
The fill values in the original netcdf4 file were 1.0e37
but after writing with h5load
the new h5pyd
thinks the fillvalue is 0.0
.
This same notebook worked at the ESIP summer meeting.
Dear John,
I have encountered an error after pulling the latest codes.
File "/usr/local/lib/python2.7/dist-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/files.py", line 161, in init
raise IOError(rsp.status_code, rsp.reason)
IOError: [Errno 403] Forbidden
When I reverted it back to b03d59b commit, there is no problem.
Not sure what exactly changed and I don't seem to find any changes on line 161 in _hl/files.py.
Would you be able to give me some clue or suggestion to resolve this?
Thanks
Ken
My input dataset does not yet have a group called 2015. I am not able to create a dataset inside of it. In h5py, I was able to do so without first creating the dataset.
fileOut.create_dataset('2015/newdset',data=newdata,compression='gzip')
File "...\h5pyd\_hl\group.py", line 148, in create_dataset
self[name] = dset
File "...\h5pyd\_hl\group.py", line 420, in __setitem__
self.PUT(req, body=body)
File "...\h5pyd\_hl\base.py", line 418, in PUT
raise IOError(rsp.reason)
IOError: Bad Request: invalid linkname, '/' not allowed
I ran the following code on my file. The new group gets added, however all of the old data in the file disappears. The file size is now 8kb from previous 50mb, so it definitely got wiped.
fileOut = h5py.File("My_File.hdfgroup.org","w")
fileOut.create_group('2015')
This is a file that previously had ACL authentication, which I removed. Before removing it I got an IOError and no changes were made to the file.
You can't compare tuples with is
since they will be given a new ID every time you construct a tuple.
>>> (Ellipsis,) is (Ellipsis,)
False
>>> (Ellipsis,) == (Ellipsis,)
True
The error is here in the code:
Line 581 in 8f92ac1
Implement regionrefs
Try dask
on h5pyd
instead of h5py
to see if there are issues.
I'm trying to add h5pyd to a project's requirements.txt, however, when pip install -r requirements.txt
I get the following:
Obtaining h5pyd from git+http://github.com/HDFGroup/[email protected]#egg=h5pyd (from -r requirements.txt (line 61))
Updating ./env/src/h5pyd clone (to v0.3.0)
Complete output from command python setup.py egg_info:
Download error on https://pypi.python.org/simple/pkgconfig/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590) -- Some packages may not be found!
Couldn't find index page for 'pkgconfig' (maybe misspelled?)
Download error on https://pypi.python.org/simple/: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:590) -- Some packages may not be found!
No local packages or working download links found for pkgconfig
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/src/h5pyd/setup.py", line 43, in <module>
'hsconfigure = h5pyd._apps.hsconfigure:main']
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/setuptools/__init__.py", line 128, in setup
_install_setup_requires(attrs)
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/setuptools/__init__.py", line 123, in _install_setup_requires
dist.fetch_build_eggs(dist.setup_requires)
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/setuptools/dist.py", line 455, in fetch_build_eggs
replace_conflicting=True,
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 866, in resolve
replace_conflicting=replace_conflicting
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1146, in best_match
return self.obtain(req, installer)
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1158, in obtain
return installer(requirement)
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/setuptools/dist.py", line 522, in fetch_build_egg
return cmd.easy_install(req)
File "/Users/nlaws/projects/reopt/webtool/reopt_api/env/lib/python2.7/site-packages/setuptools/command/easy_install.py", line 667, in easy_install
raise DistutilsError(msg)
distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pkgconfig')
Does the version of pkgconfig need to be specified in setup.py (for the setup_requires
arg)? If so, which version?
array of byte strings to be loaded into h5pyd
time_index = pd.date_range('2016-01-01 00:30:00', '2016-12-31 23:30:00', freq='h')
time_index = np.array(time_index.astype(str), dtype='S20')
Loading using the data param in create_dataset
with h5pyd.File('/home/mrossol/nsrdb_tmy.h5', 'w') as f:
f.create_dataset('time_index', time_index.shape, dtype=time_index.dtype, data=time_index)
Produces the following error:
/anaconda/lib/python3.6/json/encoder.py in default(self, o)
178 “”"
179 raise TypeError(“Object of type ‘%s’ is not JSON serializable” %
--> 180 o.class.name)
181
182 def encode(self, o):
TypeError: Object of type ‘bytes’ is not JSON serializable
If you create the dataset and then load the array it works:
with h5pyd.File('/home/mrossol/nsrdb_tmy.h5', 'w') as f:
t_index = f.create_dataset('time_index', time_index.shape, dtype=time_index.dtype)
t_index[...] = time_index
In this notebook
https://gist.github.com/rsignell-usgs/07143a5ab54afb8ad6eb1af255d025c9
we use xarray
to open a local netcdf4 file and then the same dataset that was 'hsload'ed to hsds
.
xarray
automatically recognized the CF-compliant time units and converts the time coordinate to datetime
so that the plot is correctly labeled in cell [6].
But time is not recognized for the the HSDS dataset plot in cell [5].
Any idea what the problem is?
v0,2.8
should be renamed v0.2.8
f_remote = h5pyd.File(domain, "r")
ds_remote = f_remote["/data"]
ds_remote
is still writable and I've verified it by f_remote.close()
and reopening it.
If I load a two-dimensional slab from a three-dimensional array, I get a numpy array with ndim=2
in h5py, but ndim=3
in h5pyd, with one of the dimensions of size 1.
The following accesses the same file on the remote server and stored locally:
>>> import h5pyd as h5d
>>> a=h5d.File('mullite_300K.mullite.exfac', mode='r', endpoint='http://some.server:5000')
>>> a['/entry/transform/v'][400].shape
(1, 901, 901)
>>> import h5py as h5
>>> b=h5.File('mullite/mullite_300K.nxs')
>>> b['/entry/transform/v'][400].shape
(901, 901)
My set-up is:
Windows 10 with "Bash on Windows" (WSL), Ubuntu Xenial.
HDF Server (h5serv) running on WSL on default port 5000, exposing "Novartis" dataset.
HDF Server is accessible from Windows browser:
http://localhost:5000/
{"root": "ddfa84c2-d5bc-11e7-bcec-d43d7e31e165", "lastModified": "2017-11-30T10:54:41Z", "created": "2017-11-30T10:54:41Z", "hrefs": [{"href": "http://localhost:5000/", "rel": "self"}, {"href": "http://localhost:5000/datasets", "rel": "database"}, {"href": "http://localhost:5000/groups", "rel": "groupbase"}, {"href": "http://localhost:5000/datatypes", "rel": "typebase"}, {"href": "http://localhost:5000/groups/ddfa84c2-d5bc-11e7-bcec-d43d7e31e165", "rel": "root"}]}
HDF Server is accessible via h5pyd from WSL:
>>> f = h5pyd.File('', 'r')
>>> print(list(f))
['Novartis']
File "C:\Program Files\Python 3.5\lib\site-packages\h5pyd\_hl\files.py", line 161, in __init__
raise IOError(rsp.status_code, rsp.reason)
OSError: [Errno 503] Service Unavailable
Any pointers how to debug and eliminate the issue are appreciated!
Link for "Reporting Issues" on Readme is invalid
Some firewall software will alter the host header in requests sent to the server, resulting in the operation to fail.
I am trying to access a dataset which contains an enum array via h5serv, however h5pyd throws the following exception:
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/group.py", line 335, in __getitem__
tgt = getObjByUuid(link_json['collection'], link_json['id'])
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/group.py", line 311, in getObjByUuid
tgt = Dataset(DatasetID(self, dataset_json))
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/dataset.py", line 416, in __init__
self._dtype = createDataType(self.id.type_json)
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 725, in createDataType
dt = createDataType(field['type']) # recursive call
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 732, in createDataType
dtRet = createBaseDataType(typeItem) # create non-compound dt
File "$HOME/project/venv/lib/python2.7/site-packages/h5pyd-0.2.6-py2.7.egg/h5pyd/_hl/h5type.py", line 638, in createBaseDataType
raise TypeError("Array Type base type must be integer, float, or string")
TypeError: Array Type base type must be integer, float, or string
We can create a minimal dataset to reproduce the error using h5py as follows:
import h5py
import numpy as np
f = h5py.File('test.h5', 'w')
enum_type = h5py.special_dtype(enum=('i', {"FOO": 0, "BAR": 1, "BAZ": 2}))
comp_type = np.dtype([('my_enum_array', enum_type, 10), ('my_int', 'i'), ('my_string', np.str_, 32)])
dataset = f.create_dataset("test", (4,), comp_type)
f.close()
We then put it in h5serv's data directory and try to access it:
import h5pyd
f = h5pyd.File("test.hdfgroup.org", endpoint="http://127.0.0.1:5000")
print(f['test'])
This yields the above exception. Note that we are able to access the dataset as expected using regular h5py.
Applying the following patch to h5pyd prevents the exception and returns a dataframe, however it doesn't seem to give the correct behavior (the enum array seems to be treated as an int array):
diff --git a/h5pyd/_hl/h5type.py b/h5pyd/_hl/h5type.py
index 4ce6cb4..10ce562 100644
--- a/h5pyd/_hl/h5type.py
+++ b/h5pyd/_hl/h5type.py
@@ -637 +637 @@ def createBaseDataType(typeItem):
- if arrayBaseType["class"] not in ('H5T_INTEGER', 'H5T_FLOAT', 'H5T_STRING'):
+ if arrayBaseType["class"] not in ('H5T_INTEGER', 'H5T_FLOAT', 'H5T_STRING', 'H5T_ENUM'):
I'm not sure how to properly proceed in working around this. Thanks in advance for your advice.
In the nexusformat API, we load the entire HDF5 file tree by recursively walking through the groups in h5py, without reading in data values except for scalars and small arrays. On a local file, we can load files containing hundreds of objects without a significant time delay. For example, a file with 80 objects (groups, datasets, and attributes) takes 0.05s to load on my laptop. However, on h5pyd, the same load takes over 20s.
A call to load all the items in an HDF5 group requires two GET requests, and sometimes three, for each object, so there could be an improvement if all the metadata (shape, dtype, etc.) for each object were returned in a single call, and an even more significant one if all the items in a group could be returned with one GET request. Loading one group of 10 objects took 29 requests in my tests.
Binary data reads are fast, though.
Pull endpoint, user, password, etc. from env variable or config file if not set explicitly.
Add package to cheeseshop!
h5netcdf is a pythonic interface to netcdf4
files using h5py
.
It would be super cool to try h5netcdf
on top of h5pyd
instead.
If that worked we could try xarray
with dask
on top of h5pyd
.
And if that worked, it would be amazing....
According to server restful api spec, parameter 'host' is needed. but in h5pyd, parameter seems to modified as 'domain'. It does now work.
Hi
I am trying to get h5pyd up and running on the h5serv docker image available here:
https://hub.docker.com/r/hdfgroup/h5serv/
Running the pip install command as documented gives the following:
# pip install h5pyd
Collecting h5pyd
Could not find a version that satisfies the requirement h5pyd (from versions: )
No matching distribution found for h5pyd
This looks like it's trying to use versions from a local requirements.txt file, but it does not exist. Not quite sure whether this is a pip or h5pyd issue.
Thanks
coords[1:3]
Out[100]: [(441, 82852), (441, 88209)]
len(coords)
Out[101]: 2500
data = ds_remote[coords]
Traceback (most recent call last):
File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-99-6ac77acf88d2>", line 1, in <module>
data = ds_remote[coords]
File "/home/wjiang2/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 848, in __getitem__
rsp = self.POST(req, body=body)
File "/home/wjiang2/.local/lib/python3.6/site-packages/h5pyd/_hl/base.py", line 477, in POST
raise IOError(rsp.reason)
OSError: Request Entity Too Large
As requested here by @mrocklin: pangeo-data/pangeo#75 (comment)
hsload isn't inspecting chunks prior to writing them to the server. This results in the server needlessly allocated chunks on the server and increased file size.
hsload should inspect each chunk and skip the write if the chunk is all zeros (or whatever the fill value is).
Create testall.py script.
Https endpoints fail with SSLError.
E.g.:
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.