Comments (6)
Hi @jreadey, we're interested in using fancy indexing with HSDS datasets as well. Initially, we tried using the equivalent point selection as suggested in #48 as a workaround, but are finding that the performance is poor relative to a hyperslab selection on a superset of the point selection once you get past the scale of 10k points.
Just curious to see if there's any update on what it would take to implement fancy indexing at this point. I was looking through the RESTful HDF5 white paper and noticed that there is a section under the Dataset POST spec that mentions "set-theoretical combinations of hyperslabs" but a detailed example request is not given. It's also not mentioned in the h5serv documentation so I'm wondering if it made it into the final version of the spec.
from h5pyd.
Yes, I've been meaning to get to this...
I think a fairly simple extension to the dataset GET api should work (basically passing in the h5py index as parameters).
They h5py docs have a warning that performance could be sub-optimal, but I'd want o make a related HSDS update to do the fancy selection efficiently.
You are looking to handle this type of idexing:
dset[4:6, [2,5,9]]
correct?
Also, I see that h5py recently added support for Multi-Block selection: https://docs.h5py.org/en/stable/high/dataset.html#multi-block-selection. Is that of interest as well/
from h5pyd.
That's right. We don't have a specific need for multi-block selection at the moment, just fancy indexing as you've described.
To clarify, is it be possible to make a fancy indexing request (albeit less efficiently) via the dataset value GET API currently? Wondering if we could test this out with a modification of the h5pyd client or if an HSDS update would be required.
from h5pyd.
Yes, you can just do multiple regular selections.
E.g., instead of dset[4:6, [2,5,9]], do the following:
arr = np.zeros((2,3))
for i in (2,5,9):
arr[:,i] = dset[4:6,i]
It's unfortunate that h5pyd doesn't support asynchronous requests which would let you do all the fetches without waiting for responses, but this should work till we have the fancy selection going.
from h5pyd.
I believe I have this feature working now. Code changes are in the fancyindx branch of hsds and fancyindx branch of h5pyd.
I'll be doing some additional testing and evaluation and then merge into master later this week.
from h5pyd.
The feature is merged into master now and in PyPI (version 0.9.2).
Here's a simple performance test comparing using fancy indexing vs iterating through a set of columns: https://gist.github.com/jreadey/bd75c469559f03596bd2d274dfb5a315. In my testing using fancy indexing was ~8x faster (running with 4 HSDS nodes).
from h5pyd.
Related Issues (20)
- Fancy indexing index list length is limited by GET query size HOT 3
- Error retrieving data: 429 HOT 3
- `hsload` fails on empty data sets (with a dimension of length 0) HOT 13
- bulk download suggestions HOT 2
- hsget failing when datasets contain fillvalues HOT 2
- Append option for hsload HOT 2
- Passing 'Bearer' token to h5pyd instead of fixed keycloak config HOT 3
- Issue in creating /home directory HOT 14
- How to read HDF5 file in Vaex data frame HOT 7
- Conflict with google-auth-oauthlib-0.6.0 on MacOS HOT 11
- hsload change dataset's datatype class from Compound (H5T_COMPOUND) to Opaque (H5T_OPAQUE) HOT 2
- hsds return "filter "H5Z_FLETCHER_DEFLATE" not recognized" due to hsload datasets with FLETCHER32 filter HOT 3
- h5pyd dataset.chunks not compatible with h5py HOT 2
- hsload fails decoding ASCII encoded attributes HOT 10
- hsload fails with compact datasets HOT 1
- `hsload` fails when an attribute has type `Reference` HOT 4
- hsload fails with datasets using scale offset filter HOT 1
- apply source compression filter in hsload HOT 1
- Show filters applied to any datasets in hsls HOT 1
- h5pyd not evaluating environment variables HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h5pyd.