Coder Social home page Coder Social logo

Fancy indexing is not supported about h5pyd HOT 6 CLOSED

hdfgroup avatar hdfgroup commented on September 26, 2024
Fancy indexing is not supported

from h5pyd.

Comments (6)

jananzhu avatar jananzhu commented on September 26, 2024

Hi @jreadey, we're interested in using fancy indexing with HSDS datasets as well. Initially, we tried using the equivalent point selection as suggested in #48 as a workaround, but are finding that the performance is poor relative to a hyperslab selection on a superset of the point selection once you get past the scale of 10k points.

Just curious to see if there's any update on what it would take to implement fancy indexing at this point. I was looking through the RESTful HDF5 white paper and noticed that there is a section under the Dataset POST spec that mentions "set-theoretical combinations of hyperslabs" but a detailed example request is not given. It's also not mentioned in the h5serv documentation so I'm wondering if it made it into the final version of the spec.

from h5pyd.

jreadey avatar jreadey commented on September 26, 2024

Yes, I've been meaning to get to this...
I think a fairly simple extension to the dataset GET api should work (basically passing in the h5py index as parameters).
They h5py docs have a warning that performance could be sub-optimal, but I'd want o make a related HSDS update to do the fancy selection efficiently.

You are looking to handle this type of idexing:

dset[4:6, [2,5,9]]

correct?

Also, I see that h5py recently added support for Multi-Block selection: https://docs.h5py.org/en/stable/high/dataset.html#multi-block-selection. Is that of interest as well/

from h5pyd.

jananzhu avatar jananzhu commented on September 26, 2024

That's right. We don't have a specific need for multi-block selection at the moment, just fancy indexing as you've described.

To clarify, is it be possible to make a fancy indexing request (albeit less efficiently) via the dataset value GET API currently? Wondering if we could test this out with a modification of the h5pyd client or if an HSDS update would be required.

from h5pyd.

jreadey avatar jreadey commented on September 26, 2024

Yes, you can just do multiple regular selections.
E.g., instead of dset[4:6, [2,5,9]], do the following:

arr = np.zeros((2,3))
for i in (2,5,9):
  arr[:,i] = dset[4:6,i]

It's unfortunate that h5pyd doesn't support asynchronous requests which would let you do all the fetches without waiting for responses, but this should work till we have the fancy selection going.

from h5pyd.

jreadey avatar jreadey commented on September 26, 2024

I believe I have this feature working now. Code changes are in the fancyindx branch of hsds and fancyindx branch of h5pyd.
I'll be doing some additional testing and evaluation and then merge into master later this week.

from h5pyd.

jreadey avatar jreadey commented on September 26, 2024

The feature is merged into master now and in PyPI (version 0.9.2).

Here's a simple performance test comparing using fancy indexing vs iterating through a set of columns: https://gist.github.com/jreadey/bd75c469559f03596bd2d274dfb5a315. In my testing using fancy indexing was ~8x faster (running with 4 HSDS nodes).

from h5pyd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.