Coder Social home page Coder Social logo

Comments (2)

hpages avatar hpages commented on July 24, 2024

Hey @mikejiang, @gfinak, @raphg,

Yes the seed class is mandated. This is how you actually implement the backend, by implementing a seed class. When you implement an extract_array method for your seed objects, you're providing random access to the array data stored in your backend. So code in DelayedArray can use extract_array() to extract array data from a seed object without knowing anything about the backend.

About implementing a backend with dual layouts: For this to play well with DelayedArray, at least 2 things are needed:

  1. Your extract_array method will need to take advantage of the dual layout e.g. by using the row-oriented layout when the supplied index is of the form list(i, NULL) and by using the column-oriented layout when index is of the form list(NULL, j).

  2. When doing block-processing, DelayedArray will need to choose a block geometry that leads to optimal calls to extract_array(). For example, if DelayedArray object x has a seed with dual layout, rowSums() should ideally use blocks made of full rows and colSums() should use blocks made of full columns. That's if x doesn't carry a delayed transposition on it. If it does, then it's the other way around. Note that improving the block-processing strategy used by DelayedArray is still a work in progress. My priority at the moment is to have the block-processing strategy play well with the physical chunk geometry of the seed. Seeds will have a way to tell DelayedArray about the chunk geometry via a chunkdim method or something like that. If a seed provides no chunkdim method, a default block-processing strategy should be used. It would actually make sense that this default strategy does the above i.e. use blocks made of full rows or cols when calling row/col summarization functions like rowSums()/colSums(). Then it would play well with your dual layout backend. I'm putting this on the TODO list.

If you're going to implement a seed class for HDF5 dual layout, you should probably avoid starting from scratch. It's going to be easier to define the new class on top of the HDF5ArraySeed class e.g. with something like this:

    setClass("DualHDF5ArraySeed",
        slots=c(row_oriented="HDF5ArraySeed",
                col_oriented="HDF5ArraySeed"))

It feels to me that the approach would be the same if you were going to implement a seed class for tiledb dual layout (except that AFAIK there is no TileDbSeed class yet so you would need to start by implementing that). Implementing a dual layout seed might actually be done in a more generic way e.g. with something like:

    setClass("DualSeed",
        slots=c(row_oriented="ANY",
                col_oriented="ANY"))

with a validity method that checks that the seeds stored in the row_oriented and col_oriented slots are "compatible". What "compatible" means exactly (and how strict it needs to be) still needs to be
decided. For example, there is no reason a priori why the 2 sub-seeds would need to use the same backend.

from delayedarray.

hpages avatar hpages commented on July 24, 2024

@mikejiang Is it ok to close this?

from delayedarray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.