Currently the implementation of external data is limited to filling in data on a per-e

Some hopefully quick questions for <a class="user-mention notranslate" data-hovercard-

Add new document for "bulk" external resources about event-model HOT 6 CLOSED

tacaswell commented on August 17, 2024

Add new document for "bulk" external resources

from event-model.

Comments (6)

cjtitus commented on August 17, 2024

Thanks for writing this all up. My one question is -- in option D, what will the data look like when it is finally "filled"? And what will the syntax be?

Normally you have run.primary.read() to give you the data in tabular form. I think the whole point of the stream handler is that your data may not be in a form that is amenable to shoving in one table, because it may be ragged, etc.

So for some stream "external", would we have run.external? What would this be? A node that has been created via a Tiled adapter that is determined from the spec?

from event-model.

tacaswell commented on August 17, 2024

It would be run.extrenal_stream.read() just like it is now. From the client point of view you should not be able to tell that the data is stored externally, it will just look like another stream.

There is currently no node type in tiled that "natively" handles ragged data correctly (we reed to add the ability to expose awkward arrays). For these cases currently the best approach is probably to go with a "long form" with one of the columns being the "group" it is in as that is something we can currently express (and is more-or-less how awkward handles it under the hood but provides a nice API on top that is effectively variably striding, but I digress).

At the end of the day, because tiled is a service the server needs to be able to tell the client "this is what I am about to send you" in way we can serialize (as opposed to in Python where you can say "that function will return 'a numpy array'" and leave it at that level of detail).

from event-model.

cjtitus commented on August 17, 2024

Ok, cool. I actually already do a long form with one column being "pixel", and the other columns being timestamp and photon energy, so I think that will work nicely for my detector, and other multi-element photon-counting detectors. This is an approach that I, personally, would be happy with, and it sounds like it would translate well to awkward array in the future, but will be useful before native support is implemented.

from event-model.

maffettone commented on August 17, 2024

Some hopefully quick questions for @tacaswell :

Can you point me at something that is directly using event model, so I get a feeling for the use case? Is the run engine processing Msgs to call functions like event_model.compose_run?
Is "stream" the right name to use here? It may get confusing/degenerate with other things called streams. Would sequential_resource and sequential_datum be more adequate? Or is sequential too limiting of a description.
Compatibility... This one might not be quick. My instinct is to extend core classes like DocumentRouter and Filler to have methods for the new documents. NamedTuple classes like ComposeRunBundle and their associated unit tests will start to fail if I extend them because of positional unpacking. I'm timid about editing tests and prefer to add new ones, but I don't think that'll be possible in this case, which makes me nervous about breaking downstream bluesky functionality that depends on this behavior...

from event-model.

tacaswell commented on August 17, 2024

Is the run engine processing Msgs to call functions like event_model.compose_run?

No, event_model is a refactoring of the logic that was scattered through the RE. There is an idea to make RE actually use event model, but I do not think we have done that yet. The best reference may either the event model tests or https://github.com/bluesky/bluesky/blob/master/bluesky/tests/test_callbacks.py#L518

Is "stream" the right name to use here?

I think so, the goal so to make it so that data is produced via these documents and data produced via making queries to mongo are indistinguishable from the user point of view. For example, imagine you have a parquet file on disk, using these documents you should be able to graft that onto a run as a single stream.

My instinct is to extend core classes like DocumentRouter and Filler to have methods for the new documents

yes, that is what I would go (and we have do that before).

NamedTuple classes like ComposeRunBundle and their associated unit tests will start to fail if I extend them

oh bad. Using named tuple was an easy way to get dotted name access, but does carry with it this order and length problem. It hind sight maybe these should have been data classes instead (side comment, the spread operator in JS is very nice, kinda wish we had that in Python).

Leaving aside if we want to abandon named tuples for now, I would break ComposeRunBundle by adding a new factory family on the end. This at least keeps the breakage in one place and makes it impossible to have a mix of Events and these new documents in the same stream (which in principle there is nothing wrong with, but lets get it working at all and the have someone ask for that before we make the most complex thing we can imagine....).

We could do terrible things to the named-tuple subclass to only iterate through the first N things, but if we went that way it should be a stepping stone to dropping positional unpacking all together.

The trade off I see here is either breaking some stuff now vs maintaining internally inconsistent compat code ("so has N+1 keys, but I can only iterate over N of them?!") vs building a second full set of method families.

I could also see a case for making compose_run take a flag like include_whole_stream_tools (note: rename this) that controls if it returns current RunBundler or ExtendedRunBundler. It is a bit of type instability, but only in the case of positional unpacking and it lets users opt-into the breaking change. If it defaults to False, it also gives us a chance to start warning that it will change. Instead of this being a bool, doing it as extension_level=0 (with 0 being the current state, 1 being what you are adding now, and if we add more in the future, we could increment it further).

from event-model.

maffettone commented on August 17, 2024

Sounds good, I rely on unit tests to guide my quest, and will call things stream.

As for the breaking problem, and the future proofing, I think we may be best off with something like an extended data class. It's roughly similar to doing terrible things to a namedtuple subclass, but I think it's defensible. It will be fully backward compatible with the namedtuple implementation, and allow extension to new types of documents without breaking anything. This also avoids having new flags to think about, or a global variable to track how many new document types have been added since 2022.

The notion here is if you are going to unpack this, you are going to get a simplified set of documents that will meet most use cases. If you want to work with more document types you should start to be explicit. If this existing pre v3.10 I'd consider plugging it in (https://docs.python.org/3.10/library/dataclasses.html#dataclasses.KW_ONLY).

from dataclasses import dataclass
from typing import Union

@dataclass
class ComposeBundle:
    start_doc: dict
    compose_descriptor: callable
    compose_resource: callable
    compose_stop: callable
    compose_stream_resource: Union[None, callable] = None

    def __iter__(self):
        return iter(
            (
                self.start_doc,
                self.compose_descriptor,
                self.compose_resource,
                self.compose_stop,
            )
        )


def foo():
    pass


if __name__ == "__main__":
    start_doc, compose_descriptor, compose_resource, compose_stop = ComposeBundle(
        {}, foo, foo, foo
    )

    bundle = ComposeBundle({}, foo, foo, foo, compose_stream_resource=foo)
    (
        start_doc,
        compose_descriptor,
        compose_resource,
        compose_stream_resource,
        compose_stop,
    ) = (
        bundle.start_doc,
        bundle.compose_descriptor,
        bundle.compose_resource,
        bundle.compose_stream_resource,
        bundle.compose_stop,
    )

from event-model.

Add new document for "bulk" external resources about event-model HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent