Coder Social home page Coder Social logo

Comments (8)

Fatal1ty avatar Fatal1ty commented on July 1, 2024

Hey @mishamsk

First of all, congratulations on your project publication! I'll consider using it when I need to do some parsing stuff.

When I was developing discriminator feature, I thought about adding an attribute to Discriminator that would keep a tag generator function. In the current implementation it's hardcoded to take the tag from a dataclass field which you specifed in Discriminator object. For example, if we have Discriminator(field="__type", include_subtypes=True), the following variant registration code will be generated:

for variant in iter_all_subclasses(__main__.Node):
    try:
        variants_map[variant.__dict__['__type']] = variant
    except KeyError:
        continue

By introducing a tag generator function we can change it. For example, if we have Discriminator(field="__type", variant_tag_getter=lambda cls: cls.__name__, include_subtypes=True) the following variant registration code will be generated:

for variant in iter_all_subclasses(__main__.Node):
    try:
        variants_map[variant_tag_getter(variant)] = variant
    except KeyError:
        continue

I'm bad at naming but I guess that variant_tag_getter will more clearly indicate the purpose instead of variant_tagger_fn which may imply the "post serialize hook" functionality.

The question is, do we need or will we need functionality of automatic adding tags to all variants on serialization, so we could avoid using __post_serialize__ hooks. We can add a bool parameter add_tag_on_serialization (just a working name) to Discriminator for this but let's keep it for a separate issue.

What do you think? Maybe I'm missing something and you have your own vision of solving this issue.

from mashumaro.

mishamsk avatar mishamsk commented on July 1, 2024

@Fatal1ty first thanks for the wishes! it took me way to much time;-)

rgd your design suggestion, I have a couple of thoughts.

First, rgd my specific use case. I think it's a distinct usage pattern from a "model a messaging protocol and parse". Instead it's more of a "pickle" use case. I am controlling both serialization and deserialization, so do not care about validations or trying to decode data originating from some external system. I have a hunch that only this kind of usage pattern where one can expect a literal class name in serialized data. None of the public specs, except the ones used for interoperability, would ever dictate to store a literal class name in serialized data, because class name is obviously implementation specific.

So if you think that mashumaro is commonly used for such things, it may be worth implementing explicitly, instead of a "generic" approach. But you'd know better if this is the case. My usage pattern may be rare.

Rgd your proposal for the universal solution. I'd definitely say that whatever it will be, it's going to be the harder part to grasp for users. I had to read mashumaro's code to understand where this "getter" stuff will pop-up. I am not sure if you need a different argument for the Discriminator though. Why not just inspect the field value, and allow callable?

I do think that using tag instead of field would be generally better, now that the discriminator is not necessarily a field. It would also match some other libs (e.g. msgspec), but that would break the API...

from mashumaro.

Fatal1ty avatar Fatal1ty commented on July 1, 2024

I am not sure if you need a different argument for the Discriminator though. Why not just inspect the field value, and allow callable?

Assuming we are using JSON serialization format, we need two things:

  1. A JSON key name that holds a tag value in the serialized data. We will take this tag value on deserialization and find it in the "tag → class" mapping to determine which variant class should be deserialized.
  2. A function that returns a tag value for a class so that we could go through the class hierarchy and create "tag → class" mapping.

Currently we can configure only first item through Discriminator "field" attribute. The second item is hardcoded:

lines.append(
f"variants_map[variant.__dict__["
f"'{discriminator.field}']] = variant"
)

Semantically this function (I called it variant_tagger_fn in my proposal) returns a value for the class level attribute with the name discriminator.field:

def variant_tagger_fn(cls):
    return cls.__dict__[discriminator.field]

In your case you don't have class level attributes containing unique literals, so you could use the following function instead to get dynamic tag values:

def variant_tagger_fn(cls):
    return cls.__name__  # or something more practical

I do think that using tag instead of field would be generally better, now that the discriminator is not necessarily a field. It would also match some other libs (e.g. msgspec), but that would break the API...

msgspec has its tag_field parameter which semantically is the same as discriminator.field in mashumaro. The difference is that by default msgspec consider the class name as a tag value, and mashumaro consider the value of a specific class level attribute as a tag value. My proposal is to allow the modification of this "tagger".

from mashumaro.

mishamsk avatar mishamsk commented on July 1, 2024

Oh, sorry, I wasn't thinking straight. Yes this makes total sense.

Did you consider keeping only the current field attribute, but allowing it to be a tuple? I.e.:

  • No attribute - assume subclass based discrimination
  • Plain string - current logic, implicitly assume that the provide string is a key in __dict__ (which btw apparently won't work for slotted dataclasses! see code sample below. I wasn't expecting that)
  • A tuple of a string & callable - the new logic, which could also solve for slotted classes

I may be biased, but the word variant doesn't "speak" to me, sounds more like an implementation detail. But, either way, having a variant_tagger_fn is better than not having anything at all;-)

sample code - slotted dataclasses do not have __dict__ on class itself too!

from dataclasses import dataclass
from typing import ClassVar

@dataclass(slots=True)
class Test:
    x: ClassVar[int] = 0
    y: int = 0

print("__dict__" in dir(Test)) # False
print("__dict__" in dir(Test())) # False

from mashumaro.

Fatal1ty avatar Fatal1ty commented on July 1, 2024

@mishamsk Sorry for delay, I was busy at work.

Did you consider keeping only the current field attribute, but allowing it to be a tuple? I.e.:

  • No attribute - assume subclass based discrimination
  • Plain string - current logic, implicitly assume that the provide string is a key in __dict__ (which btw apparently won't work for slotted dataclasses! see code sample below. I wasn't expecting that)
  • A tuple of a string & callable - the new logic, which could also solve for slotted classes

It will work but I don't see any point to make one argument to accept different value types instead of using two arguments. It will also be complicated to explain the use of this parameter in the documentation. Let's make it simple.

I may be biased, but the word variant doesn't "speak" to me, sounds more like an implementation detail. But, either way, having a variant_tagger_fn is better than not having anything at all;-)

I agree that it doesn't sound very well but I didn't manage to find a better word. I do think this complexity has roots in using discriminator for both subclass deserialization and for discriminated union deserialization. If it was only for subclass deserilizaiton then I would called it subclass_tagger_fn. In my defense, I found some articles where the term "variant" is used for the same things:

  • current logic, implicitly assume that the provide string is a key in __dict__ (which btw apparently won't work for slotted dataclasses! see code sample below. I wasn't expecting that

I grabbed my head and almost believed it, but it made me go to check. Fortunately for us, your assumption is only partially correct. The truth is slotted dataclass instances don't have __dict__ attribute but class objects do have it. Compare this:

print("__dict__" in dir(Test))  # False
print(Test.__dict__)  # {'__module__': '__main__', '__annotations__': {'x': typing.ClassVar[int], ...

You might be curious why I even rely on __dict__ attribute when retrieving a tag. The reason is that I wanted to support intermediate nodes in class tree that don't have explicit tag defined and are only used as a base class for their nodes. If we were using getattr(variant_cls, tag) instead of variant_cls.__dict__[tag] we would get a value inherited from an ancestor for those intermediate nodes and overwrite a class registered for the tag.

from mashumaro.

mishamsk avatar mishamsk commented on July 1, 2024

nothing to apologize for! appreciate additional insights.

Sorry for the false alarm, didn’t know that dir had completely different behavior based on what’s being inspected.

Regarding the naming, as I’ve said, I do not have a strong opinion. You’ve build & maintained this library all along, so you’ll know better what works for the community. The only thing I can add, is that I’ve been trying to find other libs that supported auto-subclass (de)serialization, and couldn’t find any. Seems odd to me, that there is nothing in-between pickle and strictly typed validation/serialization (pydantic, msgspec, cattrs etc). So having such a feature in mashumaro would further set it apart!

Also, this change seems relatively simple. Probably even simpler than #136. I am on vacation, so am away from coding for the sake of the family, but I may get some time to draft a PR when I am back if you haven’t done it yourself by then.

from mashumaro.

Fatal1ty avatar Fatal1ty commented on July 1, 2024

It was easy to implement :)

from mashumaro.

mishamsk avatar mishamsk commented on July 1, 2024

@Fatal1ty nice! you are definitely moving faster than I do;-)

from mashumaro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.