Coder Social home page Coder Social logo

fatal1ty / mashumaro Goto Github PK

View Code? Open in Web Editor NEW
730.0 16.0 40.0 1.75 MB

Fast and well tested serialization library

License: Apache License 2.0

Python 99.82% Shell 0.10% Just 0.07%
serialization deserialization python python3 marshalling dataclasses json msgpack yaml typehints

mashumaro's People

Contributors

dependabot[bot] avatar fatal1ty avatar gshank avatar kianmeng avatar matthew-chambers-pushly avatar mishamsk avatar peterallenwebb avatar ra80533 avatar sirkonst avatar to-bee avatar ydylla avatar zupo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mashumaro's Issues

Alias option won't work with slots enabled

Hi!
Just noticed that if i enable slots (to reduce memory usage) alias option doesn't work

@dataclass(slots=True)
class DataClass(DataClassJSONMixin):
    a: int = field(metadata=field_options(alias="FieldA"))
    b: int = field(metadata=field_options(alias="#invalid"))

x = DataClass.from_dict({"FieldA": 1, "#invalid": 2})  # DataClass(a=1, b=2)
print(x)
x.to_dict()  # {"a": 1, "b": 2}  # no aliases on serialization by default

got error

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    x = DataClass.from_dict({"FieldA": 1, "#invalid": 2})  # DataClass(a=1, b=2)
  File "<string>", line 26, in from_dict
TypeError: DataClass.__init__() missing 2 required positional arguments: 'a' and 'b'

don't know if there is some workaround, probably better to mention that in docs

Using `InitVar` with mashumaro

First up, thanks for the library. It's providing really useful for something I started working on.

I've run into one problem. I'd like some properties of the dataclass to be ignored for most purposes. InitVar appears ideal for this usecase, https://docs.python.org/3/library/dataclasses.html#init-only-variables

If a field is an InitVar, it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level fields() function. Init-only fields are added as parameters to the generated init() method, and are passed to the optional post_init() method. They are not otherwise used by dataclasses.

However, adding an InitVar seems to trip mashuramo. It complains that it's not serializable. I would expect InitVars to be ignored, because they are explicitly not intended to be part of the final serialisation.

class Organization(object):
     api_token: InitVar[str] = None
raise UnserializableField(fname, ftype, parent)
mashumaro.exceptions.UnserializableField: Field "api_token" of type dataclasses.InitVar in models.Organization is not serializable
  • Are there any work arounds at the moment?
  • Does the above make sense, should InitVars be ignored by mashuramo's validation?

`TypeError` using PEP-585 standard collection annotation types for Python <3.9

Summary

In the release v2.8 of mashumaru one of the updates includes support for PEP 585 compliance.

This implies supporting type annotations using the standard collection types list, tuple, ... instead of the corresponding generics typing.List, typing.Tuple, ...

However, on Python 3.7 or Python 3.8 a TypeError is raised.

Reproduce the Error

The following code snippet

from __future__ import annotations
from mashumaro import DataClassJSONMixin

# This works fine
x: list[int] = [1, 2, 3]
print(x)

# This raises TypeError on Python <3.9
class MyClass(DataClassJSONMixin):
    arg1: list[int]

runs without issues on Python 3.9, but produces an error on Python 3.7 and and Python 3.8:

TypeError: 'type' object is not subscriptable

Environment

  • Operating System: Mac OS X
  • mashumaro Version: 2.10.1
  • Python Versions: 3.7.6, 3.8.1, 3.9.1

datetime parsing does not handle generic ISO-8601 strings

Currently the generated from_dict code calls datetime.fromisoformat():

elif origin_type in (datetime.datetime, datetime.date, datetime.time):
return f'{value_name} if use_datetime else ' \
f'datetime.{origin_type.__name__}.' \
f'fromisoformat({value_name})'

fromisoformat is only designed to invert the strings generated by datetime.isoformat():

This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.

According to mashumaro's documentation:

use_datetime: False  # False - load datetime oriented objects from ISO 8601 formatted string, True - keep untouched

I believe it should be within mashumaro's scope to handle generic ISO-8601 strings

Serialization strategy for TypeVar does not work in BaseConfig

I find when I define a TypeVar variables of that type are always serialized with the first type listed in the TypeVar definition.

In an attempt to fix that I tired to set a global serialization strategy, however that strategy was never selected, rather, the selection seems to be also based on the first type listed in the TypeVar. Quite a bit of the code in metaprograming.py seems to handle a TypeVar so I wonder if I have something else missing?

The basic code is:

Numeric = TypeVar('Numeric', float, int)  # <== here float is first in the list, seems to be selected as the key for serialization strategy.

class NumericStrategy(SerializationStrategy):
    def serialize(self, value):
        return int(value)

    def deserialize(self, value):
        return int(value)

@dataclass
class foo(DataClassYAMLMixin):
    bar: Numeric = 1
    class Config(BaseConfig):
            serialization_strategy = {
                Numeric: NumericStrategy(),  # <== this is the line I think should activate the strategy
                int: NumericStrategy(),
                #float: NumericStrategy(),  # <== it seems that only this line would activate the strategy
            }

__slots__ are ignored

Hey there! First, thanks for this great library, I am really enjoying using it!

Sometimes I like to use __slots__ with dataclasses to prevent me from accidentally setting a value to a (typo-ed) new attribute instead of to an existing attribute.

However, if I use the DataClassDictMixin from mashumaro, it seems like __slots__ are ignored. I.e. If I set a value to a non-existing attribute, I do not get AttributeError, but a new attribute is created and value is set to it, as if __slots__ would not exist.

Here's a short pytest case to showcase the problem:

import pytest
from dataclasses import dataclass
from mashumaro import DataClassDictMixin


@dataclass
class Foo:
    __slots__ = ["number"]
    number: int


@dataclass
class Bar(DataClassDictMixin):
    __slots__ = ["number"]
    number: int


def test_slots():

    foo = Foo(1)
    with pytest.raises(AttributeError) as err:
        foo.new_attribute = 2
    assert str(err.value) == "'Foo' object has no attribute 'new_attribute'"

    bar = Bar(1)
    bar.new_attribute = 2
    # -> should also fail with "'Bar' object has no attribute 'new_attribute'",
    # but it doesn't

to_dict() fails for dictionaries with tuple keys

We have some dictionaries which have tuple keys. 'from_dict' works okay with a pre-constructed dictionary, but to_dict() fails with the error: TypeError: unhashable type: 'list'

`#!/usr/bin/env python
from typing import Union, Tuple, Dict
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType

@DataClass
class TestClass(DataClassDictMixin):
name: str
patches: Dict[Tuple[str, str], str]

dct = {
'name': 'testing',
'patches': {
('one', 'name'): 'test1',
('two', 'order'): 'test2',
('three', 'change'): 'test3',
}
}

obj = TestClass.from_dict(dct)
print(obj)

new_dct = obj.to_dict()
print(new_dct)

`

to_dict and from_dict do not support None as value for an Optional nested in Tuple

  • mashumaro version: 3.0
  • Python version: 3.6.9 (PyEnv, CPython)
  • Operating System: Ubuntu 20.04

Description

I'm trying to set an optional value in a Tuple (in the example Tuple[Optional[int], int]) and export it in YAML, but it results in a TypeError. By looking at the generated code (see below), we can see the Optional is ignored by the builder.

What I Did

from dataclasses import dataclass, field
from typing import Tuple, Optional

from mashumaro.mixins.yaml import DataClassYAMLMixin

@dataclass
class Foo(DataClassYAMLMixin):
    bar: Tuple[Optional[int], int] = field(default_factory=lambda: (None, 42))

print(Foo().to_dict())

Output:

Traceback (most recent call last):
  File "test2.py", line 17, in <module>
    print(Foo().to_dict())
  File "<string>", line 7, in to_dict
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Generated code for to_dict:

__main__.Foo:
def to_dict(self):
    kwargs = {}
    value = getattr(self, 'bar')
    if value is None:
        kwargs['bar'] = None
    else:
        kwargs['bar'] = [int(value[0]), int(value[1])]
        # should have been [int(value[0]) if value[0] is not None else None, int(value[1])]
    return kwargs
setattr(cls, 'to_dict', to_dict)

EDIT: from_dict also suffers from the same issue

field serde specification does not work?

Following the README, I'm trying to add a custom serde for a numpy array:

from dataclasses import dataclass
from dataclasses import field
import pickle

from mashumaro import DataClassMessagePackMixin
import numpy as np

@dataclass
class Numpy(DataClassMessagePackMixin):
    data: np.ndarray = field(metadata={'serialize': pickle.dumps, 'deserialize': pickle.loads})

It does not seem to work, as mashumaru complains:

UnserializableField: Field "data" of type numpy.ndarray in __main__.Numpy is not serializable

Python 3.8.10, mashumaro 2.6.2

P.S. (also consider adding mashumaro.__version__ property, currently I could not figure out how to check the mashumaro version without trying to pip install it again)

Custom IntEnum dumping

Hi, first of all, great package, congrats!
My question is the following: I would like to achieve that an IntEnum class would be dumped as its name to yaml.
I tried the following:

class Example(IntEnum, SerializableType):
    A = 1
    B = 2

    def _serialize(self):
        return {"value": self.name}

    @classmethod
    def _deserialize(cls, value):
        return Example.__members__[value["value"]]

However I get this exception: TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Is there any other way to achieve this?
Thanks!

Support for dataclasses.field default_factory argument

Hi @Fatal1ty,

Thanks for making this library, I've found it to be easy to use and very performant.

Today I ran into an issue where deserialization of a subclass inheriting a field with default_factory failed. Here's a minimal example:

from dataclasses import dataclass
from mashumaro import DataClassJSONMixin

@dataclass()
class A(DataClassJSONMixin):
    foo: List[str] = field(default_factory=list)
 
@dataclass()
class B(A):
    pass

print(A())  # A(sentences=[])
print(B())  # B(sentences=[])
print(A.from_dict({}))  # A(sentences=[])
print(B.from_dict({}))  # Exception
~/.pyenv/versions/3.8.2/lib/python3.8/site-packages/mashumaro/serializer/base/metaprogramming.py in from_dict(cls, d, use_bytes, use_enum, use_datetime)

MissingField: Field "foo" of type typing.List[str] is missing in __main__.B instance

I think this occurs because when checking for default values of ancestors, only field.default is extracted, ignoring field.default_factory --

d[field.name] = field.default
.

Removing use_enum & use_datetime from from_dict() in favor of auto detection?

Hi,
why does the from_dict() function has use_enum & use_datetime as options, instead of an automatic isinstance({value_name}, {type_name(origin_type)}) check?

I think an instance check right at the beginning near the value is None check would also be useful for other special types like UUID, IPv4Address or Path where otherwise the current from_dict function would fail if it already contains a valid instance of the origin type.

Are you open for a pull request which removes the two options? Do you think it would even be possible? Or is it not inline with your design goals.

broken serialization for subclass of MutableMapping

Commit 501b648 broke Mashumaro for our project (dbt). The problem appears to be that Python reports None as the origin for one of our classes that inherits from a class that also inherits from MutableMapping.

There is a test at https://github.com/gshank/mashumaro/blob/broken_config/tests/test_mutable_mapping.py. This test passes on Python 3.6 but is broken in Python 3.8.

I made an attempt to fix it but just went around in circles. Putting back the 'is_dataclass' calls that were removed in the breaking commit fixed this test, but broke two other tests: tests/test_data_types.py::test_dataclass_field_without_mixin and tests/test_data_types.py::test_serializable_type_dataclass.

Datamapper example produces error on list items

Great work on this package. Seems to be the only package to support the schema changes i want to perform on json coming in from an external api.

However the provided example in /examples/json_remapping.py does not work for list items. Suppose i have a dataclasses setup like below (note the companies attribute)

@dataclass
class Company(DataClassJSONMixin):
    id: int
    name: str

    __remapping__ = {
        "ID": "id",
        "NAME": "name",
    }

@dataclass
class User(DataClassJSONMixin):
    id: int
    username: str
    email: str
    companies: List[Company]

    __remapping__ = {
        "ID": "id",
        "USERNAME": "username",
        "EMAIL": "email",
        "COMPANIES": ("companies", Company.__remapping__),
    }

We will receive a AttributeError: 'list' object has no attribute 'items' because the remapper does not seem to be prepared for list items:

def remapper(d: Dict[str, Any], rules: RemappingRules) -> Dict[str, Any]:
    result = {}
    for key, value in d.items():
        mapped_key = rules.get(key, key)
        if isinstance(mapped_key, tuple):
            value = remapper(value, mapped_key[1])
            result[mapped_key[0]] = value
        else:
            result[mapped_key] = value
    return result

How should i go about changing the remapper to also support list items?

String fields nether validated nor casted

I am decalring class with str field. I expect, that after parsing it will contain only str data. Possibilites are: cast any other types to str or raise a validation error during parsing. None of this happends

E.G, this code works and obj.id will be a list though it is declared as str

@dataclass
class MashumaroTodo(DataClassDictMixin):
    id: str


obj = MashumaroTodo.from_dict({"id": ["invalid data"]})

Python 3.8 support

There is an issue with the InitVar test for me, trying to port it to Python 3.8.

Inconsistent ordering of Optional[Union[...]] types

We have a NodeConfig class that has a 'unique_key' attribute defined: unique_key: Optional[Union[str, List[str]]] = None

I have two identical tests in two different test directories ('test' and 'tests'). One of them is failing because "unique_key": "id" is converted to "unique_key": ["i", "d"], which implies that the List is processed first. The other one is not failing.

When I dump out the compiled code, I see a difference in the order:

< raise InvalidFieldValue('unique_key',typing.Union[str, typing.List[str], None],value,cls)
`

            raise InvalidFieldValue('unique_key',typing.Union[typing.List[str], str, None],value,cls)

`
I'm not sure where to look in the code for the order of the Union types.

self-referencing/forward-references dataclasses are not supported

Mashumaro does not currently support self-referencing classes, the code-generation fails when it attempts to reflect on the field's type (which is a forward reference)

For example, consider the following case:

import dataclasses
from typing import Optional

@dataclasses.dataclass
class Node:
    value: str
    next: Optional['Node'] = None

@dataclasses.dataclass
class LinkedList:
    head: Optional[Node] = None

a = Node("A")
b = Node("B")
c = Node("C")

a.next = b
b.next = c

linked_list = LinkedList(head=a)

print("list", dataclasses.asdict(linked_list))
print("A", dataclasses.asdict(a))
print("B", dataclasses.asdict(b))
print("C", dataclasses.asdict(c))

This gives us the expected:

list {'head': {'value': 'A', 'next': {'value': 'B', 'next': {'value': 'C', 'next': None}}}}
A {'value': 'A', 'next': {'value': 'B', 'next': {'value': 'C', 'next': None}}}
B {'value': 'B', 'next': {'value': 'C', 'next': None}}
C {'value': 'C', 'next': None}

The equivalent classes using mashumaro:

import dataclasses
from typing import Optional

from mashumaro import DataClassJSONMixin

@dataclasses.dataclass
class Node(DataClassJSONMixin):
    value: str
    next: Optional['Node'] = None

@dataclasses.dataclass
class LinkedList(DataClassJSONMixin):
    head: Optional[Node] = None

a = Node("A")
b = Node("B")
c = Node("C")

a.next = b
b.next = c

linked_list = LinkedList(head=a)

print("list", linked_list.to_dict())
print("A", a.to_dict())
print("B", b.to_dict())
print("C", c.to_dict())

throws an error during Mashumaro's code generation:

Traceback (most recent call last):
  File "mashumaro_test.py", line 7, in <module>
    class Node(DataClassJSONMixin):
  File "mashumaro/serializer/base/dict.py", line 19, in __init_subclass__
    raise exc
  File "mashumaro/serializer/base/dict.py", line 15, in __init_subclass__
    builder.add_to_dict()
  File "mashumaro/serializer/base/metaprogramming.py", line 201, in add_to_dict
    for fname, ftype in self.fields.items():
  File "mashumaro/serializer/base/metaprogramming.py", line 78, in fields
    return self.__get_fields()
  File "mashumaro/serializer/base/metaprogramming.py", line 69, in __get_fields
    for fname, ftype in typing.get_type_hints(self.cls).items():
  File "/usr/lib/python3.9/typing.py", line 1410, in get_type_hints
    value = _eval_type(value, base_globals, localns)
  File "/usr/lib/python3.9/typing.py", line 279, in _eval_type
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
  File "/usr/lib/python3.9/typing.py", line 279, in <genexpr>
    ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
  File "/usr/lib/python3.9/typing.py", line 277, in _eval_type
    return t._evaluate(globalns, localns, recursive_guard)
  File "/usr/lib/python3.9/typing.py", line 533, in _evaluate
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
NameError: name 'Node' is not defined

generate schema from models?

is there any way to generate a data dict of the model structure (field + type)? This would allow to generate typescript definitions for example ala poor mans openAPI generator type tool?

mypy raises error on writing msgpack to binary file

Line of code which save msgpack to binary file, cause mypy to raise typing error. How can I fix typing here? Save/load works fine, by the way.

@attr.s(auto_attribs=True)
class Phrase(DataClassMessagePackMixin):
    some_data: str
....

with open(base_data_path / "my_data", "wb") as f:
        f.write(phrase.to_msgpack())

Here, mypy raises typing error:
error: Argument 1 to "write" of "IO" has incompatible type "Union[str, bytes, bytearray]"; expected "bytes"

mypy --version

mypy 0.930

to_dict() should take an options object or pass through kwargs

  • I want to interact with outside data types that need camelKeys, but want to use snake_keys for Python objects.
  • I want these rules to be applied recursively so that whenever we convert a child DataClassDictMixinthe behavior is carries through.

I could get most of this behavior by overloading to_dict, and intercepting the input or output. The problem is that in some scenarios I want the camelCase behavior and in some scenarios I don't. An example would be JSON serialization where it's enabled, and dict serialization where it's not. If I add a flag to to_dict to control the behavior it won't pass through to nested instances. We could convert the method to take an Options dict, or we could pass through kwargs to recursive invocations.

TestCase for CodeBuilder.defaults property

It seems to me as if the defaults property of the CodeBuilder is nurtured by
self.cls.__dict__ (via namepsace property).

If I look at an actual class I got:
cls.__dataclass_fields__ holding a mapping from field names to dataclass Field instances, which in turn has either a default or a default_factory.

Maybe the default property should read values from the cls.__dataclass_fields__ just as normally instanciated dataclass? (or at least not raise a MissingError, as cls(**kwargs) would fill them in afterwards, and only break if default is a NoneType)

Member variables of forward-referenced class types are ignored during serialization

Member variables of forward-referenced class types are ignored during serialization.

For example, consider the following case:

from __future__ import annotations
from dataclasses import dataclass
from mashumaro import DataClassDictMixin

@dataclass
class A(DataClassDictMixin):
    a:B

@dataclass
class B(DataClassDictMixin):
    b:int

a = A(B(1))
print(a.to_dict())

This gives us the expected:

{'a': {'b': 1}}

On the other hand, running the following case does not give the expected results.

@dataclass
class Base(DataClassDictMixin):
    pass

@dataclass
class A1(Base):
    a:B1

@dataclass
class B1(Base):
    b:int


a = A1(B1(1))
print(a.to_dict())

Result:

{}

At this time, no exception or error message will be generated.

Expected:

{'a': {'b': 1}}

Thanks for the excellent tool!

Serializing any type via cattrs-like structure/unstructure

Hi, thanks for a great library!

A lot of field types are supported, is there a reason they can't be serialized on their own?
I would be nice to be able to do this:

mashumaro.structure({"test": 4}, collections.Counter[str])
mashumaro.unstructure(my_list_of_named_tuple_instances)

Problem with function type_name, exact in the __qualname__ attribute

Hello. Recently, while adapting my project code to the mashumaro library, I encountered a strange bug. Here is the code itself to play the bug:

from enum import Enum
from typing import Set
from dataclasses import dataclass
from mashumaro import DataClassJSONMixin

class testing:
    
    @staticmethod
    def pets():
        
        class PetType(Enum):
            CAT = 'CAT'
            MOUSE = 'MOUSE'

        @dataclass(unsafe_hash=True)
        class Pet(DataClassJSONMixin):
            name: str
            age: int
            pet_type: PetType

        @dataclass
        class Person(DataClassJSONMixin):
            first_name: str
            second_name: str
            age: int
            pets: Set[Pet]


        tom = Pet(name='Tom', age=5, pet_type=PetType.CAT)
        jerry = Pet(name='Jerry', age=3, pet_type=PetType.MOUSE)
        john = Person(first_name='John', second_name='Smith', age=18, pets={tom, jerry})

        dump = john.to_json()
        person = Person.from_json(dump)

testing.pets()

I made this code based on an official example. Here is the runtime error itself:

Traceback (most recent call last):
  File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 31, in <module>
    start(fakepyfile,mainpyfile)
  File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 30, in start
    exec(open(mainpyfile).read(),  __main__.__dict__)
  File "<string>", line 36, in <module>
  File "<string>", line 16, in pets
  File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.8/site-packages/mashumaro/serializer/base/dict.py", line 21, in __init_subclass__
    raise exc
  File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.8/site-packages/mashumaro/serializer/base/dict.py", line 13, in __init_subclass__
    builder.add_from_dict()
  File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.8/site-packages/mashumaro/serializer/base/metaprogramming.py", line 323, in add_from_dict
    self.compile()
  File "/data/user/0/ru.iiec.pydroid3/files/arm-linux-androideabi/lib/python3.8/site-packages/mashumaro/serializer/base/metaprogramming.py", line 191, in compile
    exec(self.lines.as_text(), globals(), self.__dict__)
  File "<string>", line 43
    kwargs['pet_type'] = value if use_enum else __main__.testing.pets.<locals>.PetType(value)
                                                                      ^
SyntaxError: invalid syntax

[Program finished]

The problem seems to be the __qualname__ attribute, which is in the function type_name, of the file helpers.py. I tried to exclude the closure names (name.<locals>) that gives the qualname attribute, but failed.

How to omit a field on `from_dict`

I am de-serializing a dictionary from a MongoDB query, and I want to dis-regard the _id field on the incoming record. Is there a way to do that?

Handling defaults in dataclasses

QUESTION

When it comes to default values in dataclasses, is there any documentation how it should be handled properly? Unfortunately I haven't found anything related.

e.g. I have my schema below:

@dataclass
class LogLevel(DataClassYAMLMixin):
    level: str = field(default="INFO")

@dataclass
class Config(DataClassYAMLMixin):
    logging: LogLevel

yaml file:

logging:


there is no log level defined in yaml, only 'logging' section, so expected dataclass will be created using defaults, but it's not.

Class: MyServiceClass, Mq(mq=MqSetup(setup=1, setup2='neco')), Config(logging=None)

instead of:

Class: MyServiceClass, Mq(mq=MqSetup(setup=1, setup2='neco')), Config(logging=LogLevel(level='INFO'))

Can this usecase be handled by mashumaro library, please?

cannot install with conda for python 3.9

I'm trying to install mashumaro for python3.9 with conda. I use the following command:

conda create -n test -c conda-forge python=3.9 mashumaro

and I get the following errors:

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                                    

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package python conflicts for:
mashumaro -> backports-datetime-fromisoformat -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.6,<3.7|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.7|>=3.9,<3.10.0a0|>=3.5,<3.6.0a0']
mashumaro -> python[version='>=3.6']
python=3.9The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.34=0
  - feature:|@/linux-64::__glibc==2.34=0

Your installed version is: 2.34

This will also fail for python 3.10. However, I could install mashumaro for python3.6, 3.7, 3.8 without this problem, and I could install python 3.9 solely without issue.

Skip default value members on serialization

Hi!

Is there a way to skip variables that are default values from serialization/dict conversion?

exampe:

@dataclass
class MyClass(DataClassJSONMixin):
    name: str = None
    soul: str = None


c = MyClass()
c.name='itseme'

print(c.to_json())

output: {"name": "itseme", "soul": null}
desired output : {"name": "itseme"}

thanks!

mashumaro explodes when used as a vendor'd install

I have a project where I am installing mashumaro in a "vendor" subdirectory (via pdistx) for use by my code; because of the environment the code is running in, installing mashumaro globally is not an option, neither is updating PYTHON_PATH or similar. This means that my import line in my code ends up needing to look like: from vendor.mashumaro import DataClassJSONMixin

Setting up the vendor directory (see reproduction steps below) and running the following script:

from dataclasses import dataclass
from vendor.mashumaro import DataClassJSONMixin

@dataclass
class TestClassTwo(DataClassJSONMixin):
    field1: int

@dataclass
class TestClassOne(DataClassJSONMixin):
    field1: TestClassTwo

...results in the following stack trace:

Traceback (most recent call last):
  File "D:\Users\username\Documents\testdir\test2.py", line 10, in <module>
    class TestClassOne(DataClassJSONMixin):
  File "D:\Users\username\Documents\testdir\vendor\mashumaro\serializer\base\dict.py", line 14, in __init_subclass__
    builder.add_from_dict()
  File "D:\Users\username\Documents\testdir\vendor\mashumaro\serializer\base\metaprogramming.py", line 229, in add_from_dict
    self._from_dict_set_value(fname, ftype, metadata, alias)
  File "D:\Users\username\Documents\testdir\vendor\mashumaro\serializer\base\metaprogramming.py", line 250, in _from_dict_set_value
    unpacked_value = self._unpack_field_value(fname=fname, ftype=ftype, parent=self.cls, metadata=metadata)
  File "D:\Users\username\Documents\testdir\vendor\mashumaro\serializer\base\metaprogramming.py", line 800, in _unpack_field_value
    raise UnserializableField(fname, ftype, parent)
vendor.mashumaro.exceptions.UnserializableField: Field "field1" of type TestClassTwo in TestClassOne is not serializable

This appears to happen because of the following hardcoded class name in meta/helpers.py:

DataClassDictMixinPath = "mashumaro.serializer.base.dict.DataClassDictMixin"

...which is no longer correct when the code is run in this way (the correct path would be vendor.mashumaro.serializer.base.dict.DataClassDictMixin).

I'm not sure what the best way to fix this is, but in the extremely short term I'm able to patch the class name for my specific use case. It would be great if I was able to vendor an install without having to edit it afterwards!

Reproduction steps:

  1. install pdistx if you don't already have it installed
  2. Create an empty directory for the test
  3. In that directory, create a requirements.txt file, as such:
mashumaro==2.9
msgpack==1.0.3
pyyaml==6.0
typing-extensions==4.0.0
  1. execute pdistx vendor -r requirements.txt vendor. This will create a 'vendor' subdirectory with mashumaro and its dependencies
  2. Create a test script using the test script at the beginning of this issue, execute it, and observe the stack trace.

Thanks for the excellent tool!

Inconsistent checks for invalid value type for str field type

If a class based on DataClassDictMixin has a field with type str it will construct instances from data that contains data of other types for that field, including numbers, lists, and dicts. However fields of other types, eg int, do not accept other non-compatible types. Not sure if this is intentional and I'm missing something here, but it kinda seems like unexpected/undesirable behaviour when you want the input data to be validated.

The following example only throws an error on the very last line:

from dataclasses import dataclass
from mashumaro import DataClassDictMixin


@dataclass
class StrType(DataClassDictMixin):
    a: str

StrType.from_dict({'a': 1})
StrType.from_dict({'a': [1, 2]})
StrType.from_dict({'a': {'b': 1}})


@dataclass
class IntType(DataClassDictMixin):
    a: int

IntType.from_dict({'a': 'blah'})

Deserialize json to union of different classes by parameter

Hi!
I'm trying to understand if it is possible to deserialize nested dataclass based on some value in main dataclass.
Don't know how to explain it more correctly, probably my code will say more than me:

I've such json:

{
    "pointList": [{
		"r": {some data for pointType 1},
		"x": 1,
		"y": 1,
		"pointType": 1
	}, {
		"p": {some data for pointType 4},
		"x": 2,
		"y": 2,
		"pointType": 4
	}
    ]
}

So i defined dataclass for each pointType i have. serialization worked like a charm, but how can I deserialize such json, choosing correct dataclass for each point?

class BaseRequest(DataClassJSONMixin):
    class Config(BaseConfig):
        code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]

@dataclass(slots=True)
class MapPoint(BaseRequest):
    x: int
    y: int
    pointType: int

@dataclass(slots=True)
class Point4(MapPoint):
    r: Point4Data = None
    pointType: int = 4

@dataclass(slots=True)
class Point1(MapPoint):
    p: Point1Data = None
    pointType: int = 1

@dataclass(slots=True)
class MapData(BaseRequest):
    pointList: List[Union[Point1,Point4]]

Fo example make some dict where i can set class for each pointType and pass it to deserialization function. Or this is imposible and i want too much? :)

Remapping json output and input

Is the function of remapping of the input and output of the JSON is supported?

I see dict_param but it does not work as expected.

@DataClass
class User(DataClassJSONMixin):
id: int
username: str
email: str

assert User.from_json(json.dumps({"ID":1})) == User(id=1) # FAILED

Please add a Changelog

I couldn't find any information in the Readme about any breaking changes in the 3.0 release and there does not appear to be a changelog. Can you add a CHANGELOG.md file that contains the most important updates as well as a list of breaking changes for every release?

I like the format proposed by https://keepachangelog.com

Validation Fails on List[Optional[int]] types

Description

Mashumaro seems to fail on types of type: List[Optional[int]] for values that contain None. For example: [2, None].
It appears what might be happening is that mashumaro is trying to convert each element to an int and upon failure will raise.

Is this possibly a bug?
Thanks for the great library by the way!

Here is a minimal working example:

from __future__ import annotations

from dataclasses import dataclass
from typing import List, Union

from mashumaro import DataClassJSONMixin


@dataclass
class MyModel(DataClassJSONMixin):
    my_list: List[Union[int, None]]


@dataclass
class MyModel2(DataClassJSONMixin):
    my_list: List[Union[int, None, str]]


if __name__ == '__main__':
    # Works fine
    model_1_a = MyModel.from_dict({'my_list': [120, 1]})
    print(model_1_a)
    # prints: MyModel(my_list=[120, 1])

    # prints:
    # Field "my_list" of type typing.List[typing.Union[int, NoneType]] in __main__.MyModel has invalid value [None, 1]
    # The underlying error appears to be:
    # TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
    try:
        model_1_b = MyModel.from_dict({'my_list': [None, 1]})
    except Exception as error:
        print(error)

    model_2_a = MyModel2.from_dict({'my_list': [None, 1]})
    print(model_2_a)
    # prints: MyModel2(my_list=[None, 1])

Temporary solution

A temporary workaround, for those that may have also encountered this is to define your own deserializer using field(metadata={'deserializer': ...}):

from __future__ import annotations

from dataclasses import dataclass, field
from typing import List, Union, Optional

from mashumaro import DataClassJSONMixin


def _deserialize_list_optional_int(items: List[Optional[int]]) -> List[Optional[int]]:
    return [
        int(item) if item is not None else None
        for item in items
    ]


@dataclass
class MyModel(DataClassJSONMixin):
    my_list: List[Union[int, None]] = field(metadata={
        'deserialize': _deserialize_list_optional_int})


if __name__ == '__main__':
    # Works fine
    model_1_a = MyModel.from_dict({'my_list': [120, 1]})
    print(model_1_a)
    # prints: MyModel(my_list=[120, 1])

    model_1_b = MyModel.from_dict({'my_list': [None, 1]})
    print(model_1_b)

Provide options to to_dict and from_dict calls.

We have callouts that we use in our 'from_dict' and 'to_dict' calls that sometimes need to run some transformations and sometimes can't run the transformations. It's difficult to give them information on which case is correct for this particular call without the ability to pass options along to the callouts. One way to do that would be to allow an 'options' dictionary on to_dict and from_dict that is passed to pre/post serialize/deserialize calls.

Nested class serialization breaks with future annotations import

Minimal code to reproduce:

from __future__ import annotations # <-- Brings troubles
from dataclasses import dataclass
from mashumaro import DataClassDictMixin


@dataclass
class Root(DataClassDictMixin):
    @dataclass
    class Nested(DataClassDictMixin):
        x: int

    nested: Nested


cfg = Root.from_dict({"nested": {"x": 1}}) 

print(cfg)
Traceback [python==3.9.5 / mashumaro==2.9.1]:
Traceback (most recent call last):
  File "bug.py", line 15, in <module>
    cfg = Root.from_dict({"nested": {"x": 1}})
  File "<path-to-venv>/lib/python3.9/site-packages/mashumaro/serializer/base/dict.py", line 50, in from_dict
    builder.add_from_dict()
  File "<path-to-venv>/lib/python3.9/site-packages/mashumaro/serializer/base/metaprogramming.py", line 282, in add_from_dict
    for fname, ftype in self.field_types.items():
  File "<path-to-venv>/lib/python3.9/site-packages/mashumaro/serializer/base/metaprogramming.py", line 178, in field_types
    return self.__get_field_types()
  File "<path-to-venv>/lib/python3.9/site-packages/mashumaro/serializer/base/metaprogramming.py", line 152, in __get_field_types
    raise UnresolvedTypeReferenceError(self.cls, name) from None
mashumaro.exceptions.UnresolvedTypeReferenceError: Class Root has unresolved type reference Nested in some of its fields

Code runs fine without from __future__ import annotations line.

Note: with DataClassYAMLMixin you get a rather cryptic error message:

Traceback (most recent call last):
  File "bug-yml.py", line 14, in <module>
    cfg = Root.from_yaml("""
  File "<path-to-venv>/lib/python3.9/site-packages/mashumaro/serializer/yaml.py", line 51, in from_yaml
    return cls.from_dict(
  File "<string>", line 10, in from_dict
TypeError: __init__() missing 1 required positional argument: 'nested'

Feature Request: Optional Serde By Alias

Hello I have a feature request that would be very beneficial to us. We are moving models in and out of a document store where attributes contribute to the size of the document. Because of this we want to shorten attributes when saving the document and rehydrate them coming out.

But there are places in our application where we don't have access to the deserialized dataclass and instead just have a dict. We want to abstract this so we don't have to know what the aliased fields are when working directly against a dict, and instead would like to have the ability to deserialize the dict based on either the alias or the unaliased keys.

FEATURE REQUEST: MIXED ALIAS DESERIALIZATION

Here is an example of a case where I'd like to build a new user entity from a dict. Then I'd like to deserialize it into my UserEntity. But while building my new user dict I don't want to have to know what the aliased fields are:

@dataclass
class UserEntity(DataClassDictMixin):
  name: str = field(metadata=field_options(alias="n"))
  age: str = field(metadata=field_options(alias="a"))

new_user = { "name": "Bill", "age": 30 }
new_user_ent = UserEntity.from_dict(new_user)

This doesn't work with DataClasDictMixin as far as I'm aware, so we created a helper mixin that translates unaliased fields to aliased fields before deserializing in the __pre_deserialization__ method:

ALLOW_UNALIASED_DESERIALIZATION = True

@classmethod
def __pre_deserialize__(cls, d: Dict[Any, Any]) -> Dict[Any, Any]:
    if cls.ALLOW_UNALIASED_DESERIALIZATION:
        alias_revmap = cls.get_alias_revmap()
        for field, alias in alias_revmap.items():
            if field in d:
                d[alias] = d[field]
                d.pop(field)
    return d

@classmethod
def get_alias_map(cls) -> Dict[str, str]:
    """
    If aliases are defined. Returns a map of the aliased fields to the non-aliased field names
    """
    if not hasattr(cls, "__alias_map__"):
        cls.__alias_map__ = {}
        for key, field in cls.__dataclass_fields__.items():
            alias = field.metadata.get("alias")
            if alias:
                cls.__alias_map__[alias] = key
    return cls.__alias_map__

@classmethod
def get_alias_revmap(cls) -> Dict[str, str]:
    """
    If aliases are defined. Returns a map of the non-aliased fields to the alias field names
    """
    if not hasattr(cls, "__alias_revmap__"):
        alias_map = cls.get_alias_map()
        cls.__alias_revamp__ = {
            v: k for k, v in alias_map.items()
        }
    return cls.__alias_revamp__

FEATURE REQUEST: OPTIONALLY DISABLE ALIAS SERIALIZATION

When serializing entities, aliases are not used by default. Once again we have a situation where we'd like to optionally opt out of alias serialization. This can be done with the TO_DICT_ADD_BY_ALIAS_FLAG, but perhaps an oversight is that you cannot make alias serialization the default and the opt out of it. If it is optional, it's only opt in.

@dataclass
class MyEntity(DataClassDictMixin):
    a: str = field(metadata=field_options(alias="FieldA"))
    b: str = field(metadata=field_options(alias="FieldB"))

    class Config(BaseConfig):
        serialize_by_alias = True
        code_generation_options = [TO_DICT_ADD_BY_ALIAS_FLAG]


def scratch():
    me = MyEntity(
        a="Hello",
        b="World"
    )
    print(me.to_dict())
    print(me.to_dict(by_alias=False))


if __name__ == "__main__":
    scratch()

In this sample I would expect print(me.to_dict()) to use aliases during serialization, and for print(me.to_dict(by_alias=False)) to opt out of the default. But even though I set the serialize_by_alias flag it will still use the unaliased fields during serialization because I've also added the TO_DICT_ADD_BY_ALIAS_FLAG.

I tried removing the TO_DICT_ADD_BY_ALIAS_FLAG entirely, but when you do that you no longer have access to by_alias in the to_dict() and it will throw this exception:

to_dict() got an unexpected keyword argument 'by_alias'

I tried to get around this by overwriting the to_dict method in both my helper mixin and entity class, but that code never gets hit.

    def to_dict(self, **kwargs) -> dict:
        print("HERE")
        return super().to_dict(**kwargs)

Whether as an instance method or a class method.


Thank you for taking the time to read through my feature request.

Broken serialization when using Dict + serialization_strategy

I'm hitting a weird bug. When I use a Dict field type with a serialization_strategy, the serialize method is also passed the entire dict, not individual items of the dict. The result is broken serialization as every value contains all values.

Confusing, right? I'll try to explain with an example.

from enum import Enum
from typing import Set
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from typing import Dict
from decimal import Decimal
from mashumaro.config import BaseConfig
from mashumaro.types import RoundedDecimal


@dataclass()
class Foo(DataClassDictMixin):
    bar: Dict[str, Decimal]

    class Config(BaseConfig):
        serialization_strategy = {
            Decimal: RoundedDecimal(),
        }


foo = Foo(bar={"a": 1, "b": 2})
print(foo.to_dict())
assert foo.to_dict() == {"bar": {"a": "1", "b": "2"}}

This fails, because foo.to_dict() does not return {"bar": {"a": "1", "b": "2"}} but it returns {'bar': {'a': "{'a': 1, 'b': 2}", 'b': "{'a': 1, 'b': 2}"}}. See it there? Every value in the returned dict contains all values of the dict.

Support Union type

I read in the TODO list that the support of Union types is planned. Is there any timeline?

Example use case:

@dataclass 
class CustomShape(DataClassYAMLMixin):
    name: str
    num_corners: int

@dataclass
class ShapeCollection(DataClassYAMLMixin):
    shapes: List[Union[str, CustomShape]]
# shapes.yaml
shapes:
  - triangle
  - name: square
    num_corners: 4

Expected parsed structure:

with open(file="shapes.yaml", mode='r') as f:
    shapes: ShapeCollection = ShapeCollection.from_yaml(data=f)

print(shapes)
# ShapeCollection(shapes=["triangle, CustomShape(name=square, num_corners: 4)"])

(btw: congrats for the project, it makes the most out from the combination of dataclasses and YAML)

Possible to serialize a top-level list/array?

JSON allows an array at top level (instead of an object). It would be nice if we have eg a List[MyDataClass] to be able to serialize this directly without a wrapper. Is this possible in mashumaro?

Currently I'm working around this like follows:

json = f"[{','.join([item.to_json() for item in my_list])}]"

Mashumaro Doesn't Understand Imports In Imported File

my_enum.py:

class MyEnum(str, Enum):
  a = "A"
  b = "B"

my_base_class.py

from my_enum import MyEnum
class MyBaseClass(DataClassDictMixin):
  my_enum: MyEnum

my_class.py

from my_base_class import MyBaseClass

class MyClass(DataClassDictMixin, MyBaseClass):
  hello: str

main.py

from my_class import MyClass
print("hello")

Gives an error like:

name 'MyEnum' is not defined
Traceback (most recent call last):
name 'MyEnum' is not defined

  ...

  File "....", line 5, in <module>
    class MyClass(MyBaseClass):
  File "..../.pyenv/versions/3.7.7/lib/python3.7/abc.py", line 126, in __new__
    cls = super().__new__(mcls, name, bases, namespace, **kwargs)
  File "..../lib/python3.7/site-packages/mashumaro/serializer/base/dict.py", line 23, in __init_subclass__
    raise exc
  File "..../lib/python3.7/site-packages/mashumaro/serializer/base/dict.py", line 19, in __init_subclass__
    builder.add_to_dict()
  File "..../lib/python3.7/site-packages/mashumaro/serializer/base/metaprogramming.py", line 430, in add_to_dict
    for fname, ftype in self.field_types.items():
  File "..../lib/python3.7/site-packages/mashumaro/serializer/base/metaprogramming.py", line 169, in field_types
    return self.__get_field_types()
  File "..../lib/python3.7/site-packages/mashumaro/serializer/base/metaprogramming.py", line 144, in __get_field_types
    for fname, ftype in typing.get_type_hints(self.cls, globalns).items():
  File "..../.pyenv/versions/3.7.7/lib/python3.7/typing.py", line 982, in get_type_hints
    value = _eval_type(value, base_globals, localns)
  File "..../.pyenv/versions/3.7.7/lib/python3.7/typing.py", line 263, in _eval_type
    return t._evaluate(globalns, localns)
  File "..../.pyenv/versions/3.7.7/lib/python3.7/typing.py", line 468, in _evaluate
    eval(self.__forward_code__, globalns, localns),
  File "<string>", line 1, in <module>
  
NameError: name 'MyEnum' is not defined

When stepping through the debugger it looks like MyEnum is not in the globalns or localns when it tries to build a builder for the subclass.

How can I fix this without needing to import all parent class dependencies as imports to the subclasses?

Error when trying pip install

Hello,

I encounter this issue when using pip install

long_description=open('README.md').read(),
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe3 in position 13: illegal multibyte sequence

Please try to make all computer be able to install this package.

Thank you so much!

PEP-563 breaks SerializationStrategy

  1. SerializationStrategy example with DateTimeFormats works fine in py3.7
    https://github.com/Fatal1ty/mashumaro#user-defined-classes

  2. however introduction of PEP-563 via from __future__ import annotations
    https://www.python.org/dev/peps/pep-0563/
    breaks usage of SerializationStrategy with error trace:

Traceback (most recent call last):
  File ".../src/verify/code/mashu/mashu_pep563.py", line 22, in <module>
    class DateTimeFormats(DataClassDictMixin):
  File "/usr/lib/python3.7/site-packages/mashumaro/serializer/base/dict.py", line 19, in __init_subclass__
    raise exc
  File "/usr/lib/python3.7/site-packages/mashumaro/serializer/base/dict.py", line 15, in __init_subclass__
    builder.add_to_dict()
  File "/usr/lib/python3.7/site-packages/mashumaro/serializer/base/metaprogramming.py", line 163, in add_to_dict
    for fname, ftype in self.fields.items():
  File "/usr/lib/python3.7/site-packages/mashumaro/serializer/base/metaprogramming.py", line 53, in fields
    for fname, ftype in typing.get_type_hints(self.cls).items():
  File "/usr/lib/python3.7/typing.py", line 973, in get_type_hints
    value = _eval_type(value, base_globals, localns)
  File "/usr/lib/python3.7/typing.py", line 260, in _eval_type
    return t._evaluate(globalns, localns)
  File "/usr/lib/python3.7/typing.py", line 466, in _evaluate
    is_argument=self.__forward_is_argument__)
  File "/usr/lib/python3.7/typing.py", line 139, in _type_check
    raise TypeError(f"{msg} Got {arg!r:.100}.")
TypeError: Forward references must evaluate to types. Got <__main__.FormattedDateTime object at 0x7faec3540f60>.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.