Is it ok that required fields doesn't work in load()

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Support for preprocessing functions has been added as of <a class="commit-link" data-h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[discuss] Validation behavior during deserialization vs. serialization about marshmallow HOT 16 CLOSED

marshmallow-code commented on August 19, 2024

[discuss] Validation behavior during deserialization vs. serialization

from marshmallow.

Comments (16)

sloria commented on August 19, 2024 1

As discussed in the above thread, defaults for deserialization can be defined in make_object or in a preprocessor. For example, if you want the deserialization defaults to be the same as serialization defaults, you could do the following:

from marshmallow import fields, Schema
from marshmallow.validate import Range

class GETSchema(Schema):

    page = fields.Integer(default=1)
    per_page = fields.Integer(default=10, validate=Range(min=10, max=50))
    order_by = fields.Select(['id', 'name', 'priority'], default='priority')
    sort = fields.Select(['asc', 'desc'], default='desc')

    def make_object(schema, in_data):
        for name, field in schema.fields.items():
            if name not in in_data:
                in_data[name] = field.default
        return in_data

schema = GETSchema()
schema.load({}).data  # {'order_by': 'priority', 'per_page': 10, 'sort': 'asc', 'page': 1}

In lieu of the missing parameter discussed in previous comments, you can take advantage of the fact that extra kwargs passed to fields are stored in each field's metadata attribute. This would allow you to have different defaults between serialization/deserialization.

from marshmallow import fields, Schema
from marshmallow.validate import Range

class GETSchema(Schema):

    page = fields.Integer(default=1, missing='null')
    per_page = fields.Integer(default=10, missing='null', validate=Range(min=10, max=50))
    order_by = fields.Select(['id', 'name', 'priority'], default='priority', missing='id')
    sort = fields.Select(['asc', 'desc'], default='desc', missing='asc')

    def make_object(schema, in_data):
        for name, field in schema.fields.items():
            if name not in in_data and field.metadata.get('missing'):
                in_data[name] = field.metadata['missing']
        return in_data

schema = GETSchema()
schema.load({}).data  # {'order_by': 'id', 'per_page': 'null', 'sort': 'asc', 'page': 'null'}

Yes, I am aware that these workarounds are just that: hacky workarounds. I will open up an issue to reopen discussion of a built-in missing param.

EDIT: Fix first example

from marshmallow.

sloria commented on August 19, 2024

Yes, I am considering adding the required check to Schema.load. Validation upon serialization made sense when marshmallow only had the ability to serialize (but not deserialize) objects. With the new API in 1.0, validation can occur on input data passed to load. Thank you for the suggestion.

from marshmallow.

sloria commented on August 19, 2024

OK, so I've implemented the following behavior (now on dev).

Schema#load will raise/store an error if a required field is missing from the input dictionary.

class UserSchema(Schema):
    name = fields.String(required=True)
    email = fields.Email()

in_data = {'email': '[email protected]'}  # name is missing
data, errors = UserSchema().load(in_data)
errors  # {'name': 'Missing data for required field.'}
data  # OrderedDict([('email', '[email protected]')])

Note that None is still a valid value that may be passed without failing validation.

Schema#dump will raise/store an error if a required attribute is None on the input object.

u = User(name=None, email='[email protected]')
data, errors = UserSchema().dump(u)
errors  # {'name': 'Missing data for required field.'}
data  # OrderedDict([('name', None), ('email', '[email protected]')])

Schema#load will use the value of a field's default attribute if an input value is missing and the field is not required.
Schema#dump will use the value of a field's default attribute if the input object's corresponding attribute is None and the field is not required.

I'm not completely sure whether the dump behavior is optimal. Some questions left to answer:

Should dump perform the same validation steps that load does, including checking required fields?
Should the default for deserialization be the same as the default serialization, or should there be another Field parameter to allow for two different defaults?

Tagging this one for discussion.

from marshmallow.

sloria commented on August 19, 2024

Renamed the issue to broaden the scope of the discussion.

from marshmallow.

zerodivisi0n commented on August 19, 2024

In case of dump, when the fields are already according to the internal model, validation is not required at all.

Two separated default is not exactly needed. I am more inclined to think that the default values are only needed in dump and not in load. Otherwise it increases the complexity of code.

from marshmallow.

sloria commented on August 19, 2024

Thanks @zerodivisi0n for your input.

What if you don't need to use load but still want to do validation? Take the example of a REST API that allows requesting all objects for a specific resource.

blogs = Blogs.query.all()  # no need to use `load`
data, errors = BlogSchema().dump(blogs, many=True)

You raise a valid point though, that validation should occur before serialization and therefore is probably outside the scope of Schema#dump's responsibility.

from marshmallow.

sloria commented on August 19, 2024

I've modified the Field class to only apply user-defined validators on deserialization. The default parameter only applies to serialization.

I think there may be a use case for a second default parameter for missing input values, similar to the missing parameter in colander: http://colander.readthedocs.org/en/latest/null.html.

However, if support for preprocessing functions is added (see #47), there would be no need for the new parameter, since missing values could be inserted by a preprocessor.

from marshmallow.

zerodivisi0n commented on August 19, 2024

I hesitate to answer what solution is better. It seems to me that additional missing parameter is over-engineering. But setting default value for single field in preprocessor is bad too. In the case when there is a lot of such fields it requires repeated code for each field. So I guess that missing parameter is better for these cases.

from marshmallow.

sloria commented on August 19, 2024

Support for preprocessing functions has been added as of 7f23184. Missing values can be handled by either a pre-processing function or in make_object, so I won't add the extra parameter for now.

Closing this issue since the proper behavior has been implemented.

from marshmallow.

sloria commented on August 19, 2024

@zerodivisi0n I didn't see your comment before I posted my most recent comment and closed the issue. Reopening for further discussion.

from marshmallow.

sloria commented on August 19, 2024

I'm going to hold off on adding the missing parameter for now, to avoid the possibility of adding unneeded code. I believe a common use case for deserialization will be to use make_object to construct an app-level object. Defaults can be defined with dict.get.

class BlogSerializer(Schema):
    # fields...
    def make_object(self, in_data):
         return Blog.query.find_one(_id=in_data.get('_id', None),
                                    title=in_data.get('content, '')

This isn't to say that the parameter won't be added in future releases. A compelling use case may arise that cannot be cleanly addressed using the existing hooks. Until we identify that use case, it will not be included.

from marshmallow.

zerodivisi0n commented on August 19, 2024

It's ok to use make_object for that purposes.

from marshmallow.

rastikerdar commented on August 19, 2024

But please consider this simple code:

class GETSchema(Schema):
    page = fields.Integer(default=1)
    per_page = fields.Integer(default=10, validate=Range(min=10, max=50))
    order_by = fields.Select(['id', 'name', 'priority'], default='priority')
    sort = fields.Select(['asc', 'desc'], default='desc')

How do i set all default values without extra code and much complexity for deserializing(load)?

from marshmallow.

andrewbaxter commented on August 19, 2024

In case of dump, when the fields are already according to the internal model, validation is not required at all.

I get data from a number of sources, mix it with some of my own data, validate and persist. Later, I retrieve the data to do processing. I need to make sure the data's in the expected format, ideally before it leaves this edge system.

Validation would act as a debugging aid, a filter (for bad data), and would notify us if any of the sources changed their data formats before it clogs up the system.

I'd also like to point out that http://marshmallow.readthedocs.org/en/latest/quickstart.html#validation states: Schema.dump() also validates the format of its fields and returns a dictionary of errors. which I took to indicate that validation does occur when dumping data.

I realize it's probably not easy to re-add validation now that it's been removed for some time, but I'd really appreciate it if this feature were to come back.

If I may present an alternative implementation, having validation as an optional separate step that always operates on Python-native data (before dumping or after loading) would solve my issues.

from marshmallow.

sloria commented on August 19, 2024

@andrewbaxter Could you use Schema.validate before serializing?

from marshmallow.

andrewbaxter commented on August 19, 2024

Yeah, thanks, I'm afraid I hadn't noticed that. #189 still is a problem, but I think that's not by design so I'll go back to that issue. I'll open another issue regarding the docs.

from marshmallow.

[discuss] Validation behavior during deserialization vs. serialization about marshmallow HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent