geewee / django-auto-prefetching Goto Github PK
View Code? Open in Web Editor NEWAutomatic prefetching for Django
License: MIT License
Automatic prefetching for Django
License: MIT License
I've seen an issue where if a large nested object is POSTed to the server, the WritableNestedSerializer
returns the instance it created from create()
and then that object is fetched without select_related
or prefetch_related
applied to it.
Using AutoPrefetchViewSetMixin
, the GET requests are all prefetched correctly but responses from POSTs are not. Is this something within scope for this project or is there some other kind of work around?
I am having performance issues with drf. I feel like your library might help, but I can't figure out how to use it, or if it pertains. All the things I try result in no impact. Please see the stackoverflow question.
I'd appreciate any thoughts.
I see djangorestframework-serializer-extensions provides similar functionality with its extensions_auto_optimize = True
(documented here). Are there any relevant differences to your knowledge?
https://github.com/evenicoulddoit/django-rest-framework-serializer-extensions
Does this only work with modelviewsets? or can it work with Generic views
https://www.django-rest-framework.org/api-guide/generic-views/
If you forget to add many=True to a serializer that has the reverse side of the ForeignKey we calculate the wrong prefetch_related fields and we get an error. We can't catch this error early because it's only thrown when the queryset is evaluated
Hi,
I have a problem using your mixin when my viewset is having their own custom get_queryset
method. Its simple does not work... Short example:
class ArtistViewSet(AutoPrefetchViewSetMixin, viewsets.ModelViewSet):
def get_queryset(self):
return Artist.objects.filter(owner=self.request.user)
in this case, your mixin method get_queryset
is not invoked at all...
Is there a way around to make this working?
Thanks
In my serializer I have 6 same FK fields:
field = CustomSerializer(
source="field", read_only=True
)
field = CustomSerializer(
source="field2", read_only=True
)
And django_auto_prefetching.prefetch(queryset, self.get_serializer_class())
generate 2 LEFT JOIN, that cause:
django.db.utils.OperationalError: (1116, 'Too many tables; MariaDB can only use 61 tables in a join')
Can we prevent duplicate JOIN
by default?
Reference Implementation:
class AutoPrefetchViewSetMixin:
auto_prefetch_excluded_fields = set()
auto_prefetch_extra_select_fields = set()
auto_prefetch_extra_prefetch_fields = set()
def get_auto_prefetch_excluded_fields(self):
return self.auto_prefetch_excluded_fields
def get_auto_prefetch_extra_select_fields(self):
return self.auto_prefetch_extra_select_fields
def get_auto_prefetch_extra_prefetch_fields(self):
return self.auto_prefetch_extra_prefetch_fields
def get_queryset(self):
serializer = self.get_serializer()
qs = super().get_queryset()
kwargs = {
"excluded_fields": self.get_auto_prefetch_excluded_fields(),
"extra_select_fiedls": self.get_auto_prefetch_extra_select_fields(),
"extra_prefetch_fields": self.get_auto_prefetch_extra_prefetch_fields(),
}
return prefetch(qs, serializer, **kwargs)
def prefetch(
queryset,
serializer: Type[ModelSerializer],
*,
excluded_fields=set(),
extra_select_fields=set(),
extra_prefetch_fields=set(),
):
select_related, prefetch_related = _prefetch(serializer)
select_related = select_related + extra_select_fields - excluded_fields
prefetch_related = prefetch_related + extra_prefetch_fields - excluded_fields
try:
if select_related:
queryset = queryset.select_related(*select_related)
if prefetch_related:
queryset = queryset.prefetch_related(*prefetch_related)
return queryset
except FieldError as e:
raise ValueError(
f"Calculated wrong field in select_related. Do you have a nested serializer for a ForeignKey where "
f"you've forgotten to specify many=True? Original error: {e}"
)
When using the AutoPrefetchViewSetMixin
errors get thrown for fields which are not on the original serializer_class
(?). We have a GenericViewSet
which has a main model and actions which return different serializers (basically nested info). When using the Mixin, the following error appears:
b'{"message":"Invalid field name(s) given in select_related: \'gateway\', \'unit_id\', \'protocol\', \'type\', \'manufacturer\', \'conf\'. Choices are: address_id, type_id, parent_id, created_by, company_id","status":500,"error":true,"detail":""}'
Where the select_related
fields are the actual fields of the serializer being used and the Choices
are the fields on the main model/ serializer_class for this ViewSet.
I think PrimaryKeyRelatedField
also doesn't need to apply prefetchin.
The code of to_representation in PrimaryKeyRelatedField is:
def to_representation(self, value):
if self.pk_field is not None:
return self.pk_field.to_representation(value.pk)
return value.pk
The pk_field
field must be a simple type (not related, i think) and it is also accessible directly in the main query without applying a join.
I'm having big overloads because the PrimaryKeyRelatedField add a lot of unnecessary joins.
What do you think about it?? I can send a PR.
In case you don't want to add that change, maybe it would be good to make it configurable. By default it would stay as it is to avoid risking other projects that use the library
Thanks!
This looks really cool.
My team has been using and contributing to RestQL for several months and we built the majority of the implementation around auto eager loading/mapping definitions, from the view level.
We definitely considered trying to do auto eager in the way you have but certainly ran into the issue of SerializerMethodField and for that reason we opted not to implement that.
All that said, having the option to have this mixin potentially play well with django-restql may be advantageous for projects that don't have as many complex serializers, don't want to manage the mapping definition for those serializers (it's explicit but needs to be maintained by the team), but still want the option of GraphQL-like data fetching (and other niceties provided by RestQL).
I haven't used this library: it was listed on the most recent Django newsletter (congrats!). But it seems like these two projects have a lot in common (again, at this time I haven't tried out this project or looked at its source), and either combining or testing out integrating them could be very advantageous.
Curious for your thoughts!
Hi, i want to ask you something. Related the issue on #18, does this works in function based view? like, using method_decorator
or something like that?
First of all thanks for this awesome package.
I noticed an issue whereby if your serializer does not make use of any Foreign Key relations in the underlying Django model, and those relations are not nullable, they will ALL be prefetched. This could cause a performance issue for people who want to use a serializer that does not render those relations as they will be prefetched even though they are never going to be serialized.
The root cause appears to be the behaviour specified by the Django docs:
There may be some situations where you wish to call select_related() with a lot of related objects, or where you don’t know all of the relations. In these cases it is possible to call select_related() with no arguments. This will follow all non-null foreign keys it can find - nullable foreign keys must be specified. This is not recommended in most cases as it is likely to make the underlying query more complex, and return more data, than is actually needed.
For example if we had this set of models and serializer:
class MyRelation(models.Model):
pass
class MyModel(models.Model):
textfield: models.TextField()
relation = models.ForeignKey(MyRelation, on_delete=models.CASCADE)
class MySerializer(serializers.ModelSerializer):
class Meta:
model = MyModel
fields = ('textfield',)
The link from MyModel
to MyRelation
is not specified for use in the serializer, but this package will prefetch it anyway. The reason is because in this section of code:
def prefetch(queryset, serializer: Type[ModelSerializer]):
select_related, prefetch_related = _prefetch(serializer)
try:
return queryset.select_related(*select_related).prefetch_related(
*prefetch_related
)
In this scenario the select_related evaluates to an empty set, and as a result, queryset.select_related
thinks you want it to automatically prefetch any non-nullable relationships.
I was going to submit a fix for this like so:
def prefetch(queryset, serializer: Type[ModelSerializer]):
select_related, prefetch_related = _prefetch(serializer)
try:
if select_related:
queryset = queryset.select_related(*select_related)
if prefetch_related:
queryset = queryset.prefetch_related(*prefetch_related)
return queryset
After re-running the tests, two tests fail:
FAIL: test_reverse_foreign_key_lookups (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21
FAIL: test_it_prefetches_when_using_dotted_property_access (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21
Both tests are related to the use of the ChildBSerializerWithDottedPropertyAccess
. The failures are not directly caused by my change, but rather, they were previously passing for the wrong reason: the relations were only being prefetched by default after the package calls select_related
with an empty set rather than with the name of the actual relation associated with that dotted property.
As a result I decided not to submit a PR and instead ask whether this was a known issue or intended behaviour. It seems like the dotted property access never actually worked as it was intended and the tests associated with this behaviour were really just passing by accident. But by fixing this the overzealous pre-fetching issue, we expose the issue with these dotted properties.
dotted notation to traverse attributes are not working
user_platforms = UserPlatformSerializer(source='user.user_platforms', read_only=True, many=True)
AttributeError: Cannot find 'user.user_platforms' on Artist object, 'user.user_platforms__platform__logo' is an invalid parameter to prefetch_related()
Exception:
FieldError at /api/web_content/engine_configs/
Invalid field name(s) given in select_related: '*'. Choices are: (none)
select_related is "*" in prefetch:
Workaround: ignore HyperlinkedIdentityField fields:
for name, field_instance in fields:
field_type_name = field_instance.__class__.__name__
logger.debug(
f'{" " * indentation} Field "{name}", type: {field_type_name}, src: "{field_instance.source}"'
)
# We potentially need to recurse deeper
if isinstance(field_instance, (BaseSerializer, RelatedField, ManyRelatedField)) \
and not isinstance(field_instance, HyperlinkedIdentityField):
# ...
Serializer:
class EngineUpdateSerializer(HispaSerializerMixin, serializers.ModelSerializer):
url = serializers.HyperlinkedIdentityField(view_name='web_contents:engineupdate-detail')
config = SimpleEngineConfigSerializer()
first_update = SimpleEngineUpdateSerializer()
parent = SimpleEngineUpdateSerializer()
class Meta:
model = EngineUpdate
Hi, thank you for releasing this amazing work, I found it's really a package with great potential.
I tried to adapt my current project to this package, and I found that we are currently checking the field_instance
as below:
# __init__.py, Line 144
# We potentially need to recurse deeper
if isinstance(
field_instance, (BaseSerializer, RelatedField, ManyRelatedField)
) and ...
IMHO, should the BaseSerializer
be replaced with ModelSerializer
? Because the serializers.Serializer
is also inherited from BaseSerializer
which is not the target we want to check if any select_related
and prefetch_related
fields should be included.
I know we can exclude the fields by specifying them with auto_prefetch_excluded_fields
but I'm just curious about the use cases, for me, the serializers.Serializer
is often used as the request body validator and it's also can be used to generate the API schema with drf-spectacular
(it's also an awesome package that generates the OpenAPI schema automatically btw).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.