[V ] Alteration (enhancement/optimization) of existing feature(s) <h2

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Allow access to StrawberryField.metadata in a query level. about strawberry HOT 15 OPEN

shmoon-kr commented on July 21, 2024

Allow access to StrawberryField.metadata in a query level.

from strawberry.

Comments (15)

bellini666 commented on July 21, 2024 1

Oh interesting! Does that mean disabling qs._fetch_all() prevents the query from running async?

It prevents you from returning querysets to be resolved at graphql-core level, as iterating over them without them already being cached will trigger a db query T_T

from strawberry.

fireteam99 commented on July 21, 2024 1

I see, thanks for the explanation! Glad there will be a more permanent fix coming in the future

from strawberry.

patrick91 commented on July 21, 2024

@bellini666 how do you handle this use case in django?

from strawberry.

shmoon-kr commented on July 21, 2024

Another idea is to add a boolean flag to the parameters of strawberry_django.fields() that indicates a counter query.

For example, one can define a counter query like the following:

@strawberry.type
class Query:
    fruit_count: int = strawberry_django.field(count=True)

And in a filter apply resolver,

def apply(
    filters: Optional[object],
    queryset: _QS,
    info: Optional[Info] = None,
    pk: Optional[Any] = None,
    count: Bool = False,
) -> _QS:
    if pk not in (None, strawberry.UNSET):  # noqa: PLR6201
        queryset = queryset.filter(pk=pk)

    if filters in (None, strawberry.UNSET) or not has_django_definition(filters):  # noqa: PLR6201
        return queryset.count() if count else queryset

    ...

    return queryset.count() if count else queryset

Then a user can make counter function easily with the default field resolver.

I'm not familiar with the structure of the library but I think you can do easily with the above idea.

from strawberry.

bellini666 commented on July 21, 2024

Is there any way to make the metadata visible in the query?

@shmoon-kr what exactly do you mean by this? I'm not sure if I got it from your example.

from strawberry.

shmoon-kr commented on July 21, 2024

@bellini666

Please refer to the following link:
https://strawberry.rocks/docs/guides/pagination/overview

In the documentation above, there is an example of using metadata to return a total count when implementing the pagination function.

According to the above article, "It may be useful to add metadata to the result. For example, the metadata may specify how many items there are in total, so that the client knows what the greatest offset value can be." and gives the following query result example.

{
  "users": [
    ...
  ]
  "metadata": {
    "count": 25
  }
}

This is exactly what I need, but I don't have an example of how to add metadata to the results in a query resolver and how to include it in the resolver.

I checked the source code and saw that the StrawberryField class has a member variable called metadata, so I assumed I could assign it to this variable, and since the StrawberryDjangoField class inherits from the StrawberryField class, I thought I could do the same.

As a test, I tried to modify the pagination apply resolver defined in strawberry_django to include the metadata, but that didn't make the metadata appear in the query result like above. So, I'm asking how to get the metadata to appear in the query results, like the results of my query.

Thank you for your interest in this case.

from strawberry.

bellini666 commented on July 21, 2024

@shmoon-kr oh, I see what you mean

You can probably do something similar to what I do on strawberry-graphql-django for ListConnectionWithTotalCound: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/main/strawberry_django/relay.py#L51

In there I overridden resolve_connection to store nodes at the object, which is basically the django's QuerySet, and then I use it in the total_count resolver, which could instead return a Metadata object.

Of course you would need to change your return value and probably implement your own resolver for that.

Let me know if that points you to the right direction. Otherwise feel free to share more code in here so I can give you more advices

from strawberry.

fireteam99 commented on July 21, 2024

Hi all, I'm also using strawberry-django and have a similar need for pagination metadata without using Relay. My current approach is a manually resolve my list queries using a generic "wrapper" node.

@strawberry.type
class GenericListNode(Generic[T]):
    page_meta: PageMetaNode
    results: List[T]


def get_generic_list(
    root,
    info: Info,
    model: Type[Model],
    node: Type[T],
    pagination: Optional[OffsetPaginationInput] = strawberry.UNSET,
    filters = None,
    order = None,
) -> GenericListNode[T]:
    """
    Provides pagination metadata with support for Strawberry Django's built in pagination, sorting, and ordering.
    """

    # for some reason trying to set pagination as a default argument does not work
    if not pagination:
        pagination = OffsetPaginationInput(offset=0, limit=30)

    qs: List[T] = model.objects.all()

    get_queryset = getattr(node, "get_queryset", None)
    if callable(get_queryset):
        qs = node.get_queryset(qs, info)

    qs: List[T] = strawberry_django.ordering.apply(order, qs)
    qs: List[T] = strawberry_django.filters.apply(filters, qs)

    total_count = qs.count()

    qs: List[T] = strawberry_django.pagination.apply(pagination, qs)

    return GenericListNode(
        page_meta=PageMetaNode(
            total_count=total_count,
        ),
        results=qs,
    )

# Usage
def get_foos(
    root,
    info: Info,
    pagination: Optional[OffsetPaginationInput] = None,
    filters: Optional[FooFilter] = None,
    order: Optional[FooOrder] = None,
) -> GenericListNode[UserNode]:
    return get_generic_list(root, info, FooModel, Foo, pagination, filter, order)

@strawberry.type(name="Query")
class FooQuery:
    foos: GenericListNode[Foo] = strawberry.field(resolver=get_foos)

This approach isn't perfect, since you still have to write a resolver function for each query and can't take advantage of strawberry-django's automatic list resolver.

Therefore, I've been trying encapsulate this logic into a field extension. Unfortunatly, this results in a performance issues because the next_ function in resolve evaluates the entire queryset (i think due to this line). This makes queries on large tables very slow - even when applying aggressive pagination.

Extension:

class PaginationExtension(FieldExtension):
    def apply(self, field: StrawberryField) -> None:
        type_argument = getattr(field.type, "of_type", None)

        if not isinstance(field.type, StrawberryList) or not type_argument:
            raise TypeError("Invalid type for PaginationExtension")

        # check if a pagination argument is already provided - either as a resolver argument or setting `pagination=True` with `strawberry_django.type`
        pagination_argument_exists = False
        for argument in field.arguments:
            if (
                argument.python_name == "pagination"
                and getattr(argument.type, "of_type", None) == OffsetPaginationInput
            ):
                pagination_argument_exists = True
                break

        if not pagination_argument_exists:
            raise ValueError("Missing pagination argument")

        field.type = GenericListNode[type_argument]

    def resolve(
        self,
        next_: Callable[..., Any],
        source: Any,
        info: Info,
        pagination: Optional[OffsetPaginationInput] = None,
        **kwargs,
    ):
        if not pagination:
            pagination = OffsetPaginationInput(offset=0, limit=30)

        # This is really slow because the default Strawberry Django resolver attempts to fetch and evaluate the entire queryset
        # before applying pagination. Source: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/35a85e7d1b9d3cb30a91b167dbfafaaa22453e75/strawberry_django/resolvers.py#L29-L34
        results = next_(source, info, **kwargs)

        paginated_results = strawberry_django.pagination.apply(pagination, results)

        return GenericListNode(
            page_meta=PageMetaNode(
                total_count=results.count(),
            ),
            results=paginated_results,
        )

# Usage
@strawberry.type(name="Query")
class FooQuery:
    foos: list[Foo] = strawberry_django.field(
        extensions=[PaginationExtension()]
    )

@bellini666 would it be possible to prevent or delay the queryset from being evaluated by next_ in a field extension? I'd also love to get your thoughts on my approach in general.

from strawberry.

bellini666 commented on July 21, 2024

@fireteam99

would it be possible to prevent or delay the queryset from being evaluated by next_ in a field extension?

Yeah, that's the optimizer doing its thing =P

The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250

I would say that a very "ugly" workaround would be for you to do field.is_connection = True on your apply method. That might make it return the queryset without fetching it, I'm just not sure if it is going to have any other unwanted side effects...

A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection as I mentioned, but can also be overriden by an extension? What do you think?

I'd also love to get your thoughts on my approach in general.

Your approach seems fine :)

from strawberry.

fireteam99 commented on July 21, 2024

@bellini666 thanks for the explanation!

The relay connection also has a similar issue, and we workaround that internally by not doing that fetch you pointed out in here: https://github.com/strawberry-graphql/strawberry-django/blob/a4cc6a77ad221b6eb968685e9dfa54e9f82c5f51/strawberry_django/fields/field.py#L250

Ahh i see. Can confirm setting field.is_connection = True does indeed fix the performance issues! I didn't notice any issues with ordering, filter, prefetching, or any unwanted side affects for that matter. From what I understand, the qs._fetch_all() is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.

A better fix would be to add a way to define if the field should do the prefetch or not, which by default uses the is_connection as I mentioned, but can also be overriden by an extension? What do you think?

Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?

from strawberry.

bellini666 commented on July 21, 2024

From what I understand, the qs._fetch_all() is used by the internal optimizer? I'm curious why I didn't encounter any n+1 issues even after disabling it.

Actually the opposite. qs._fetch_all() is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.

That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async), otherwise you get something like this:

Agreed, this would be a much better solution! Is this something we can override in an extension right now? Or will it require a code change in the library?

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

from strawberry.

fireteam99 commented on July 21, 2024

Actually the opposite. qs._fetch_all() is what django does internally to cache the results, so after that iterating over the queryset will use the already fetched results instead of querying the db again.

That is done like that because when running async, you can only access the database from the main thread (usually using sync_to_async), otherwise you get something like this:

Oh interesting! Does that mean disabling qs._fetch_all() prevents the query from running async?

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

Awesome, thank you so much!

from strawberry.

bellini666 commented on July 21, 2024

Hey @fireteam99 , I think this should allow you to solve your issue: strawberry-graphql/strawberry-django#481

When it is merged, you just need to do field.disable_fetch_list_results = True in your extension apply method

Let me know what you think.

There isn't a way right now, but I'll try to add an api for that during the weekend, or maybe try to remove the _fetch_all() call and solve that in another way. I have an idea that might work...

Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)

from strawberry.

fireteam99 commented on July 21, 2024

Hi @bellini666, thanks for adding this feature! This is a lot cleaner than the connection workaround.

Also, regarding this... I couldn't find a way right now, but it seems that the next graphql-core version, once it is released, will allow us to remove that workaround once and for all :)

Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?

from strawberry.

bellini666 commented on July 21, 2024

Gotcha gotcha, so this shouldn't be an issue as long as I'm not using async?

No. Although strawberry-django will do the fetching for both, unless you set the disable_fetch_list_results attribute to True

from strawberry.

Allow access to StrawberryField.metadata in a query level. about strawberry HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent