syrusakbary / aiodataloader Goto Github PK

View Code? Open in Web Editor NEW

264.0 264.0 34.0 54 KB

Asyncio DataLoader for Python3

License: MIT License

Python 100.00%

aiodataloader's People

Contributors

Stargazers

Watchers

aiodataloader's Issues

Application batching and dataloading

Any way to use this with Graphene on an application level instead of request?

Allow overriding get_cache_key method on subclass

I prefer to subclass DataLoader and provide an overridden batch_load_fn on the subclass, rather than passing a function into DataLoader's constructor (which feels a little too Javascript-y).

Please could the same functionality be provided for the get_cache_key function? At the moment, there is no way to override this function on a subclass because of the way the function is defaulted inside the constructor.

Move aiodataloader beneath graphql-python github org?

Hi @syrusakbary - since dataloader/aiodataloader are so closely tied with our graphql implementation using graphene, I got to wondering whether you'd consider moving them beneath graphql-python github org. Thoughts?

Significant performance degradation because of usage of gather instead of serial await

Scenario: Python 3.11 GraphQL gateway using Ariadne with lots of nested data

During development I found a significant performance degradation. I raised this issue in GraphQL core graphql-python/graphql-core#190 . After some more research I found that using gather on CPU bound tasks causes significant overhead (graphql-python/graphql-core#190 (comment)). In the case of CPU bound async tasks it is better to use sequential await.

So I monkey patched gather into serial await in GraphQL core, but I still had very slow responses. Today I finally dove into this problem again and I saw that there was another gather in aiodataloader!

As far as I understand, the goal of the dataloader (when using with cache) is to only cause a few IO bound lookups and serve all other loads directly through the cache. This means that we will have a CPU bound usage of gather. I monkey patched the aiodataloader gather into a serial await and my requests went from 3s -> 500ms.

I am not sure if this is always the case (for example when not using cache), but as long as you want cache you really need to have a serial await. Maybe I am missing something (please let me know), but I would suggest to add a serial await to the load_many if cache is being used.

async def serial_gather(*futures: Awaitable[Any]):
    return [await future for future in futures]

aiodataloader = import_module("aiodataloader")

def load_many(self, keys: Iterable[Any]) -> "Future[List[ReturnT]]":
    """
    Loads multiple keys, returning a list of values

    >>> a, b = await my_loader.load_many([ 'a', 'b' ])

    This is equivalent to the more verbose:

    >>> a, b = await gather(
    >>>    my_loader.load('a'),
    >>>    my_loader.load('b')
    >>> )
    """
    if not isinstance(keys, Iterable):
        raise TypeError(
            ("The loader.load_many() function must be called with Iterable<key> but got: {}.").format(keys)
        )

    return serial_gather(*[self.load(key) for key in keys])

aiodataloader.DataLoader.load_many = load_many

Python 3.10 compatiblity

  File "/app/src/gateway/graphql/project.py", line 118, in <module>
    from aiodataloader import DataLoader
  File "/venv/lib/python3.10/site-packages/aiodataloader.py", line 2, in <module>
    from collections import Iterable, namedtuple
ImportError: cannot import name 'Iterable' from 'collections' (/usr/local/lib/python3.10/collections/__init__.py)

See (for example) Kozea/pygal@7796f14 for potential fix

Cache the Data after future gets resolved

Hello,
I'm trying to use Redis backed dict instead of regular dict as a cache for the Dataloader.
But aiodataloader is directly caching the future object instead of the actual value and due to this, the Redis backed dict is storing the str representation of the future.

class ABCLoader(Dataloader):

 async def batch_load_fn(keys):
       # Some Processing
       return response

loader = ABCLoader(cache_map=RedisDict(namespace="ABC", expires=600))

# The value which get stored in Redis is
key = "ABC:key1"
value = "Future:<Future pending>"

Is there anyway to cache the actual value instead of future ?

Python3.4 equivalent?

Hi!
This looks great, but I cannot use python 3.5. Do you know anything that does something similar for Python 3.4?
Thanks!

[Sharing] pydantic-resolve, an easy way to generate lite-graphql-like nested data with aiodataloader

aiodataloader is very helpful to use with graphql framework such as Graphene or strawberry for complex query result.

and then I start to wonder if I could use with aiodataloader alone (without including extra graphql system) to generate such nested data (only a small subset of graphql, get nested related informations) in some simple scenarios (like writing a small script)

so I build pydantic-resolve repo,

a pure library which can work nicely with aiodataloader to do so.

by defining loader_fn:

import asyncio
from typing import List, Optional
from pydantic import BaseModel
from pydantic_resolve import Resolver, mapper, LoaderDepend

async def friends_batch_load_fn(names):
    mock_db = {
        'tangkikodo': ['tom', 'jerry'],
        'john': ['mike', 'wallace'],
        'trump': ['sam', 'jim'],
        'sally': ['sindy', 'lydia'],
    }
    result = [mock_db.get(name, []) for name in names]
    return result

async def contact_batch_load_fn(names):
    mock_db = {
        'tom': 100, 'jerry':200, 'mike': 3000, 'wallace': 400, 'sam': 500,
        'jim': 600, 'sindy': 700, 'lydia': 800, 'tangkikodo': 900, 'john': 1000,
        'trump': 1200, 'sally': 1300,
    }
    result = [mock_db.get(name, None) for name in names]
    return result

and schema:

class Contact(BaseModel):
    number: Optional[int]

class Friend(BaseModel):
    name: str

    contact: int = 0
    @mapper(lambda n: Contact(number=n))
    def resolve_contact(self, loader=LoaderDepend(contact_batch_load_fn)):
        return loader.load(self.name)

class User(BaseModel):
    name: str
    age: int

    greeting: str = ''
    def resolve_greeting(self):
        return f"hello, i'm {self.name}, {self.age} years old."

    contact: int = 0
    @mapper(lambda n: Contact(number=n))
    def resolve_contact(self, loader=LoaderDepend(contact_batch_load_fn)):
        return loader.load(self.name)

    friends: List[Friend] = []
    @mapper(lambda items: [Friend(name=item) for item in items])  # transform after data received
    def resolve_friends(self, loader=LoaderDepend(friends_batch_load_fn)):
        return loader.load(self.name)

class Root(BaseModel):
    users: List[User] = []
    def resolve_users(self):
        return [
            User(name="tangkikodo", age=19),  # transform first
            User(name='john', age=21),
            # User(name='trump', age=59),  # uncomment to resolve more
            # User(name='sally', age=21),
            # User(name='some one', age=0)
        ]
async def main():
    import json
    root = Root()
    root = await Resolver().resolve(root)
    dct = root.dict()
    print(json.dumps(dct, indent=4))

asyncio.run(main())

and then it can export the nested data as we expected.

{
  "users": [
    {
      "name": "tangkikodo",
      "age": 19,
      "greeting": "hello, i'm tangkikodo, 19 years old.",
      "contact": {
        "number": 900
      },
      "friends": [
        {
          "name": "tom",
          "contact": {
            "number": 100
          }
        },
        {
          "name": "jerry",
          "contact": {
            "number": 200
          }
        }
      ]
    },
    {
      "name": "john",
      "age": 21,
      "greeting": "hello, i'm john, 21 years old.",
      "contact": {
        "number": 1000
      },
      "friends": [
        {
          "name": "mike",
          "contact": {
            "number": 3000
          }
        },
        {
          "name": "wallace",
          "contact": {
            "number": 400
          }
        }
      ]
    }
  ]
}

pydantic-resolve can also work with dataclass or just simple class instances and the loader instance lifecycle is isolated in each single Resolve().resolve(data) call.

Hope this simple library can help someone, and thank again for the great work of aiodataloader

Thx.

GraphQL Errors on initial install

When I install the library, my graphql requests return "There is no current event loop in thread 'Thread-1'.. Any idea how to solve?

Project maintaining

Is it possible to move this project to aio-libs organization?
Of course, I can copy this package to my project and do some needed changes locally, but I guess many people are interested in further project support and aio-ecosystem growth!

Publish a new release

I noticed a warning while using the library:

/usr/local/lib/python3.7/site-packages/aiodataloader.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working

which seems to have been fixed in this commit: 13e696e

However no releases has been made with that particular fix. Would it be possible to bump to 0.3.0 with that change?

Cannot use with pytest

Hello!
I've used the package although I cannot use it my pytest scenarios due to the infamous got Future <Future pending> attached to a different loop issue.

Consider the following test code:

import pytest
from aiodataloader import DataLoader


async def load_data(ids):
    return map(lambda x: x + 5, ids)


loader = DataLoader(batch_load_fn=load_data)


@pytest.mark.asyncio
async def test_loader_returns_data(event_loop):
    data = await loader.load(1)
    assert data == 6

When run, I get an error:

RuntimeError: Task <Task pending coro=<test_user_loader_succeedes() running at /app/loaders/tests/test_loader.py:14> cb=[_run_until_complete_cb() at /usr/local/lib/python3.7/asyncio/base_events.py:158]> got Future <Future pending> attached to a different loop

Is there an easy way to fix it?

Blocking behavior of dispatch_queue_batch

You could insert e.g. await asyncio.sleep(0) in dispatch_queue_batch function. It's async but has blocking behavior if there are lots of data to process.

Push collections.abc deprecation warning fix to pypi

DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working

I'm currently seeing this warning using version 0.2.0.

I see that this was fixed in commit 13e696e

But this code is not available on pypi. Could a new version be released?

Type annotations for DataLoader

I just started experimenting with aiodataloader and it's been great so far. Having type annotations (at least for the public DataLoader class) would make it even easier for folks who are leveraging Python's typing system.

From a preliminary look it seems like adding the basics for the DataLoader would be pretty straightforward. We would need to accept a type for the KeyType and ValueType. I think it would look something like

from aiodataloader import DataLoader

class UserLoader(DataLoader[int, User]):
    async def batch_load_fn(self, keys: Sequence[int]) -> Sequence[User]:
        return await my_batch_get_users(keys)

user_loader = UserLoader()

I may have some time to try doing this myself. If I do would you be interested in accepting the patch?

syrusakbary / aiodataloader Goto Github PK

aiodataloader's People

Contributors

Stargazers

Watchers

Forkers

aiodataloader's Issues

Recommend Projects

Recommend Topics

Recommend Org