syrusakbary / aiodataloader Goto Github PK
View Code? Open in Web Editor NEWAsyncio DataLoader for Python3
License: MIT License
Asyncio DataLoader for Python3
License: MIT License
Any way to use this with Graphene on an application level instead of request?
I prefer to subclass DataLoader and provide an overridden batch_load_fn on the subclass, rather than passing a function into DataLoader's constructor (which feels a little too Javascript-y).
Please could the same functionality be provided for the get_cache_key function? At the moment, there is no way to override this function on a subclass because of the way the function is defaulted inside the constructor.
Hi @syrusakbary - since dataloader/aiodataloader are so closely tied with our graphql implementation using graphene, I got to wondering whether you'd consider moving them beneath graphql-python github org. Thoughts?
Scenario: Python 3.11 GraphQL gateway using Ariadne with lots of nested data
During development I found a significant performance degradation. I raised this issue in GraphQL core graphql-python/graphql-core#190 . After some more research I found that using gather
on CPU bound tasks causes significant overhead (graphql-python/graphql-core#190 (comment)). In the case of CPU bound async tasks it is better to use sequential await.
So I monkey patched gather
into serial await in GraphQL core, but I still had very slow responses. Today I finally dove into this problem again and I saw that there was another gather
in aiodataloader!
As far as I understand, the goal of the dataloader (when using with cache) is to only cause a few IO bound lookups and serve all other loads directly through the cache. This means that we will have a CPU bound usage of gather
. I monkey patched the aiodataloader gather
into a serial await and my requests went from 3s -> 500ms.
I am not sure if this is always the case (for example when not using cache), but as long as you want cache you really need to have a serial await. Maybe I am missing something (please let me know), but I would suggest to add a serial await to the load_many
if cache is being used.
async def serial_gather(*futures: Awaitable[Any]):
return [await future for future in futures]
aiodataloader = import_module("aiodataloader")
def load_many(self, keys: Iterable[Any]) -> "Future[List[ReturnT]]":
"""
Loads multiple keys, returning a list of values
>>> a, b = await my_loader.load_many([ 'a', 'b' ])
This is equivalent to the more verbose:
>>> a, b = await gather(
>>> my_loader.load('a'),
>>> my_loader.load('b')
>>> )
"""
if not isinstance(keys, Iterable):
raise TypeError(
("The loader.load_many() function must be called with Iterable<key> but got: {}.").format(keys)
)
return serial_gather(*[self.load(key) for key in keys])
aiodataloader.DataLoader.load_many = load_many
File "/app/src/gateway/graphql/project.py", line 118, in <module>
from aiodataloader import DataLoader
File "/venv/lib/python3.10/site-packages/aiodataloader.py", line 2, in <module>
from collections import Iterable, namedtuple
ImportError: cannot import name 'Iterable' from 'collections' (/usr/local/lib/python3.10/collections/__init__.py)
See (for example) Kozea/pygal@7796f14 for potential fix
Hello,
I'm trying to use Redis backed dict instead of regular dict as a cache for the Dataloader.
But aiodataloader is directly caching the future object instead of the actual value and due to this, the Redis backed dict is storing the str representation of the future.
class ABCLoader(Dataloader):
async def batch_load_fn(keys):
# Some Processing
return response
loader = ABCLoader(cache_map=RedisDict(namespace="ABC", expires=600))
# The value which get stored in Redis is
key = "ABC:key1"
value = "Future:<Future pending>"
Is there anyway to cache the actual value instead of future ?
Hi!
This looks great, but I cannot use python 3.5. Do you know anything that does something similar for Python 3.4?
Thanks!
aiodataloader is very helpful to use with graphql framework such as Graphene or strawberry for complex query result.
and then I start to wonder if I could use with aiodataloader alone (without including extra graphql system) to generate such nested data (only a small subset of graphql, get nested related informations) in some simple scenarios (like writing a small script)
so I build pydantic-resolve
repo,
a pure library which can work nicely with aiodataloader to do so.
by defining loader_fn:
import asyncio
from typing import List, Optional
from pydantic import BaseModel
from pydantic_resolve import Resolver, mapper, LoaderDepend
async def friends_batch_load_fn(names):
mock_db = {
'tangkikodo': ['tom', 'jerry'],
'john': ['mike', 'wallace'],
'trump': ['sam', 'jim'],
'sally': ['sindy', 'lydia'],
}
result = [mock_db.get(name, []) for name in names]
return result
async def contact_batch_load_fn(names):
mock_db = {
'tom': 100, 'jerry':200, 'mike': 3000, 'wallace': 400, 'sam': 500,
'jim': 600, 'sindy': 700, 'lydia': 800, 'tangkikodo': 900, 'john': 1000,
'trump': 1200, 'sally': 1300,
}
result = [mock_db.get(name, None) for name in names]
return result
and schema:
class Contact(BaseModel):
number: Optional[int]
class Friend(BaseModel):
name: str
contact: int = 0
@mapper(lambda n: Contact(number=n))
def resolve_contact(self, loader=LoaderDepend(contact_batch_load_fn)):
return loader.load(self.name)
class User(BaseModel):
name: str
age: int
greeting: str = ''
def resolve_greeting(self):
return f"hello, i'm {self.name}, {self.age} years old."
contact: int = 0
@mapper(lambda n: Contact(number=n))
def resolve_contact(self, loader=LoaderDepend(contact_batch_load_fn)):
return loader.load(self.name)
friends: List[Friend] = []
@mapper(lambda items: [Friend(name=item) for item in items]) # transform after data received
def resolve_friends(self, loader=LoaderDepend(friends_batch_load_fn)):
return loader.load(self.name)
class Root(BaseModel):
users: List[User] = []
def resolve_users(self):
return [
User(name="tangkikodo", age=19), # transform first
User(name='john', age=21),
# User(name='trump', age=59), # uncomment to resolve more
# User(name='sally', age=21),
# User(name='some one', age=0)
]
async def main():
import json
root = Root()
root = await Resolver().resolve(root)
dct = root.dict()
print(json.dumps(dct, indent=4))
asyncio.run(main())
and then it can export the nested data as we expected.
{
"users": [
{
"name": "tangkikodo",
"age": 19,
"greeting": "hello, i'm tangkikodo, 19 years old.",
"contact": {
"number": 900
},
"friends": [
{
"name": "tom",
"contact": {
"number": 100
}
},
{
"name": "jerry",
"contact": {
"number": 200
}
}
]
},
{
"name": "john",
"age": 21,
"greeting": "hello, i'm john, 21 years old.",
"contact": {
"number": 1000
},
"friends": [
{
"name": "mike",
"contact": {
"number": 3000
}
},
{
"name": "wallace",
"contact": {
"number": 400
}
}
]
}
]
}
pydantic-resolve
can also work with dataclass or just simple class instances and the loader instance lifecycle is isolated in each single Resolve().resolve(data)
call.
Hope this simple library can help someone, and thank again for the great work of aiodataloader
Thx.
When I install the library, my graphql requests return "There is no current event loop in thread 'Thread-1'.
. Any idea how to solve?
Is it possible to move this project to aio-libs organization?
Of course, I can copy this package to my project and do some needed changes locally, but I guess many people are interested in further project support and aio-ecosystem growth!
I noticed a warning while using the library:
/usr/local/lib/python3.7/site-packages/aiodataloader.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
which seems to have been fixed in this commit: 13e696e
However no releases has been made with that particular fix. Would it be possible to bump to 0.3.0 with that change?
Hello!
I've used the package although I cannot use it my pytest scenarios due to the infamous got Future <Future pending> attached to a different loop
issue.
Consider the following test code:
import pytest
from aiodataloader import DataLoader
async def load_data(ids):
return map(lambda x: x + 5, ids)
loader = DataLoader(batch_load_fn=load_data)
@pytest.mark.asyncio
async def test_loader_returns_data(event_loop):
data = await loader.load(1)
assert data == 6
When run, I get an error:
RuntimeError: Task <Task pending coro=<test_user_loader_succeedes() running at /app/loaders/tests/test_loader.py:14> cb=[_run_until_complete_cb() at /usr/local/lib/python3.7/asyncio/base_events.py:158]> got Future <Future pending> attached to a different loop
Is there an easy way to fix it?
You could insert e.g. await asyncio.sleep(0) in dispatch_queue_batch function. It's async but has blocking behavior if there are lots of data to process.
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
I'm currently seeing this warning using version 0.2.0.
I see that this was fixed in commit 13e696e
But this code is not available on pypi. Could a new version be released?
I just started experimenting with aiodataloader
and it's been great so far. Having type annotations (at least for the public DataLoader
class) would make it even easier for folks who are leveraging Python's typing system.
From a preliminary look it seems like adding the basics for the DataLoader
would be pretty straightforward. We would need to accept a type for the KeyType
and ValueType
. I think it would look something like
from aiodataloader import DataLoader
class UserLoader(DataLoader[int, User]):
async def batch_load_fn(self, keys: Sequence[int]) -> Sequence[User]:
return await my_batch_get_users(keys)
user_loader = UserLoader()
I may have some time to try doing this myself. If I do would you be interested in accepting the patch?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.