Hi, I'll keep this fairly brief for now, but try to expand it later.
A common issue I've seen with people using Dataloader is that they either use it as an application-wide cache, or they don't understand how dataloader works at a lower-level, and misunderstand why they are using it.
I feel like this is mainly a documentation issue. Dataloader is a data loading mechanism, where you load that data from is arbitrary, all that Dataloader ensures is that a signal value by a given key always resolves to the same value.
That is, I've seen people create a dataloader instance at application level, and think that this is safe because they're calling userLoader.clear()
after changing a user. The problem with this is that that userLoader.clear()
isn't broadcast to all other instances in your cluster.
Say you have a node.js server using dataloader, and you horizontally scale it — if you retain data between requests than you can get a case where server1
's dataloader has a different local state to server2
which could have a different state from the database (e.g., server3
has written to the database) — this issue is encountered as soon as you go from one process to more than one process, because the dataloader "cache" is in-process-memory, not some sort of shared memory for all processes. (e.g., using memcached for caching would be sharing that memory between servers / processes)
I think there's a better way to communicate what dataloader is doing, and it should be highly promoted that dataloader is best suited to per-request caching (such that all values by a given key resolve to the same value within the one request, such that you don't get data-drift within the one request — sort of a snapshot of the world at that given time).
if the storage for the "cached" state for dataloader actually came from a shared datasource (memcached, redis, etc), then this issue would be avoided completely; but then you're also by-passing the point of dataloader in some ways.
I'd be willing to restructure / rewrite the documentation to be clearer as to split the "what it does" from the "api for doing it" – then again, I'd be fairly inclined to say clear
or reset
are anti-patterns, and shouldn't be generally used — i.e., don't use them to clear state between requests, only between the instance of that object in one request, e.g.,
1 request start
2 -> userLoader.get(1)
3 -> Users.update(1, { new: "data" })
4 -> userLoader.clear(1);
5 respond with userLoader.get(1)
6 -> request end
i.e., that second userLoader.get call shouldn't return the result from line 2, instead, it should return the fresh result from the database.
Perhaps a better API would be userLoader.get()
and userLoader.getFresh()
— i.e., we're not talking about a cache, as it's not a "cache" but more a call identity memoization.
I'd be interested in hearing others thoughts here, in particular, let me know if this makes any sense at all, and whether you'd be interested in a PR to redocument this project to help users avoid misunderstandings