Comments (7)
Thanks @lawliet19189 !
Let's start with an MVP of this. This feature is ~urgent so we can probably integrate a minimal OK version then improve it incrementally.
Right now, the evaluation workflow often looks like this:
from dsp.evaluation.utils import evaluate
def my_DSP_program(...):
return ...
dev = [list of inputs]
evaluate(my_DSP_program, dev)
This evaluate
function is simply a loop (map
) with some minor metric stuff. The loop is executed entirely sequentially.
Let's start by adding a parallel_evaluate
version of that. The current proposed design is to use multi-threading. Python will not truly parallelize things when using threads, but because requests to the LM (and also the retrieval model) will block while text is being generated (or retrieved), threads can make execution a whole lot faster.
This is not in general too difficult. The following works well in simple cases, and delivers very large speedups in my experience (>10x).
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=20) as executor:
preds = list(tqdm.tqdm(executor.map(my_DSP_program, dev), total=len(dev)))
This seems to work fine for simple programs, e.g., vanilla invocations of the LM or simple retrieve-then-read.
But it has some problems:
-
We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager
with dsp.settings.context(..)
isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible. -
It's not obvious how this interacts with the LRU cache design for LM calls, which has the most significance during the demonstrate stage. Basically, many programs have this structure where the first thing they do is
annotate
a given set of examples to use as demonstrations. These demonstrations are often the same across different inputs to the program.
When you launch 20 threads in parallel, they all try to annotate the very same examples in parallel. Each of them gets a cache miss (I suppose; it's not even clear how the LRU cache is supposed to behave with threads, though it does appear to be thread-safe) and because of that they all work on recomputing the same demonstrations, at least 20 times (19 being redundant).
What would be nice is if, for instance, all but the first thread that tries to annotate a given example block until that example (or rather that LM call) is fulfilled by the thread that grabbed it first. (A more advanced feature, for later, is to proceed to annotate a different example while one is blocked, but this isn't necessary.)
from dspy.
@hmoazam Is this interesting to you as a first issue by any chance? @lawliet19189 and I can help. We hope to have a working initial MVP version by, say, Saturday
from dspy.
Thanks Yifan! Excited to hear your plans.
Depending on what subset of DSP you need, you can already do multi-threading safely. For example, most programs never need to use the context manager anyway.
The main place where threading gets tricky is if you want to use dsp.annotate (i.e., create complex demonstrations with LLMs on the fly) or dsp.compile (finetune models) or use the context manager (mainly used to use different LLMs for different parts of your program). I wouldn't be surprised if you don't need any of these to start.
In general though, we've been doing some work on making sure everything is safe and smooth (mainly @tomjoshi and @hmoazam) that may be relevant to you.
- This PR has fixes for thread-safety of the context #67
- This branch has a near-completed effort to switch to a redis cache that will be more concurrency-friendly https://github.com/stanfordnlp/dsp/tree/utils/parallel-cache
The current cache works fine afaik with parallelism but sometimes it gets slow and clunky when there's a lot of threads.
from dspy.
Another minor thing for longer term is that it'd be nice if the evaluation metric is printed every few predictions, not just at the end, by the main thread.
from dspy.
We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager
with dsp.settings.context(..)
isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible.
+1
I would suggest using threading.Local
instead of a singleton.
from dspy.
For context, I'm interested in figuring out possible integrations of DSP with HELM, specifically:
- Using HELM to evaluate DSP templates
- Using the Demonstrate part of DSP as in HELM's adaptation stage (selection of in-context learning examples)
For both of these use-cases, I need to support multi-threaded usage. I'm currently most concerned about correctness i.e. if I set the model in one thread using the settings context manager, it should not affect other threads.
I'm happy to help out and send a PR for the context manager change.
from dspy.
Thanks for the pointers! I'll take a look at the branches.
from dspy.
Related Issues (20)
- Add Community Contributions for Signatures, Prefixes, Postfixes and useful utilities in the release package HOT 2
- clarification on type of dspy.Prediction.context HOT 2
- unable to connect to the host: future-hgx-1 HOT 2
- Crash saving typed predictors with the '... is not JSON serializable' message HOT 2
- Bool output failing when using TypedPredictor HOT 1
- cannot pickle 'socket' object when using ReAct module with HFClientVLLM
- AttributeError: 'HFClientVLLM' object has no attribute 'model'
- LLM output appears cut-off (compared to the output of the same LLM and prompt in Langchain) HOT 4
- Would suggest COPRO ignore "return_outputs" HOT 1
- Implement Exponential Backoff for ThrottlingException in InvokeModel Calls HOT 2
- Error loading program compiled using the Ensemble optimizer HOT 2
- Support for Fireworks API Models with DSPY HOT 1
- Connection error. HOT 4
- Support for vLLM chat completion endpoint HOT 1
- There is a small error in the example in signature-optimizer.mdx.
- Inspect_history only shows the last history regardless of n/skip HOT 2
- Circular import and NameError in Cohere Module HOT 4
- Update weaviate_rm.py with v4 WeaviateClient
- Enhance error handling HOT 4
- DPSy Assistant HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dspy.