The motivation for this issue is to be able to run the DSP programs in a scalable mann

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

efficient parallelization of DSP programs about dspy HOT 7 CLOSED

stanfordnlp commented on July 3, 2024 3

efficient parallelization of DSP programs

from dspy.

Comments (7)

okhat commented on July 3, 2024 1

Thanks @lawliet19189 !

Let's start with an MVP of this. This feature is ~urgent so we can probably integrate a minimal OK version then improve it incrementally.

Right now, the evaluation workflow often looks like this:

from dsp.evaluation.utils import evaluate

def my_DSP_program(...):
   return ...

dev = [list of inputs]

evaluate(my_DSP_program, dev)

This evaluate function is simply a loop (map) with some minor metric stuff. The loop is executed entirely sequentially.

Let's start by adding a parallel_evaluate version of that. The current proposed design is to use multi-threading. Python will not truly parallelize things when using threads, but because requests to the LM (and also the retrieval model) will block while text is being generated (or retrieved), threads can make execution a whole lot faster.

This is not in general too difficult. The following works well in simple cases, and delivers very large speedups in my experience (>10x).

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=20) as executor:
    preds = list(tqdm.tqdm(executor.map(my_DSP_program, dev), total=len(dev)))

This seems to work fine for simple programs, e.g., vanilla invocations of the LM or simple retrieve-then-read.

But it has some problems:

We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager with dsp.settings.context(..) isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible.
It's not obvious how this interacts with the LRU cache design for LM calls, which has the most significance during the demonstrate stage. Basically, many programs have this structure where the first thing they do is annotate a given set of examples to use as demonstrations. These demonstrations are often the same across different inputs to the program.

When you launch 20 threads in parallel, they all try to annotate the very same examples in parallel. Each of them gets a cache miss (I suppose; it's not even clear how the LRU cache is supposed to behave with threads, though it does appear to be thread-safe) and because of that they all work on recomputing the same demonstrations, at least 20 times (19 being redundant).

What would be nice is if, for instance, all but the first thread that tries to annotate a given example block until that example (or rather that LM call) is fulfilled by the thread that grabbed it first. (A more advanced feature, for later, is to proceed to annotate a different example while one is blocked, but this isn't necessary.)

from dspy.

okhat commented on July 3, 2024 1

@hmoazam Is this interesting to you as a first issue by any chance? @lawliet19189 and I can help. We hope to have a working initial MVP version by, say, Saturday

from dspy.

okhat commented on July 3, 2024 1

Thanks Yifan! Excited to hear your plans.

Depending on what subset of DSP you need, you can already do multi-threading safely. For example, most programs never need to use the context manager anyway.

The main place where threading gets tricky is if you want to use dsp.annotate (i.e., create complex demonstrations with LLMs on the fly) or dsp.compile (finetune models) or use the context manager (mainly used to use different LLMs for different parts of your program). I wouldn't be surprised if you don't need any of these to start.

In general though, we've been doing some work on making sure everything is safe and smooth (mainly @tomjoshi and @hmoazam) that may be relevant to you.

This PR has fixes for thread-safety of the context #67
This branch has a near-completed effort to switch to a redis cache that will be more concurrency-friendly https://github.com/stanfordnlp/dsp/tree/utils/parallel-cache

The current cache works fine afaik with parallelism but sometimes it gets slow and clunky when there's a lot of threads.

from dspy.

okhat commented on July 3, 2024

Another minor thing for longer term is that it'd be nice if the evaluation metric is printed every few predictions, not just at the end, by the main thread.

from dspy.

yifanmai commented on July 3, 2024

We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager with dsp.settings.context(..) isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible.

I would suggest using threading.Local instead of a singleton.

from dspy.

yifanmai commented on July 3, 2024

For context, I'm interested in figuring out possible integrations of DSP with HELM, specifically:

Using HELM to evaluate DSP templates
Using the Demonstrate part of DSP as in HELM's adaptation stage (selection of in-context learning examples)

For both of these use-cases, I need to support multi-threaded usage. I'm currently most concerned about correctness i.e. if I set the model in one thread using the settings context manager, it should not affect other threads.

I'm happy to help out and send a PR for the context manager change.

from dspy.

yifanmai commented on July 3, 2024

Thanks for the pointers! I'll take a look at the branches.

from dspy.

efficient parallelization of DSP programs about dspy HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent