Coder Social home page Coder Social logo

Comments (7)

okhat avatar okhat commented on July 3, 2024 1

Thanks @lawliet19189 !

Let's start with an MVP of this. This feature is ~urgent so we can probably integrate a minimal OK version then improve it incrementally.

Right now, the evaluation workflow often looks like this:

from dsp.evaluation.utils import evaluate

def my_DSP_program(...):
   return ...

dev = [list of inputs]

evaluate(my_DSP_program, dev)

This evaluate function is simply a loop (map) with some minor metric stuff. The loop is executed entirely sequentially.

Let's start by adding a parallel_evaluate version of that. The current proposed design is to use multi-threading. Python will not truly parallelize things when using threads, but because requests to the LM (and also the retrieval model) will block while text is being generated (or retrieved), threads can make execution a whole lot faster.

This is not in general too difficult. The following works well in simple cases, and delivers very large speedups in my experience (>10x).

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=20) as executor:
    preds = list(tqdm.tqdm(executor.map(my_DSP_program, dev), total=len(dev)))

This seems to work fine for simple programs, e.g., vanilla invocations of the LM or simple retrieve-then-read.

But it has some problems:

  1. We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager with dsp.settings.context(..) isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible.

  2. It's not obvious how this interacts with the LRU cache design for LM calls, which has the most significance during the demonstrate stage. Basically, many programs have this structure where the first thing they do is annotate a given set of examples to use as demonstrations. These demonstrations are often the same across different inputs to the program.

When you launch 20 threads in parallel, they all try to annotate the very same examples in parallel. Each of them gets a cache miss (I suppose; it's not even clear how the LRU cache is supposed to behave with threads, though it does appear to be thread-safe) and because of that they all work on recomputing the same demonstrations, at least 20 times (19 being redundant).

What would be nice is if, for instance, all but the first thread that tries to annotate a given example block until that example (or rather that LM call) is fulfilled by the thread that grabbed it first. (A more advanced feature, for later, is to proceed to annotate a different example while one is blocked, but this isn't necessary.)

from dspy.

okhat avatar okhat commented on July 3, 2024 1

@hmoazam Is this interesting to you as a first issue by any chance? @lawliet19189 and I can help. We hope to have a working initial MVP version by, say, Saturday

from dspy.

okhat avatar okhat commented on July 3, 2024 1

Thanks Yifan! Excited to hear your plans.

Depending on what subset of DSP you need, you can already do multi-threading safely. For example, most programs never need to use the context manager anyway.

The main place where threading gets tricky is if you want to use dsp.annotate (i.e., create complex demonstrations with LLMs on the fly) or dsp.compile (finetune models) or use the context manager (mainly used to use different LLMs for different parts of your program). I wouldn't be surprised if you don't need any of these to start.

In general though, we've been doing some work on making sure everything is safe and smooth (mainly @tomjoshi and @hmoazam) that may be relevant to you.

The current cache works fine afaik with parallelism but sometimes it gets slow and clunky when there's a lot of threads.

from dspy.

okhat avatar okhat commented on July 3, 2024

Another minor thing for longer term is that it'd be nice if the evaluation metric is printed every few predictions, not just at the end, by the main thread.

from dspy.

yifanmai avatar yifanmai commented on July 3, 2024

We need to check that the main primitives of DSP are thread-safe. Off the top of my head, the context manager with dsp.settings.context(..) isn't thread safe. It modifies a stack (list) in place. Locking isn't the answer here. Instead, we want a thread-local stack if possible.

+1

I would suggest using threading.Local instead of a singleton.

from dspy.

yifanmai avatar yifanmai commented on July 3, 2024

For context, I'm interested in figuring out possible integrations of DSP with HELM, specifically:

  1. Using HELM to evaluate DSP templates
  2. Using the Demonstrate part of DSP as in HELM's adaptation stage (selection of in-context learning examples)

For both of these use-cases, I need to support multi-threaded usage. I'm currently most concerned about correctness i.e. if I set the model in one thread using the settings context manager, it should not affect other threads.

I'm happy to help out and send a PR for the context manager change.

from dspy.

yifanmai avatar yifanmai commented on July 3, 2024

Thanks for the pointers! I'll take a look at the branches.

from dspy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.