Coder Social home page Coder Social logo

Optimization dry runs about dspy HOT 9 OPEN

okhat avatar okhat commented on August 15, 2024 11
Optimization dry runs

from dspy.

Comments (9)

buzypi avatar buzypi commented on August 15, 2024 2

This would be a wonderful feature. I have 3 things in mind (not sure about how well this can be implemented as I don't yet have complete source code level understanding):

  1. A dry run at a module level that can show us the prompts that it is planning to submit to the LM: predict(input, dry_run=True). We need to have a convention of how we can separate the args from these settings. But it will be good if we don't have to change settings at a global level but rather at a module level, predict in dry_run mode and then run it without dry_run in case we are happy. This is great in a Notebook/REPL environment.
  2. A stepper/debugger at a teleprompter level: BootstrapFewShot(metric=...).compile(..., step=True), which shows progress after each loop and asks if we want to continue. If we don't intend to continue, we can still work with the optimisations that have been obtained until the current cycle.
  3. An explain function that shows the estimation of the number of calls / tokens: BootstrapFewShot(...).explain(). Again, I am thinking from the perspective of running these in Notebooks where we develop the programs incrementally.

from dspy.

thomasahle avatar thomasahle commented on August 15, 2024 1

Is there a way to do this without adding more special features to the Predict class?
It already has a lot of "magical" key word arguments.

Could it maybe be done using something like:

with dspy.dryrun():
    optimizer.compile(...)

Where the dryrun() is implemented by replacing the dsp.lm with a wrapped language model that does what dry_run does here?

from dspy.

okhat avatar okhat commented on August 15, 2024 1

To be clear budgets are outside the scope of dryruns but they’re related

from dspy.

smwitkowski avatar smwitkowski commented on August 15, 2024

I've started to work on some of these - #408

Not going in the same order, but here's what I have so far:

  1. A stepper/debugger at a teleprompter level: BootstrapFewShot(metric=...).compile(..., step=True), which shows progress after each loop and asks if we want to continue. If we don't intend to continue, we can still work with the optimisations that have been obtained until the current cycle.

dspy/teleprompt/bootstrap.py

-   def compile(self, student, *, teacher=None, trainset, valset=None):
+   def compile(self, student, *, teacher=None, trainset, valset=None, step=False):
        self.trainset = trainset
        self.valset = valset
-   def _bootstrap(self, *, max_bootstraps=None):
+   def _bootstrap(self, *, max_bootstraps=None, step=False):
        max_bootstraps = max_bootstraps or self.max_bootstrapped_demos

        bootstrapped = {}
        self.name2traces = {name: [] for name in self.name2predictor}
        for round_idx in range(self.max_rounds):
            for example_idx, example in enumerate(tqdm.tqdm(self.trainset)):
                if len(bootstrapped) >= max_bootstraps:
                    break
                if example_idx not in bootstrapped:
                    success = self._bootstrap_one_example(example, round_idx)

                    if success:
                        bootstrapped[example_idx] = True
+           if step:
+               user_input = input("Continue bootstrapping? (Y/n): ")
+               if user_input.lower() == 'n':
+                   print("Bootstrapping interrupted by user.")
+                   return  # Exit the loop and method

        print(f'Bootstrapped {len(bootstrapped)} full traces after {example_idx+1} examples in round {round_idx}.')

This seems pretty straightforward. Just adding step to .compile and ._bootstrap does the job. I'm nervous about the interaction with Notebooks and running this via the command line though. Requesting user inputs in a notebook environment has been tricky for me in the past.

3.A dry run at a module level that can show us the prompts that it is planning to submit to the LM: predict(input, dry_run=True). We need to have a convention of how we can separate the args from these settings. But it will be good if we don't have to change settings at a global level but rather at a module level, predict in dry_run mode and then run it without dry_run in case we are happy. This is great in a Notebook/REPL environment.

By changing Predict.forward to look for dry_run, we can add in a check if the user wants to perform a dry run or not

dspy/predict/predict.py

class Predict(Parameter):
        ...
    def forward(self, **kwargs):
        # Extract the three privileged keyword arguments.
        new_signature = kwargs.pop("new_signature", None)
        signature = kwargs.pop("signature", self.signature)
        demos = kwargs.pop("demos", self.demos)
+        dry_run = kwargs.pop("dry_run", False)

        ...

+       if dry_run:
+            # Prepare a structured output for the dry run
+            dry_run_info = {
+                'prompt': x,  # The prepared prompt
+                'config': config,  # The configuration used for generation
+                'signature': str(signature),  # The signature being used
+                'stage': self.stage,  # The current stage
+            }
+
+            # If an encoder is available, include encoded tokens in the output
+            encoder = dsp.settings.config.get('encoder', None)
+           if encoder is not None:
+                encoded_tokens = encoder.encode(x)
+                dry_run_info['encoded_tokens'] = encoded_tokens
+                dry_run_info['token_count'] = len(encoded_tokens)

+            # Option 1: Return the dry run information for further inspection
+            return dry_run_info

If they do want to perform a dry run, we use tiktoken to find the number of tokens, and return that with the prompt.

dsp/utils/utils.py

import tqdm
import datetime
import itertools
+import tiktoken

from collections import defaultdict

...

+def load_encoder_for_lm(lm):
+   """
+   Load and cache the tiktoken encoder based on the LM configuration.
    
+   Args:
+       lm: The language model configuration.
+   """

+   # Load the encoder. This is a placeholder; adjust based on how you actually load the encoder.
+   encoder = tiktoken.encoding_for_model(lm)
+   # Cache the encoder
+   return encoder

Do note that this only works for OpenAI models, since we're using tiktoken. To make it more robust, we ought to consider the suite of models that are valid, and then expand load_encoder_for_lm to handle each case. I think that should be done before (3).

Let me know your thoughts. Happy to continue working on this.

from dspy.

KCaverly avatar KCaverly commented on August 15, 2024

One thought on Token Counting - would it make sense to build this directly into the LM abstraction?

I imagine, while there may be some overlap in tokenization model to model, it may be cleaner to pair the tokenization directly with the provider.

from dspy.

buzypi avatar buzypi commented on August 15, 2024

One thought on Token Counting - would it make sense to build this directly into the LM abstraction?

I imagine, while there may be some overlap in tokenization model to model, it may be cleaner to pair the tokenization directly with the provider.

I think this ties to the refactoring work that is being discussed here: #390 and I agree that it would be good to think of a generic solution which will work when integrating other open source models.

from dspy.

okhat avatar okhat commented on August 15, 2024

I love the explorations here. Will look more closely tomorrow most likely BUT:

the main challenge on this issue is possibly unaddressed which is that a lot of the optimizer logic is complex and data-dependent

For example, bootstrap few shot will stop after it labels enough training examples — it won’t try to trace them all unnecessarily, depending on the metric

Unclear how to dryrun that behavior…

from dspy.

KCaverly avatar KCaverly commented on August 15, 2024

If the tokens used is non-determistic based on the optimization, could it provide value if we simply collected a broad sample of the number of optimization calls, and use it to estimate a likely range, as opposed to a single estimate?

from dspy.

okhat avatar okhat commented on August 15, 2024

Great idea @KCaverly — yeah this also brings up having a “budget”. I imagine saying: “ please don’t make more than 10,000 requests and don’t cost me more than $4 on this run”

from dspy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.