This may need to be done on a per-optimizer basis but it may be good to think of a dev

I've started to work on some of these - <a class="issue-link js-issue-link" data-error

Great idea <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Optimization dry runs about dspy HOT 9 OPEN

okhat commented on August 15, 2024 11

Optimization dry runs

from dspy.

Comments (9)

buzypi commented on August 15, 2024 2

This would be a wonderful feature. I have 3 things in mind (not sure about how well this can be implemented as I don't yet have complete source code level understanding):

A dry run at a module level that can show us the prompts that it is planning to submit to the LM: predict(input, dry_run=True). We need to have a convention of how we can separate the args from these settings. But it will be good if we don't have to change settings at a global level but rather at a module level, predict in dry_run mode and then run it without dry_run in case we are happy. This is great in a Notebook/REPL environment.
A stepper/debugger at a teleprompter level: BootstrapFewShot(metric=...).compile(..., step=True), which shows progress after each loop and asks if we want to continue. If we don't intend to continue, we can still work with the optimisations that have been obtained until the current cycle.
An explain function that shows the estimation of the number of calls / tokens: BootstrapFewShot(...).explain(). Again, I am thinking from the perspective of running these in Notebooks where we develop the programs incrementally.

from dspy.

thomasahle commented on August 15, 2024 1

Is there a way to do this without adding more special features to the Predict class?
It already has a lot of "magical" key word arguments.

Could it maybe be done using something like:

with dspy.dryrun():
    optimizer.compile(...)

Where the dryrun() is implemented by replacing the dsp.lm with a wrapped language model that does what dry_run does here?

from dspy.

okhat commented on August 15, 2024 1

To be clear budgets are outside the scope of dryruns but they’re related

from dspy.

smwitkowski commented on August 15, 2024

I've started to work on some of these - #408

Not going in the same order, but here's what I have so far:

A stepper/debugger at a teleprompter level: BootstrapFewShot(metric=...).compile(..., step=True), which shows progress after each loop and asks if we want to continue. If we don't intend to continue, we can still work with the optimisations that have been obtained until the current cycle.

dspy/teleprompt/bootstrap.py

-   def compile(self, student, *, teacher=None, trainset, valset=None):
+   def compile(self, student, *, teacher=None, trainset, valset=None, step=False):
        self.trainset = trainset
        self.valset = valset

-   def _bootstrap(self, *, max_bootstraps=None):
+   def _bootstrap(self, *, max_bootstraps=None, step=False):
        max_bootstraps = max_bootstraps or self.max_bootstrapped_demos

        bootstrapped = {}
        self.name2traces = {name: [] for name in self.name2predictor}
        for round_idx in range(self.max_rounds):
            for example_idx, example in enumerate(tqdm.tqdm(self.trainset)):
                if len(bootstrapped) >= max_bootstraps:
                    break
                if example_idx not in bootstrapped:
                    success = self._bootstrap_one_example(example, round_idx)

                    if success:
                        bootstrapped[example_idx] = True
+           if step:
+               user_input = input("Continue bootstrapping? (Y/n): ")
+               if user_input.lower() == 'n':
+                   print("Bootstrapping interrupted by user.")
+                   return  # Exit the loop and method

        print(f'Bootstrapped {len(bootstrapped)} full traces after {example_idx+1} examples in round {round_idx}.')

This seems pretty straightforward. Just adding step to .compile and ._bootstrap does the job. I'm nervous about the interaction with Notebooks and running this via the command line though. Requesting user inputs in a notebook environment has been tricky for me in the past.

3.A dry run at a module level that can show us the prompts that it is planning to submit to the LM: predict(input, dry_run=True). We need to have a convention of how we can separate the args from these settings. But it will be good if we don't have to change settings at a global level but rather at a module level, predict in dry_run mode and then run it without dry_run in case we are happy. This is great in a Notebook/REPL environment.

By changing Predict.forward to look for dry_run, we can add in a check if the user wants to perform a dry run or not

dspy/predict/predict.py

class Predict(Parameter):
        ...
    def forward(self, **kwargs):
        # Extract the three privileged keyword arguments.
        new_signature = kwargs.pop("new_signature", None)
        signature = kwargs.pop("signature", self.signature)
        demos = kwargs.pop("demos", self.demos)
+        dry_run = kwargs.pop("dry_run", False)

        ...

+       if dry_run:
+            # Prepare a structured output for the dry run
+            dry_run_info = {
+                'prompt': x,  # The prepared prompt
+                'config': config,  # The configuration used for generation
+                'signature': str(signature),  # The signature being used
+                'stage': self.stage,  # The current stage
+            }
+
+            # If an encoder is available, include encoded tokens in the output
+            encoder = dsp.settings.config.get('encoder', None)
+           if encoder is not None:
+                encoded_tokens = encoder.encode(x)
+                dry_run_info['encoded_tokens'] = encoded_tokens
+                dry_run_info['token_count'] = len(encoded_tokens)

+            # Option 1: Return the dry run information for further inspection
+            return dry_run_info

If they do want to perform a dry run, we use tiktoken to find the number of tokens, and return that with the prompt.

dsp/utils/utils.py

import tqdm
import datetime
import itertools
+import tiktoken

from collections import defaultdict

...

+def load_encoder_for_lm(lm):
+   """
+   Load and cache the tiktoken encoder based on the LM configuration.
    
+   Args:
+       lm: The language model configuration.
+   """

+   # Load the encoder. This is a placeholder; adjust based on how you actually load the encoder.
+   encoder = tiktoken.encoding_for_model(lm)
+   # Cache the encoder
+   return encoder

Do note that this only works for OpenAI models, since we're using tiktoken. To make it more robust, we ought to consider the suite of models that are valid, and then expand load_encoder_for_lm to handle each case. I think that should be done before (3).

Let me know your thoughts. Happy to continue working on this.

from dspy.

KCaverly commented on August 15, 2024

One thought on Token Counting - would it make sense to build this directly into the LM abstraction?

I imagine, while there may be some overlap in tokenization model to model, it may be cleaner to pair the tokenization directly with the provider.

from dspy.

buzypi commented on August 15, 2024

One thought on Token Counting - would it make sense to build this directly into the LM abstraction?

I imagine, while there may be some overlap in tokenization model to model, it may be cleaner to pair the tokenization directly with the provider.

I think this ties to the refactoring work that is being discussed here: #390 and I agree that it would be good to think of a generic solution which will work when integrating other open source models.

from dspy.

okhat commented on August 15, 2024

I love the explorations here. Will look more closely tomorrow most likely BUT:

the main challenge on this issue is possibly unaddressed which is that a lot of the optimizer logic is complex and data-dependent

For example, bootstrap few shot will stop after it labels enough training examples — it won’t try to trace them all unnecessarily, depending on the metric

Unclear how to dryrun that behavior…

from dspy.

KCaverly commented on August 15, 2024

If the tokens used is non-determistic based on the optimization, could it provide value if we simply collected a broad sample of the number of optimization calls, and use it to estimate a likely range, as opposed to a single estimate?

from dspy.

okhat commented on August 15, 2024

Great idea @KCaverly — yeah this also brings up having a “budget”. I imagine saying: “ please don’t make more than 10,000 requests and don’t cost me more than $4 on this run”

from dspy.

Optimization dry runs about dspy HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent