Coder Social home page Coder Social logo

stanfordnlp / dspy Goto Github PK

View Code? Open in Web Editor NEW
12.9K 125.0 990.0 27.77 MB

DSPy: The framework for programming—not prompting—foundation models

Home Page: https://dspy-docs.vercel.app/

License: MIT License

Python 77.24% Jupyter Notebook 22.45% HTML 0.16% CSS 0.05% JavaScript 0.02% Shell 0.08%

dspy's Introduction

DSPy: Programming—not prompting—Foundation Models

[Oct'23] DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
[Jan'24] In-Context Learning for Extreme Multi-Label Classification
[Dec'23] DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
[Dec'22] Demonstrate-Search-Predict: Composing Retrieval & Language Models for Knowledge-Intensive NLP

Getting Started:  

Documentation: DSPy Docs


DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize.

DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs.

Table of Contents

If you need help thinking about your task, we recently created a Discord server for the community.

  1. Installation
  2. Tutorials & Documentation
  3. Framework Syntax
  4. Compiling: Two Powerful Concepts
  5. Pydantic Types
  6. FAQ: Is DSPy right for me?

Analogy to Neural Networks

When we build neural networks, we don't write manual for-loops over lists of hand-tuned floats. Instead, you might use a framework like PyTorch to compose declarative layers (e.g., Convolution or Dropout) and then use optimizers (e.g., SGD or Adam) to learn the parameters of the network.

Ditto! DSPy gives you the right general-purpose modules (e.g., ChainOfThought, ReAct, etc.), which replace string-based prompting tricks. To replace prompt hacking and one-off synthetic data generators, DSPy also gives you general optimizers (BootstrapFewShotWithRandomSearch or BayesianSignatureOptimizer), which are algorithms that update parameters in your program. Whenever you modify your code, your data, your assertions, or your metric, you can compile your program again and DSPy will create new effective prompts that fit your changes.

Mini-FAQs

What do DSPy optimizers tune? Each optimizer is different, but they all seek to maximize a metric on your program by updating prompts or LM weights. Current DSPy optimizers can inspect your data, simulate traces through your program to generate good/bad examples of each step, propose or refine instructions for each step based on past results, finetune the weights of your LM on self-generated examples, or combine several of these to improve quality or cut cost. We'd love to merge new optimizers that explore a richer space: most manual steps you currently go through for prompt engineering, "synthetic data" generation, or self-improvement can probably generalized into a DSPy optimizer that acts on arbitrary LM programs.

How should I use DSPy for my task? Using DSPy is an iterative process. You first define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers (modules) to use, giving each layer a signature (input/output spec), and then calling your modules freely in your Python code. Lastly, you use a DSPy optimizer to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.

What if I have a better idea for prompting or synthetic data generation? Perfect. We encourage you to think if it's best expressed as a module or an optimizer, and we'd love to merge it in DSPy so everyone can use it. DSPy is not a complete project; it's an ongoing effort to create structure (modules and optimizers) in place of hacky prompt and pipeline engineering tricks.

What does DSPy stand for? It's a long story but the backronym now is Declarative Self-improving Language Programs, pythonically.

1) Installation

All you need is:

pip install dspy-ai

To install the very latest from main:

pip install git+https://github.com/stanfordnlp/dspy.git

Or open our intro notebook in Google Colab:

By default, DSPy installs the latest openai from pip. However, if you install old version before OpenAI changed their API openai~=0.28.1, the library will use that just fine. Both are supported.

For the optional (alphabetically sorted) Chromadb, Groq, Marqo, Milvus, MongoDB, MyScaleDB, Pinecone, Qdrant, Snowflake, or Weaviate retrieval integration(s), include the extra(s) below:

pip install dspy-ai[chromadb] # or [groq] or [marqo] or [milvus] or [mongodb] or [myscale] or [pinecone] or [qdrant] or [snowflake] or [weaviate]

2) Documentation

The DSPy documentation is divided into tutorials (step-by-step illustration of solving a task in DSPy), guides (how to use specific parts of the API), and examples (self-contained programs that illustrate usage).

A) Tutorials

Level Tutorial Run in Colab Description
Beginner Getting Started Introduces the basic building blocks in DSPy. Tackles the task of complex question answering with HotPotQA.
Beginner Minimal Working Example N/A Builds and optimizes a very simple chain-of-thought program in DSPy for math question answering. Very short.
Beginner Compiling for Tricky Tasks N/A Teaches LMs to reason about logical statements and negation. Uses GPT-4 to bootstrap few-shot CoT demonstations for GPT-3.5. Establishes a state-of-the-art result on ScoNe. Contributed by Chris Potts.
Beginner Local Models & Custom Datasets Illustrates two different things together: how to use local models (Llama-2-13B in particular) and how to use your own data examples for training and development.
Intermediate The DSPy Paper N/A Sections 3, 5, 6, and 7 of the DSPy paper can be consumed as a tutorial. They include explained code snippets, results, and discussions of the abstractions and API.
Intermediate DSPy Assertions Introduces example of applying DSPy Assertions while generating long-form responses to questions with citations. Presents comparative evaluation in both zero-shot and compiled settings.
Intermediate Finetuning for Complex Programs Teaches a local T5 model (770M) to do exceptionally well on HotPotQA. Uses only 200 labeled answers. Uses no hand-written prompts, no calls to OpenAI, and no labels for retrieval or reasoning.
Advanced Information Extraction Tackles extracting information from long articles (biomedical research papers). Combines in-context learning and retrieval to set SOTA on BioDEX. Contributed by Karel D’Oosterlinck.

Other resources people find useful:

B) Guides

If you're new to DSPy, it's probably best to go in sequential order. You will probably refer to these guides frequently after that, e.g. to copy/paste snippets that you can edit for your own DSPy programs.

  1. Language Models

  2. Signatures

  3. Modules

  4. Data

  5. Metrics

  6. Optimizers (formerly Teleprompters)

  7. DSPy Assertions

C) Examples

The DSPy team believes complexity has to be justified. We take this seriously: we never release a complex tutorial (above) or example (below) unless we can demonstrate empirically that this complexity has generally led to improved quality or cost. This kind of rule is rarely enforced by other frameworks or docs, but you can count on it in DSPy examples.

There's a bunch of examples in the examples/ directory and in the top-level directory. We welcome contributions!

You can find other examples tweeted by @lateinteraction on Twitter/X.

Some other examples (not exhaustive, feel free to add more via PR):

TODO: Add links to the state-of-the-art results by the University of Toronto on Clinical NLP, on Theory of Mind (ToM) by Plastic Labs, and the DSPy pipeline from Replit.

There are also recent cool examples at Weaviate's DSPy cookbook by Connor Shorten. See tutorial on YouTube.

3) Syntax: You're in charge of the workflow—it's free-form Python code!

DSPy hides tedious prompt engineering, but it cleanly exposes the important decisions you need to make: [1] what's your system design going to look like? [2] what are the important constraints on the behavior of your program?

You express your system as free-form Pythonic modules. DSPy will tune the quality of your program in whatever way you use foundation models: you can code with loops, if statements, or exceptions, and use DSPy modules within any Python control flow you think works for your task.

Suppose you want to build a simple retrieval-augmented generation (RAG) system for question answering. You can define your own RAG program like this:

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        context = self.retrieve(question).passages
        answer = self.generate_answer(context=context, question=question)
        return answer

A program has two key methods, which you can edit to fit your needs.

Your __init__ method declares the modules you will use. Here, RAG will use the built-in Retrieve for retrieval and ChainOfThought for generating answers. DSPy offers general-purpose modules that take the shape of your own sub-tasks — and not pre-built functions for specific applications.

Modules that use the LM, like ChainOfThought, require a signature. That is a declarative spec that tells the module what it's expected to do. In this example, we use the short-hand signature notation context, question -> answer to tell ChainOfThought it will be given some context and a question and must produce an answer. We will discuss more advanced signatures below.

Your forward method expresses any computation you want to do with your modules. In this case, we use the module self.retrieve to search for some context and then use the module self.generate_answer, which uses the context and question to generate the answer!

You can now either use this RAG program in zero-shot mode. Or compile it to obtain higher quality. Zero-shot usage is simple. Just define an instance of your program and then call it:

rag = RAG()  # zero-shot, uncompiled version of RAG
rag("what is the capital of France?").answer  # -> "Paris"

The next section will discuss how to compile our simple RAG program. When we compile it, the DSPy compiler will annotate demonstrations of its steps: (1) retrieval, (2) using context, and (3) using chain-of-thought to answer questions. From these demonstrations, the DSPy compiler will make sure it produces an effective few-shot prompt that works well with your LM, retrieval model, and data. If you're working with small models, it'll finetune your model (instead of prompting) to do this task.

If you later decide you need another step in your pipeline, just add another module and compile again. Maybe add a module that takes the chat history into account during search?

4) Two Powerful Concepts: Signatures & Teleprompters

Note: We will soon rename teleprompters to optimizers. This will not affect their functionality, but will simplify the terms used.

To make it possible to compile any program you write, DSPy introduces two simple concepts: Signatures and Teleprompters.

4.a) Declaring the input/output behavior of LMs with dspy.Signature

When we assign tasks to LMs in DSPy, we specify the behavior we need as a Signature. A signature is a declarative specification of input/output behavior of a DSPy module.

Instead of investing effort into how to get your LM to do a sub-task, signatures enable you to inform DSPy what the sub-task is. Later, the DSPy compiler will figure out how to build a complex prompt for your large LM (or finetune your small LM) specifically for your signature, on your data, and within your pipeline.

A signature consists of three simple elements:

  • A minimal description of the sub-task the LM is supposed to solve.
  • A description of one or more input fields (e.g., input question) that will we will give to the LM.
  • A description of one or more output fields (e.g., the question's answer) that we will expect from the LM.

We support two notations for expressing signatures. The short-hand signature notation is for quick development. You just provide your module (e.g., dspy.ChainOfThought) with a string with input_field_name_1, ... -> output_field_name_1, ... with the fields separated by commas.

In the RAG class earlier, we saw:

self.generate_answer = dspy.ChainOfThought("context, question -> answer")

In many cases, this barebones signature is sufficient. However, sometimes you need more control. In these cases, we can use the full notation to express a more fully-fledged signature below.

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

### inside your program's __init__ function
self.generate_answer = dspy.ChainOfThought(GenerateSearchQuery)

You can optionally provide a prefix and/or desc key for each input or output field to refine or constraint the behavior of modules using your signature. The description of the sub-task itself is specified as the docstring (i.e., """Write a simple...""").

4.b) Asking DSPy to automatically optimize your program with dspy.teleprompt.*

After defining the RAG program, we can compile it. Compiling a program will update the parameters stored in each module. For large LMs, this is primarily in the form of creating and validating good demonstrations for inclusion in your prompt(s).

Compiling depends on three things: a (potentially tiny) training set, a metric for validation, and your choice of teleprompter from DSPy. Teleprompters are powerful optimizers (included in DSPy) that can learn to bootstrap and select effective prompts for the modules of any program. (The "tele-" in the name means "at a distance", i.e., automatic prompting at a distance.)

DSPy typically requires very minimal labeling. For example, our RAG pipeline may work well with just a handful of examples that contain a question and its (human-annotated) answer. Your pipeline may involve multiple complex steps: our basic RAG example includes a retrieved context, a chain of thought, and the answer. However, you only need labels for the initial question and the final answer. DSPy will bootstrap any intermediate labels needed to support your pipeline. If you change your pipeline in any way, the data bootstrapped will change accordingly!

my_rag_trainset = [
  dspy.Example(
    question="Which award did Gary Zukav's first book receive?",
    answer="National Book Award"
  ),
  ...
]

Second, define your validation logic, which will express some constraints on the behavior of your program or individual modules. For RAG, we might express a simple check like this:

def validate_context_and_answer(example, pred, trace=None):
    # check the gold label and the predicted answer are the same
    answer_match = example.answer.lower() == pred.answer.lower()

    # check the predicted answer comes from one of the retrieved contexts
    context_match = any((pred.answer.lower() in c) for c in pred.context)

    return answer_match and context_match

Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. For RAG, we might use the simple teleprompter called BootstrapFewShot. To do so, we instantiate the teleprompter itself with a validation function my_rag_validation_logic and then compile against some training set my_rag_trainset.

from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=my_rag_validation_logic)
compiled_rag = teleprompter.compile(RAG(), trainset=my_rag_trainset)

If we now use compiled_rag, it will invoke our LM with rich prompts with few-shot demonstrations of chain-of-thought retrieval-augmented question answering on our data.

5) Pydantic Types

Sometimes you need more than just string inputs/outputs. Assume, for example, you need to find

from pydantic import BaseModel, Field
from dspy.functional import TypedPredictor

class TravelInformation(BaseModel):
    origin: str = Field(pattern=r"^[A-Z]{3}$")
    destination: str = Field(pattern=r"^[A-Z]{3}$")
    date: datetime.date
    confidence: float = Field(gt=0, lt=1)

class TravelSignature(Signature):
    """ Extract all travel information in the given email """
    email: str = InputField()
    flight_information: list[TravelInformation] = OutputField()

predictor = TypedPredictor(TravelSignature)
predictor(email='...')

Which will output a list of TravelInformation objects.

There are other ways to create typed signatures too. Such as

predictor = TypedChainOfThought("question:str -> answer:int")

which applies chain of thought, and is guaranteed to return an int.

There's even an approach inspired by tanuki.py, which can be convenient when defining modules:

from dspy.functional import FunctionalModule, predictor, cot

class MyModule(FunctionalModule):
    @predictor
    def hard_question(possible_topics: list[str]) -> str:
        """Write a hard question based on one of the topics. It should be answerable by a number."""

    @cot
    def answer(question: str) -> float:
        pass

    def forward(possible_topics: list[str]):
        q = hard_question(possible_topics=possible_topics)
        a = answer(question=q)
        return (q, a)

For more examples, see the list above, as well as the unit tests for the module.

6) FAQ: Is DSPy right for me?

The DSPy philosophy and abstraction differ significantly from other libraries and frameworks, so it's usually straightforward to decide when DSPy is (or isn't) the right framework for your usecase.

If you're a NLP/AI researcher (or a practitioner exploring new pipelines or new tasks), the answer is generally an invariable yes. If you're a practitioner doing other things, please read on.

[5.a] DSPy vs. thin wrappers for prompts (OpenAI API, MiniChain, basic templating)

In other words: Why can't I just write my prompts directly as string templates? Well, for extremely simple settings, this might work just fine. (If you're familiar with neural networks, this is like expressing a tiny two-layer NN as a Python for-loop. It kinda works.)

However, when you need higher quality (or manageable cost), then you need to iteratively explore multi-stage decomposition, improved prompting, data bootstrapping, careful finetuning, retrieval augmentation, and/or using smaller (or cheaper, or local) models. The true expressive power of building with foundation models lies in the interactions between these pieces. But every time you change one piece, you likely break (or weaken) multiple other components.

DSPy cleanly abstracts away (and powerfully optimizes) the parts of these interactions that are external to your actual system design. It lets you focus on designing the module-level interactions: the same program expressed in 10 or 20 lines of DSPy can easily be compiled into multi-stage instructions for GPT-4, detailed prompts for Llama2-13b, or finetunes for T5-base.

Oh, and you wouldn't need to maintain long, brittle, model-specific strings at the core of your project anymore.

[5.b] DSPy vs. application development libraries like LangChain, LlamaIndex

Note: If you use LangChain as a thin wrapper around your own prompt strings, refer to answer [5.a] instead.

LangChain and LlamaIndex are popular libraries that target high-level application development with LMs. They offer many batteries-included, pre-built application modules that plug in with your data or configuration. In practice, indeed, many usecases genuinely don't need any special components. If you'd be happy to use someone's generic, off-the-shelf prompt for question answering over PDFs or standard text-to-SQL as long as it's easy to set up on your data, then you will probably find a very rich ecosystem in these libraries.

Unlike these libraries, DSPy doesn't internally contain hand-crafted prompts that target specific applications you can build. Instead, DSPy introduces a very small set of much more powerful and general-purpose modules that can learn to prompt (or finetune) your LM within your pipeline on your data.

DSPy offers a whole different degree of modularity: when you change your data, make tweaks to your program's control flow, or change your target LM, the DSPy compiler can map your program into a new set of prompts (or finetunes) that are optimized specifically for this pipeline. Because of this, you may find that DSPy obtains the highest quality for your task, with the least effort, provided you're willing to implement (or extend) your own short program. In short, DSPy is for when you need a lightweight but automatically-optimizing programming model — not a library of predefined prompts and integrations.

If you're familiar with neural networks:

This is like the difference between PyTorch (i.e., representing DSPy) and HuggingFace Transformers (i.e., representing the higher-level libraries). If you simply want to use off-the-shelf BERT-base-uncased or GPT2-large or apply minimal finetuning to them, HF Transformers makes it very straightforward. If, however, you're looking to build your own architecture (or extend an existing one significantly), you have to quickly drop down into something much more modular like PyTorch. Luckily, HF Transformers is implemented in backends like PyTorch. We are similarly excited about high-level wrapper around DSPy for common applications. If this is implemented using DSPy, your high-level application can also adapt significantly to your data in a way that static prompt chains won't. Please open an issue if this is something you want to help with.

[5.c] DSPy vs. generation control libraries like Guidance, LMQL, RELM, Outlines

Guidance, LMQL, RELM, and Outlines are all exciting new libraries for controlling the individual completions of LMs, e.g., if you want to enforce JSON output schema or constrain sampling to a particular regular expression.

This is very useful in many settings, but it's generally focused on low-level, structured control of a single LM call. It doesn't help ensure the JSON (or structured output) you get is going to be correct or useful for your task.

In contrast, DSPy automatically optimizes the prompts in your programs to align them with various task needs, which may also include producing valid structured ouputs. That said, we are considering allowing Signatures in DSPy to express regex-like constraints that are implemented by these libraries.

Testing

To run the tests, you need to first clone the repository.

Then install the package through poetry: Note - You may need to

poetry install --with test

Then run the all tests, or a specific test suite, with the following commands:

poetry run pytest
poetry run pytest tests/PATH_TO_TEST_SUITE

Contribution Quickstart

See CONTRIBUTING.md for a quickstart guide to contributing to DSPy.

Contributors & Acknowledgements

DSPy is led by Omar Khattab at Stanford NLP with Chris Potts and Matei Zaharia.

Key contributors and team members include Arnav Singhvi, Krista Opsahl-Ong, Michael Ryan, Cyrus Nouroozi, Kyle Caverly, Amir Mehr, Karel D'Oosterlinck, Shangyin Tan, Manish Shetty, Herumb Shandilya, Paridhi Maheshwari, Keshav Santhanam, Sri Vardhamanan, Eric Zhang, Hanna Moazam, Thomas Joshi, Saiful Haq, and Ashutosh Sharma.

DSPy includes important contributions from Rick Battle and Igor Kotenkov. It reflects discussions with Peter Zhong, Haoze He, Lisa Li, David Hall, Ashwin Paranjape, Heather Miller, Chris Manning, Percy Liang, and many others.

The DSPy logo is designed by Chuyi Zhang.

📜 Citation & Reading More

To stay up to date or learn more, follow @lateinteraction on Twitter.

If you use DSPy or DSP in a research paper, please cite our work as follows:

@article{khattab2023dspy,
  title={DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines},
  author={Khattab, Omar and Singhvi, Arnav and Maheshwari, Paridhi and Zhang, Zhiyuan and Santhanam, Keshav and Vardhamanan, Sri and Haq, Saiful and Sharma, Ashutosh and Joshi, Thomas T. and Moazam, Hanna and Miller, Heather and Zaharia, Matei and Potts, Christopher},
  journal={arXiv preprint arXiv:2310.03714},
  year={2023}
}
@article{khattab2022demonstrate,
  title={Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive {NLP}},
  author={Khattab, Omar and Santhanam, Keshav and Li, Xiang Lisa and Hall, David and Liang, Percy and Potts, Christopher and Zaharia, Matei},
  journal={arXiv preprint arXiv:2212.14024},
  year={2022}
}

You can also read more about the evolution of the framework from Demonstrate-Search-Predict to DSPy:

Note: If you're looking for Demonstrate-Search-Predict (DSP), which is the previous version of DSPy, you can find it on the v1 branch of this repo.

dspy's People

Contributors

anindyadeep avatar anush008 avatar arnavsinghvi11 avatar cshorten avatar demontego avatar detaos avatar drawal1 avatar fsndzomga avatar harishkumar1112001 avatar insop avatar isaacbmiller avatar jasper-xian avatar joshmantova avatar kcaverly avatar klopsahlong avatar krypticmouse avatar lawliet19189 avatar manishshettym avatar mogith-pn avatar nbqu avatar no-dice-io avatar okhat avatar sfc-gh-alherrera avatar shangyint avatar someshfengde avatar stalkermustang avatar thomasahle avatar tom-doerr avatar usamajamil43 avatar xenonmolecule avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dspy's Issues

Typing

First of all, thanks for opensourcing your work, I think it's a great step in the direction of making LLM tools more sane.

I see that DSP uses dotdict which leads to extremely dynamic behavior that could lead to hard to debug errors when something will change between versions (field gets renamed, a required field is added et c), and is also problematic when someone would like to provide a HTTP API that will return your classes (because the developer needs to check which fields got set in code)

Questions:

  • Do you plan to migrate the project to Pydantic/dataclasses that would validate arguments?
  • If not, what are the obstacles?
  • If you plan to migrate it but just don't have enough man-hours, could you point me to the classes that you think will have a stable API? I can migrate these classes to something that will support validation

I think I'll be forking your project in the coming weeks, it'll be great if I got some guidance so that my extension could be used by someone else.
Additional benefit of using pydantic/dataclasses would be that it'll be extremely easy to write a HTTP server (which is something that I'm very likely do anyways)

dsp.generate self-consistency

I am using dsp.generate for question answer with self-consistency. When I use dsp.generate with the following code I get varying numbers of completions (sometimes the full n and sometimes less, sometimes even 1). Would you know why I don't get the full n?

code:
example, completions = dsp.generate(qa_template_with_CoT_context_diag, n=23, temperature=0.7)(example, stage='qa')

Clarification: Is DSP only meant for short factual answers?

Might be a silly questions, however all the examples I have seen has to do with short factoid answer, often between 1 and 5 words, wanted to know if DSP is only meant for short factual answers? Or can it be also used for cases where we want paraphrasing or summarising over multiple chunks of text that might contain the answers of a question asked by the user? If yes, it would be great to see an example of the same.

Thanks for all the great work!,
Regards,
Karrtik

intro.ipynb: Trial and Errors

First of all: I wish all papers came with step by step reproductions and iterative improvements like this one does. Kudos!

I attempted to run this on Colab and immediately ran into the problem of a missing OpenAI key. Easy enough, but conflicts with the statement in the notebook. Then Programs 3 and 5 failed to complete -- no errors were reported but they ran much longer than reasonable so I eventually killed them.

I also noticed that when updating the LLM from text-davinci-002 to -003 most of the improvements provided by DSP went away: while -002 got a measly 23.1%, -003 was able to answer correctly 38.5% of the time in Program 1, and with -003 Program 2's score went DOWN to 30.8% while Program 4 "only" achieved 46.2%. Certainly good, but not quite the improvement I hoped for. (Also, if using -002 shouldn't this be referred to as GPT3? IIRC the 3.5 moniker was introduced with -003? [My memory may be wrong, and Google wasn't helpful]) Since I could not make Program 5 run I'm not sure how much that improves things

I'd be interested in seeing how Turbo or 4 improves on the basic task, but have not myself attempted to change the API call.

FWIW this is my modestly forked version: https://gist.github.com/oaustegard/6edf2ed6f5b17a04d307a4593b6af3f7#file-intro-ipynb

text-davinci-001: majority_vote_ function does not handle cases where pred[prediction_field] is a list of str instead of str

Link to the code snippet in question:
https://github.com/stanfordnlp/dsp/blob/12d7f1106f2f524be161d35865a81e0ea016e929/dsp/primitives/predict.py#L211

Expected behavior:
The function should work with pred[prediction_field] of type Union[List[str], str].

Current behavior:
The function raises an error when calling normalize_text on pred[prediction_field] when it is of type List[str].

efficient parallelization of DSP programs

The motivation for this issue is to be able to run the DSP programs in a scalable manner. An example of the circumstance that warrants this is evaluating a DSP program on thousands of data points. In the aforementioned mentioned case, Running the DSP program sequentially on all the data points might not be the efficient choice. Therefore having a scalable approach that can optimize for these use-cases is required.

The ideal scalable solution should:

  • be thread-safe: this issue seems to be observable in our LRU caches, where caching degrades on threaded calls due to multiple thread accessing cache at the same time.
  • be non-blocking: most of the DSP program operations are I/O bound. Therefore, it should be possible to have high-concurrency with very limited blocking calls.
  • have reliable caching: extension of being thread-safe but we need to make the system caches remain consistent and up-to-date with the underlying data
  • have good abstracted methods / apis: it should be straight-forward to use and extensible.

branch primitive not found

hi, i would like to inquire about the primitive branch that is mentioned in the research paper. i don't think i am able to find it in any of the codes in the repo...

also, i would also like to know if there is any way we could see what are the direct requests to the api

Transformers error

The following error is occurring when trying to run the Notebook for testing with KNN.
AttributeError: module 'dsp' has no attribute 'SentenceTransformersVectorizer'
Can you help me?

How to design a system to provide long answers

I am not sure how to train DSPy to synthesize long, detailed answers such as those are required for "how to" questions. So far, I have tried training on long examples with RAG and SimplifiedBaleen; and answer_exact_match frac set to different values ranging from 0.5-0.8

In all cases, the system truncates the answers or gives incomplete answers, even when the answers are retrieved fully formed in the context

What is needed is a way to retrieve the answers to the various questions, and then instead of selecting one answer (which is what Predict does), to build/assemble the final answer from the answers to all the sub-questions

I tried doing this using manual prompt engineering and langchain and it works great! Unfortunately, its complicated and requires special handling for questions that are more suitable for "factoid" answers

Are there any examples to demonstrate synthesizing long, detailed answers?

KNN faiss index creation error

The following error occurred when calling dsp.knn(train_dataset), and changing line 76 to "encode_residuals" rendered another type mismatch error.

in create_faiss_index(emb_dim, n_objects, n_probe, max_gpu_devices, encode_residuals, in_list_dist_type, centroid_dist_type)
114 index = _get_brute_index(emb_dim=emb_dim, dist_type=in_list_dist_type)
115 else:
--> 116 index = _get_ivf_index(
117 emb_dim=emb_dim,
118 n_objects=n_objects,
119 in_list_dist_type=in_list_dist_type,
120 centroid_dist_type=centroid_dist_type,
121 encode_residuals=encode_residuals
...
70 index = faiss.IndexIVFScalarQuantizer(
71 quannizer,
72 emb_dim,
73 n_list,
74 faiss.ScalarQuantizer.QT_fp16, # TODO: should be optional?
75 centroid_metric,
76 encode_residuals=encode_residuals
)
TypeError: replacement_init() got an unexpected keyword argument 'encode_residuals'

Using davinci-003 makes results worse?

Have others noticed this issue as well? I swapped the LM to be davinci-text-003 and my results are actually worse than with davinci-text-002.

Some of the answers are pretty "obviously wrong" too:

Question: What year was the party of the winner of the 1971 San Francisco mayoral election founded?

Rationale: Let's think step by step. First, we need to identify the winner of the 1971 San Francisco mayoral election. From the context, we can see that the winner of the 1971 San Francisco mayoral election was Joseph Alioto. Next, we need to identify the date in which the party of Joseph Alioto was founded. 

From the context, we can see that the Democratic Party of Serbia (of which Joseph Alioto was a member) was founded on February 3, 1990.

On multihop_QA_v2, instead of 84.6% results, I am getting 53.8%. multihop_QA_v1 achieved ~34%.

Feature request: more models

Many thanks for this amazing work! Eye-opening for a newbie who just got to know RAGstack.
So is it possible or suggested to use open source LLMs such as vicuna, llama 2, and open source embed models like BGE?
Thank you in advance~~

About the colbertV2 search server

Hi, Did you split the Wikipedia (Dec 2018) doc to small passages? Could you tell more about the colbertV2 model you use? The colbert end-to-end model or re-rank model? I was wondering why the search result is shorter than the real wiki page.

Documentation to help support a custom data set

I am setting up an experiment to try DSP using a custom dataset. This consists of 2200 customer support conversations and 140 document articles, used in finding answers to customer questions. We curate this data and add which parts of conversations are useful and whether a conversation part is a preliminary follow up question from the support agent and the subsequent answer from the customer. This is very common, where the agent needs more information to answer the initial customer question. We also add a list of questions that are answered by an article to provide context.

What would be useful in the documentation is a walkthrough of how to support such a custom dataset. Some of the considerations would be:

  1. How we condition our dataset to suit the Example inputs required by DSP. It is fairly clear that the preliminary questions and answers lend themselves to CoT approach, but it is not clear how this relates to the DSP Examples required.
  2. Our final answers often have multiple steps to be followed by the customer. Also a lot of intermediate answers are quite long. All your examples have very short answers. It is not clear if our dataset suits DSP as a consequence of this difference.
  3. We would need to train Colbert V2 on our data. It is not clear where we can do the training on a service supporting Colbert v2
  4. What specification of PC we would need to run the Colbert V2 inference for RM. I have a 8 core 16GB RAM mini server available, but do not know if it would be adequate.
  5. It would also be interesting to see if there is much difference in performance replacing RM with a simpler mechanism such a FAISS.

Thank you for the great work so far!

biggest challenge in retrieval*

Lots of cool stuff built around leveraging realtime human created knowledge.

The biggest challenge I'm running into is 'back propagation' for the entire RAG/REML/RAML/R* pipeline. Always tricky to do when stitching together a bunch of disparate models.

Not sure the REST based 3rd party models belong in lower level efforts, btw. They are too much flux / unreliable. Very useful for sure, but probably best to keep them high level.

Request: Instructions for how to use Chat

Would it be possible to edit intro.ipynb to include instructions for how to utilize gpt-3.5-turbo as the LM?

In the setup paragraph I tried invoking Chat as the LM using

lm = dsp.GPT3(model='gpt-3.5-turbo', model_type='chat', api_key=openai_key)

as suggested by my reading of gpt3.py

But when getting to the execution of Program 1 I get the error

Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?
---------------------------------------------------------------------------
InvalidRequestError                       Traceback (most recent call last)
[<ipython-input-7-9f968c8aa805>](https://localhost:8080/#) in <cell line: 2>()
      1 print(dev[0].question)
----> 2 print(vanilla_LM_QA(dev[0].question))

18 frames
[/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py](https://localhost:8080/#) in _interpret_response_line(self, rbody, rcode, rheaders, stream)
    681         stream_error = stream and "error" in resp.data
    682         if stream_error or not 200 <= rcode < 300:
--> 683             raise self.handle_error_response(
    684                 rbody, rcode, resp.data, rheaders, stream_error=stream_error
    685             )

InvalidRequestError: Unrecognized request argument supplied: model_type

ColBERTv2 ec2 instance

Hi,

I am trying to query the ColBERT v2 ec2 instance in the notebook but have trouble getting a response from it. Is it still running?

Getting an Exception following RAG example in docs

This happens regardless of whether you pip install or git clone the source. I am using the BootstrapFewShot teleprompter as shown in the intro notebook

It works fine in zero shot mode (no compilation)

Running the code gives me the exception below:
Exception has occurred: TypeError
cannot pickle '_thread.RLock' object
File ".../dspy/primitives/module.py", line 49, in reset_copy
obj = copy.deepcopy(self)
File ".../src/bootstrap.py", line 51, in _prepare_student_and_teacher
self.student = student.reset_copy()
File ".../bootstrap.py", line 41, in compile
self._prepare_student_and_teacher(student, teacher)
File ".../DSPy_LLM.py", line 131, in main
compiled_rag = teleprompter.compile(RAG(), trainset=TRAIN_SET)
File ".../DSPy_LLM.py", line 166, in
main()
TypeError: cannot pickle '_thread.RLock' object

Any suggestions?

ColBERTv2 retriever not fetching appropriate passage for an obvious query

Hi,

Thanks for this great project!

I was playing around with different prompts of my own within the DSP framework, and I am having trouble getting a correct answer to the following simple question:

Which team does the player named 2015 Diamond Head Classic’s MVP play for?

There is a Wikipedia page about the 2015 Diamond Head Classic (link). The phrase "2015 Diamond Head Classic" appears in the title as well as the abstract. The abstract also mentions "Buddy Hield" was named MVP.

However, the ColBERTv2 retriever is unable to retrieve the exact Wikipedia page in top 5 results. I checked the page's history and it was added in 2015, so it should have been present in the 2019 Wikipedia dump.

1st Hop

Write a search query that will help answer a complex question; if unsure, say "Not Found".
---
Follow the following format.

Question: «question to be answered»
Rationale: Let's think step by step. To answer this question, we first need to find out «the missing information»
Search Query: «a simple question for seeking the missing information»
---
Question: Which team does the player named 2015 Diamond Head Classic’s MVP play for?
Rationale: Let's think step by step. To answer this question, we first need to find out the player's name.
Search Query: "2015 Diamond Head Classic's MVP"

2nd Hop

Write a search query that will help answer a complex question; if unsure, say "Not Found".
---
Follow the following format:

Context: «sources that may contain relevant content»
Question: «question to be answered»
Rationale: Let's think step by step. Based on the context, we have learned the following. «information from context that provides useful clues»
Search Query: «a simple question for seeking remaining missing information»
---
Context:
«2015–16 Big Ten Conference men's basketball season | rankings Throughout the conference regular season, the Big Ten offices named one or two players of the week and one or two freshmen of the week each Monday. On November 17 in the Champions Classic, Denzel Valentine led Michigan State over Kansas by posting the first triple-double of the 2015–16 NCAA Division I men's basketball season with 29 points, 12 rebounds and 12 assists. On January 5, Diamond Stone was named national freshman of the week by the United States Basketball Writers Association. This table summarizes the head-to-head results between teams in conference play. Each team played 18 conference games,»
«Nelson Figueroa | Diamondbacks on December 21, 2012. He was released on April 26, 2013. Figueroa again signed with Taiwan's Uni-President 7-Eleven Lions in mid-2013. Figueroa has a brief but successful stint with the Lions in 2007, during which he was voted the MVP of Taiwan Series that year. On February 16, 2015, SNY announced that Figueroa would replace Bob Ojeda as the pre[/post-game](https://vscode-remote+ssh-002dremote-002babhinav-002dms-002d7a40.vscode-resource.vscode-cdn.net/post-game) analyst for their Mets broadcasts. Figueroa played as a pitcher for the Puerto Rican national team in the 2013 World Baseball Classic where he won a silver medal. Following the conclusion of the tournament, which was won by Dominican»
«Lucas Dias | Lucas Dias Lucas Dias Silva (born July 6, 1995) is a Brazilian professional basketball player who currently plays for Franca of the Novo Basquete Brasil (NBB). Dias was named Jordan Brand Classic International MVP in 2012. On April 21, 2015, it was announced that he would enter the 2015 NBA draft. However, he withdrew from the draft before the draft withdrawal deadline. Dias began his pro career with the Brazilian NBB League club E.C. Pinheiros. He was named the Brazilian League Revelation Player of the 2015–16 season. In 2016, he moved to the Brazilian club Paulistano. Dias represented Brazil at»
«2015 McDonald's All-American Boys Game | Kelley of the Bullis School in Potomac, Maryland coached the East team, while Robert Smith of Chicago's Simeon Career Academy coached the West team. The East defeated the West by a 111–91 score. Cheick Diallo earned MVP of the game after posting 18 points and 10 rebounds, for the East team. Five East team players (Diallo, Antonio Blakeney, Diamond Stone, Dwayne Bacon, and Isaiah Briscoe) and four West team players (Allonzo Trier, Brandon Ingram, P. J. Dozier, and Ivan Rabb) reached double figures in scoring. 2015 McDonald's All-American Boys Game The 2015 McDonald's All-American Boys Game is an All-star basketball»
«2015 MVP Cup | 2015 MVP Cup The 2015 Manny V. Pangilinan Cup, also known as the Master Game Face MVP Cup 2015 due to sponsorship reasons, was an invitational basketball tournament which was participated by four teams from September 11–13, 2015 at the Smart Araneta Coliseum. While a similarly named tournament was held in 2010, the 2010 MVP Invitational Champions' Cup, the 2015 MVP Cup is considered the inaugural edition of the MVP Cup and is planned to be held annually. The tournament was a single-round robin format and the champions were awarded $25,000. China, South Korea and Senegal were invited to join»
Question: Which team does the player named 2015 Diamond Head Classic’s MVP play for?
Rationale: Let's think step by step. Based on the context, we have learned the following. We need to find a player named 2015 Diamond Head Classic's MVP and which team he plays for.
Search Query: "2015 Diamond Head Classic's MVP" team

The subsequent hops cannot find the answer as the appropriate passage is not retrieved in the 2nd hop.

Thanks!

CC @okhat

Best way to use stop sequences with DSP

I have been playing around with DSP and a wide range of models. A big problem I noticed with Open Source models is that they don't really know when to stop.
This is quite easy to mitigate by using '\n' as the stopping sequence for the answer.

I set it globally with lm = dsp.GPT3(...stop"\n").

This works well for vanilla_LM_QA and retrieve_then_read_QA but in retrieve_then_read_QA_v2 it destroys the models ability to generate a good rationale. I am looking for a good way to differentiate stop words between intermediary generations and the answer.
I made some adjustments to do_generate function in predict.py so I can use stop words with GPT3 without degrading performance. You can see it below or in my fork.

I was wondering if there may be a simpler way without modifying the DSP code?

def do_generate(
       example: Example, stage: str, max_depth: int = 2, original_example=None
   ):
       if not dsp.settings.lm:
           raise AssertionError("No LM is loaded.")
       original_example = original_example or example
       assert stage is not None

       # Look up the appropriate fields in each demonstration.
       example = example.demos_at(lambda d: d[stage])

       # Generate and extract the fields.
       prompt = template(example)
       completions: list[dict[str, Any]] = generator(prompt, **kwargs)
       completions: list[Example] = [template.extract(example, p) for p in completions]

       # Find the completions that are most complete.
       field_names: list[str] = [field.input_variable for field in template.fields]

       last_field_idx = 0
       for field_idx, key in enumerate(field_names):
           completions_ = [
               c for c in completions if key in c.keys() and c[key] is not None
           ]

           # Filter out completions that are missing fields that are present in at least one completion.
           if len(completions_):
               completions = completions_
               last_field_idx = field_idx + 1

       # If none of the completions is completed (i.e., none has the final field set).
       if last_field_idx < len(field_names):
           # Pick the first completion that has gone farthest.
           completion = completions[0]
           completion[field_names[last_field_idx]] = ""

           # Recurse with greedy decoding and a shorter length.
           max_tokens = kwargs.get("max_tokens", dsp.settings.lm.kwargs["max_tokens"])
           max_tokens = min(max(75, max_tokens // 2), max_tokens)
           #MY CHANGES CODE START
           #Determine wheter this is a final generation or not
           if last_field_idx == len(field_names) - 1:
               #final generation
               new_kwargs = {
               **kwargs,
               "max_tokens": max_tokens,
               "n": 1,
               "temperature": 0.0,
               "stop": "\n"
           }
           else:
               #not final generation
               new_kwargs = {
                   **kwargs,
                   "max_tokens": max_tokens,
                   "n": 1,
                   "temperature": 0.0,
                   "stop": "\n\n"
               }
           #MY CHANGES CODE END
           assert max_depth > 0
           return generate(template, **new_kwargs)(
               completion,
               stage=stage,
               max_depth=max_depth - 1,
               original_example=original_example,
           )

       completions = Completions(completions, template=template)
       example = example.copy(completions=completions)

       if len(completions) == 1:
           completion = completions[0]
           example[stage] = example.copy(**completion)

           if dsp.settings.compiling:
               inputs_ = set(original_example.keys())
               inputs = [
                   f.input_variable
                   for f in template.fields
                   if f.input_variable in inputs_
               ]
               outputs = [
                   f.output_variable
                   for f in template.fields
                   if f.input_variable not in inputs_
               ]

               example.compiling_stages = example.get("compiling_stages", [])
               example.compiling_stages.append(
                   {
                       "name": stage,
                       "template": template,
                       "inputs": inputs,
                       "outputs": outputs,
                   }
               )
       else:
           # assert not dsp.settings.compiling, "TODO: At this point, cannot compile n>1 generations"
           example[stage] = dotdict(completions=completions)

       return example, completions

Unable to call openai for fine-tuning in the DSP compiler

I am trying the new DSP compiler to get a less expensive model to work on my own questions. Here is the code from the notebook that I am using wiht my own set of unlabeled questions-

compiled_QA = dsp.compile(program=multihop_QA, examples=dev_pubmed_unlabeled)

I am getting the following error which I traced back to the compiler.py file and this line

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [56], in <cell line: 1>()
----> 1 compiled_QA = dsp.compile(program=multihop_QA, examples=dev_pubmed_unlabeled)

File ~/llm_search/dsp/dsp/primitives/compiler.py:164, in compile(program, examples, target)
    162 def compile(program, examples, target='ada'):
    163     training_data = simulate(program, examples)
--> 164     compiled_lm = finetune(training_data, target=target)
    166     def compiled_program(*args, **kwargs):
    167         with dsp.settings.context(compiled_lm=compiled_lm, compiling=False):

File ~/llm_search/dsp/dsp/primitives/compiler.py:155, in finetune(training_data, target)
    152     for line in training_data:
    153         f.write(ujson.dumps(line) + '\n')
--> 155 jobname, ft = openai_finetune(name, target)
    156 print(ft)
    158 ft = dsp.GPT3(model=ft, stop=" </s>")

File ~/llm_search/dsp/dsp/primitives/compiler.py:130, in openai_finetune(name, target)
    127 except:
    128     pass
--> 130 jobname, ft = openai_finetune_(name, target)
    132 with open(training_data_path, 'w') as f:
    133     f.write(ujson.dumps((jobname, ft)) + '\n')

File ~/llm_search/dsp/dsp/primitives/compiler.py:87, in openai_finetune_(name, target)
     84 print(command)
     86 # command = """python script.py"""
---> 87 process = subprocess.Popen(command.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     89 while line := process.stdout.readline().decode().strip():
     90     if 'created fine-tune:' in line.lower():

File ~/miniconda3/envs/lang_model/lib/python3.8/subprocess.py:858, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    854         if self.text_mode:
    855             self.stderr = io.TextIOWrapper(self.stderr,
    856                     encoding=encoding, errors=errors)
--> 858     self._execute_child(args, executable, preexec_fn, close_fds,
    859                         pass_fds, cwd, env,
    860                         startupinfo, creationflags, shell,
    861                         p2cread, p2cwrite,
    862                         c2pread, c2pwrite,
    863                         errread, errwrite,
    864                         restore_signals, start_new_session)
    865 except:
    866     # Cleanup if the child failed starting.
    867     for f in filter(None, (self.stdin, self.stdout, self.stderr)):

File ~/miniconda3/envs/lang_model/lib/python3.8/subprocess.py:1704, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1702     if errno_num != 0:
   1703         err_msg = os.strerror(errno_num)
-> 1704     raise child_exception_type(errno_num, err_msg, err_filename)
   1705 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'openai'

Is it because the openai api has changed or I am missing any packages? Openai package is of course already installed. Thanks.

Can we use Azure Coginitive Search to replace Colbert based RM

In my use case, I have data indexed on Azure cognitive search.

How can I leverage DSP capabilities with such data?

In the DSP intro notebook, I notice the following Wrapper for the ColBERTv2 Retrieval.

rm = dsp.ColBERTv2(url=colbert_server)

How can I have a new wrapper which is compatible with Azure cognitive search? Is it a functionality you plan to include in the future? Thank you.

Is it legal to use OpenAI output to compile a smaller model?

The compilation step is very neat and I love the DSP paradym and was planning to implement it internally. But we not sure about a legal consideration. According to openai's terms of service

(iii) use output from the Services to develop models that compete with OpenAI;

is not allowed.

Does this mean if I compile a model based on the outputs from GPT-4 to fine-tune a smaller opensource model that is used internally or used to power a product that is in no way competing with OpenAI fine? or is that Illegal as well?

the intro.ipynb colab fails

hi, it seems that the colab fails to install the dsp-ml.

`
try: # When on google Colab, let's clone the notebook so we download the cache.
import google.colab
!git -C dsp/ pull || git clone https://github.com/stanfordnlp/dsp
except: pass

!pip install -U dsp-ml

Already up to date.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dsp-ml
Using cached dsp-ml-0.1.4.tar.gz (18 kB)
Preparing metadata (setup.py) ... done
Collecting backoff
Using cached backoff-2.2.1-py3-none-any.whl (15 kB)
Requirement already satisfied: joblib in /usr/local/lib/python3.9/dist-packages (from dsp-ml) (1.2.0)
Collecting jupyter
Using cached jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting openai
Using cached openai-0.27.1.tar.gz (57 kB)
Installing build dependencies ... done
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

`

Feature: add custom HF-models as LLMs

Hi team,
Do you have any plans for physical models, not only APIs? Can we serve LLM like Flan-T5-xxl and query it for free instead of the OpenAIs endpoint (as the only available option)?

According to this blog post from Yahoo/Vespa search team, even the 3B model can deliver a performance boost while generating with a proper prompt. At the same time, it's available for individuals like me & community.

IMO this change also requires code/structure refactoring, and it's not a one-evening task.

`dsp.Example` class not compatible with multiprocessing

I am trying to use DSP in a ipynb I wrote that sends hundreds of prompts to the openai api. Because of the volume, I previously used Pool from the multiprocessing library to parallelize my requests. With DSP, however, I am not able to do this because the prompts, which are represented by the dsp.Example class, are not pickleable (since the __getstate__ and __setstate__ methods are undefined), and thus not compatible with Pool.

Without multiprocessing, making these requests to the openai api take 10-15 minutes instead of seconds.

I've created this gist with code from the dsp intro.ipynb to illustrate my use case and re-produce the error:
https://colab.research.google.com/gist/danielmachlab/fc79ce5d7e8eb7c505ea53ae56066253/knn_example.ipynb#scrollTo=vjtdEHWa19hD

The solution to this issue should be to define the __getstate__ and the __setstate__ methods.

Can DSP Be Trained On JSON

Thank you for open sourcing your DSP research. Qq - can the DSP framework be used on JSON data to input for example OpenAPI 3.0.0 to another format? or it only works with text?

Again appreciate your guys effort and thank you for show casing your work.

How to replicate the results in the paper

Hi, thanks for the amazing work! Could you please share the data you used in the paper and provide more details about how to replicate the results in the paper if possible?

DSP is a rather ambiguous name

I don't care too much, but thought I would point out - DSP has a well established meaning in a field not too far removed from NLP

Bypassing Context length / MaxToken length of LLMs using DSPy in context self instruct

I have tried using Llama V2 to generate synthetic data for self instruct. Unfortunately my Prompts are long and the prompt / response combination from the Llama 13b chat model constantly exceeds the 4096 token limitation.

Is there any way to bypass this limitation using DSPy with the Llama v2 model? Should I be using the chat model or any other model to be able to use in context self instruct with DSPy and Llama v2?

Are there any examples in DSPy documentation which I can refer to?

What is the best way to select train questions?

Suppose I have a question, which require the answer to be generated by examining multiple sections of a document.

What kind of training questions should I include?

Does it need to be a mix of direct answer questions and multi section examine ones?

Any best practices to prepare the training Q/A pairs, especially when using with Program5 on intro notebook. Thanks in advance.

how to use hf client

Hi, I hope this is the correct place to ask but I can't figure out how to run dsp with hf-client.
So far I have managed to get the hf-server running.
But I can't figure out how to use the hf-client.

I start the server with:
python -m dsp.modules.hf_server --port 4242 --model "google/flan-t5-base"

With curl everything is fine:
Client:
curl -d '{"prompt":"What is the answer to life, the universe, and everything?"}' \ -X POST "<my-server-ip>" \ -H 'Content-Type: application/json'

{"prompt":"What is the answer to life, the universe, and everything?","choices":[{"text":"atoms"}],"latency":22557.142734527588}

Server:

#> Response: "{'prompt': 'What is the answer to life, the universe, and everything?', 'choices': [{'text': 'atoms'}]}"
INFO:     <ip>:0 - "POST / HTTP/1.1" 200 OK

Afterward I tried to use the hf_client similar to the intro notebook.

Client:

>import dsp

>colbert_server = 'http://ec2-44-228-128-229.us-west-2.compute.amazonaws.com:8893/api/search'
>lm = dsp.HFModelClient(port="4242", model="google/flan-t5-base", url="<my-server-ip>")
>rm = dsp.ColBERTv2(url=colbert_server)

>dsp.settings.configure(lm=lm, rm=rm)
> lm._generate(prompt="What is 5+7?")
{'detail': 'Not Found'}

Server:
INFO: 141.56.132.131:0 - "POST /%3A HTTP/1.1" 404 Not Found

curious about the prompt layout for language models

hi okhat, thanks for your open sourcing! i have two questions if you would like to help:

  1. when i playing on the demo, i noticed that the actual prompt string(fed into the launguage model) containing symbols like ${} and ---.
    i am new to this area, is that some special usage of the model?
    the layout of references in Context field also seems unfamiliar.
    as an example:
Write a search query that will help answer a complex question.

---

Follow the following format.

Context:
${sources that may contain relevant content}

Question: ${the question to be answered}

Rationale: Let's think step by step. Based on the context, we have learned the following. ${information from the context that provides useful clues}

Search Query: ${a simple question for seeking the missing information}

---

Context:
[1] «Right Back at It Again | at the Kerrang! Awards. Personnel per digital booklet. Right Back at It Again "Right Back at It Again" is the second track and the first single from A Day to Remember's fifth album, "Common Courtesy" (2013). In October 20, 2015, the song was featured in Activision rhythm-music game, "". Vocalist, Jeremy McKinnon wrote the lyrics, while the music was written by McKinnon, former guitarist Tom Denney, guitarist Neil Westfall and producer Andrew Wade. "Right Back at It Again" almost wasn't included on the album as it was one of the excess songs the band had recorded, "we realised that it»
[2] «Right Back at It Again | Right Back at It Again "Right Back at It Again" is the second track and the first single from A Day to Remember's fifth album, "Common Courtesy" (2013). In October 20, 2015, the song was featured in Activision rhythm-music game, "". Vocalist, Jeremy McKinnon wrote the lyrics, while the music was written by McKinnon, former guitarist Tom Denney, guitarist Neil Westfall and producer Andrew Wade. "Right Back at It Again" almost wasn't included on the album as it was one of the excess songs the band had recorded, "we realised that it sounded great, so on it went." "Right Back»

Question: Right Back At It Again contains lyrics co-written by the singer born in what city?
  1. which is where does the qa pairs for in-context learning come from?
    you mentioned that along with the instructions, few qa pairs would be helpful(in variable train: list) to define the task, but i am curious about how to select qa pairs to build such a list.
    in the demo, it was just given.
train = [('Who produced the album that included a re-recording of "Lithium"?', ['Butch Vig']),
         ('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', ['Kevin Greutert']),
         ('The heir to the Du Pont family fortune sponsored what wrestling team?', ['Foxcatcher', 'Team Foxcatcher', 'Foxcatcher Team']),
         ('In what year was the star of To Hell and Back born?', ['1925']),
         ('Which award did the first book of Gary Zukav receive?', ['U.S. National Book Award', 'National Book Award']),
         ('What city was the victim of Joseph Druces working in?', ['Boston, Massachusetts', 'Boston']),]

train = [dsp.Example(question=question, answer=answer) for question, answer in train]

thanks a lot for your kind help!

how to setup colbertv2 model on my own data?

Hi. In the notebooks, we can use a pre-set server for colbert model that works on wikipedia data. But I want to know how to use the same for my own set of documents. Can anybody please help?

Tool use?

It seems that DSP would be applicable to tool use.

Demonstrate:

  1. Sucessful instances of tool use

Search:

  1. Use tool

Predict:

  1. Utilizing output, generate grounded output

How to remove multiple use of demonstrate stage to save LLM calls

I am using the program 5 on intro notebook for an experiment.

I have multiple 100 of questions to get answers. I am using 5 train questions to demonstrate.

I notice that for each question, the demonstrate stage is annotating answers. Can this repeated task be avoided to save LLM calls in demonstrate stage?

Can we use the demonstrate once, and use the demonstration in other questions as well?

Thanks.

Question 8: Server down

Hi I think the server for ColBERTv2 is down because when I try to run the programs with the RM using my own data set, the server doesn't connect, so I wanted to ask whether the server will be permanently down?

program with knn and annotate requires too many tokens

hi, I would like to ask how do you keep to the token size for program 5 in intro.ipynb because when i try to run it with my own questions it returns
InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 456171 tokens. Please reduce the length of the messages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.