DSPy has a small number (maybe 5-6) of extremely powerful concepts that have grown org

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="21

Recently, Ollama released an <a href="https://ollama.com/blog/openai-compatibility" re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Just merged a massive PR on this from the amazing <a class="user-mention notranslate"

[WIP] Major refactor roadmap ,about stanfordnlp/dspy

Comments (57)

CyrusOfEden commented on August 15, 2024 9

I'm for using as much Pydantic as we can here

from dspy.

Neoxelox commented on August 15, 2024 6

It would be great to have the same kind of LM abstraction for RMs.

I would create an RM class, like the existent LM class, that all the different third-party retriever models inherit from, instead of inheriting from the Retrieve module. This would allow to create different advanced retrieve techniques as modules, that would inherit from the Retrieve module which would use any RM transparently (which already does but it is confusing because the RM is another Retrieve module).

Something like ChainOfThought inheriting from Predict which uses an LM underneath.

from dspy.

okhat commented on August 15, 2024 6

Yes, RMs too but RMs can honestly just be function calls so it's easier for people to deal with it now.

Actually CoT and Predict should be dspy.Modules. CoT shouldn't inherit from Predict, that's a bad old decision that we'll change.

from dspy.

CyrusOfEden commented on August 15, 2024 4

#368 should get merged in (or at least, its tests) before we embark on any major refactor because we ought to have tests to ensure we don't have any unintended regressions

from dspy.

S1M0N38 commented on August 15, 2024 4

Recently, Ollama released an OpenAI-compatible API. Other companies like Mistral AI also offer APIs that follow OpenAI specifications. Additionally, there are projects like LiteLLM that provide API-compatibility layers (e.g., using proxies).

So I think that LM abstraction could potentially just be a single thin wrapper around the OpenAI API specification. Is this a viable option?

from dspy.

bencrouse commented on August 15, 2024 4

I'm new to this library, I'd love to see more support for production deployment, my prioritized wish list would be:

Deployment tools/docs (clarity for: how does this fit into your CI/CD?)
Async
Type annotations
Doc strings
Streaming

from dspy.

CyrusOfEden commented on August 15, 2024 4

@bencrouse those are definitely on the roadmap — I think the focus right now is reliability / API / typed outputs / everything Omar originally mentioned and then afterwards we want to do some thinking through what it means to productionize this (async/streaming/deployment/etc.)

from dspy.

okhat commented on August 15, 2024 4

Yeah I’m in favor of minimal abstractions around tools. I think a tool being just a function whose docstring and arguments and name can be used by ReAct can achieve this

from dspy.

okhat commented on August 15, 2024 4

Just merged a massive PR on this from the amazing @thomasahle : #451

from dspy.

AndreasMadsen commented on August 15, 2024 4

Just my personal opinion. But I would much prefer seeing this module adopt an async-await programming model. All of the LLM calls are I/O bound, so the current thread model doesn't make much sense and is harder to debug. It's also much easer to go async -> sync (using asyncio.run()) than the other way around. This would also make it much simpler to throttle the number of parallel calls or load balance, since there is no need to share a counter/completion state between threads. Such refactors are often hard to do when a project has matured, so I hope you will consider it.

from dspy.

peteryongzhong commented on August 15, 2024 3

The coupling between DSPy and "tools", in the ReAct sense, should be as light as possible in my opinion. There could be some annotations that could be made that makes optimisation easier, however; for instance, to enable backtracking with it.

More generally, I feel like some of the coupling with other technologies like RMs may be a little too strong at the moment from a software engineering perspective, which I understand the reason behind as it unlocks a lot of cool things like retrieveEnsemble, but it does feel a little specific at the moment.

My vague intuition is that it would be great if DSPy could have a more generic boundaries and well defined boundaries when it comes to the backend components(hopefully backed by classes and types and less so magic strings/attributes). This comment might betrays more of a personal disperference against "duck" typing, but regardless perhaps a more defined internal operating mechanics/schema could make future development of passes and features a lot less burdensome.

from dspy.

okhat commented on August 15, 2024 3

Async is definitely in popular demand and would be nice to have, but I don’t understand the claim about threading. Threads work great right now for throughput.

from dspy.

okhat commented on August 15, 2024 2

I disagree with that @krypticmouse . Predictions are already Examples anyway.

from dspy.

fearnworks commented on August 15, 2024 2

#392

from dspy.

ishaan-jaff commented on August 15, 2024 2

Hi I'm the litellm maintainer - what's missing in litellm to start using it in this repo ? @CyrusOfEden @S1M0N38

Happy to help with any issues / feature requests - even minor ones

from dspy.

KCaverly commented on August 15, 2024 2

Question thats come up in a few different places.

With the LiteLLM integration, would we sunset all provider specific classes (OpenAI, Cohere etc), and direct everyone to use the LiteLLM interface?

from dspy.

krypticmouse commented on August 15, 2024 1

Indeed, Prediction are basically Example they do have the from_completion method that Examples don't. That doesn't make much difference yes, but yea I thought this could've become a class method of a Pydantic model.

Not a major issue tbh though, just a thought :)

Mostly just for better organization and readability.

from dspy.

CShorten commented on August 15, 2024 1

@S1M0N38 you would still need the thin wrapper though to pass in optional arguments with kwargs.

from dspy.

S1M0N38 commented on August 15, 2024 1

@S1M0N38 you would still need the thin wrapper though to pass in optional arguments with kwargs.

@CShorten what are optional kwargs that differ from provider to provider that are need by DSPy? (e.g. I think temperature is one of those needed to control the amount of prompt "exploration", isn't it?). Here for example are the input params that LiteLLM support for different providers.

from dspy.

CyrusOfEden commented on August 15, 2024 1

@S1M0N38 yup, LiteLLM would be good for inference — and currently LMs don't have a .finetune method but we want that later.

from dspy.

CyrusOfEden commented on August 15, 2024 1

@ishaan-jaff how does tool use work with models that don't necessarily support it?

Would be really cool if I could use LiteLLM for tool use for whatever model -- is there a table for tool use support somewhere?

Separately, is LiteLLM able to integrate something like Outlines to support tool use for models that don't natively support it?

from dspy.

CShorten commented on August 15, 2024 1

Interesting question, I suspect digging into how dspy.ReAct implements the dspy.Retrieve tool could be a good start to understanding how to interface all tools @CyrusOfEden.

Maybe this is the argument for why these tools should be integrated deeper into DSPy than externally used as calls to the forward pass (or there is some kind of API contract with the ReAct for passing in arbitrary functions as well).

from dspy.

ovshake commented on August 15, 2024 1

@CShorten, @okhat Currently, in WeaviateRM, the Weaviate client is still from Weaviate v3. Are there any plans in the current roadmap to update it to Weaviate Python Client (v4)? If not, should we add it?

from dspy.

ovshake commented on August 15, 2024 1

Thanks @CShorten , I will definitely take a stab at it and open a PR :)

from dspy.

isaacbmiller commented on August 15, 2024 1

@CyrusOfEden @okhat I started looking into a liteLLM integration as a starting place for a nicer universal integration. We might be blocked from getting an openAI version running until we upgrade to > OpenAI v1.0.0(#403 is merged). I'm still going to try to see if I can get an Ollama version to work.

from dspy.

okhat commented on August 15, 2024 1

Thanks @dharrawal, I wrote some thoughts on Discord:

I might be missing some of the point you're trying to make, but in the general case, it's not possible for people to specify a metric on each module
The goal of a good optimizer (like in RL problems also) is to figure out good intermediate supervision
There are many ways to achieve that, some simple and others more complex, but the consistent thing is that optimizers will prefer intermediate steps that maximize the eventual metric at the end
If you want to optimize each layer separately with a metric, you can just compile each module alone, but that's rarely the needed usecase

from dspy.

CShorten commented on August 15, 2024 1

^ Big topic I think -- I would love to at least be able to control the max_bootstrapped_examples per module in optimization. For example, some tasks like reranking or summarizing 20 documents require more input tokens for few-shot examples than say query generation or yes/no / low cardinality output routers, like vector search or text-to-sql as a random example.

I think it's also related to the concept of taking a single component out, optimizing it separately on the incoming inputs and outputs and plugging it back into the broader network with the compiled signature. But then I'm not sure how the new round of compilation impacts those examples -- although I suppose they are stored as predictor.demos() so the compiler probably does have some kind of interface in place.

Worked around this by setting ._compiled = True for those interested.

from dspy.

ishaan-jaff commented on August 15, 2024 1

sent a discord request @CyrusOfEden

from dspy.

CyrusOfEden commented on August 15, 2024 1

@okhat it's more for deployment —

when you're using threading to compile / run once it doesn't make all that big a difference

in production async uses way less memory

from dspy.

CyrusOfEden commented on August 15, 2024 1

@AndreasMadsen I agree that async is the way to go to make DSPy more useful in production settings, and more elegant. In the meantime, might I recommend this way forward? I understand it's not 100% what you're looking for, but maybe in the meantime it unblocks you.

Here's an example of what you could do:

from asgiref.sync import sync_to_async

program_a = ...
program_b = ...

async_program_a = sync_to_async(thread_sensitive=False)(program_a)
async_program_b = sync_to_async(thread_sensitive=False)(program_b)

with dspy.context(lm=x):
    a, b = await asyncio.gather(
        async_program_a(*args, **kwargs),
        async_program_b(*args, **kwargs)
    )

I like (and share) your idea of passing the LM directly to the program, because right now the code above (is probably) not compatible with different LMs per-concurrent-program. I'll noodle on that further as part of the backend refactor :-)

from dspy.

krypticmouse commented on August 15, 2024

Sounds perfect! I was wondering if we can shift Example, Prediction and Completions classes to Pydantic.

Tensors are the only dtype in pytorch and so is the case for Examples an dtype but internally all that the other two do could be wrapped in a class method.

This would be a wayy big of a migration and possibly not even backwards compatible. So might wanna think on this.

from dspy.

denver-smartspace commented on August 15, 2024

Reliably keeping within token limits when compiling and running programs, without having to set the module config for a specific LM, is big for me to be able to deploy this to production. IMO, ideally the config could stay pretty much the same if you move from a 32k context to a 8k context. You'd just recompile and it'd automatically use less or shorter demos and whatever else it needed to

My initial thoughts are that this have two main elements:

Add something like an estimate_tokens method to LM. It'd take the same arguments as an LM call but would just return the tokens that would be used if you actually called it. Same idea as a 'what if' in infrastructure deployments, takes the same parameters but doesn't run anything just tells you what'd it'd to if you actually ran it.
Make use of the new estimate_tokens method when compiling to stay within token limits

The distinction between the 2 elements is because it's not just for compiling that it'd be useful. When we create a module or program it'd be good to be able to estimate tokens so you can do things like limit the amount of context given by retrieval

from dspy.

peteryongzhong commented on August 15, 2024

I want to echo your point @okhat about the instructions/prompts in modules. I think right now they are a little spread out in code as strings in various places that is sometimes appended together. If that could be elevated in terms of abstractions and/or made clearer, it might even make it easier to analyse a module and potentially perform some interesting transformation on itself later down time line. I don't quite think we need to go as far as the prompting first abstractions that langchain offers but prompting is not something we can completely divorce this from, but doing so in a more organised fashion that allows for future analysis could be useful?

from dspy.

thomasahle commented on August 15, 2024

Integrating 4 (optimizers) in the thinking early on might be necessary, since they are what put the biggest strain on the API. We need to think about what features they require to be available, such as

Changing the signatures
Storing traces/examples
Reconfiguring the lms
and so on, and find the simplest Predict classes etc. that satisfy those.

Assertions is another example of an "involved" feature, that needs a lot of support, but hopefully not a lot of special casing. Right now there's the whole new_signature keyword argument that gets used sometimes, and seems to be introduced specifically for Retry to use.

from dspy.

CShorten commented on August 15, 2024

Hey team, some notes so far:

Backend refactor sounds great!
Indeed, this is an interesting one.
Can’t comment on how this is currently configured.
Awesome, certain the team you’ve put together will come up with something interesting for this! Already super love the BayesianSignatureOptimizer.
Ah fantastic, sorry for the delay here — will touch up on the WeaviateRM.

I like the idea of extending RMs to be more than function calls, but I do think that interfacing, for example, the Weaviate python client with the module’s forward pass probably will work fine for awhile.

Keeping within token limits sounds amazing. The LMs have an internal max_tokens state that you could probably just multiply by the upper bound of number of calls in your module’s forward passes. Compiling is another story I don’t know enough about DSPy yet to comment on.

still have a couple more responses to read, will update with mentions.

from dspy.

CyrusOfEden commented on August 15, 2024

I'll try to kick off the backend refactor Saturday, if not, @isaacbmiller is down to have the first part ready by Tuesday–Wednesday of next week

from dspy.

CyrusOfEden commented on August 15, 2024

@S1M0N38 @CShorten just came across LiteLLM today — and it seems like a home run for inference (not fine-tuning). Am I missing anything?

from dspy.

S1M0N38 commented on August 15, 2024

... just came across LiteLLM today — and it seems like a home run for inference (not fine-tuning). Am I missing anything?

@CyrusOfEden I believe you're correct, but upon examining the code in dsp/modules/[gpt3|cohere|ollama].py, it appears that the only requests being made are HTTP requests to the inference endpoint, namely /completion, chat/completion, /api/generate, api/chat, etc. These are all inference requests for text. Could you elaborate on the fine-tuning you mentioned?

I'm not entirely familiar with the inner workings and requirements of this framework, so everything I've mentioned may not be feasible. Therefore, please take my statements with a grain of salt. In my opinion, for a project like this, it's best to focus on core concepts rather than implementing numerous features. The idea is to defer those to other libraries or leave them to the user to implement, guided by high-quality documentation.

from dspy.

CShorten commented on August 15, 2024

@S1M0N38 I think the way multiple generations are sampled -- for example Cohere has num_generations but the google.generativeai API has no such option. Probably little nuances like this, but the chart you shared is great, I trust your judgment on this.

from dspy.

buzypi commented on August 15, 2024

+1 For supporting all OpenAI compatible local LLM servers for inferencing and not just Ollama. I think this will increase adoption because a lot of "application developers" of LMs who are not ML experts use tools like LM Studio, GPT4All, etc.

from dspy.

CyrusOfEden commented on August 15, 2024

Hi I'm the litellm maintainer - what's missing in litellm to start using it in this repo ? @CyrusOfEden @S1M0N38

Happy to help with an issues / feature requests - even minor ones

Tbh I think you're good for now, great to have your support 🔥

from dspy.

CyrusOfEden commented on August 15, 2024

Yeah I’m in favor of minimal abstractions around tools. I think a tool being just a function whose docstring and arguments and name can be used by ReAct can achieve this

LiteLLM on it — really liking what I'm finding in this repo — they have a util for converting from a Python function to a JSON schema tool def [0]

[0] https://litellm.vercel.app/docs/completion/function_call#litellmfunction_to_dict---convert-functions-to-dictionary-for-openai-function-calling

from dspy.

ishaan-jaff commented on August 15, 2024

@ishaan-jaff how does tool use work with models that don't necessarily support it?

We switch on JSON mode if the provider supports it and we add thefunction / tools to the prompt and then get a response. You would have to explicitly enable this litellm.add_function_to_prompt = True

Does this answer your question ?

from dspy.

CShorten commented on August 15, 2024

Hey @ovshake! Yes planning on updating that soon, but please feel free if interested would really really appreciate it!

On a side note related to the WeaviateRM, @okhat. I am preparing a quick set of demos on the discussion above about how to use Weaviate as an external tool and how to then interface the functions with docstrings for ReAct. Want to get this nugget in there for the docs.

from dspy.

younes-io commented on August 15, 2024

Hello,
Could we also add the feature that allows loading a compiled/optimized module with its assertions and inferences to the roadmap?
If that's already supported, please share a notebook. I can propose a notebook if I get the code snippet for this.
Thank you!

from dspy.

CyrusOfEden commented on August 15, 2024

@ishaan-jaff can you add me on Discord? cyrusofeden

Wanna chat about this DSPy integration

from dspy.

collinjung commented on August 15, 2024

Hi, Omar asked me to tag you @CyrusOfEden . Here is a notebook where I use LangChain's gpt model and am able to use it with native DSPy predict function: [Google Colab]. Please let me know if you have questions

from dspy.

dharrawal commented on August 15, 2024

If the trainset and metric function were tied to the module (via the predict function for each module), the teleprompters could optimize each and every layer (module) in a multi-module app. Off the top, this does not seem like a difficult thing to implement.

I realize assertions and suggestions are a step in this direction, but I don't think they are a replacement for optimizing each layer individually.

from dspy.

mgbvox commented on August 15, 2024

@okhat @CyrusOfEden New to dspy, looking to contribute, but the bounty board is empty and all the discussing seems to be happening here; thought I'd tag you directly. How might I actually contribute here?

from dspy.

peteryongzhong commented on August 15, 2024

@mgbvox agreed with the static typing aspects!

from dspy.

chris-boson commented on August 15, 2024

Even better would be just using enums and other nested pydantic objects. This works but it should ideally just implicitly call the typed versions when annotations are available, and use function calling to generate structured outputs.

class EmotionType(Enum):
    sadness = "sadness"
    joy = "joy"
    love = "love"
    anger = "anger"
    fear = "fear"
    surprise = "surprise"


class Emotion(dspy.Signature):
    sentence = dspy.InputField()
    sentiment: EmotionType = dspy.OutputField()


sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me"

classify = TypedPredictor(Emotion)
classify(sentence=sentence)

Prediction(
    sentiment=<EmotionType.fear: 'fear'>
)

from dspy.

thomasahle commented on August 15, 2024

@chris-boson You can already do this now, right?
I think Cyrus backend work will allow types to go through function calling, or schema APIs or whichever way the underlying model prefers.

from dspy.

chris-boson commented on August 15, 2024

@chris-boson You can already do this now, right? I think Cyrus backend work will allow types to go through function calling, or schema APIs or whichever way the underlying model prefers.

@thomasahle Yes, it works, just a bit cumbersome to explicitly call the typed versions. Good to know we're working on integrating function calling!

Generally having type annotations / pydantic flow through the entire stack would make it significantly more useful when interacting with APIs or other traditional software systems where structured output is important. Also types constrain the problem space and can be checked with something like mypy to uncover many issues early. I think that would mesh very well with the idea of "programming" LLMs. I think the way instructor is going looks very promising.

from dspy.

AndreasMadsen commented on August 15, 2024

@okhat I think you are assuming a single LLM server, in which case each thread makes one call to the server at a time, and then you can easily synchronize the throttle. However, if you are doing requests to different LLM servers, then that won't work. That becomes relevant for both load balancing and when different models are used in the same pipeline.

Consider this example. In the run_sync case summary and reasoning are computed one after the other, when they could be computed simultaneously. Running them simultaneously is hard to do with the thread model, but easy with the async-await model (I removed the with statements for simplicity, but you could keep them while still using async-await). You also don't need to worry about thread-safety, etc. Since there is just a single thread.

Of course, summary + reasoning and sentiment are still separated. So to fully saturate the inference server the number of parallel tasks (run_async) would need to be greater than the throttle threshold, and the throttle needs to be implemented in the Client, not the Evaluator, but that would also be the case in the thread model. Additionally, this would enable to have different throttles for different clients.

summary = dspy.Predict('sentence -> summary')
reasoning = dspy.Predict('sentence -> reasoning')
sentiment = dspy.Predict('summary, reasoning -> sentiment')

def run_sync():
  with dspy.context(lm=bart):
    s = summary(sentence=sentence)
  with dspy.context(lm=llama):
    r = reasoning(sentence=sentence)
  with dspy.context(lm=t5):
    return sentiment(summary=s, reasoning=r)

async def run_async(): 
  s, r = await asyncio.gather(
    summary(sentence=sentence, lm=bart),
    reasoning(sentence=sentence, lm=llama)
  )
  with dspy.context(lm=t5):
    return await sentiment(summary=s, reasoning=r)

As a side note. Threads are also hard to debug, because you can get intermingled print statements. That is never the case with the async-await.

from dspy.

drawal1 commented on August 15, 2024

Modules is an area that needs refactoring, for sure. The problem is that dspy modules conflate the hosting provider (Bedrock, Sagemaker, Azure, ...) with the model (Mixtral8x7B, Haiku, ...).

A related issue is that DSPy prompts are currently model-agnostic, but the best results do require a model-aware prompt. People have pointed this out on Discord or come up with various hacks to simulate system prompts etc.

Is this work in-scope? I see 'LiteLLM' posts here that seem related but not quite

Simplest solution here would be for modules to be per-model and require a "hosting-provider" object in the constructor OR modules to be per-hosting-provider and require a "model" object in the constructor

from dspy.

KCaverly commented on August 15, 2024

To catch this thread up, weve got a new backend infrastructure up and basically ready to review/merge. This should offer the following:

LiteLLM integration, 100+ models out of the box.
Global caching for lm calls.
Chat mode and dynamic prompting.
JSON mode for applicable models.
A modularized, type safe and tested API to build from.

from dspy.

drawal1 commented on August 15, 2024

@KCaverly - Arize Phoenix depends on all LM classes being under the 'dsp' module. Something to consider testing before merge

from dspy.

[WIP] Major refactor roadmap about dspy HOT 57 OPEN

Comments (57)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent