First of all: I wish all papers came with step by step reproductions and iterative imp

intro.ipynb: Trial and Errors about dspy HOT 3 CLOSED

stanfordnlp commented on July 3, 2024

intro.ipynb: Trial and Errors

from dspy.

Comments (3)

okhat commented on July 3, 2024 1

Thanks for summarizing these findings!

I just opened the intro notebook in Colab and selected "Run all" and it works just fine without an API key. Notice you'll need an API key if you make changes to the models or examples. The original content doesn't need one.

GPT-3.5 includes text-davinci-002, -003, and the turbo model. See, e.g., https://platform.openai.com/docs/models/gpt-3-5

About the speed of runs, it seems that -003 (and possibly other models, I haven't check) is experiencing some load issues or throttling. I'm seeing much higher response times than usual (maybe 7x or more slower than usual), at least when generating many completions. You might want to try later, or reduce the n=20 setting that's doing self-consistency.

As for your evaluations, it's not unreasonable to look at these scores for some weak signal but keep in mind these are 13 examples and a % score here means very little. What they're useful for is the instructiveness of inspecting individual examples and seeing the patterns of mistakes in each program and LLM pair. We note in the notebook: "This tiny set is not meant to be a reliable benchmark, but it'll be instructive to use it for illustration." So don't read much into these scores.

If you're looking for a more reliable evaluation, check out the paper. It has tests at a much larger scale.

Anyway, I ran them for you with -003. Program 3 gets 46.2%, Program 4 gets 53.8%, and Program 5 gets a 69.2%. Seems reasonable to me: all scores are within 1-2 questions from the davinci-002 runs, as you'd expect from an LLM upgrade that's reported to lower academic benchmark scores.

As for Program 2 going down by one example from Program 1 with -003, that's not necessarily surprising. Indeed, they perform very close to one another in the original -002 run anyway. Program 2 isn't something we advise you to use. Indeed, this is the retrieve-then-read paradigm that we're advocating the need to go beyond in DSP. The goal of the notebook isn't to show some single best system; it's show the scope of thing you can implement and DSP and how it's done, and why it makes sense and what the tradeoffs are. That's why we don't have a UI demo of some cool program; we take you on the journey of building five different programs of varying complexity.

Think of DSP like PyTorch. We're not telling you Program 2 is a better program than program 1 (although, at large scale, it does work better on many tasks!). We're saying there's a natural program that can be built here with our primitives, and here's how it can be done. That's like a PyTorch tutorial saying here's how to do a CNN and here's how to do a Transformer. We're not endorsing the specific architecture of each program.

Hope this breakdown helps! Check out the compiler notebook too, while you're at it! Expect more large releases around the compiler quite soon.

from dspy.

oaustegard commented on July 3, 2024

My apologies: I realize a number of these issues are due to me running the notebook from Github: I started running into similar issues with compiler then thought; this can't be right and ran in the "native" environment instead and can confirm your results -- I had hoped to correct my findings before you noticed the errant issue-filing...

Again, I do wish every academic paper came with this type of iterative demonstration: now that I see Program 5 run to completion the results are very encouraging, and I'm surprised I haven't seen more discussion of this approach among the explosion of LangChain/LlamaIndex RAC use-cases and how-tos.

Closing this as a case of PEBCAK. Thanks again!

from dspy.

okhat commented on July 3, 2024

No worries at all! Always happy to help as issues arise. We do need to update a lot of the material and release more soon.

from dspy.

intro.ipynb: Trial and Errors about dspy HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent