Coder Social home page Coder Social logo

Comments (3)

okhat avatar okhat commented on July 3, 2024 1

Thanks for summarizing these findings!

I just opened the intro notebook in Colab and selected "Run all" and it works just fine without an API key. Notice you'll need an API key if you make changes to the models or examples. The original content doesn't need one.

GPT-3.5 includes text-davinci-002, -003, and the turbo model. See, e.g., https://platform.openai.com/docs/models/gpt-3-5

About the speed of runs, it seems that -003 (and possibly other models, I haven't check) is experiencing some load issues or throttling. I'm seeing much higher response times than usual (maybe 7x or more slower than usual), at least when generating many completions. You might want to try later, or reduce the n=20 setting that's doing self-consistency.

As for your evaluations, it's not unreasonable to look at these scores for some weak signal but keep in mind these are 13 examples and a % score here means very little. What they're useful for is the instructiveness of inspecting individual examples and seeing the patterns of mistakes in each program and LLM pair. We note in the notebook: "This tiny set is not meant to be a reliable benchmark, but it'll be instructive to use it for illustration." So don't read much into these scores.

If you're looking for a more reliable evaluation, check out the paper. It has tests at a much larger scale.

Anyway, I ran them for you with -003. Program 3 gets 46.2%, Program 4 gets 53.8%, and Program 5 gets a 69.2%. Seems reasonable to me: all scores are within 1-2 questions from the davinci-002 runs, as you'd expect from an LLM upgrade that's reported to lower academic benchmark scores.

As for Program 2 going down by one example from Program 1 with -003, that's not necessarily surprising. Indeed, they perform very close to one another in the original -002 run anyway. Program 2 isn't something we advise you to use. Indeed, this is the retrieve-then-read paradigm that we're advocating the need to go beyond in DSP. The goal of the notebook isn't to show some single best system; it's show the scope of thing you can implement and DSP and how it's done, and why it makes sense and what the tradeoffs are. That's why we don't have a UI demo of some cool program; we take you on the journey of building five different programs of varying complexity.

Think of DSP like PyTorch. We're not telling you Program 2 is a better program than program 1 (although, at large scale, it does work better on many tasks!). We're saying there's a natural program that can be built here with our primitives, and here's how it can be done. That's like a PyTorch tutorial saying here's how to do a CNN and here's how to do a Transformer. We're not endorsing the specific architecture of each program.

Hope this breakdown helps! Check out the compiler notebook too, while you're at it! Expect more large releases around the compiler quite soon.

from dspy.

oaustegard avatar oaustegard commented on July 3, 2024

My apologies: I realize a number of these issues are due to me running the notebook from Github: I started running into similar issues with compiler then thought; this can't be right and ran in the "native" environment instead and can confirm your results -- I had hoped to correct my findings before you noticed the errant issue-filing...

Again, I do wish every academic paper came with this type of iterative demonstration: now that I see Program 5 run to completion the results are very encouraging, and I'm surprised I haven't seen more discussion of this approach among the explosion of LangChain/LlamaIndex RAC use-cases and how-tos.

Closing this as a case of PEBCAK. Thanks again!

from dspy.

okhat avatar okhat commented on July 3, 2024

No worries at all! Always happy to help as issues arise. We do need to update a lot of the material and release more soon.

from dspy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.