Coder Social home page Coder Social logo

Comments (5)

xukai92 avatar xukai92 commented on August 23, 2024

it's a known issue to us as well.

  • for now, for MT-Bench, we do the same thing (filtering out the -1 score) when computing the average. it's okay because, if it's a random API error, the average score is still unbiased (but more noise)
  • for this reason, we did track the error rates so that we know how noisy the average score is
  • perhaps a better way is to at least try a few times (if there is stochasticity in either the answer generation or teacher judgment) until we get something that is not -1
  • as for MT-Bench-Branch, as the users would like to know the actual per qna.yaml score, it's probably more important to get at least 1 score that is not -1.

as it's on our radar anyway, i can actually take a stab to see why there is -1 and how to make sure we can get at least one meaningful score per Q&A

from eval.

danmcp avatar danmcp commented on August 23, 2024

i can actually take a stab to see why there is -1 and how to make sure we can get at least one meaningful score per Q&A

The most common errors I am seeing are hitting the 4096 context length limit. That gives an easy way to recreate the issue.

from eval.

danmcp avatar danmcp commented on August 23, 2024

This is the error vllm gives if you set the context larger than the model supports:

ValueError: User-specified max_model_len (5000) is greater than the derived max_model_len (max_position_embeddings=4096 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size.

Setting max_position_embeddings in the model config does get past the error but obviously results are no longer reliable. This is probably acceptable for testing but it's not a great experience for the user just trying things out.

from eval.

danmcp avatar danmcp commented on August 23, 2024

Related: instructlab/instructlab#1615

from eval.

danmcp avatar danmcp commented on August 23, 2024

Resolved with #49 and instructlab/instructlab#1597

from eval.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.