When mt_bench gets an error in an openai call, the current logic is to retry a few tim

it's a known issue to us as well. for now, for MT-Bench, we do

Related: <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Resolved with <a class="issue-link js-issue-link" data-error-text="Failed to load titl

API error handling ignores errors in results about eval HOT 5 CLOSED

danmcp commented on August 23, 2024

API error handling ignores errors in results

from eval.

Comments (5)

xukai92 commented on August 23, 2024

it's a known issue to us as well.

for now, for MT-Bench, we do the same thing (filtering out the -1 score) when computing the average. it's okay because, if it's a random API error, the average score is still unbiased (but more noise)
for this reason, we did track the error rates so that we know how noisy the average score is
perhaps a better way is to at least try a few times (if there is stochasticity in either the answer generation or teacher judgment) until we get something that is not -1
as for MT-Bench-Branch, as the users would like to know the actual per qna.yaml score, it's probably more important to get at least 1 score that is not -1.

as it's on our radar anyway, i can actually take a stab to see why there is -1 and how to make sure we can get at least one meaningful score per Q&A

from eval.

danmcp commented on August 23, 2024

i can actually take a stab to see why there is -1 and how to make sure we can get at least one meaningful score per Q&A

The most common errors I am seeing are hitting the 4096 context length limit. That gives an easy way to recreate the issue.

from eval.

danmcp commented on August 23, 2024

This is the error vllm gives if you set the context larger than the model supports:

ValueError: User-specified max_model_len (5000) is greater than the derived max_model_len (max_position_embeddings=4096 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. Make sure the value is correct and within the model context size.

Setting max_position_embeddings in the model config does get past the error but obviously results are no longer reliable. This is probably acceptable for testing but it's not a great experience for the user just trying things out.

from eval.

danmcp commented on August 23, 2024

Related: instructlab/instructlab#1615

from eval.

danmcp commented on August 23, 2024

Resolved with #49 and instructlab/instructlab#1597

from eval.

Recommend Projects

API error handling ignores errors in results about eval HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent