❓ Questions and Help Before asking: Sear

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

[QUESTION]Interpretation and comparison of COMET scores across several languages about comet HOT 5 CLOSED

clairehua1 commented on September 28, 2024

[QUESTION]Interpretation and comparison of COMET scores across several languages

from comet.

Comments (5)

ricardorei commented on September 28, 2024

Hi @clairehua1,

You should avoid comparing scores between languages and even between domains. This is not just for COMET but for any MT Metric.

For example BLEU, even tho is lexical, highly depends on the underlying tokenizer thus the results vary a lot between different languages.

PS: even human annotation has a lot of variability between languages and domains. If we want reliable and comparable results we need to make sure the test conditions are the same (same data, same annotators)

Cheers,
Ricardo

from comet.

clairehua1 commented on September 28, 2024

Thanks for the answer Ricardo! Is there a way to interpret the COMET score other than using it as a ranking system?

from comet.

ricardorei commented on September 28, 2024

@clairehua1 for a specific setting (language pair and domain) you could plot the distribution of scores and analyse it by looking at quantiles. The scores usually follow a normal distribution.

To give a bit more context most models are trained to predict a z-normalized direct assessment (a z-score). Z-scores have a mean at 0 and follow a normal distribution which means that ideally a score of 0 should represent an average translation.

In practise the distribution of scores (for the default models wmt20-comet-da) is slightly skewed towards positive scores which means that an average translation is usually assigned a score of 0.5. I have an explanation here

from comet.

ricardorei commented on September 28, 2024

In the plots above you can see how different is the scores between English-German and English-Hausa. But you can see that the "peak" for German is a bit higher than Hausa.

Nonetheless this is expected due to the fact that German translations tend to have better quality than Hausa ones.

from comet.

ricardorei commented on September 28, 2024

from comet.

Recommend Projects

[QUESTION]Interpretation and comparison of COMET scores across several languages about comet HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent