Comments (5)
Hi @clairehua1,
You should avoid comparing scores between languages and even between domains. This is not just for COMET but for any MT Metric.
For example BLEU, even tho is lexical, highly depends on the underlying tokenizer thus the results vary a lot between different languages.
PS: even human annotation has a lot of variability between languages and domains. If we want reliable and comparable results we need to make sure the test conditions are the same (same data, same annotators)
Cheers,
Ricardo
from comet.
Thanks for the answer Ricardo! Is there a way to interpret the COMET score other than using it as a ranking system?
from comet.
@clairehua1 for a specific setting (language pair and domain) you could plot the distribution of scores and analyse it by looking at quantiles. The scores usually follow a normal distribution.
To give a bit more context most models are trained to predict a z-normalized direct assessment (a z-score). Z-scores have a mean at 0 and follow a normal distribution which means that ideally a score of 0 should represent an average translation.
In practise the distribution of scores (for the default models wmt20-comet-da
) is slightly skewed towards positive scores which means that an average translation is usually assigned a score of 0.5. I have an explanation here
from comet.
In the plots above you can see how different is the scores between English-German and English-Hausa. But you can see that the "peak" for German is a bit higher than Hausa.
Nonetheless this is expected due to the fact that German translations tend to have better quality than Hausa ones.
from comet.
from comet.
Related Issues (20)
- wmt22-cometkiwi-da is not available for download. HOT 10
- [QUESTION] Train my own Metric: HOT 4
- Avoid downloading XLM-R checkpoint from huggingface HOT 1
- Use BetterTransformer for fast inference HOT 1
- [QUESTION] default model is not update? HOT 2
- Version 2.0 HOT 2
- Specifying GPU ID for inference HOT 4
- Models not accessible HOT 2
- Inefficient _layer_norm implementation in layerwise_attention.py HOT 1
- [QUESTION]__init__.py generates a wrong path for hparams.yaml in Windows HOT 6
- tensor_lru_cache is limited to tensors with at least 2-Dimensions HOT 5
- `Unbabel/wmt22-comet-da` model not working as part of Huggingface evaluate HOT 2
- Can't reproduce Cometinho model scores HOT 3
- Do system scores above 100 really "differ"? HOT 4
- wmt20-comet-qe-da do not work under huggingface guides (possible version conflicts) HOT 3
- Support pandas 2
- [QUESTION] How to finetune `wmt22-comet-da` and have results scaled to 0-1 range HOT 1
- UnifiedMetric test failing: test_multitask_with_references HOT 1
- Segmentation error when tring to reproduce wmt22 results HOT 1
- How to reproduce Unbabel/wmt22-comet-da model HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.