Comments (4)
Hey @BramVanroy if you look at z-scores used to train COMET (the wmt20-comet-da) there are some outliers that go well over 1, in other words, for that model, its possible to have scores over 1.
From "empirical" experience the model does not output scores > 1.0 very often and when it does its usually because the domain is easy with many short segments or your model just overfitted a specific domain. At Unbabel sometimes we came across such COMET values when we do domain adaptation for a domain where content repeats itself a lot.
Nonetheless, I also came across that blogpost and I think its just publicity I would not rely much on its scientific value. The baseline scores are also very high which ints a domain where content is not difficult and if you give some examples of translations in that domain models will easily learn to produce perfect translations. They explicitly talk about a a new GPT-style model and then they write that the model is 1000x smaller than GPT-4. What does that mean? First of all the GPT-4 parameters are not disclosed and assuming the size of GPT-3 (175B parameters) then 1000x less parameters is ~the size of a Transformer big model that is commonly used for translation. Nonetheless its great that they are using in-context learning and there is a lot of value in those approaches for MT that can differentiate companies from generic MT like google but I would not take the results very seriously (from a scientific perspective).
from comet.
Btw for the new model wmt22-comet-da
its much much less common to have scores over 1.0. because the training data was scaled between 0 and 1.
from comet.
Thanks for the response Ricardo! I agree with your observations about the blog post - it did not contain much useful information in the sense that technical details are missing. That being said, in-context learning/prompt translation does seem like a fruitful prospect in the months and years to come.
from comet.
Sorry for bumping this again @ricardorei, but I am experiencing the "opposite" now with very low, negative scores. I was translating some WMT data with gpt-3.5 and got this translation back. Curiously, its COMET score (with the cometinho checkpoint) -1.034
. That's before *100, so a very low score.
from comet import load_from_checkpoint, download_model
if __name__ == "__main__":
data = [{
"src": "There's mask-shaming and then there's full on assault.",
"ref": "Masken-Shaming ist eine Sache, Körperverletzung eine andere.",
"mt": "Es gibt Maskenscham und dann gibt es den vollen Angriff."
}]
model_path = download_model("eamt22-cometinho-da")
model = load_from_checkpoint(model_path)
seg_scores, sys_score = model.predict(data, batch_size=8, gpus=0)
print(seg_scores, sys_score)
My worry is a bit similar to before: sentence scores like these greatly impact the system scores, which makes me wonder whether it makes sense to ReLU the score, or even sigmoid them. If I remember correctly that it is what you do in the new models, is that correct? If so, would it makes sense to do that when using the older models as well?
from comet.
Related Issues (20)
- Sparsemax not actually used in COMET-KIWI, XCOMET-XL/XXL HOT 4
- Invalid link reference of reference-free model in readme
- Minimizing cpu RAM vs only use GPU RAM HOT 1
- what is the precision when load_from_checkpoint?
- Runtime error when loading wmt23-cometkiwi-da-xl HOT 1
- Different scores from different COMET package versions 1.1.2 and 2.2.1 HOT 2
- Different versions of COMET code give different scores with the same model and date.
- [QUESTION] large file scoring HOT 3
- [QUESTION] Splitting big models over multiple GPUs HOT 6
- [QUESTION] Memory footprint HOT 21
- [INPUT] Text Length of Input (source, reference, and hypothesis) HOT 2
- Change the global variable logger to comet_logger HOT 1
- Training script for XCOMET HOT 1
- Safetensors Support
- [QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction HOT 4
- [QUESTION] why num_layers = num_hidden_layers + 1 HOT 1
- [QUESTION] Comet kiwi architecture HOT 11
- Training data and scripts used for wmt22-cometkiwi-da HOT 4
- Add missing library stubs or py.typed marker
- I see Unbabel comet is downloading models--xlm-roberta-large folder every time, is there any way to load it from local, if yes please share the hack.[QUESTION] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.