Comments (21)
Hmm that seems like either the model is not converging or your ground truth is all the same scores.
it's the plain wmt 2020 da csv file
ok, with miniLM learning_rate needs to be much higher and it works fine with 2020 data.
from comet.
You mean the DA's from WMT 22? Some years of WMT are known to have very noisy DA's. For WMT 22 I would not use the DA's... For WMT you have the SQM data or the MQM from the metrics task... the DA's from WMT 2022 were collected only into english and are known to be noisy.
from comet.
Its too big. Ill share it by email
from comet.
You can train an XLMRoberta-large on a 24GB if you keep the embeddings frozen. XLM-R embeddings take a lot of space. But keeping them frozen has no impact on performance and reduces memory a lot.
from comet.
How do I do that? is it documented somewhere?
EDIT: found it but they were frozen already ....
also is the exact same 1720-da.csv dataset downladable somewhere? cause I am running tests independently wit 17, 18, 19 but with 20 I am getting this error: (encoder is miniLM)
Loading data/2020-da.csv.
Epoch 0: 30%|█████████████████████████████████████████████████▊ | 3167/10555 [01:59<04:39, 26.45it/s, v_num=0]Encoder model fine-tuning
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10555/10555 [13:16<00:00, 13.26it/s, v_num=0/home/vincent/miniconda3/envs/pt2.1.0/lib/python3.11/site-packages/scipy/stats/_stats_py.py:5445: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.████| 13/13 [00:00<00:00, 37.22it/s]
warnings.warn(stats.ConstantInputWarning(warn_msg))
/home/vincent/miniconda3/envs/pt2.1.0/lib/python3.11/site-packages/scipy/stats/_stats_py.py:4781: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
warnings.warn(stats.ConstantInputWarning(msg))
Epoch 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 10555/10555 [13:18<00:00, 13.22it/s, v_num=0, val_kendall=nan.0, val_spearman=nan.0, val_pearson=nan.0]Epoch 0, global step 1320: 'val_kendall' reached -inf (best -inf), saving model to '/home/vincent/nlp/COMET/lightning_logs/version_0/checkpoints/epoch=0-step=1320-val_kendall=nan.ckpt' as top 5
from comet.
Also you should use precision at 16.
To keep embeddings frozen just keep this flag at true.
from comet.
Hmm that seems like either the model is not converging or your ground truth is all the same scores.
You can find the data here
from comet.
embedding frozen is already set to true in the unified_metric.yaml so not helping.
When I set precision: 16, I am getting a warning saying it's better to use 16-mixed for AMP.
I'll try 16 but I think I got an error with 16 only.
from comet.
Hi @vince62s,
The precision value I currently use to avoid the warning is 16-mixed
(following this). Also, you might want to try with nr_frozen_epochs: 1.0
and a bigger value for accumulate_grad_batches
.
Hope this helps.
from comet.
Hmm that seems like either the model is not converging or your ground truth is all the same scores.
it's the plain wmt 2020 da csv file
from comet.
Hi @vince62s,
The precision value I currently use to avoid the warning is
16-mixed
(following this). Also, you might want to try withnr_frozen_epochs: 1.0
and a bigger value foraccumulate_grad_batches
.Hope this helps.
the memory issue appears as soon as the encoder is no longer frozen. so to test (to avoid waiting) I put nr_frozen_epochs=0.0 so that I see rigth away if things fit in the vRam.
with precision:16 / batch_size 4 we are a the very limit of 24GB, would be a pity if it crashes. there could be twp nice options: 1) have a filtertoolong catch to exclude examples taht are very long and tirgger this, and 2) a try/except when it goes OOM so that it can discard the batch and continue.
from comet.
You can find the data here
@ricardorei can you share the script that computes those csv files ? I would like to redo the same but exclude some specific systems, or do you have the same with the system name as a column?
from comet.
I actually found the notebooks I used...but I did not saved the data. Just the raw notebooks. They should help you redo the data
from comet.
they also point to the previous WMT websites where you can download the data.
from comet.
Thanks, in the meantime I managed to do it for wmt2021. I was able to exclude one system but it gives me the same results.
I still have an issue with wmt22 data whatever the learning rate when training only on those data it does not converge.
from comet.
but here: https://huggingface.co/datasets/RicardoRei/wmt-da-human-evaluation
it has some 2022 data, is it DA or something else?
I trained on the 2022 extract from there so must be DA
from comet.
yes exactly. Its those DA's from WMT 22
from comet.
Usually I only use DA's from 2017 to 2020. Even those from 2021 I don't trust too much
from comet.
but do you have the exact data set used for wmt23-cometkiwi-da-xl and for wmt22-cometkiwi-da ?
from comet.
Yes I do. Let me download it and I'll share here.
Its basically WMT 17 to 20 + MLQE-PE data.
from comet.
closing this but training with xlmroberta large or XL is still an issue with 24GB vram
maybe using Lora would help and be the solution.
from comet.
Related Issues (20)
- Minimizing cpu RAM vs only use GPU RAM HOT 1
- what is the precision when load_from_checkpoint?
- Runtime error when loading wmt23-cometkiwi-da-xl HOT 1
- Different scores from different COMET package versions 1.1.2 and 2.2.1 HOT 2
- Different versions of COMET code give different scores with the same model and date.
- [QUESTION] large file scoring HOT 3
- [QUESTION] Splitting big models over multiple GPUs HOT 6
- [INPUT] Text Length of Input (source, reference, and hypothesis) HOT 2
- Change the global variable logger to comet_logger HOT 1
- Training script for XCOMET HOT 1
- Safetensors Support
- [QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction HOT 4
- [QUESTION] why num_layers = num_hidden_layers + 1 HOT 1
- [QUESTION] Comet kiwi architecture HOT 11
- Training data and scripts used for wmt22-cometkiwi-da HOT 4
- Add missing library stubs or py.typed marker
- I see Unbabel comet is downloading models--xlm-roberta-large folder every time, is there any way to load it from local, if yes please share the hack.[QUESTION] HOT 1
- [QUESTION] predict multiple times with a model
- [QUESTION] how to enable multi-gpu when calling the predict method
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.