Experiment setup for evaluating the effects of quality estimation filtering for machine translation models
- PyTorch version >= 1.10.0
- Python version >= 3.8 (Experiments were run using Python 3.9.12)
- fairseq :
pip install fairseq
- TransQuest :
pip install transquest
The German-English IWSLT17 dataset can be found here.
- Download 2017-01-trnmted.tgz
- Extract files with
tar -xzvf 2017-01-trnmted.tgz
- From the unzipped folder, also extract
texts\DeEnItNlRo\DeEnItNlRo\DeEnItNlRo.tgz
- Run
prep-iwslt17.sh
to prepare the train, test, and valid sets.
- Split
train.de
andtrain.en
files usingData_Preprocessing\split-data.sh
. This will make all 7 dataset splits at once. - Preprocess all split datasets using
Data_Preprocessing\preprocess.sh
. Change paths to to point to your train, valid, and test set locations.
- Run
fs_tq_v1.py
for each split. Change the source and target data to test.de and test.en, respectively. Change the checkpoint and databin paths for all 7 splits. In line 76, save sentences that are rated less than 0.70 :if (pred < 0.70)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Fine-tune the original models for each split with
finetune.sh
. Change the paths to the tokenized data, checkpoint_best.pt, and save directory.
- Run
fs_tq_v1.py
for each split. Change the source and target data to train.de and train.en, respectively. Change the checkpoint and databin paths for all 7 splits. In line 76, save sentences that are rated less than 0.712 :if (pred < 0.712)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Fine-tune the original models for each split with
finetune.sh
. Change the paths to the tokenized data, checkpoint_best.pt, and save directory.
- Run
fs_tq_v1.py
for each split. Change the source and target data to train.de and train.en, respectively. Change the checkpoint and databin paths for all 7 splits. In line 76, save sentences that are rated higher than 0.712 :if (pred > 0.712)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Fine-tune the original models for each split with
finetune.sh
. Change the paths to the tokenized data, checkpoint_best.pt, and save directory.
- Run
tq_iwslt17.py
for each original dataset split. Change the source and target data to train.de and train.en, respectively. In line 76, save sentences that are rated higher than 0.712 :if (pred > 0.712)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Train a new model for each split with
finetune.sh
with a learning rate--lr 5e-4
. Omit line 18--finetune-from-model checkpoints-v4/checkpoint7/checkpoint_best.pt
. Change the paths to the tokenized data, and save directory.
- Run
fs_tq_v4.py
for each split. Change the source and target data to train.de and train.en, respectively. Change the checkpoint to point to the models trained in experiment 4 and the databin path for all 7 splits. In line 79, save sentences that are rated higher than 0.712 :if (pred > 0.712)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Train a new model for each split with
finetune.sh
. Change the paths to the tokenized data, and save directory.
- Run
fs_tq_v4.py
for each split. Change the source and target data to train.de and train.en, respectively. Change the checkpoint to point to the models trained in experiment 4 and the databin path for all 7 splits. In line 79, save sentences that are rated lower than 0.712 :if (pred < 0.712)
- Preprocess the saved sentences using
Data_Preprocessing\preprocess.sh
- Train a new model for each split with
finetune.sh
. Change the paths to the tokenized data, and save directory.