Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!๐ฅณ
Updated: Source code is released!๐คฉ
I will release the code very soon.
โโโ README.md
โโโ ckpt # store model weights during training
โย ย โโโ README.md
โโโ data # store the data
โย ย โโโ README.md
โโโ gen_data.py # generate Dataset
โโโ install_cli.sh # install required package
โโโ loss.py # loss function
โโโ main_xlm_bert.py # train mulingual-BERT
โโโ main_xlm_roberta.py # train XLM-RoBERTa
โโโ model.py # model implementation
โโโ pred_data
โย ย โโโ README.md
โโโ preprocessing.py # preprocess the data
โโโ pretrained_weights # store the pretrained weights
โย ย โโโ README.md
โโโ train.py # define training and validation loop
Use the following command so that you can install all of required packages:
sh install_cli.sh
The first step is to preprocess the data. Just use the following command:
python3 -u preprocessing.py
The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.
If you want to train model which use multilingual-BERT as the encoder, use the following command:
nohup python3 -u main_xlm_bert.py \
--base_path your base path \
--batch_size 8 \
--epochs 50 \
> train_xlm_bert_log.log 2>&1 &
If you want to train model which use XLM-RoBERTa as the encoder, use the following command:
nohup python3 -u main_xlm_roberta.py \
--base_path your base path \
--batch_size 8 \
--epochs 50 \
> train_xlm_roberta_log.log 2>&1 &
The final step is inference after training. Use the following command:
nohup python3 -u inference.py > inference.log 2>&1 &
Congralutions! You have got the final results!๐คฉ
If you use our code, please indicate the source.