codewithzichao / multilingual-transformers Goto Github PK

View Code? Open in Web Editor NEW

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th, 4th and 3rd in Tamil, Malayalam and Kannada language of this task finally!🥳

Python 99.22% Shell 0.78%

multilingual-transformers's Introduction

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Our source code for EACL2021 workshop: Offensive Language Identification in Dravidian Languages. We ranked 4th,4th and 3th in Tamil, Malayalam and Kannada language of this task finally!🥳

Updated: Source code is released!🤩

I will release the code very soon.

Repository structure

├── README.md
├── ckpt                        # store model weights during training
│   └── README.md
├── data                        # store the data
│   └── README.md
├── gen_data.py                 # generate Dataset
├── install_cli.sh              # install required package
├── loss.py                     # loss function
├── main_xlm_bert.py            # train mulingual-BERT
├── main_xlm_roberta.py         # train XLM-RoBERTa
├── model.py                    # model implementation
├── pred_data
│   └── README.md
├── preprocessing.py            # preprocess the data
├── pretrained_weights          # store the pretrained weights
│   └── README.md
└── train.py                    # define training and validation loop

Installation

Use the following command so that you can install all of required packages:

sh install_cli.sh

Preprocessing

The first step is to preprocess the data. Just use the following command:

python3 -u preprocessing.py

Training

The second step is to train our model. In our solution, We trained two models which use multilingual-BERT and XLM-RoBERTa as the encoder, respectively.

If you want to train model which use multilingual-BERT as the encoder, use the following command:

nohup python3 -u main_xlm_bert.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_bert_log.log 2>&1 &

If you want to train model which use XLM-RoBERTa as the encoder, use the following command:

nohup python3 -u main_xlm_roberta.py \
        --base_path your base path \
        --batch_size 8 \
        --epochs 50 \
        > train_xlm_roberta_log.log 2>&1 &

Inference

The final step is inference after training. Use the following command:

nohup python3 -u inference.py > inference.log 2>&1 &

Congralutions! You have got the final results!🤩

If you use our code, please indicate the source.

Recommend Projects

codewithzichao / multilingual-transformers Goto Github PK

multilingual-transformers's Introduction

Offensive Language Identification in Dravidian Languages at EACL2021 Workshop

Repository structure

Installation

Preprocessing

Training

Inference

multilingual-transformers's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent