javanese-hate-speech's Introduction

A Deep Learning Approach to Abusive Language and Hate Speech Detection for the Javanese Language

Abstract

This paper develops a deep learning approach to abusive language and hate speech detection using Javanese and Indonesian large language models (LLMs). We experiment on a Javanese Twitter dataset created by Putri et al., aiming to beat their best F-measure of 0.780. Using a fine-tuned Javanese GPT-2 as a feature extractor for our classifier, the model achieves an F-measure of 0.811. Surprisingly, utilizing an Indonesian GPT-2 as the feature extractor yields a superior F-measure 0.854, potentially attributable to code-mixing in Javanese Twitter data or the model’s training on colloquial language. This study further explores the nuances of hate speech detection in Javanese, emphasizing language and model choice.

Please see our paper.

Code

To run the code please follow the instructions:

Clone the repository
Install the requirements in requirements.txt
Run data_preparation.ipynb to clean and split the data
Run javanese_experiments.ipynb to train and evaluate the models (GPU is recommended)
See model_analysis.ipynb for further analysis of the best model, Indonesian GPT-2

Recommend Projects

kevinywu / javanese-hate-speech Goto Github PK

javanese-hate-speech's Introduction

A Deep Learning Approach to Abusive Language and Hate Speech Detection for the Javanese Language

Abstract

Code

javanese-hate-speech's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent