Coder Social home page Coder Social logo

hate-speech-classification's Introduction

Hate Speech Classification

This is my attempt to replicate the paper "Hate Speech Dataset from a White Supremacy Forum" released in 2018 https://www.aclweb.org/anthology/W18-5102/

Experimental setting as described in the paper:

The experiments are based on a balanced subset of labelled sentences. All the sentences labelled as HATE have been collected, and an equivalent number of NOHATE sentences have been randomly sampled, summing up 2k labelled sentences. From this amount, the 80% has been used for training and the remaining 20% for testing.

The evaluated algorithms are the following:

  • Support Vector Machines (SVM) (Hearst et al., 1998) over Bag-of-Words vectors. Word-count-based vectors have been computed and fed into a Python Scikit-learn LinearSVM11 classifier to separate HATE and NOHATE instances.

  • Convolutional Neural Networks (CNN), as described in (Kim, 2014). The implementation is a simplified version using a single input channel of randomly initialized word embeddings12.

  • Recurrent Neural Networks with Long Shortterm Memories (LSTM) (Hochreiter and Schmidhuber, 1997). A LSTM layer of size 128 over word embeddings of size 300.

Result Comparison

Results excluding relation label

                             Results from Paper       ||    My Implementation        |
Model                 |   Hate    |  NoHate  |  All   ||  Hate  |  NoHate |  All     |
Logistic Regression   |           |          |        ||  x.xx  |  x.xx   |  x.xx    |
SVM                   |   0.72    |   0.76   |  0.74  ||  x.xx  |  x.xx   |  0.6875  |
CNN                   |   0.54    |   0.86   |  0.70  ||  x.xx  |  x.xx   |  0.72    |
LSTM                  |   0.76    |   0.80   |  0.78  ||  0.71  |  0.68   |  0.70    |

Results including relation label

                             Results from Paper       ||    My Implementation        |
Model                 |   Hate    |  NoHate  |  All   ||  Hate  |  NoHate |  All     |
Logistic Regression   |           |          |        ||  x.xx  |  x.xx   |  x.xx    |
SVM                   |   0.69    |   0.73   |  0.71  ||  x.xx  |  x.xx   |  0.7025  |
CNN                   |   0.55    |   0.79   |  0.66  ||  x.xx  |  x.xx   |  0.67    |
LSTM                  |   0.71    |   0.75   |  0.73  ||  x.xx  |  x.xx   |  x.xx    |

hate-speech-classification's People

Contributors

sandesh10 avatar

Stargazers

 avatar  avatar Areesha Asif avatar

Watchers

James Cloos avatar Ankur Padia avatar  avatar paper2code - bot avatar  avatar

Forkers

aqhali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.