the-naughtyformer's Introduction

The Naughtyformer: A Transformer Understands Offensive Humor

Jokes are intentionally written to be funny, but not all jokes are created the same. Some jokes may be fit for a classroom of kindergarteners, but others are best reserved for a more mature audience. While recent work has shown impressive results on humor detection in text, here we instead investigate the more nuanced task of detecting humor subtypes, especially of the less innocent variety. To that end, we introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. Moreover, we show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.

Dataset

To train the Naughtyformer, we introduce a dataset of 92,153 total jokes across categories of 1) Clean Jokes, 2) Dark Jokes, and 3) Dirty Jokes. We also include a fourth category, News, representing a non-joke.

The cleaned and post-processed data is found in cleaned_data.csv. The original Reddit is found in raw_data/.

Model Checkpoint

Our DeBERTa checkpoint for the Naughtyformer can be found here.

Citation

If you find this useful in your research, please consider citing:

@article{tang2022naughtyformer,
  title={The Naughtyformer: A Transformer Understands Offensive Humor},
  author={Tang, Leonard and Cai, Alexander and Li, Steve and Wang, Jason},
  journal={arXiv preprint arXiv:2211.14369},
  year={2022}
}

the-naughtyformer's People

leonardtang / the-naughtyformer Goto Github PK

the-naughtyformer's Introduction

The Naughtyformer: A Transformer Understands Offensive Humor

Dataset

Model Checkpoint

Citation

the-naughtyformer's People

Contributors

Stargazers

Watchers

the-naughtyformer's Issues

What is the input syntax?

The model files are not correct

The model files are deleted

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent