Coder Social home page Coder Social logo

covid_fake_news's Introduction

COVID19 Fake News Detection in English ๐Ÿ”Ž ๐Ÿ‘€

This repository contains the code for implementing the "A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection " (Accepted at CONSTRAINT Workshop, AAAI 2021).

Preprint: https://arxiv.org/abs/2101.03545

๐Ÿ’ก Please look into our extended work: https://arxiv.org/pdf/2104.01791.pdf Accepted at Neurocomputing!

Task Description

It is a subtask in the CONSTRAINT-2021 shared task on the hostile post detection. This subtask focuses on the detection of COVID19-related fake news in English. The sources of data are various social-media platforms such as Twitter, Facebook, Instagram, etc. Given a social media post, the objective of the shared task is to classify it into either fake or real news.

For example, the following two posts belong to fake and real categories, respectively. image

English Dataset: https://competitions.codalab.org/competitions/26655 or https://github.com/diptamath/covid_fake_news/tree/main/data

English dataset paper: https://arxiv.org/abs/2011.03327

Link to Competition: https://constraint-shared-task-2021.github.io/

Our Approach

Our basic approach involves trying out different language models. Such model have achievedstate-of-the-art results on a variety of text classification tasks, which was the basic driving force behind our intuition to use them. We have tried out different language models like XLNet, RoBERTa, XLM-RoBERTa, DeBERTa, ELECTRA and ERNIE2.0. The individual training model files can be obtained here.

In order to improve the performance of our classification model, we have tried out various ensemble techniques using various combinations of these models. The combination that has yielded the best result is the one using XLNet, RoBERTa, XLM-RoBERTa, DeBERTa. We have created a new feature set using the predictions from different model predictions and saved the resulting feature data. We have also tried out 2 ensemble techniques: Hard Voting and Soft Voting, where Soft Voting has achieved superior results with the above model combination. The code files related to ensembling can be found at this link.

All our work related to Heuristic Post-Processing can be obtained from the Analysis Folder. First, we extract our username statistics and domain statistics from the training data and save them in the Statistical meta folder. We merge our statistical features using this code. Finally, we create our datasets for post-processing and apply our post-processing algorithm to obtain the final classification result.

We also perform an ablation study regarding the priority of username handles and URL domains, and also regarding the threshold parameter, which can be accessed here.

Results

  • Our initial approach using ensembling achieved an F-score of 98.31 against the 98.69 F1-score of the leaderboard topper
  • Post evaluation, we have been able to improve our solution drastically achieving an F1-score of 98.83, using Heuristic Post-Processing

Citation

Please consider citing our paper in your publications if the project helps your research. The BibTeX reference is as follows:

@article{das2021heuristic,
title={A Heuristic-driven Ensemble Framework for COVID-19 Fake News Detection},
author={Das, Sourya Dipta and Basak, Ayan and Dutta, Saikat},
journal={arXiv preprint arXiv:2101.03545},
year={2021}

covid_fake_news's People

Contributors

ayanbasak13 avatar diptamath avatar saikatdutta avatar testttttttt11 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.