Coder Social home page Coder Social logo

liar-plus's Introduction

LIAR-PLUS

The extended LIAR dataset for fact-checking and fake news detection released in our paper: Where is Your Evidence: Improving Fact-Checking by Justification Modeling. Tariq Alhindi, Savvas Petridis and Smaranda Muresan. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) Brussels, Belgium November 1st, 2018.

This dataset has evidence sentences extracted automatically from the full-text verdict report written by journalists in Politifact. Our objective is to provide a benchmark for evidence retrieval and show empirically that including evidence information in any automatic fake news detection method (regardless of features or classifier) always results in superior performance to any method lacking such information.

Below is the description of the TSV file taken as is from the original LIAR dataset, which was published in this paper. We added a new column at the end that has the extracted justification.

  • Column 1: the ID of the statement ([ID].json).
  • Column 2: the label.
  • Column 3: the statement.
  • Column 4: the subject(s).
  • Column 5: the speaker.
  • Column 6: the speaker's job title.
  • Column 7: the state info.
  • Column 8: the party affiliation.
  • Columns 9-13: the total credit history count, including the current statement.
    • 9: barely true counts.
    • 10: false counts.
    • 11: half true counts.
    • 12: mostly true counts.
    • 13: pants on fire counts.
  • Column 14: the context (venue / location of the speech or statement).
  • Column 15: the extracted justification

Our justification extraction method is done as follows:

  • Get all sentences in the 'Our Ruling' section of the report if it exists or get the last five sentences.
  • Remove any sentence that have the verdict and any verdict-related words. Verdict-related words are provided in the forbidden words file.

Please Note:
The dataset in the current commit is the second version which was updated after publishing the paper. We increased the list of forbidden words in the second version after realizing that we have missed a few in v1. To find the results of our experiments on v2 of the dataset, please refer to the poster. To find the results on v1 of the dataset, please refer to the paper. V1 of the dataset can be found in this commit.

Note that we do not provide the full-text verdict report in this current version of the dataset, but you can use the following command to access the full verdict report and links to the source documents:

wget http://www.politifact.com//api/v/2/statement/[ID]/?format=json

The original sources retain the copyright of the data. Note that there are absolutely no guarantees with this data, and we provide this dataset "as is", but you are welcome to report the issues of the preliminary version of this data.
You are allowed to use this dataset for research purposes only.

Kindly cite our paper if you find this dataset useful.

@inproceedings{alhindi2018your,
title={Where is your Evidence: Improving Fact-checking by Justification Modeling},
author={Alhindi, Tariq and Petridis, Savvas and Muresan, Smaranda},
booktitle={Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)},
pages={85--90},
year={2018}
}

v2.0 10/24/2018

liar-plus's People

Contributors

tariq60 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.