Coder Social home page Coder Social logo

defacto / defactonlp Goto Github PK

View Code? Open in Web Editor NEW
41.0 41.0 6.0 2.19 GB

DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.

Python 97.50% Java 1.13% Batchfile 0.04% Shell 1.16% Dockerfile 0.14% Prolog 0.03%
attention deep-learning defacto defactonlp fact-checking ner tf-idf

defactonlp's Introduction

DeFacto: Deep Fact Validation

Build Status Project Stats

A Fact-Validation framework ❌ ✅

DeFacto is a framework for validating statements by finding confirming sources for it. It takes a statement (such as “Jamaica Inn was directed by Alfred Hitchcock”) as input and then tries to find evidence for the truth of that statement by searching for information in the web (more information).

  • Java version
  • 🐍 Python version | 🔥 coming soon

Found a 🐛 bug? Please open an issue

Docker Image (please checkout the branch defacto-docker)

Changelog

  • v2.1
    • HTTP service support
  • v2.0
    • Multilingual Deep Fact Validation Feature
  • v1.0
    • No support for http service
  • Vaadin component required (user interface)

How to cite

@Article{gerber2015,
  title = {DeFacto - Temporal and Multilingual Deep Fact Validation},
  author = {Gerber, Daniel and Esteves, Diego and Lehmann, Jens and B{\"u}hmann, Lorenz and 
           Usbeck, Ricardo and {Ngonga Ngomo}, Axel-Cyrille and Speck, Ren{\'e}},
  journal = {Web Semantics: Science, Services and Agents on the World Wide Web},
  year = {2015},
  url = {http://svn.aksw.org/papers/2015/JWS_DeFacto/public.pdf}
}
@article{Esteves:2018:TVA:3183573.3177873,
 title = {Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis},
 author = {Esteves, Diego and Rula, Anisa and Reddy, Aniketh Janardhan and Lehmann, Jens},
 journal = {J. Data and Information Quality},
 year = {2018},
 url = {http://doi.acm.org/10.1145/3177873},
 publisher = {ACM}
} 

Bugs

Found a 🐛 bug? Open an issue

defactonlp's People

Contributors

anikethjr avatar diegoesteves avatar gilrocha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

defactonlp's Issues

Using DeFacto score to discover evidence

Code a function defacto_evidence_retriever(claim, potential_evidence_sentences) where claim is the statement whose veracity we have to determine and potential_evidence_sentences is a list of sentences which could possibly support or refute the claim or provide no relevant information. The function should return a dictionary called results which will have three fields:

  1. claim: the original claim
  2. label: overall classification for the claim - SUPPORTS, REFUTES or NOT ENOUGH INFO
  3. evidence: if the label is SUPPORTS or REFUTES, then it is the list of sentences which either support or refute the claim else it is an empty list

Using Textual Entailment to retrieve evidence

Code a function textual_entailment_evidence_retriever(claim, potential_evidence_sentences) where claim is the statement whose veracity we have to determine and potential_evidence_sentences is a list of sentences which could possibly support or refute the claim or provide no relevant information. The function should return a dictionary called results which will have three fields:

  1. claim: the original claim
  2. label: overall classification for the claim - SUPPORTS, REFUTES or NOT ENOUGH INFO
  3. evidence: if the label is SUPPORTS or REFUTES, then it is the list of sentences which either support or refute the claim else it is an empty list

Determine Label and supporting evidences given RTE predictions

Inside textual_entailment_evidence_retriever(claim, potential_evidence_sentences) there is a function called determinePredictedLabel(preds) that given the RTE predictions determines the label and supporting sentences.

Aim:

improve determinePredictedLabel(preds) function

Current version:

Returns the label predicted more times given the evidences and all the evidences with the corresponding predicted label

Ideas:

  1. @aniketh: After getting the number and confidence of sentences which support, refute and are neutral, we could supply these numbers to another classifier (like a fully connected NN or SVM) to perform the final label determination. Another similar approach could be used to determine which possible evidence sentences are to be finally given as outputs.

Questions:
@aniketh Do you have suggestions on how to formulate the problem?

Suggestions:

  • @gil:
    • Part 1: feature representation something like (SupportScore, ContradictionScore), where SupportScore is the max confidence score given to a "Entailment" prediction that the RTE model gave to an evidence, ContradictionScore similarly. Labels for this task would be [Support, Contradiction, NotEnoughInfo]
    • Part 2: feature representation WIP
  1. How to obtain some measure of factuality/veracity of given evidence? (recall that the RTE model just tell us how much a given evidence can be used to support the claim, it does not judge the veracity of the evidence)
    • Train a model to determine how factual the evidence is? Is there related work on this? Is there existing models that we can use? Is there something wikipedia specific to determine the veracity of sentence (@diego work? Wikitables? Knowledge bases?)
    • Recursive approach!!! Run the code again but now the claim is the evidence ... then if we obtain supporting sentences labeled as "Support" it means that it is true! If we obtain supporting sentences labeled as "Contradiction" then it the confidence should be reduced!
      • What happens if the evidence is a fact? There are no evidences supporting a simple fact, right?
    • Do we have citations/references in the files? Sentences with citations to other work (e.g. scientific papers) are potentially more sustained ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.