defacto / defactonlp Goto Github PK

DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.

Python 97.50% Java 1.13% Batchfile 0.04% Shell 1.16% Dockerfile 0.14% Prolog 0.03%

attention deep-learning defacto defactonlp fact-checking ner tf-idf

defactonlp's Introduction

DeFacto: Deep Fact Validation

A Fact-Validation framework ❌ ✅

DeFacto is a framework for validating statements by finding confirming sources for it. It takes a statement (such as “Jamaica Inn was directed by Alfred Hitchcock”) as input and then tries to find evidence for the truth of that statement by searching for information in the web (more information).

☕ Java version
🐍 Python version | 🔥 coming soon ❗

Found a 🐛 bug? Please open an issue

Docker Image (please checkout the branch defacto-docker)

Guidelines for demo using Docker

Changelog

v2.1
- HTTP service support
v2.0
- Multilingual Deep Fact Validation Feature
v1.0
- No support for http service
Vaadin component required (user interface)

How to cite

@Article{gerber2015,
  title = {DeFacto - Temporal and Multilingual Deep Fact Validation},
  author = {Gerber, Daniel and Esteves, Diego and Lehmann, Jens and B{\"u}hmann, Lorenz and 
           Usbeck, Ricardo and {Ngonga Ngomo}, Axel-Cyrille and Speck, Ren{\'e}},
  journal = {Web Semantics: Science, Services and Agents on the World Wide Web},
  year = {2015},
  url = {http://svn.aksw.org/papers/2015/JWS_DeFacto/public.pdf}
}

@article{Esteves:2018:TVA:3183573.3177873,
 title = {Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis},
 author = {Esteves, Diego and Rula, Anisa and Reddy, Aniketh Janardhan and Lehmann, Jens},
 journal = {J. Data and Information Quality},
 year = {2018},
 url = {http://doi.acm.org/10.1145/3177873},
 publisher = {ACM}
}

Bugs

Found a 🐛 bug? Open an issue

defactonlp's People

Contributors

Stargazers

Watchers

Forkers

rameshjes ouyuliusi wxj1818 mossydidar arvind-india kumudabg

defactonlp's Issues

Using DeFacto score to discover evidence

Code a function defacto_evidence_retriever(claim, potential_evidence_sentences) where claim is the statement whose veracity we have to determine and potential_evidence_sentences is a list of sentences which could possibly support or refute the claim or provide no relevant information. The function should return a dictionary called results which will have three fields:

claim: the original claim
label: overall classification for the claim - SUPPORTS, REFUTES or NOT ENOUGH INFO
evidence: if the label is SUPPORTS or REFUTES, then it is the list of sentences which either support or refute the claim else it is an empty list

Instructions to run DeFactoNLP

Are there instructions to run DeFactoNLP? Thanks!

Wikipedia array fix

Hello,

please fix the other bug you mentioned:
https://github.com/DeFacto/DeFactoNLP/blob/master/predict.py

Line: 29

To fix DeFacto bug on predicting

Hi Ismail,

please consider fixing the bug you found here https://github.com/DeFacto/DeFactoNLP/blob/master/rte/rte.py

Line 65: numberOfPredictionsPerLabel= np.asarray([len(nonePredictions), len(supportPredictions), len(contradictionPredictions)])

Thanks!
;-)

Try https://github.com/dirko/pyhacrf to build the document retrieval module

Get positive examples from training data.
Randomly sample negative examples by creating pairs of non-matching entity-doc pairs.
Train the CRF.
Apply to testing data!

Using Textual Entailment to retrieve evidence

Code a function textual_entailment_evidence_retriever(claim, potential_evidence_sentences) where claim is the statement whose veracity we have to determine and potential_evidence_sentences is a list of sentences which could possibly support or refute the claim or provide no relevant information. The function should return a dictionary called results which will have three fields:

claim: the original claim
label: overall classification for the claim - SUPPORTS, REFUTES or NOT ENOUGH INFO
evidence: if the label is SUPPORTS or REFUTES, then it is the list of sentences which either support or refute the claim else it is an empty list

Determine Label and supporting evidences given RTE predictions

Inside textual_entailment_evidence_retriever(claim, potential_evidence_sentences) there is a function called determinePredictedLabel(preds) that given the RTE predictions determines the label and supporting sentences.

Aim:

improve determinePredictedLabel(preds) function

Current version:

Returns the label predicted more times given the evidences and all the evidences with the corresponding predicted label

Ideas:

@aniketh: After getting the number and confidence of sentences which support, refute and are neutral, we could supply these numbers to another classifier (like a fully connected NN or SVM) to perform the final label determination. Another similar approach could be used to determine which possible evidence sentences are to be finally given as outputs.

Questions:
@aniketh Do you have suggestions on how to formulate the problem?

Suggestions:

@gil:
- Part 1: feature representation something like (SupportScore, ContradictionScore), where SupportScore is the max confidence score given to a "Entailment" prediction that the RTE model gave to an evidence, ContradictionScore similarly. Labels for this task would be [Support, Contradiction, NotEnoughInfo]
- Part 2: feature representation WIP

How to obtain some measure of factuality/veracity of given evidence? (recall that the RTE model just tell us how much a given evidence can be used to support the claim, it does not judge the veracity of the evidence)
- Train a model to determine how factual the evidence is? Is there related work on this? Is there existing models that we can use? Is there something wikipedia specific to determine the veracity of sentence (@diego work? Wikitables? Knowledge bases?)
- Recursive approach!!! Run the code again but now the claim is the evidence ... then if we obtain supporting sentences labeled as "Support" it means that it is true! If we obtain supporting sentences labeled as "Contradiction" then it the confidence should be reduced!
  - What happens if the evidence is a fact? There are no evidences supporting a simple fact, right?
- Do we have citations/references in the files? Sentences with citations to other work (e.g. scientific papers) are potentially more sustained ...