Coder Social home page Coder Social logo

tresca-msw / fakenewscorpusspanish Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jpposadas/fakenewscorpusspanish

0.0 1.0 0.0 3.06 MB

The Spanish Fake News Corpus contains a collection of 971 news divided into 491 real news and 480 fake news. The corpus covers news from 9 different topics: Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, and Society

License: Creative Commons Attribution 4.0 International

fakenewscorpusspanish's Introduction

📰 The Spanish Fake News Corpus

GitHub GitHub repo size GitHub last commit GitHub stars

📢 Call for Participation in Fake News Detection in Spanish Shared Task (FakeDeS) 2021

We invite the entire github community to participate in the second edition of the fake news detection task (FakeDeS). For more details and to register, visit the official event website https://sites.google.com/view/fakedes

📆 Important dates

  • March 1th 2021: training data available to participants
  • April 19th 2021: test data available to participants
  • April 30th 2021: system results due to organizers
  • May 15th 2021: assessment returned to participants
  • Jun 7th 2021: working notes submission
  • June 21th 2021: working notes reviewed (peer-reviewed)
  • June 28th 2021: camera ready papers due to the organizers
  • September 2021: IberLEF@SEPLN 2021 Workshop
  • October 2021: Presentation of results at a workshop in Mexico

📄 Corpus Description

The Spanish Fake News Corpus contains a collection of news compiled from several resources on the Web: established newspapers websites, media companies’ websites, special websites dedicated to validating fake news and websites designated by different journalists as sites that regularly publish fake news. The news were collected from **January to July of 2018** and all of them were written in Spanish. The process of tagging the corpus was manually performed and the method followed is described in the paper. aspects were considered: 1) news were tagged as true if there was evidence that it has been published in reliable sites, i.e., established newspaper websites or renowned journalists websites; 2) news were tagged as fake if there were news from reliable sites or specialized website in detection of deceptive content for example VerificadoMX (https://verificado.mx) that contradicts it or no other evidence was found about the news besides the source; 3) the correlation between the news was kept by collecting the true-fake news pair of an event; 4) we tried to trace the source of the news.

The corpus contains 971 news divided into 491 real news and 480 fake news. The corpus covers news from 9 different topics: **Science, Sport, Economy, Education, Entertainment, Politics, Health, Security, and Society**. The corpus was split into train and test sets, using around the 70\% of the corpus for train and the rest for test. We performed a hierarchical distribution of the corpus, i.e., all the categories keep the 70\%-30\% ratio.

The corpus is concentrated in the files train.xlsx and development.xlsx. The meaning of the columns is described next:

  • Id: assign an identifier to each instance.
  • Category: indicates the category of the news (true or fake).
  • Topic: indicates the topic related to the news.
  • Source: indicates the name of the source.
  • Headline: contains the headline of the news.
  • Text: contains the raw text of the news.
  • Link: contains the URL of the source.

📝 How to cite

If you use the corpus please cite the following articles:

  1. Posadas-Durán, J. P., Gómez-Adorno, H., Sidorov, G., & Escobar, J. J. M. (2019). Detection of fake news in a new corpus for the Spanish language. Journal of Intelligent & Fuzzy Systems, 36(5), 4869-4876.

  2. Aragón, M. E., Jarquín, H., Gómez, M. M. Y., Escalante, H. J., Villaseñor-Pineda, L., Gómez-Adorno, H., ... & Posadas-Durán, J. P. (2020, September). Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish. In Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain.

The fake news corpus in Spanish was used for the Fake News Detection Task in the MEX-A3T competition at the IberLEF 2020 congress. The details of the competition can be viewed in the main page of the competition.

Authors of the corpus

Juan Manuel Ramírez Cruz (ESIME Zacatenco - IPN), Silvia Úrsula Palacios Alvarado (ESIME Zacatenco - IPN), Karime Elena Franca Tapia (ESIME Zacatenco - IPN), Juan Pablo Francisco Posadas Durán (ESIME Zacatenco - IPN), Helena Montserrat Gómez Adorno (IIMAS - UNAM), Grigori Sidorov (CIC - IPN)

Aknowledgments

The work was done with partial support of Red Temática de Tecnologías del Lenguaje, CONACYT project 240844 and SIP-IPN projects 20181849 and 20171813

License

CC-BY-4.0.

fakenewscorpusspanish's People

Contributors

jpposadas avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.