Coder Social home page Coder Social logo

mm-covid's Introduction

MM-COVID

Multilingual and Multimodal COVID-19 Fake News Dataset

Data Structure

The data is stored at Google Drive

  • news_collection.json: this file stores the information about the fact-checking, news content and news label
  • news_tweet_relation.json: this file stores the dicussion of the news content from Twitter
  • tweet_tweet_relation.json: this file stores the retweets, recursively replies of the tweets.

Due to the Twitter privacy concerns, we only provide the twitter IDs for the tweets, you can utilize Twarc to Hydrate these tweet IDs.

Crawling Pipeline

This code stored the data into MongoDB. You should pre-install MongoDB before running the code.

The main file is FakeNewsCrawler.py and the pipeline of this file is as:

pipeline

WorkFlow

  1. Use crawler to get all the fake news from the Factchecking server.
  2. Fetch the html page of the source provided in the article and parse and get the "title" of the article
  3. Using the title fetched in the previous step and Twitter's advanced search API get tweets matching title using web scrapping
  4. For every tweet related to fake news get the favourites, replies, retweets associated with it.
  5. For all the users who tweeted those fake tweets, gather the social network information like followers, followees.

Installation

Requirements:

Credits for FakeNewsNet.

Mongo db setup - https://docs.mongodb.com/tutorials/install-mongodb-on-ubuntu/

Firefox driver - Geckodriver installation - https://askubuntu.com/questions/870530/how-to-install-geckodriver-in-ubuntu

Data download scripts are writtern in python and requires python 3.6 + to run.

Twitter API keys are used for collecting data from Twitter. Make use of the following link to get Twitter API keys https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

Script make use of keys from tweet_keys_file.json file located in code/resources folder. So the API keys needs to be updated in tweet_keys_file.json file. Provide the keys as array of JSON object with attributes app_key,app_secret,oauth_token,oauth_token_secret as mentioned in sample file.

Install all the libraries in requirements.txt using the following command

pip install -r requirements.txt

Running Code

Inorder to collect data set fast, code makes user of process parallelism and to synchronize twitter key limitations across mutiple python processes.

nohup python FakeNewsCrawler.py

References

If you use this dataset, please cite the following paper:

@misc{li2020mmcovid,
  title={MM-COVID: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation}, 
  author={Yichuan Li and Bohan Jiang and Kai Shu and Huan Liu},
  year={2020},
  eprint={2011.04088},
  archivePrefix={arXiv},
  primaryClass={cs.SI}}

If you have any questions about this dataset, please contact Yichuan Li ([email protected]).

mm-covid's People

Contributors

bigheiniu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.