Coder Social home page Coder Social logo

twitter-etl-airflow-mongodb's Introduction

ETL With Twitter Data

Overview

This project demonstrates how to work with the Twitter API in python. Using the Tweepy library, you can scrap data from Twitter. The project also shows how to Extract, Transform and Load data into a CSV file and a MongoDB database.

Task

Part 1

Write a script that downloads tweets data on a specific search topic using the standard search API. The script should contain the following functions:

  1. scrape_tweets() that has the following parameters:

    • Search topic
    • The number of tweets to download per request
    • The number of requests

    and returns a dataframe.

  2. Save_results_as_csv() that has the following parameters:

    1. the dataframe from the above function
      And returns a csv file with the following naming format:

    tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)

The following attributes of the tweets should be extracted:

  • Tweet text
  • Tweet id
  • Source
  • Coordinates
  • Retweet count
  • Likes count
  • User info
    • Username
    • Screenname
    • Location
    • Friends count
    • Verification status
    • Description
    • Followers count

Make sure to not include retweets.
Make sure you the same tweets appearing multiple times in your final csv.

Part 2

Create a MongoDB database called Tweets_db and store the extracted tweets into a collection named: raw_tweets.

Pre-requisites

  • Twitter Developer Account
    Apply for a Twitter Developer account if you do not have one. You would need the credentials for working with the Twitter API.
  • Twitter API credentials

Getting Started

The project was developed using:

  • Python 3.7.9
  • Anaconda (conda)
  • Tweepy
  • Pymongo
  • Pandas

Follow the steps below to setup the project.

Create environment

Create a conda environment using the command:

conda create -n "env-name" python=3.7

Activate environment

Activate the environment using the command:

conda activate env-name

Install packages

Install project packages using the command:

pip install -r requirements.txt

Store env variables

To store your access credentials (examples: API keys, Database access credentials), follow the steps below:

  1. Duplicate .env.example file and create a new file names .env
  2. Store your access credentials as needed

Resources

Documentations

Tutorial Articles

twitter-etl-airflow-mongodb's People

Contributors

kingsabru avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.