Coder Social home page Coder Social logo

imukoki / wrangle-and-analyze-data Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 1.41 MB

Use Python to perform Data Wrangling (Gathering, Assessing, Cleaning) of the WeRateDogs Twitter account and archive, followed by storing, analyzing and visualizing the wrangled data.

Jupyter Notebook 100.00%
assessing-data clean-data cleaning-data data-analysis extract extract-data json matplotlib numpy pandas

wrangle-and-analyze-data's Introduction

Project 2: Wrangle-and-Analyze-Data

Udacity logo

Table of contents

Introduction

Real-world data rarely comes clean. Using Python and its libraries, In this project we will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness.

The dataset that you will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

alt text

Project-Details

Your tasks in this project are as follows:

  • Data wrangling, which consists of:

    • Gathering data
    • Assessing data
    • Cleaning data
  • Storing, analyzing, and visualizing your wrangled data

  • Reporting on 1) your data wrangling efforts and 2) your data analyses and visualizations

Gathering-Data-for-this-Project

Gather each of the three pieces of data as described below in a Jupyter Notebook titled wrangle_act.ipynb:

  1. The WeRateDogs Twitter archive twitter_archive_enhanced.csv. It's instructed that this file be downloaded manually from an URL provided by Udacity.
  2. The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv
  3. Each tweet's retweet count and favorite (i.e. "like") count at minimum, and any additional data you find interesting. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count.

Assessing-Data-for-this-Project

After gathering each of the above pieces of data, assess them visually and programmatically for quality and tidiness issues. Detect and document at least eight (8) quality issues and two (2) tidiness issues in your wrangle_act.ipynb Jupyter Notebook. To meet specifications, the issues that satisfy the below mentioned Key Points must be assessed.

Cleaning-Data-for-this-Project

Clean each of the issues you documented while assessing. Perform this cleaning in wrangle_act.ipynb as well. The result should be a high quality and tidy master pandas DataFrame (or DataFrames, if appropriate).

Storing-Analyzing-and-Visualizing-Data-for-this-Project

Store the clean DataFrame(s) in a CSV file with the main one named twitter_archive_master.csv. If additional files exist because multiple tables are required for tidiness, name these files appropriately. Additionally, you may store the cleaned data in a SQLite database (which is to be submitted as well if you do).

Analyze and visualize your wrangled data in your wrangle_act.ipynb Jupyter Notebook. At least three (3) insights and one (1) visualization must be produced.

Installation

git clone https://github.com/imukoki/Wrangle-and-Analyze-Data.git
cd Wrangle-and-Analyze-Data
Jupyter notebook 

Requirements

  • Pandas
  • Numpy
  • Seaborn
  • Json
  • Tweepy

Author

๐Ÿ‘ค Innocent Mukoki

wrangle-and-analyze-data's People

Contributors

imukoki avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.