Coder Social home page Coder Social logo

nyc_bikesharing_analysis's Introduction

NYC Citi Bike Share Program Analysis

Winter 2021-22 & Summer 2022


Project background

Since 2013, the Citi Bike Program has implemented a robust infrastructure for collecting data on the program's utilization. Through the team's efforts, each month bike data is collected, organized, and made public on the "Citi Bike Data" webpage.

This project involved extracting a few NYC Citi Bike datasets and using to visualize and determine answers to company leadership questions.

The data was extracted from the website, transformed in a Jupyter Notebook using Python and Pandas then cleaned in a BigQuery using SQL queries and pushed out to a combined csv to then load into Tableau.

Each of the visualizations created contains interactive features for users to explore the different trends related to short-term customers and annual Citi Bike members, peak hours and days of the week, and the breakdown of bike types and seasons (winter and summer).

A report of findings is located in Analysis.md.


Some of the questions answered include:

  • How many trips have been recorded total during the chosen period?

  • By what percentage has total ridership grown?

  • How has the proportion of short-term customers and annual members changed?

  • What are the peak hours in which bikes are used during summer months?

  • What are the peak hours in which bikes are used during winter months?

  • What are the top 10 stations in the city for starting a journey/ ending a journey?

  • What are the bottom 10 stations in the city for starting a journey/ ending a journey?

  • What is the bikes type breakdown of all participants, short-term customers and annual members?

  • How does the average trip duration change?


Datasets Downloaded: (3 months of Winter 2021-2022 & 3 months of Summer 2022)

  • 202112-citibike-tripdata.csv
  • 202201-citibike-tripdata.csv
  • 202202-citibike-tripdata.csv
  • 202206-citibike-tripdata.csv
  • 202207-citibike-tripdata.csv
  • 202208-citibike-tripdata.csv

Note: The datasets used are not included in this folder due to storage space, but can be downloaded from this website directly.


Performing analysis

Transforming dataset with Python and Pandas

  • Use Pandas to calculate the difference between two dates (Started time and Ended time). Check the format of the data: df.info()

data_checking

  • Convert the dates to datetime format:
df['started_at'] = pd.to_datetime(df['started_at'], errors='coerce')
df['ended_at'] = pd.to_datetime(df['ended_at'], errors='coerce')
  • Calculate the difference between two dates as a timedelta and creating a new column 'tripduration': df['tripduration'] = df['ended_at'] - df['started_at']

  • Calculate the difference between two dates in seconds and creating a new column 'tripduration_seconds': df['tripduration_seconds'] = (df['ended_at'] - df['started_at']).dt.total_seconds()

new_columns

  • Export the Dataframe as a new CSV file

Repeate for all 6 files.

Cleaning the Data with SQL

  • Upload 6 CSV files into a BigQuery tables Create tables for 6 datasets:

create_tables

  • Create a new column "route":

CONCAT(start_station_name, " - ", end_station_name) AS route

create_route

  • Cleaning tables with SQL queries

Check and remove negative trip duration:

tripduration1

tripduration2

Check and remove NULL data:

Null1

Null2

  • Save new tables and export as new CSV files

Aggregate datasets into one CSV file

Jupyter Notebook was used to merge multiple datasets into one file.

df_winter_summer = pd.concat([df_202112, df_202201, df_202202, df_202206, df_202207, df_202208], ignore_index=True)

As a result, one CSV file was received: [4016535 rows x 16 columns]>

Then the new Dataframe was exported as new CSV file for further analysis in Tableau.

Create visualizations, design dashboards and create a Tableau story

Glimpse of Visualizations:

Popular routes and stations

Map


Map of bike stations

Peak Bike Stations


Peak ridership times dashboard

Peak Bike Stations


Outliers

Peak Bike Stations


Peak trips hours during the day

Peak Bike Stations

User type breakdown analysis

Peak Bike Stations

nyc_bikesharing_analysis's People

Contributors

akotovets avatar akotovets1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.