Coder Social home page Coder Social logo

avani1998 / airbnb-data-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.06 MB

This project, conducted on Google Colab, analyzes a dataset focusing on Airbnb listings. Key steps include data cleaning, visualization, hypothesis testing, and K-Means clustering of GPS locations.

License: MIT License

Jupyter Notebook 100.00%
geospatial hypothesis-testing kmeans-clustering plotly visualization airbnb data-analysis eda

airbnb-data-analysis's Introduction

Airbnb Data Analysis

drawing

Project Overview

This project, conducted on Google Colab, analyzes a dataset focusing on Airbnb listings. Key steps include data cleaning, visualization, hypothesis testing, and K-Means clustering of GPS locations.

Dataset Details:

  • Source: Inside Airbnb
  • File: listings.csv.gz
  • Location: Rio de Janeiro
  • Date Range: January 2020

Description:

The dataset contains detailed information about Airbnb listings in Rio de Janeiro, including attributes such as host information, property type, pricing, reviews, and geographical coordinates. By analyzing this dataset, insights into the Airbnb market dynamics and host-guest interactions in Rio de Janeiro can be gained.

Key Insights

After analyzing the Airbnb dataset, several key insights were uncovered:

  1. Price Distribution: The histogram of prices revealed that most listings fall within the range of $40 to $830, with a few outliers priced higher.

  2. Location Analysis: The correlation heatmap and geographical visualization highlighted that certain regions, like Rio de Janeiro, tend to have higher-priced listings.

  3. Outlier Detection: Utilizing the interquartile range method, outliers in attributes like price and host listing count were identified and addressed, improving the dataset's reliability.

  4. Hypothesis Testing: Statistical tests revealed significant relationships between various attributes.

    i. The result of the t-test comparing the "price" and "beds" suggests that there is a statistically significant difference between these two attributes. This indicates that there is strong evidence to suggest that there is indeed a relation between the price and the number of beds in the dataset.

    ii. The p-value (3.66e-250) of result of the t-test comparing the "price" and "host_total_listings_count" is far smaller than the typical significance level of 0.05, we reject the null hypothesis. This indicates that there is strong evidence to suggest that there is indeed a relation between the price and the host listing count in the dataset.

  5. K-Means Clustering: Applying K-Means clustering to GPS locations helped identify spatial patterns in listing distribution, facilitating further analysis of regional dynamics. We see from teh elbow curve that the optimal numebr of clusters is 4 indicating that there are 4 areas that define the Rio de Janerio airbnb space.

  6. Accomodatiion Type : The most popular airbnb room type in Rio de Janeiro are "Entire home/apt" and next most popular are the private rooms where as hotel rooms are the least popular.

These insights provide valuable information for understanding the Airbnb market and can guide decision-making processes for both hosts and guests.

airbnb-data-analysis's People

Contributors

avani1998 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.