Coder Social home page Coder Social logo

bhavanachitragar / ipl-data-analysis-project-using-apache-spark-on-databricks Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.68 MB

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset.

Home Page: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/19652298897236/1310443996444177/4655662666255799/latest.html

Jupyter Notebook 100.00%
apache-spark aws-s3 databricks-notebooks python

ipl-data-analysis-project-using-apache-spark-on-databricks's Introduction

IPL Data Analysis Using Apache Spark on Databricks


Untitled Diagram drawio (5)

This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset. The project also involves optimizing Spark queries to handle large datasets efficiently, leveraging Databricks’ capabilities for distributed computing. Finally, the results are visualized using Databricks notebooks and integrated tools, creating interactive dashboards or reports. These visualizations are intended to provide stakeholders with actionable insights,

Steps Involved:

1. Cleaning Data

  • Handling missing values.
  • Changing data types as needed.
  • Aggregation of Total and average runs scored in each match and inning.
  • Filtering to include only valid deliveries

2. Data Analysis

  • Using Apache Spark to perform comprehensive data analysis, leveraging Databricks for efficient data processing.
  • Analyzing key metrics such as player performance, match statistics, and team trends over different IPL seasons.
  • Employ Spark's capabilities to calculate additional metrics like average scores, win rates, and player consistency across seasons.
  • Utilize Databricks’ integrated tools to visualize data, making it easier to interpret complex patterns.

4. Insights:

Team Performance After Winning the Toss:
  • Chennai Super Kings (CSK) has the highest number of wins after winning the toss, followed by Mumbai Indians and Kolkata Knight Riders.
  • Overall, there is a noticeable correlation between winning the toss and securing a win, with some teams taking better advantage of this than others.
Average Runs Scored by Batsmen in Winning Matches:
  • Rashid Khan stands out with the highest average runs scored in matches that his team won, significantly ahead of others.
Top Venues for High Scores:
  • OUTsurance Oval and Buffalo Park are the venues with the highest average scores, suggesting they may be favorable for batting.
  • Other high-scoring venues include Sheikh Zayed Stadium and Subrata Roy Sahara Stadium.
Most Frequent Dismissal Types:
  • "Caught" is the most common conventional dismissal, followed by "Bowled" and "Run Out."
  • Dismissal types like "Obstructing the field" and "Retired hurt" are the least frequent.
Team Performance After Winning Toss:
Top Performers:
  • Chennai Super Kings and Mumbai Indians have the highest number of wins after winning the toss, indicating a strong correlation between winning the toss and match performance for these teams.
  • Kolkata Knight Riders and Royal Challengers Bangalore also have a significant number of wins after winning the toss.
Lower Performers:
  • Teams like Pune Warriors and Kochi Tuskers Kerala have the fewest wins after winning the toss, suggesting that winning the toss has not been as beneficial for them.

Sanpshots:

Screenshot 2024-08-08 180103

Screenshot 2024-08-08 180123

Screenshot 2024-08-08 180739


Credits: Darshil Parmar

ipl-data-analysis-project-using-apache-spark-on-databricks's People

Contributors

bhavanachitragar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.