This project focuses on performing an end-to-end analysis of IPL data using Apache Spark on Databricks. It begins with setting up a Databricks environment, followed by ingesting and exploring the IPL dataset. The project also involves optimizing Spark queries to handle large datasets efficiently, leveraging Databricks’ capabilities for distributed computing. Finally, the results are visualized using Databricks notebooks and integrated tools, creating interactive dashboards or reports. These visualizations are intended to provide stakeholders with actionable insights,
- Handling missing values.
- Changing data types as needed.
- Aggregation of Total and average runs scored in each match and inning.
- Filtering to include only valid deliveries
- Using Apache Spark to perform comprehensive data analysis, leveraging Databricks for efficient data processing.
- Analyzing key metrics such as player performance, match statistics, and team trends over different IPL seasons.
- Employ Spark's capabilities to calculate additional metrics like average scores, win rates, and player consistency across seasons.
- Utilize Databricks’ integrated tools to visualize data, making it easier to interpret complex patterns.
- Chennai Super Kings (CSK) has the highest number of wins after winning the toss, followed by Mumbai Indians and Kolkata Knight Riders.
- Overall, there is a noticeable correlation between winning the toss and securing a win, with some teams taking better advantage of this than others.
- Rashid Khan stands out with the highest average runs scored in matches that his team won, significantly ahead of others.
- OUTsurance Oval and Buffalo Park are the venues with the highest average scores, suggesting they may be favorable for batting.
- Other high-scoring venues include Sheikh Zayed Stadium and Subrata Roy Sahara Stadium.
- "Caught" is the most common conventional dismissal, followed by "Bowled" and "Run Out."
- Dismissal types like "Obstructing the field" and "Retired hurt" are the least frequent.
- Chennai Super Kings and Mumbai Indians have the highest number of wins after winning the toss, indicating a strong correlation between winning the toss and match performance for these teams.
- Kolkata Knight Riders and Royal Challengers Bangalore also have a significant number of wins after winning the toss.
- Teams like Pune Warriors and Kochi Tuskers Kerala have the fewest wins after winning the toss, suggesting that winning the toss has not been as beneficial for them.
Credits: Darshil Parmar