Tennis Capstone Project

This is the repo for my Tennis Capstone Project for The Data Incubator. The actual deployed web app is here. Both the frontend/ and backend/ directories contain README files for developers. THIS README describes the project itself.

Business Objective

Bring insightful tennis stats to tennis fans.

Tennis is the 4th most popular sport in the world [1]. The objective is to bring valuable information and insights about professional players to fans.

Information is valuable. We will provide both data and predictions to people through interactive visualizations. In addition to tennis hobbyists, people who gamble on tennis would find the info particularly beneficial. Professional tennis players themselves would find the info helpful in developing potential weaknesses to improve or exploit.

The web app could be monetized by offering basic features for free and charging for advanced features. For example, the basic version may only allow you to compare at most 5 players, but the advanced version may allow you to compare 14.

Data Ingestion

Data will be combined, processed, and updated periodically.

The data comes from two CSV files that are posted at [2]. I plan to add match-level stats in the future which will require additional data from [2], [3], [4], [5], or [6].

The data is loaded with pandas, widdled down, combined, and processed into the information we need. In particular, text-splitting and regular expressions are used to pull player info out of 1 column here; maps are used to create new columns from existing column combinations; and then data is aggregated per-player here. For the PageRank algorithm (see code here), point result information is aggregated per player-pair and a weighted directed graph is created. Networkx then computes the pagerank.

The ingestion pipeline is fully automated (it is enough to run this function) and I plan to rerun it periodically on the latest-and-greatest professional tennis data (the source data is updated every few months).

Visualizations

The project contains a bar chart which is used for both the stats-comparisons and the PageRank comparison. There exist 6 controls for interacting with the data as well as the zoom-interactivity of the amChart itself.

Interactive Website

Users interact with the project via a website. Users explore the data by choosing a (1) statistic, (2) normalization, (3) gender, and some other options. Users can click on info buttons to get explanations of the various choices and methods used to compute the data.

The user interactivity is client-side, and the client will make calls to the server to update the data as necessary. Tools used to achieve this include JavaScript, React, Material-UI, amCharts, Python 3, Flask, Pandas, and Networkx.

Analysis and Results

The statistics are calculated as follows:

"Points won" is the number of points a player has won. When normalized by percentage, it is divided by the number of points they have played. Service points won is the number of points a player won when serving. As a percentage, the denominator is the total number of service points they played. Aces is the number of aces a player hit. As a percentage, the denominator is the number of service points they played. Double faults is the number of double faults the player had. As a percentage, the denominator is the number of service points they played.

The GOAT algorithm is the Google PageRank algorithm applied to the following graph definition: Each player is represented by exactly 1 node. If A and B are nodes, then the directed edge (A, B) has an integer weight which is the number of points that player A lost to player B.

The time decay normalization is the same as the percentage normalization with the following difference: More recent points are weighted higher than points that happened a long time ago. We use a 1-year half-life exponential decay function, so that a point that occurred 1 year ago is only worth half as much as a point that happened today. In the percent normalization, a single point contributes 1 to the denominator and either 1 or 0 to the numerator. In the time decay normalization, a single point that occured y years ago contributes (1/2)^y to the denominator and either (1/2)^y or 0 to the numerator.

The following are some selected results from the analysis:

stat:	aces	double-faults	points-won	The GOAT Algorithm
normalization:	percent	percent	percent	raw count
`#1` player:	Ivo Karlovic `13.5%`	Goran Ivanisevic `4.1%`	Evgeny Donskoy `55.7%`	Roger Federer `4.5%`
`#2` player:	Goran Ivanisevic `9.7%`	Noah Rubin `4.0%`	Thomas Muster `54.6%`	Rafael Nadal `3.1%`
`#3` player:	John Isner `9.7%`	Matthew Ebden `4.0%`	Igor Sijsling `54.5%`	Novak Djokovic `2.7%`

CORS error

Below is a copy of my SO question that I'm NOT posting because switching to a flask PROD server and using http://162.243.168.182:5001 instead of http://clementine:5001 appears to fix the CORS error.

subject:
Why is Flask-Cors not detecting my Cross-Origin domain in production?

body:
My website has a separate server for the front-end and back-end, and so my back-end server needs to open up CORS permissions so that the front-end can request data from it.

I am using Flask-Cors successfully in development, but it doesn't work when I deploy to production. (please note that I have looked at other flask-cors questions on SO, but none of them fit my situation)

Here is the relevant code that is working in development:

# 3rd party imports
import flask
from flask import Flask, request, redirect, send_from_directory, jsonify
from flask_cors import CORS

# Create the app
app = Flask(__name__)
CORS(app, origins=[
  'http://localhost:5001',
])

# Define the routes
@app.route('/')
def index():
  # no CORS code was necessary here
  app.logger.info(f'request is: {flask.request}')

What I've tried:

Adding my server's ip address 'http://162.243.168.182:5001' to the CORS list is not enough to resolve the issue, although I understand it should be there.
It seems that using '*' to allow ALL origins does not work either. (very suspicious!)

Please note that I am using a Docker container, so my environment between development and prod are almost identical. But what's different is that I'm on a different server and I've modified the front-end to send the request to the new IP address (resulting in the famous “Access-Control-Allow-Origin” header missing CORS error).

Now I'm wondering if the flask.request object is somehow missing information, and this causes Flask-Cors to not send the Access-Control-Allow-Origin header like it's supposed to. I can provide that logging info if you think it would help!

mareoraft / tennis Goto Github PK

tennis's Introduction

Tennis Capstone Project

Business Objective

Data Ingestion

Visualizations

Interactive Website

Analysis and Results

tennis's People

Contributors

Stargazers

Watchers

tennis's Issues

Recommend Projects

Recommend Topics

Recommend Org