warrierrajeev / ufc-predictions Goto Github PK
View Code? Open in Web Editor NEWA web app to predict UFC fights
Home Page: https://ufc-predictions.rajeevwarrier.com
A web app to predict UFC fights
Home Page: https://ufc-predictions.rajeevwarrier.com
Two CSV files are being referenced in src/app/app.py
and I have no clue what they should be: https://github.com/WarrierRajeev/UFC-Predictions/blob/master/src/app/app.py#L22
I have successfully scraped all data running python -m src.create_ufc_data
. I get the following files:
UFC-Predictions/data$ tree .
.
├── data.csv
├── event_and_fight_links.pickle
├── past_event_links.pickle
├── past_fighter_links.pickle
├── preprocessed_data.csv
├── raw_fighter_details.csv
├── raw_total_fight_data.csv
└── scraped_fighter_data_dict.pickle
0 directories, 8 files
Next, I tried running the web app using it's Dockerfile:
docker build -t ufc/ufc-predictions .
docker run --env PORT=8765 ufc/ufc
I get an error message which ends with:
[2022-07-02 17:47:39 +0000] [10] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/app/app.py", line 22, in <module>
fighter_df = pd.read_csv("app_data/latest_fighter_stats.csv", index_col="index")
File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'app_data/latest_fighter_stats.csv'
[2022-07-02 17:47:39 +0000] [10] [INFO] Worker exiting (pid: 10)
[2022-07-02 17:47:39 +0000] [7] [INFO] Shutting down: Master
[2022-07-02 17:47:39 +0000] [7] [INFO] Reason: Worker failed to boot.
During scraping it is possible that due to reasons that it stops in between. Have certain checkpoints/saves in between scraping from where it will be automatically resumed.
Hello,
Fantastic work! Is it possible to upload the preprocessed database you used? I want to use TPOT to see whether there is a better ML pipeline. If I succeed in getting a better algo I will post it here :)
Currently, there are scripts to only scrape and download data. The pre-processing and formatting of the data is done in notebooks. Create scripts for these so they can be done simply using a single command.
Hey @WarrierRajeev, really interesting topic and awesome approach!
I was able to successfully execute the "python -m src.create_ufc_data" command (grabbing, processing, and saving the data) a handful of times.
However, I am no getting a series of errors, starting with attribute errors and key errors as well.
Curious if something changed in the structure of the data that's being used, causing there to be errors.
Hi @WarrierRajeev!
First, I'd like to say: great repo! I was considering scraping the data myself, but then stumbled across your repository and my headache went away :D
Issue:
There seems to be something wrong with the Heroku hosting
I'm not sure if this is relevant to you, but in case it is, here you go!
Hello there,
I think I increased prediction accuracy (using 80%-20% split) ever so slightly using TPOT (no oversampling applied yet).
try this:
# Average CV score on the training set was: 0.6958245897228948
exported_pipeline = make_pipeline(
make_union(
make_pipeline(
StackingEstimator(estimator=BernoulliNB(alpha=0.001, fit_prior=False)),
ZeroCount()
),
FunctionTransformer(copy)
),
StackingEstimator(estimator=SGDClassifier(alpha=0.01, eta0=0.1, fit_intercept=False, l1_ratio=0.0, learning_rate="constant", loss="perceptron", penalty="elasticnet", power_t=0.1)),
MaxAbsScaler(),
XGBClassifier(learning_rate=0.1, max_depth=2, min_child_weight=19, n_estimators=100, n_jobs=1, subsample=0.4, verbosity=0)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 2)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
KeyError when running create_ufc_data
Also remove data from app/app_data
Hi,
How is src\app\app_data\latest_fighter_stats.csv
generated?
thanks
Hi,
I'm new to github and python, I tried to run
python -m src.create_ufc_data
from the root folder to scrape fresh data last week, and it worked successfully, but when I tried this week, it seems to scrape everything successfully but then when processing I get this error:
'float' object has no attribute 'split'
Here is more detail:
Getting fighter urls
Getting fighter names and details
Scraping all fighter names and links:
Progress: |██████████████████████████████████████████████████| 100.00% Complete
No new fighter data to scrape at the moment, loaded existing data from C:\Users<username>...\UFC-Predictions-master\data\fighter_details.csv.
elapsed seconds = 19.22
Starting Preprocessing
Reading Files
Drop columns that contain information not yet occurred
Renaming Columns
Traceback (most recent call last):
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\create_ufc_data.py", line 21, in
preprocessor.process_raw_data() # Preprocesses the raw data and saves the csv files in data folder
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 34, in process_raw_data
self._rename_columns()
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 115, in _rename_columns
self.fights[column + attempt_suffix] = self.fights[column].apply(
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 4430, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1082, in apply
return self.apply_standard()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1137, in apply_standard
mapped = lib.map_infer(
File "pandas_libs\lib.pyx", line 2870, in pandas._libs.lib.map_infer
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 116, in
lambda X: int(X.split("of")[1])
AttributeError: 'float' object has no attribute 'split'
Make it easier to setup requirements to run the code
src.createdata.preprocess_fighter_data.py
creates a repeated but non-fatal warning:
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead
Documentation on deprecation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#whatsnew-140-deprecations-frame-series-append
As the title suggests. Reading through the source code on how to make predictions once the model is made, I am confused on where red
, and blue
is being passed into update_proba()
. Searching for update_proba()
only shows one instance of it, which is just the definition. Any insight would be appreciated. Thank you.
Firstly, thank you for your incredible project.
When I use data from kaggle and run this cell from preprocessing notebook :
pct_columns = ['Str_Acc','Str_Def', 'TD_Acc', 'TD_Def']
def pct_to_frac(X):
if X != np.NaN:
return float(X.replace('%', ''))/100
else:
return 0
for column in pct_columns:
fighter_details[column] = fighter_details[column].apply(pct_to_frac)
I get this error:
KeyError: 'Str_Acc'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'Str_Acc'
It looks like raw_fighter_details.csv
have some missing columns, can you provide feedback ?
REV in columns are reversals (https://www.foxsports.com/ufc/stats?weightclass=11&category=basic&sort=7)
Use https://docs.python.org/3/library/concurrent.futures.html to speed up web scraping.
We create per fight details using information up until that fight. Scraped fighter stats contain information that hasn't yet happened in most cases.
Currently, only a simple mean is taken. It would be better if it's an exponential moving average of fighter stats. That way recent fights are given more importance.
Since EMAs give a higher weight on recent data than on older data, they are more responsive to the latest fight stats.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.