warrierrajeev / ufc-predictions Goto Github PK

View Code? Open in Web Editor NEW

109.0 17.0 51.0 9.53 MB

A web app to predict UFC fights

Home Page: https://ufc-predictions.rajeevwarrier.com

Jupyter Notebook 94.44% Python 5.53% Dockerfile 0.03%

ufc machine-learning

ufc-predictions's People

Contributors

Stargazers

Watchers

Forkers

lobsterpub vphan6896 nigelhiggs zemingyu grapelane conalltimoney blithedale fuhrer25 csherr-200 therealharman thomasandre0102 gumshoes kit-git firingbrisingr luketingersoll anzus-bet jrf6xh vignesh424 z-feldman mathemaddict vagner1sc owen-brooks asinofsky jonnyod nlbayer reidhulsizer praduman-singh justice-baird nprime496 vsatchek q-shick vishuba sanketsabharwal nategr8-210 ababaiem shere-khan xevor11 j0si prcm066 john-adeojo akshayysinngh donnadonnam robpallin lemonflavdabest tpm518 mowens24 ktaghavi 2takumax diegopl95 alagbefranc mccodycasey

ufc-predictions's Issues

app: referenced CSV files inexistent

Two CSV files are being referenced in src/app/app.py and I have no clue what they should be: https://github.com/WarrierRajeev/UFC-Predictions/blob/master/src/app/app.py#L22

I have successfully scraped all data running python -m src.create_ufc_data. I get the following files:

UFC-Predictions/data$ tree .
.
├── data.csv
├── event_and_fight_links.pickle
├── past_event_links.pickle
├── past_fighter_links.pickle
├── preprocessed_data.csv
├── raw_fighter_details.csv
├── raw_total_fight_data.csv
└── scraped_fighter_data_dict.pickle

0 directories, 8 files

Next, I tried running the web app using it's Dockerfile:

docker build -t ufc/ufc-predictions .
docker run --env PORT=8765 ufc/ufc

I get an error message which ends with:

[2022-07-02 17:47:39 +0000] [10] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/app.py", line 22, in <module>
    fighter_df = pd.read_csv("app_data/latest_fighter_stats.csv", index_col="index")
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'app_data/latest_fighter_stats.csv'
[2022-07-02 17:47:39 +0000] [10] [INFO] Worker exiting (pid: 10)
[2022-07-02 17:47:39 +0000] [7] [INFO] Shutting down: Master
[2022-07-02 17:47:39 +0000] [7] [INFO] Reason: Worker failed to boot.

Have checkpoints to resume from

During scraping it is possible that due to reasons that it stops in between. Have certain checkpoints/saves in between scraping from where it will be automatically resumed.

Show progressbar when saving links

Can I have pre-processed database?

Hello,

Fantastic work! Is it possible to upload the preprocessed database you used? I want to use TPOT to see whether there is a better ML pipeline. If I succeed in getting a better algo I will post it here :)

Write format-data scripts

Currently, there are scripts to only scrape and download data. The pre-processing and formatting of the data is done in notebooks. Create scripts for these so they can be done simply using a single command.

Errors when executing

Hey @WarrierRajeev, really interesting topic and awesome approach!

I was able to successfully execute the "python -m src.create_ufc_data" command (grabbing, processing, and saving the data) a handful of times.

However, I am no getting a series of errors, starting with attribute errors and key errors as well.

Curious if something changed in the structure of the data that's being used, causing there to be errors.

heroku app down?

Hi @WarrierRajeev!
First, I'd like to say: great repo! I was considering scraping the data myself, but then stumbled across your repository and my headache went away :D

Issue:
There seems to be something wrong with the Heroku hosting

I'm not sure if this is relevant to you, but in case it is, here you go!

Update prediction notebooks

Slight improvement to accuracy score

Hello there,

I think I increased prediction accuracy (using 80%-20% split) ever so slightly using TPOT (no oversampling applied yet).

try this:

# Average CV score on the training set was: 0.6958245897228948
exported_pipeline = make_pipeline(
    make_union(
        make_pipeline(
            StackingEstimator(estimator=BernoulliNB(alpha=0.001, fit_prior=False)),
            ZeroCount()
        ),
        FunctionTransformer(copy)
    ),
    StackingEstimator(estimator=SGDClassifier(alpha=0.01, eta0=0.1, fit_intercept=False, l1_ratio=0.0, learning_rate="constant", loss="perceptron", penalty="elasticnet", power_t=0.1)),
    MaxAbsScaler(),
    XGBClassifier(learning_rate=0.1, max_depth=2, min_child_weight=19, n_estimators=100, n_jobs=1, subsample=0.4, verbosity=0)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 2)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

KeyError

KeyError when running create_ufc_data

Update src/app

Also remove data from app/app_data

How is latest_fighter_stats.csv generated?

Hi,

How is src\app\app_data\latest_fighter_stats.csv generated?

thanks

'float' object has no attribute 'split'

Hi,

I'm new to github and python, I tried to run

python -m src.create_ufc_data

from the root folder to scrape fresh data last week, and it worked successfully, but when I tried this week, it seems to scrape everything successfully but then when processing I get this error:

'float' object has no attribute 'split'

Here is more detail:

Getting fighter urls

Getting fighter names and details

Scraping all fighter names and links:
Progress: |██████████████████████████████████████████████████| 100.00% Complete
No new fighter data to scrape at the moment, loaded existing data from C:\Users<username>...\UFC-Predictions-master\data\fighter_details.csv.
elapsed seconds = 19.22
Starting Preprocessing

Reading Files
Drop columns that contain information not yet occurred
Renaming Columns
Traceback (most recent call last):
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\create_ufc_data.py", line 21, in
preprocessor.process_raw_data() # Preprocesses the raw data and saves the csv files in data folder
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 34, in process_raw_data
self._rename_columns()
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 115, in _rename_columns
self.fights[column + attempt_suffix] = self.fights[column].apply(
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 4430, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1082, in apply
return self.apply_standard()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1137, in apply_standard
mapped = lib.map_infer(
File "pandas_libs\lib.pyx", line 2870, in pandas._libs.lib.map_infer
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 116, in
lambda X: int(X.split("of")[1])
AttributeError: 'float' object has no attribute 'split'

Add dev setup

Make it easier to setup requirements to run the code

Deprecation warning (pandas.DataFrame.append) in src.createdata.preprocess_fighter_data.py

src.createdata.preprocess_fighter_data.py creates a repeated but non-fatal warning:

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead

Documentation on deprecation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#whatsnew-140-deprecations-frame-series-append

Where is `update_proba()` getting called?

As the title suggests. Reading through the source code on how to make predictions once the model is made, I am confused on where red, and blue is being passed into update_proba(). Searching for update_proba() only shows one instance of it, which is just the definition. Any insight would be appreciated. Thank you.

Update preprocessing notebooks

Error in preprocessing notebook

Firstly, thank you for your incredible project.

When I use data from kaggle and run this cell from preprocessing notebook :

pct_columns = ['Str_Acc','Str_Def', 'TD_Acc', 'TD_Def']

def pct_to_frac(X):
    if X != np.NaN:
        return float(X.replace('%', ''))/100
    else:
        return 0

for column in pct_columns:
    fighter_details[column] = fighter_details[column].apply(pct_to_frac)

I get this error:

KeyError: 'Str_Acc'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'Str_Acc'

It looks like raw_fighter_details.csv have some missing columns, can you provide feedback ?

Abbrv

REV in columns are reversals (https://www.foxsports.com/ufc/stats?weightclass=11&category=basic&sort=7)

Move code files to src

Speed up web scraping via use of concurrent web requests.

Use https://docs.python.org/3/library/concurrent.futures.html to speed up web scraping.

Drop scraped average fighter stats

We create per fight details using information up until that fight. Scraped fighter stats contain information that hasn't yet happened in most cases.

Do exponential moving average

Currently, only a simple mean is taken. It would be better if it's an exponential moving average of fighter stats. That way recent fights are given more importance.
Since EMAs give a higher weight on recent data than on older data, they are more responsive to the latest fight stats.