Coder Social home page Coder Social logo

ufc-predictions's People

Contributors

ekeany avatar gumshoes avatar warrierrajeev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ufc-predictions's Issues

app: referenced CSV files inexistent

Two CSV files are being referenced in src/app/app.py and I have no clue what they should be: https://github.com/WarrierRajeev/UFC-Predictions/blob/master/src/app/app.py#L22

I have successfully scraped all data running python -m src.create_ufc_data. I get the following files:

UFC-Predictions/data$ tree .
.
├── data.csv
├── event_and_fight_links.pickle
├── past_event_links.pickle
├── past_fighter_links.pickle
├── preprocessed_data.csv
├── raw_fighter_details.csv
├── raw_total_fight_data.csv
└── scraped_fighter_data_dict.pickle

0 directories, 8 files

Next, I tried running the web app using it's Dockerfile:

docker build -t ufc/ufc-predictions .
docker run --env PORT=8765 ufc/ufc

I get an error message which ends with:

[2022-07-02 17:47:39 +0000] [10] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/app.py", line 22, in <module>
    fighter_df = pd.read_csv("app_data/latest_fighter_stats.csv", index_col="index")
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: 'app_data/latest_fighter_stats.csv'
[2022-07-02 17:47:39 +0000] [10] [INFO] Worker exiting (pid: 10)
[2022-07-02 17:47:39 +0000] [7] [INFO] Shutting down: Master
[2022-07-02 17:47:39 +0000] [7] [INFO] Reason: Worker failed to boot.

Have checkpoints to resume from

During scraping it is possible that due to reasons that it stops in between. Have certain checkpoints/saves in between scraping from where it will be automatically resumed.

Can I have pre-processed database?

Hello,

Fantastic work! Is it possible to upload the preprocessed database you used? I want to use TPOT to see whether there is a better ML pipeline. If I succeed in getting a better algo I will post it here :)

Write format-data scripts

Currently, there are scripts to only scrape and download data. The pre-processing and formatting of the data is done in notebooks. Create scripts for these so they can be done simply using a single command.

Errors when executing

Hey @WarrierRajeev, really interesting topic and awesome approach!

I was able to successfully execute the "python -m src.create_ufc_data" command (grabbing, processing, and saving the data) a handful of times.

However, I am no getting a series of errors, starting with attribute errors and key errors as well.

Curious if something changed in the structure of the data that's being used, causing there to be errors.

heroku app down?

Hi @WarrierRajeev!
First, I'd like to say: great repo! I was considering scraping the data myself, but then stumbled across your repository and my headache went away :D

Issue:
There seems to be something wrong with the Heroku hosting
image

I'm not sure if this is relevant to you, but in case it is, here you go!

Slight improvement to accuracy score

Hello there,

I think I increased prediction accuracy (using 80%-20% split) ever so slightly using TPOT (no oversampling applied yet).

try this:

# Average CV score on the training set was: 0.6958245897228948
exported_pipeline = make_pipeline(
    make_union(
        make_pipeline(
            StackingEstimator(estimator=BernoulliNB(alpha=0.001, fit_prior=False)),
            ZeroCount()
        ),
        FunctionTransformer(copy)
    ),
    StackingEstimator(estimator=SGDClassifier(alpha=0.01, eta0=0.1, fit_intercept=False, l1_ratio=0.0, learning_rate="constant", loss="perceptron", penalty="elasticnet", power_t=0.1)),
    MaxAbsScaler(),
    XGBClassifier(learning_rate=0.1, max_depth=2, min_child_weight=19, n_estimators=100, n_jobs=1, subsample=0.4, verbosity=0)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 2)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

KeyError

KeyError when running create_ufc_data

'float' object has no attribute 'split'

Hi,

I'm new to github and python, I tried to run

python -m src.create_ufc_data

from the root folder to scrape fresh data last week, and it worked successfully, but when I tried this week, it seems to scrape everything successfully but then when processing I get this error:

'float' object has no attribute 'split'

Here is more detail:

Getting fighter urls

Getting fighter names and details

Scraping all fighter names and links:
Progress: |██████████████████████████████████████████████████| 100.00% Complete
No new fighter data to scrape at the moment, loaded existing data from C:\Users<username>...\UFC-Predictions-master\data\fighter_details.csv.
elapsed seconds = 19.22
Starting Preprocessing

Reading Files
Drop columns that contain information not yet occurred
Renaming Columns
Traceback (most recent call last):
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\create_ufc_data.py", line 21, in
preprocessor.process_raw_data() # Preprocesses the raw data and saves the csv files in data folder
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 34, in process_raw_data
self._rename_columns()
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 115, in _rename_columns
self.fights[column + attempt_suffix] = self.fights[column].apply(
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 4430, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1082, in apply
return self.apply_standard()
File "C:\Users<username>\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\apply.py", line 1137, in apply_standard
mapped = lib.map_infer(
File "pandas_libs\lib.pyx", line 2870, in pandas._libs.lib.map_infer
File "C:\Users<username>\Documents\repos\mma\UFC-Predictions-master\src\createdata\preprocess.py", line 116, in
lambda X: int(X.split("of")[1])
AttributeError: 'float' object has no attribute 'split'

Add dev setup

Make it easier to setup requirements to run the code

Deprecation warning (pandas.DataFrame.append) in src.createdata.preprocess_fighter_data.py

src.createdata.preprocess_fighter_data.py creates a repeated but non-fatal warning:

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead

Documentation on deprecation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html
https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#whatsnew-140-deprecations-frame-series-append

Where is `update_proba()` getting called?

As the title suggests. Reading through the source code on how to make predictions once the model is made, I am confused on where red, and blue is being passed into update_proba(). Searching for update_proba() only shows one instance of it, which is just the definition. Any insight would be appreciated. Thank you.

Error in preprocessing notebook

Firstly, thank you for your incredible project.

When I use data from kaggle and run this cell from preprocessing notebook :

pct_columns = ['Str_Acc','Str_Def', 'TD_Acc', 'TD_Def']

def pct_to_frac(X):
    if X != np.NaN:
        return float(X.replace('%', ''))/100
    else:
        return 0

for column in pct_columns:
    fighter_details[column] = fighter_details[column].apply(pct_to_frac)

I get this error:

KeyError: 'Str_Acc'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'Str_Acc'

It looks like raw_fighter_details.csv have some missing columns, can you provide feedback ?

Drop scraped average fighter stats

We create per fight details using information up until that fight. Scraped fighter stats contain information that hasn't yet happened in most cases.

Do exponential moving average

Currently, only a simple mean is taken. It would be better if it's an exponential moving average of fighter stats. That way recent fights are given more importance.
Since EMAs give a higher weight on recent data than on older data, they are more responsive to the latest fight stats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.