aponte411 / numerai_train Goto Github PK

2.0 1.0 2.0 71 KB

Contains code necessary to train Numerai models (nx.Model)

Dockerfile 0.21% Python 99.57% Shell 0.21%

numerai_train's Introduction

numerai_train

This repository contains code to train, tune, and make/submit predictions using the Numerox API to the Numerai Data Science Tournament.

https://docs.numer.ai/tournament/learn

Setup:

To train the models or make predictions I recommend that you first create a virtual environment and install the requirements file.

Using Virtualenv

Use Python 3.7 to avoid some import issues I encountered:

virtualenv --python=/usr/bin/python3.7 <path/to/new/virtualenv/>
On Posix systems: source /path/to/ENV/bin/activate
On Windows (where this repo was developed): \path\to\env\Scripts\activate

Using conda

conda create -n yourenvname python=x.x anaconda
source activate yourenvname

Install requirements

Once you have your virtual environment set up, install the requirements:

pip install -r requirements.txt

Training/Inference:

The API has a few basic components that come together in the predict.py module. There are Models and Trainers that compose a train_and_predict_model() function within the predict module. A Model is built on top of the Numerox API to make submission a bit simpler; a Trainer contains functionality to train, load, save models and also submit predictions - locally or from/to an s3 bucket.

Create a parameter dictionary:

EXAMPLE_PARAMS = { 'depth': 7, 'learning_rate': 0.1, 'l2': 0.01, 'iterations': 100 }
Use parameter dictionary as an argument to params=EXAMPLE_PARAMS to the train_and_predict_<model-name>_model() function.

To then train, make predictions, and submit predictions run the following command. After training is complete the model weights will be saved to disk with the model name attached to it like this: <model-name>_model_trained_<competition-name>. You also have the option of saving the model to an s3 bucket (more on that below):

python predict.py --model <model-name> --load-model <bool> --save-model <bool> --submit <bool>

Saving model to s3 bucket:

Setup ennvironment variables for AWS s3 bucket:

export BUCKET=<bucket-name>
export AWS_ACCESS_KEY_ID=<access-key-id>
export AWS_SECRET_KEY=<secret-key>

Default setup is for models to be loaded to and from an s3 bucket, so run predict.py module as is. If you want to also save the models locally change the code by calling trainer.save/load_model_locally() methods.

Submitting predictions:

Setup ennvironment variables for NumerAPI:

export NUMERAI_PUBLIC_ID=<public-id>
export NUMERAI_SECREY_KEY=<secret-key>

Set the --submit parameter to True when running python predict.py --submit

Running Tests

With numerai_train as your working directory, run the following from the command line:

python -m pytest tests/tests_unit.py -v

Training models using AWS ECS

WIP

Running experiments on Polyaxon

polyaxon login --username=root --password=rootpassword
polyaxon project create --name=numerai_training --description='Train models on polyaxon'
polyaxon init numerai_training
CPU: polyaxon run -f configs/polyaxon_cpu.yaml
GPU: polyaxon run -f configs/polyaxon_gpu.yaml

numerai_train's People

Contributors

Stargazers

Watchers

Forkers

etesys shaonc

numerai_train's Issues

Test out object oriented approach

Instead of having training functions in the train.py module and prediction functions in the predict.py module, I want to try creating a general Trainer class that Trainers will inherit from. This will reduce redundancy and simplify things a bit. That would then only require creating one prediction function in the predict.py module that gives users the option to load/save models alongside conducting inference.

Fix weird indexing error in LSTMModel

`2020-01-20 14:15:21 - models - ERROR = Failure to prepare predictions with Shape of passed values is (1655355, 1), indices imply (1655356, 1)
Traceback (most recent call last):
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1681, in create_block_manager_from_blocks
mgr = BlockManager(blocks, axes)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 143, in init
self._verify_integrity()
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 345, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1719, in construction_error
"Shape of passed values is {0}, indices imply {1}".format(passed, implied)
ValueError: Shape of passed values is (1655355, 1), indices imply (1655356, 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "predict.py", line 328, in
predictions = main()
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "predict.py", line 316, in main
return train_and_predict_lstm_model(submit)
File "predict.py", line 184, in train_and_predict_lstm_model
submit=submit_to_numerai)
File "predict.py", line 60, in make_predictions_and_prepare_submission
prediction: nx.Prediction = model.predict(data['tournament'], tournament)
File "/Users/davidaponte/TRADING/numerai_training/numerai_train/models.py", line 260, in predict
raise e
File "/Users/davidaponte/TRADING/numerai_training/numerai_train/models.py", line 256, in predict
tournament=tournament)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/numerox/prediction.py", line 279, in merge_arrays
df = pd.DataFrame(data=y, columns=[pair], index=ids)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/frame.py", line 440, in init
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 213, in init_ndarray
return create_block_manager_from_blocks(block_values, [columns, index])
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1688, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1719, in construction_error
"Shape of passed values is {0}, indices imply {1}".format(passed, implied)
ValueError: Shape of passed values is (1655355, 1), indices imply (1655356, 1)`