Coder Social home page Coder Social logo

numerai_train's Introduction

numerai_train

This repository contains code to train, tune, and make/submit predictions using the Numerox API to the Numerai Data Science Tournament.

https://docs.numer.ai/tournament/learn

Setup:

To train the models or make predictions I recommend that you first create a virtual environment and install the requirements file.

Using Virtualenv

Use Python 3.7 to avoid some import issues I encountered:

  • virtualenv --python=/usr/bin/python3.7 <path/to/new/virtualenv/>
  • On Posix systems: source /path/to/ENV/bin/activate
  • On Windows (where this repo was developed): \path\to\env\Scripts\activate

Using conda

  • conda create -n yourenvname python=x.x anaconda
  • source activate yourenvname

Install requirements

Once you have your virtual environment set up, install the requirements:

  • pip install -r requirements.txt

Training/Inference:

The API has a few basic components that come together in the predict.py module. There are Models and Trainers that compose a train_and_predict_model() function within the predict module. A Model is built on top of the Numerox API to make submission a bit simpler; a Trainer contains functionality to train, load, save models and also submit predictions - locally or from/to an s3 bucket.

  1. Create a parameter dictionary:
  • EXAMPLE_PARAMS = { 'depth': 7, 'learning_rate': 0.1, 'l2': 0.01, 'iterations': 100 }
  • Use parameter dictionary as an argument to params=EXAMPLE_PARAMS to the train_and_predict_<model-name>_model() function.
  1. To then train, make predictions, and submit predictions run the following command. After training is complete the model weights will be saved to disk with the model name attached to it like this: <model-name>_model_trained_<competition-name>. You also have the option of saving the model to an s3 bucket (more on that below):
  • python predict.py --model <model-name> --load-model <bool> --save-model <bool> --submit <bool>

Saving model to s3 bucket:

  1. Setup ennvironment variables for AWS s3 bucket:
  • export BUCKET=<bucket-name>
  • export AWS_ACCESS_KEY_ID=<access-key-id>
  • export AWS_SECRET_KEY=<secret-key>
  1. Default setup is for models to be loaded to and from an s3 bucket, so run predict.py module as is. If you want to also save the models locally change the code by calling trainer.save/load_model_locally() methods.

Submitting predictions:

  1. Setup ennvironment variables for NumerAPI:
  • export NUMERAI_PUBLIC_ID=<public-id>
  • export NUMERAI_SECREY_KEY=<secret-key>
  1. Set the --submit parameter to True when running python predict.py --submit

Running Tests

With numerai_train as your working directory, run the following from the command line:

  • python -m pytest tests/tests_unit.py -v

Training models using AWS ECS

WIP

Running experiments on Polyaxon

  • polyaxon login --username=root --password=rootpassword
  • polyaxon project create --name=numerai_training --description='Train models on polyaxon'
  • polyaxon init numerai_training
  • CPU: polyaxon run -f configs/polyaxon_cpu.yaml
  • GPU: polyaxon run -f configs/polyaxon_gpu.yaml

numerai_train's People

Contributors

aponte411 avatar

Stargazers

 avatar fum avatar

Watchers

 avatar

Forkers

etesys shaonc

numerai_train's Issues

Test out object oriented approach

Instead of having training functions in the train.py module and prediction functions in the predict.py module, I want to try creating a general Trainer class that Trainers will inherit from. This will reduce redundancy and simplify things a bit. That would then only require creating one prediction function in the predict.py module that gives users the option to load/save models alongside conducting inference.

Fix weird indexing error in LSTMModel

`2020-01-20 14:15:21 - models - ERROR = Failure to prepare predictions with Shape of passed values is (1655355, 1), indices imply (1655356, 1)
Traceback (most recent call last):
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1681, in create_block_manager_from_blocks
mgr = BlockManager(blocks, axes)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 143, in init
self._verify_integrity()
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 345, in _verify_integrity
construction_error(tot_items, block.shape[1:], self.axes)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1719, in construction_error
"Shape of passed values is {0}, indices imply {1}".format(passed, implied)
ValueError: Shape of passed values is (1655355, 1), indices imply (1655356, 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "predict.py", line 328, in
predictions = main()
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "predict.py", line 316, in main
return train_and_predict_lstm_model(submit)
File "predict.py", line 184, in train_and_predict_lstm_model
submit=submit_to_numerai)
File "predict.py", line 60, in make_predictions_and_prepare_submission
prediction: nx.Prediction = model.predict(data['tournament'], tournament)
File "/Users/davidaponte/TRADING/numerai_training/numerai_train/models.py", line 260, in predict
raise e
File "/Users/davidaponte/TRADING/numerai_training/numerai_train/models.py", line 256, in predict
tournament=tournament)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/numerox/prediction.py", line 279, in merge_arrays
df = pd.DataFrame(data=y, columns=[pair], index=ids)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/frame.py", line 440, in init
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 213, in init_ndarray
return create_block_manager_from_blocks(block_values, [columns, index])
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1688, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "/Users/davidaponte/ENVS/numerai_training/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1719, in construction_error
"Shape of passed values is {0}, indices imply {1}".format(passed, implied)
ValueError: Shape of passed values is (1655355, 1), indices imply (1655356, 1)`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.