Coder Social home page Coder Social logo

timothyyu / gdax-orderbook-ml Goto Github PK

View Code? Open in Web Editor NEW
93.0 17.0 28.0 293.49 MB

Application of machine learning to the Coinbase (GDAX) orderbook

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 100.00%
gdax keras tensorflow machine-learning bitcoin lstm gru orderbook trading candlesticks

gdax-orderbook-ml's Introduction

gdax-orderbook-ml

Application of machine learning to the Coinbase (GDAX) orderbook using a stacked bidirectional LSTM/GRU model to predict new support and resistance on a 15-minute basis; Currently under heavy development.

Model Structure (visual): Model Structure (visual)

General project API/data structure: General API/data structure:

General Project Requirements

  • Anaconda environment strongly recommended
    • see requirements.txt for pip, or environment.yml for Anaconda/conda
      • Jupyter Notebook
      • Python, Pandas, Matplotlib, MongoDB, PyMongo, Git LFS
      • Scipy, Numpy, Feather
      • Keras, Tensorflow, Scikit-Learn
  • Python client for the Coinbase Pro API: [coinbasepro-python] (https://github.com/danpaquin/coinbasepro-python)
  • CUDA/CUDNN-compatible GPU highly recommended for model training, testing, and predicting

Tensorflow/Keras local GPU backend configuration (Nvidia CUDA/cuDNN)

Local GPU used to greatly accelerate prototyping, construction, and building of ML model(s) for this project, especially considering the nature of the dataset & machine learning model complexity.

  • Requirements to run tensorflow with GPU support
  • Nvidia GPU compatible with with CUDA Compute 3.0
    • Nvidia CUDA 9.0
    • Nvidia cuDNN 7.0 (v7.1.2)
      • Install cuDNN .dlls in CUDA directory
      • Edit environment variables:
        • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
    • pip uninstall tensorflow && pip install --ignore-installed --upgrade tensorflow-gpu
      • Default tensorflow install is CPU-only; install CUDA and cuDNN requirements, then uninstall tensorflow and reinstall tensorflow-gpu (pip install --ignore-installed --upgrade tensorflow-gpu)

Project/File Structure

Latest notebook file(s) with project code:

9_data_pipeline_development.ipynb:

  • Development of data pipelines and optimization of data from MongoDB instance to ML model pretraining
  • Removal of deprecated packages + base package version upgrade (i.e. Pandas)
  • Development groundwork for automation pipeline for automated hourly data scrape, cycling, and training for model through segregated instance or live online-based model
  • Usage of in-line markdown cells in-notebook for readability and consistency
  • Even further refinement to program structure
  • Function scope and structure & function creation for common operations
  • Parsing of raw data into 4 separate l2 update (4 consecutive 15 minute l2update segments)

9a_model_restructure.ipynb:

  • Notebook used for further development of model in different formats, and testing of reduced complexity models
    • Keras Sequential() Model
    • Keras Functional API
    • Raw Tensorflow

8_program_structure_improvement.ipynb:

  • Previous notebook with proof-of-concept output results
    • Several function calls via API and multiple packages required are deprecated
    • Use as reference for updated development files/notebooks

6_raw_dataset_update.ipynb:

  • Notebook file used to scrape/update raw_data for both MongoDB and csv format, 1 hour of websocket data from GDAX
    • L2 Snapshot + L2 Updates without overhead of Match data response (does not have Match data; test data has Match data and adds significant I/O overhead)

Folder/Repository Structure

  • 'gdax-python' and 'gdax-ohlc-import' are repositories imported as Git Submodules:
    • After cloning the main project repository, the following command is required to ensure that the submodule repository contents are pulled/present: git submodule update --init --recursive
    • .gitmodules file is file for submodule parameters
  • 'model_saved' folder:
    • Contains .json and .h5 files for current and previous Tensorflow/Keras models (trained model and model weight export/import)
  • 'documentation' folder:
    • 'rds_ml_yu.revised.pptx' is a powerpoint presentation summarizing the key technical components, scope, limitations, of this project.
    • 'design_mockup' folder:
      • Contains diagrams, drawings, and notes used in the process of model and project design during prototyping, testing, and expansion.
    • 'design_explanation' folder:
      • Contains 8 pages of detailed explanations and diagrams in regards to both project/model structure and design.
    • 'previous_revisions' folder:
      • Contains previous/outdated versions of readme documentation and powerpoint presentations documenting the nature of this project
  • 'saved_charts' folder:
    • Output of generate_chart() for candlestick chart with visualized autogenerated support and resistance from autoSR()
    • Screenshot of model layer structure in text format
    • Graphviz output of model layer structure
  • 'test_data' folder:
    • Only has 10 minutes of scraped data for testing, development, and model input prototyping (snapshot + l2 response updates)
  • 'raw_data' folder:
    • 1 hour of scraped data (snapshot + l2 response updates)
      • l2update_15min_1-4: 1 hour of l2 updates split into four 15-minute increments
      • mongo_raw.json: 1 hour of scraped data from the gdax-python API websocket in raw mongoDB format
  • 'raw_data_10h' folder:
    • 10 hours of scraped data:
      • l2update_10h, request_log_10h, and snapshot_asks/bids_10h
      • 10 hours of scraped data in raw mongoDB export (JSON): mongo_raw_10h.json
    • Data in .msg (MessagePack) format currently experimental/testing as alternative to .csv format for I/O operations
  • 'raw_data_pipeline' folder:
    • Contains data in .feather format as part of data pipeline(s) implementation and development
  • 'archived_ipynb' folder:
    • Contains previous Jupyter Notebook files used in the construction, design, and prototyping of components of this project.
      • Jupyter Notebook (.ipynb) notebook files 1-5 & 7
      • Each successive notebook was used to construct and test whether at each "stage" if a project of this kind of scope would even be technically possible.
    • Successive numbered notebooks generally improve and are iterative in nature on previous notebook files for this project.

Misc. Technical Reference

Publications and Journals referenced for model structure and design

License

- gdax-orderbook-ml: BSD-3 Licensed, Copyright (c) 2018 Timothy Yu
- coinbasepro-python: MIT Licensed, Copyright (c) 2017 Daniel Paquin 
- autoSR() function adapted from nakulnayyar/SupResGenerator, Copyright (c) 2016 Nakul Nayyar (https://github.com/nakulnayyar/SupResGenerator)

gdax-orderbook-ml's People

Contributors

timothyyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdax-orderbook-ml's Issues

scrape 10 hours/1 day of data

reconstruct/alter #6 jupyter notebook scrape file for 10 hours of scrape

issues:

  • Ram limitations
  • Mongo db raw scrape size
  • csv limitations as a file format

new complementary tool

I want to offer a new point of view, and my colaboraty

Why this stock prediction project ?

Things this project offers that I did not find in other free projects, are:

  • Testing with +-30 models. Multiple combinations features and multiple selections of models (TensorFlow , XGBoost and Sklearn )
  • Threshold and quality models evaluation
  • Use 1k technical indicators
  • Method of best features selection (technical indicators)
  • Categorical target (do buy, do sell and do nothing) simple and dynamic, instead of continuous target variable
  • Powerful open-market-real-time evaluation system
  • Versatile integration with: Twitter, Telegram and Mail
  • Train Machine Learning model with Fresh today stock data

https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/tree/develop

fix scrape_start()

  • boolean flag for scrape_running() status

  • save to raw_data_pipeline folder

  • timezone basis/reference config (timezone of scrape)

  • outline of function def for new hour of data (mock function)

  • outline of error handling if scrape interrupted

new 1 hour data scrape w/ OHLC data save

  1. scrape new set of 1hr test data with OHLC candlestick data saved to msgpack/csv additionally

  2. related to "ch15m_req_time() not respecting time format" issue + autosr() results save:
    #9 & #33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.