Coder Social home page Coder Social logo

sycityhunter / deep-reinforcement-stock-trading Goto Github PK

View Code? Open in Web Editor NEW

This project forked from albert-z-guo/deep-reinforcement-stock-trading

0.0 0.0 0.0 10.23 MB

A light-weight deep reinforcement learning framework for portfolio management. This project explores the possibility of applying deep reinforcement learning algorithms to stock trading in a highly modular and scalable framework.

License: GNU General Public License v3.0

Python 1.01% Jupyter Notebook 98.99%

deep-reinforcement-stock-trading's Introduction

Deep-Reinforcement-Stock-Trading

This project intends to leverage deep reinforcement learning in portfolio management. The framework structure is inspired by Q-Trader. The reward for agents is the net unrealized (meaning the stocks are still in portfolio and not cashed out yet) profit evaluated at each action step. For inaction at each step, a negtive penalty is added to the portfolio as the missed opportunity to invest in "risk-free" Treasury bonds. A lot of new features and improvements are made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from scratch.

Key assumptions and limitations of the current framework:

  • trading has no impact on the market
  • only single stock type is supported
  • only 3 basic actions: buy, hold, sell (no short selling or other complex actions)
  • the agent performs only 1 action for portfolio reallocation at the end of each trade day
  • all reallocations can be finished at the closing prices
  • no missing data in price history
  • no transaction cost

Key challenges of the current framework:

  • implementing algorithms from scratch with a thorough understanding of their pros and cons
  • building a reliable reward mechanism (learning tends to be stationary/stuck in local optima quite often)
  • ensuring the framework is scalable and extensible

Currently, the state is defined as the normalized adjacent daily stock price differences for n days plus [stock_price, balance, num_holding].

In the future, we plan to add other state-of-the-art deep reinforcement learning algorithms, such as Proximal Policy Optimization (PPO), to the framework and increase the complexity to the state in each algorithm by constructing more complex price tensors etc. with a wider range of deep learning approaches, such as convolutional neural networks or attention mechanism. In addition, we plan to integrate better pipelines for high quality data source, e.g. from vendors like Quandl; and backtesting, e.g. zipline.

Getting Started

To install all libraries/dependencies used in this project, run

pip3 install -r requirement.txt

To train a DDPG agent or a DQN agent, e.g. over S&P 500 from 2010 to 2015, run

python3 train.py --model_name=model_name --stock_name=stock_name
  • model_name is the model to use: either DQN or DDPG; default is DQN
  • stock_name is the stock used to train the model; default is ^GSPC_2010-2015, which is S&P 500 from 1/1/2010 to 12/31/2015
  • window_size is the span (days) of observation; default is 10
  • num_episode is the number of episodes used for training; default is 10
  • initial_balance is the initial balance of the portfolio; default is 50000

To evaluate a DDPG or DQN agent, run

python3 evaluate.py --model_name=model_name --model_to_load=model_to_load --stock_name=stock_name
  • model_to_load is the model to load; default is DQN_ep10
  • stock_name is the stock used to evaluate the model; default is ^GSPC_2018, which is S&P 500 from 1/1/2018 to 12/31/2018
  • initial_balance is the initial balance of the portfolio; default is 50000

where stock_name can be referred in data directory and model_to_laod can be referred in saved_models directory.

To visualize training loss and portfolio value fluctuations history, run:

tensorboard --logdir=logs/model_events

where model_events can be found in logs directory.

Example Results

Note that the following results were obtained with 10 epochs of training only. alt_text

alt_text

Frequently Asked Questions (FAQ)

  • How is this project different from other price prediction approaches, such as logistic regression or LSTM?
    • Price prediction approaches like logistic regression have numerical outputs, which have to be mapped (through some interpretation of the predicted price) to action space (e.g. buy, sell, hold) separately. On the other hand, reinforcement learning approaches directly output the agent's action.

References:

deep-reinforcement-stock-trading's People

Contributors

albert-z-guo avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.