Coder Social home page Coder Social logo

awesome-ml-for-trading's Introduction

Awesome Machine Learning for Trading *)

A curated list of anything related to the application of machine learning in stock trading.

*) This work is sponsored by Stocks Asia, berita saham Indonesia (stock news search engine) and Stosia.ai.

Table of Contents

Companies that Have Done It

List of companies that have successfully applied machine learning or deep learning in stock trading.

  • Renaissance Technologies - Probably the most well known and successful hedge fund firm that specializes in systematic trading using quantitative models derived from mathematical and statistical analyses.
  • Numerai - An A.I hedge fund with crowd sourced algorithms. People can submit algorithm and get paid if it works. See the corporate introduction video here, and longer interviews with the founder Richard Craib here and here.
  • Alpaca - a platform where end users mark their trading entries/exits in the chart, and a deep learning platform figures out the model to detect such patterns. See their video presentation here
  • Binatix - there's nothing on their website, but you can read their coverage Introducing Binatix: A Deep Learning Trading Firm That's Already Profitable (actually there's not much information here either).

Conferences, Summits

Books

Structuring Financial Machine Learning Company

  • Marcos Lopez de Prado: Advances in Financial Machine Learning. John Wiley & Sons, 2018. [Google Scholar (2)]

    This is a good book describing how to structure the use of machine learning in finance. The first chapter is available online and it outlines the main ideas that will be taught in the rest of the books. This book is written by someone who knows both finance and machine learning subjects very well. The author is an expert in mathematical finance, has written many papers on machine learning and supercomputing, and has $13 billion fund under his management. He knows what he's talking about.

    In this book, he describes 1) why many failed to apply ML to finance, and 2) what it takes to succeed. And he argues that what it takes to succeed is first you must realize that there is no silver bullet, no magic ML model that will put you in the list of billionaires, and what you must do instead is to establish the infrastructure and process to facilitate the creation and development of these models.

    Be warned though that you won't find any state of the art ML techniques for finance in this book. No no no, that is just not going to happen with finance culture in this planet. As this book says, it won't tell you how to make a car, but how to make the car factory.

Market Microstructure

Papers, Thesis

On End of Day Price Data

Using Multiple Techniques

  • Mikelsen, Stian; Andersen, André Christoffer: "A Novel Algorithmic Trading Framework Applying Evolution and Machine Learning for Portfolio Optimization", Master's Thesis (2012) [PDF] [Google Scholar (8)]

    In this thesis, the authors present 14 trading systems based on technical techniques (RSI, MACD, follow the leader), machine learning (SVM, NN including RNN, and regression), and efficient frontier (EF), and evaluate them on EOD data on stocks on DOW and OBX indexes (Oslo Stock Exchange). The periods are 12 and 6 years for Dow and OBX, which are somewhat short. The paper concluded that Dow is highly efficient because it's hard to make profit using the models, whereas they can stil make some profit on OBX because it is less efficient.

Using DBN

  • C. Zhu et al.: A Stock Decision Support System based on DBNs, Journal of Computational Information Systems 10: 2 883-893 (2014) [PDF] [Google Scholar (10)]

    Using oscilation box indicator on EOD data as trading system. DBN is used for learning the data and predict box boundary. Buy if price breaks the upper bound, and vice versa. The model has 14 inputs (indicators) such as the standard HLOCV, MA, ROC, RSI, and MACD indicators. Each indicators are normalized and given different weights based on their influence on the price before they're given to DBN as input. Gray correlation degree is used to calculate these influence weights. Data are 400 stocks in S&P 500, with 1200 days and 400 days of EOD data for training and testing dataset respectively.

    The results look very good, the model can significantly outperform the benchmark in all four conditions: bull, bear, fluctuating bull, and fluctuating bear situations. However some careful decisions need to be done with regard to hyperparameter selections such as training data duration, the box window sizes, and transaction rate, and there is overfitting concern here.

Using Multi-agent Reinforcement Learning

  • Lee, Jae Won, and O. Jangmin. "A multi-agent Q-learning framework for optimizing stock trading systems" International Conference on Database and Expert Systems Applications. Springer, Berlin, Heidelberg, 2002. [PDF] [Google Scholar (11)]

    The paper presents multi-agent RL architecture containing four agents: for generating buy signal, executing buy signal, generating sell signal, and executing sell signal. Each are running Q-learning. Operating on EOD data, with dataset from KOSPI 200 stocks from Jan 1999 to Oct 2001 (less than 3 years). Claimed good result on the test set (about 4 months period). This paper presents interesting multi-agent architecture, but the details are a bit unclear.

On Intraday Price Data

Using Neural Networks

  • Dixon, M., Klabjan, D., & Bang, J. H. (2016). "Classification-based Financial Markets Prediction using Deep Neural Networks". Algorithmic Finance [PDF] [Google Scholar (21)]

    This paper analyses the 5-minute interval price of 43 commodities and FX futures over 23 year period. It uses standard NN with four hidden layers of 1000, 900, 800, and 700 neurons and 129 output neurons (three signals (up, down, and neutral) for each of the 43 securities). The input is extended with additional indicators and correlation with other securities, making a total of 9895 input features for the network. Good results were claimed.

On Market Microstructure

Theory

  • Rama Cont, Arseniy Kukanov, Sasha Stoikov. "The Price Impact of Order Book Events". 2011 [PDF] [Google Scholar (194)]

    The authors study the price impact of order book events - limit orders, market orders and cancelations to equity price. They show that, over short time intervals, price changes are mainly driven by the order flow imbalance (OFI), defined as the imbalance between supply and demand at the best bid and ask prices. Further they show that there is a linear relation between OFI and price movement and argues that OFI captures other relationships such as between trade order imbalance or traded volume and price movement.

  • Charles Cao & Oliver Hansch & Xiaoxin Wang, 2009. "The information content of an open limit order book," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 29(1), pages 16-41 [PDF] [Google Scholar (210)]

    The authors primarily investigate whether the orders behind the best bid and ask prices contribute to price discovery and contain information about short-term future price movements. There are two competing schools of thought on these. The first believes that private information is short-lived, and any imbalances of information will be reflected immediately using market orders. An informed trader who knows that the current market price is too high will expect the price to go downward in the future, especially when the other traders learn the same information, thus the likelihood of achieving execution for a limit sell order is relatively small in this situation.

    The second believes that private information is long-lived and the number of traders who may discover the private information is small. They believe submitting market orders signals impatience and reveals too much information. Harris (1990) considered two types of limit-order traders: pre-committed and value-motivated traders. The former submit limit orders to reduce trading costs, but will switch to using market orders if their orders remain unfilled for too long. The latter express their valuations of the asset through their choice of limit price. Both of these contribute to the reasons of price movements.

    The authors uses Hasbrouck method to calculate the information share of three components: the MID price (mean between the best bid and ask price), the P price (i.e. last transaction price), and weighted price WP of order book steps up to ten steps (calculated from bid and ask price of the step scaled by the number of shares). Data is order book of 100 most active stocks in ASX for the month of March 2000, sampled every second.

    The authors find that the order book is moderately informative; its contribution to price discovery is approximately 22%. The remaining 78% is from the best bid and offer prices on the book and the last transaction price. Furthermore, the authors find that order imbalances between the demand and supply schedules along the book are significantly related to future short-term returns, even after controlling for the autocorrelations in return, the inside spread, and the trade imbalance.

  • Lee, Charles, and Mark J. Ready. "Inferring trade direction from intraday data." The Journal of Finance 46.2 (1991): 733-746. [PDF] [Google Scholar (3021)]

    In this paper, the authors explains tick test, a technique to infers the direction of a trade by comparing its price to the price of the preceding trade(s), and proposes a hybrid method that combines it with quote method to increase accuracy. The tick test classifies each trade into four categories: an uptick, a downtick, a zero-uptick, and a zero-downtick. A trade is an uptick (downtick) if the price is higher (lower) than the price of the previous trade. When the price is the same as the previous trade (a zero tick), if the last price change was an uptick, then the trade is a zero-uptick. Similarly, if the last price change was a downtick, then the trade is a zero-downtick. A trade is classified as a buy if it occurs on an uptick or a zero-uptick; otherwise it is classified as a sell.

    The accuracy of Lee and Ready hybrid method has been compared with other methods such as tick test and quote methods and it was found that while the accuracy in general is better than tick test (80.8% on average vs 77.2%), it is not as good as what Lee & Ready suggested (more than 90%). The summaries of the results are as follows [source: 'tick-test', private repository].

    Paper Dataset Tick Test Rev Tick Test Quote L & R
    Aitken and Frino [1996] ASX 74.4%
    Ellis, et al. [2000] Nasdaq 77.7% 76.4% 81.0%
    Odders-White [2000] TORQ (NYSE) 78.6% 74.9% 85.0%
    Theissen [2001] Frankfurt SE 72.2% 75.4% 72.8%
    Finucane [2000] TORQ (NYSE) 83.0% 72.1% 84.4%
    Average 77.2% 72.1% 75.6% 80.8%

    References:

    • Aitken, Michael, and Alex Frino. "The accuracy of the tick test: Evidence from the Australian stock exchange." Journal of Banking & Finance 20.10 (1996): 1715-1729. [Google Scholar (103)]
    • Ellis, Katrina, Roni Michaely, and Maureen O'Hara. "The accuracy of trade classification rules: Evidence from Nasdaq." Journal of Financial and Quantitative Analysis 35.4 (2000): 529-551. [Google Scholar (467)]
    • Odders-White, Elizabeth R. "On the occurrence and consequences of inaccurate trade classification." Journal of Financial Markets 3.3 (2000): 259-286. [Google Scholar (273)]
    • Theissen, Erik. "A test of the accuracy of the Lee/Ready trade classification algorithm." Journal of International Financial Markets, Institutions and Money 11.2 (2001): 147-165. [Google Scholar (62)]
    • Finucane, Thomas J. "A direct test of methods for inferring trade direction from intra-day data." Journal of Financial and Quantitative Analysis 35.4 (2000): 553-576. [Google Scholar (153)]

Using RNN

On News

Using CNN

  • Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. "Deep learning for event-driven stock prediction". In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15), Qiang Yang and Michael Wooldridge (Eds.). AAAI Press 2327-2333. [PDF] [Google Scholar (62)]

    This paper analyzes the short, mid, and long term effects of news on stock price. It proposes two main ideas, i.e. the use of event embeddings to represent news events (as opposed to word embeddings or just bag of words methods), and the use of CNN (combined with a feed forward NN) to learn and predict the effect of these events to stock price movement. It concludes that using these two techniques gives better result than using either one. The result is about 65% accuracy, and this is about 6% better than previous state of the art work, and combined with a simple trading strategy it can give handsome profit.

    This is a very useful paper for those who want to perform news extraction and analyse the impact on stock price.

Want to Read

General Machine Learning

  • Lopez de Prado, Marcos. "The 10 Reasons Most Machine Learning Funds Fail." (2018). [SSRN] [Google Scholar (0)]

Market Microstructure

  • Matthew Dixon, "Sequence classification of the limit order book using recurrent neural networks", Journal of Computational Science, Volume 24, 2018, Pages 277-286, ISSN 1877-7503, [URL] [PDF] [Google Scholar (7)]

  • Sirignano, Justin, "Deep Learning for Limit Order Books" (May 16, 2016). [SSRN] [PDF] [Google Scholar (14)]

    Interesting papers cited:

    • Cartea, Álvaro, Ryan Donnelly, and Sebastian Jaimungal. "Enhancing trading strategies with order book signals." Applied Mathematical Finance (2018): 1-35. [PDF] [Google Scholar (17)]

    • B. Zheng, E. Moulines and F. Abergel, "Price Jump Prediction in a Limit Order Book" Journal of Mathematical Finance, Vol. 3 No. 2, 2013, pp. 242-255 [PDF] [Google Scholar (20)]

    • Avellaneda, Marco, and Sasha Stoikov. "High-frequency trading in a limit order book." Quantitative Finance 8.3 (2008): 217-224. [PDF] [Google Scholar (270)]

    • Avellaneda, Marco, Josh Reed, and Sasha Stoikov. "Forecasting prices from Level-I quotes in the presence of hidden liquidity." Algorithmic Finance 1.1 (2011): 35-43. [PDF] [Google Scholar (35)]

  • D. Palguna and I. Pollak, "Mid-Price Prediction in a Limit Order Book," (December 31, 2014) [SSRN] [PDF] [Google Scholar (6)]

  • Kearns, Michael, and Yuriy Nevmyvaka. "Machine learning for market microstructure and high frequency trading." High Frequency Trading: New Realities for Traders, Markets, and Regulators (2013). [PDF] [Google Scholar (22)]

  • Rama Cont, Arseniy Kukanov, Sasha Stoikov. "The Price Impact of Order Book Events". 2011 [PDF] [Google Scholar (194)]

    The authors study the price impact of order book events - limit orders, market orders and cancelations to equity price, using the NYSE TAQ data for 50 U.S. stocks. They show that, over short time intervals, price changes are mainly driven by the order flow imbalance, defined as the imbalance between supply and demand at the best bid and ask prices.

    Interesting papers cited:

    • Eisler, Zoltan, Jean-Philippe Bouchaud, and Julien Kockelkoren. "The price impact of order book events: market orders, limit orders and cancellations." Quantitative Finance 12.9 (2012): 1395-1419. [PDF] [Google Scholar (105)]
    • Hautsch, Nikolaus, and Ruihong Huang. "The market impact of a limit order." Journal of Economic Dynamics and Control 36.4 (2012): 501-522. [PDF] [Google Scholar (112)]
    • Avellaneda, Marco, Josh Reed, and Sasha Stoikov. "Forecasting prices from Level-I quotes in the presence of hidden liquidity." Algorithmic Finance 1.1 (2011): 35-43. [PDF] [Google Scholar (35)]
    • Hopman, Carl. "Do supply and demand drive stock prices?." Quantitative Finance 7.1 (2007): 37-53. [Google Scholar (55)]
    • Weber, Philipp, and Bernd Rosenow*. "Order book approach to price impact" Quantitative Finance 5.4 (2005): 357-364. [PDF] [Google Scholar (157)]
    • Bouchaud, Jean-Philippe, et al. "Fluctuations and response in financial markets: the subtle nature of random price changes" Quantitative finance 4.2 (2004): 176-190. [PDF] [Google Scholar (400)]
    • Engle, Robert F., and Asger Lunde. "Trades and quotes: a bivariate point process." Journal of Financial Econometrics 1.2 (2003): 159-188. [PDF] [Google Scholar (163)]
    • Hasbrouck, Joel, and Duane J. Seppi. "Common factors in prices, order flows, and liquidity." Journal of financial Economics 59.3 (2001): 383-411. [PDF] [Google Scholar (1143)]
    • Knez, Peter J., and Mark J. Ready. "Estimating the profits from trading strategies." The Review of Financial Studies 9.4 (1996): 1121-1163. [PDF] [Google Scholar (162)]
    • Karpoff, Jonathan M. "The relation between price changes and trading volume: A survey." Journal of Financial and quantitative Analysis 22.1 (1987): 109-126. [PDF] [Google Scholar (2873)]
  • Krishnamurti, Chandrasekhar. "Introduction to market microstructure." Investment management. Springer, Berlin, Heidelberg, 2009. 13-29. [PDF] [Google Scholar (10)]

  • Chordia, Tarun, Richard Roll, and Avanidhar Subrahmanyam. "Order imbalance, liquidity, and market returns." Journal of Financial economics 65.1 (2002): 111-130. [PDF] [Google Scholar (970)]

  • Hasbrouck, Joel. "One security, many markets: Determining the contributions to price discovery." The journal of Finance 50.4 (1995): 1175-1199. [Google Scholar (1477)]

News

  • Andersen, Torben G., et al. "Real-time price discovery in global stock, bond and foreign exchange markets." Journal of international Economics 73.2 (2007): 251-277. [PDF] [Google Scholar (952)]

Other Lists

  • Greg Harris: "A Survey of Deep Learning Techniques Applied to Trading" (July 2016) [URL (broken)] [PDF].

    Note that many the content overlap with items on this page.

awesome-ml-for-trading's People

Contributors

bennylp avatar

Watchers

James Cloos avatar Fabrício Silva avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.