Coder Social home page Coder Social logo

ersinaksar / financial-data-structures Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tmorgan4/financial-data-structures

0.0 0.0 0.0 13.08 MB

Create structured financial data in the form of time, tick, volume, and dollar bars from unstructured tick data. From Marcos Lopez de Prado's Advances in Financial Machine Learning textbook.

License: MIT License

Python 0.88% Jupyter Notebook 42.34% HTML 56.78%

financial-data-structures's Introduction

Create Financial Data Structures: Time, Tick, Volume, and Dollar Bars

Version of Pycharm Professional Edition: 2017.3

Version of Python: 3.6.5

Description

This program is to help users create structured financial data from unstructured data, in the form of time, tick, volume, and dollar bars.

The user passes tick data to the create_bars(data, units=1000, type='tick') function and it returns the desired structured data. Everything can be found in the main.py file. I left lots of comments in the code.

These bars are used throughout the text book (Advances in Financial Machine Learning, By Marcos Lopez de Prado, 2018, pg 25) to build the more interesting features for predicting financial time series data.

A great paper to read more about how the tick, volume, and dollar bars have better statistical properties to standard time sampled data is: The Volume Clock: Insights into the high frequency paradigm, Lopez de Prado, et al

Note: Please make sure you unzip the ES_Trades.csv.zip found in raw_tick_data, it was too big to upload without zipping it first.

Why Using Different Sampling Techniques is Cool!

The whole motivation behind sampling data differently to the traditional fixed time intervals, is that alternative methods offer better statistical properties.

The following is from de Prado's text book:

Fixed time interval sampling should be avoided for 2 reasons:

  1. Markets don't process information at fixed time intervals, time bars oversample during quiet periods and undersample during busy periods.
  2. Time sampled bars often exhibit poor statistical properties:
    • Serial Correlation
    • Heteroscedasticity
    • Non Normality of Returns

Mandlebrot and Taylor [1967] were among the first to realize that sampling as a function of the number of transactions exhibited desirable statistical properties. Multiple studies have confirmed that sampling as a function of trading activity allows us to achieve returns closer to IID Normal Ane and Geman [2000]

This is important because many statistical methods rely on the assumption that observations are drawn from an IID Gaussian process.

In the paper titled 'The Volume Clock' you will see the authors show standardized distributions of fixed time intervals and volume bars. Notice how the volume bars have a lower kurtosis. If you check the notebook you will see that the test statistics from the Jarque-Bera normality tests are much lower for non fixed time bars.

small_sample

The code provided in this repo is some of the only code that de Prado doesn't offer in his text book and I thought that the community would find it useful to have a repo from which they can access the code needed to build time, tick, volume, and dollar bars.

To show that the code is correct, I downloaded a small sample and plotted the same distribution. 20 days of tick data, E-mini S&P500 futures, 1 Sep 2013 to 20 Sep 2013. (Sourced from Tick Data LLC: https://s3-us-west-2.amazonaws.com/tick-data-s3/downloads/ES_Sample.zip)

small_sample

As you can see from the results, the alternative bar types provide better statistical properties. A deeper dive can be found in the notebooks.

Data Analysis Jupyter Notebook

The following data analysis is performed on the series of E-mini S&P 500 futures tick data:

  1. Form tick, volume, and dollar bars
  2. Count the number of bars produced by tick, volume, and dollar bars on a weekly basis. Plot a time seiries of that bar count. What bar type produces the most stable weekly count? Why?
  3. Compute serieal correlation of returns for the three bar types. What bar method has the lowest serial correlation?
  4. Apply the Jarque-Bera normality test on returns from the three bar types. What method achieves the lowest test statistic?
  5. Standardize & Plot the Distributions

Notes:

  • dir: jupyter_notebooks
  • This Jupyter Notebook is labeled as DataAnalysis.ipynb
  • Accompanying html file for ease of use

Installation Windows

Ha-Ha be serious, Quants don't use windows. You can download a real operating system here: https://www.ubuntu.com/download/desktop You can also buy a real computer here: https://www.apple.com/shop/buy-mac/macbook-pro

Installation on Mac OS X

Make sure you install the latest version of the Anaconda 3 distribution which must include an IDE like Spyder. To do this you can follow the install and update instructions found on this link: https://www.anaconda.com/download/#mac

To install the package dependency run: pip install -r pip_requirements.txt

From Spyder or Pycharm IDE: Open the file main.py and run it.

From Terminal:

  1. Go to the directory where you have saved the file, example: cd Desktop/bars/awesome/
  2. pip install -r pip_requirements.txt
  3. Run the file: main.py (python main.py)

Installation on Ubuntu Linux

Make sure you install the latest version of the Anaconda 3 distribution which must include an IDE like Spyder. To do this you can follow the install and update instructions found on this link: https://www.anaconda.com/download/#linux

To install the package dependency run: pip install -r pip_requirements.txt

From Spyder or Pycharm IDE: Open the file main.py and run it.

From Terminal:

  1. Go to the directory where you have saved the file, example: cd Desktop/bars/awesome/
  2. pip install -r pip_requirements.txt
  3. Run the file: main.py (python main.py)

Packages Used

Packages can all be installed by running the following command in the terminal (project working directory): "pip install -r pip_requirements.txt"

  • numpy==1.14.2
  • pandas==0.22.0
  • Cython==0.28.2

Notes:

  • This program was built using a MacBook Pro on OS X.
  • Python 3.6.5 was used
  • Tested Successfully on both Mac OS X and Linux Ubuntu

License (MIT)

The MIT License (MIT)

Copyright (c) 2018 Jacques Joubert

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

financial-data-structures's People

Contributors

jackal08 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.