matthewgilbert / mapping Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 4.0 3.37 MB

Futures Instrument Mapping Functionality

License: MIT License

Makefile 0.30% Python 99.64% Batchfile 0.03% Shell 0.03%

mapping's People

Contributors

Stargazers

Watchers

Forkers

alexanu fagan2888 thetradingflow wenjian0721

mapping's Issues

Flattening dictionary weights and returns

Currently util.calc_rets supports a nested dictionary structure for weights and
prices however there is no easy way to persist weights in this structure.

An example of this is

import pandas as pd
import mapping.util

idx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'CLF5'),
                                  (pd.Timestamp('2015-01-03'), 'CLG5')])
ret1 = pd.Series([0.005, -0.02], index=idx)
idx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'COF5'),
                                  (pd.Timestamp('2015-01-03'), 'COG5')])
ret2 = pd.Series([0.005, -0.02], index=idx)

rets = {"CL": ret1, "CO": ret2}

vals = [1, 1]
widx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'CLF5'),
                                   (pd.Timestamp('2015-01-03'), 'CLG5')])
weights1 = pd.DataFrame(vals, index=widx, columns=["CL1"])
widx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'COF5'),
                                   (pd.Timestamp('2015-01-03'), 'COG5')])
weights2 = pd.DataFrame(vals, index=widx, columns=["CO1"])

weights = {"CL": weights1, "CO": weights2}

mapping.util.calc_rets(rets, weights)

It would be good to add utility functions for flattening and unflattening both weight and
price dictionaries so they can easily be persisted to csv.

util.reindex() with leading return data is incorrect

If you attempt to util.reindex() a set of returns for which there are leading returns prior to having any weight this will cause a bug where they are compounded into the first return. An example is shown below

import pandas as pd
from pandas import Timestamp as TS
import mapping.util as util

idx = pd.MultiIndex.from_tuples([(TS('2015-01-02'), 'CLF5'),
                                 (TS('2015-01-02'), 'CLH5'),
                                 (TS('2015-01-03'), 'CLF5'),
                                 (TS('2015-01-03'), 'CLH5'),
                                 (TS('2015-01-04'), 'CLH5')])
returns = pd.Series([0.02, 0.01, 0.06, 0.03, -0.02], index=idx)
widx = pd.MultiIndex.from_tuples([(TS('2015-01-02'), 'CLF5'),
                                  (TS('2015-01-03'), 'CLF5'),
                                  (TS('2015-01-04'), 'CLH5')])
new_rets = util.reindex(returns, widx, limit=1)

returns
2015-01-02  CLF5    0.02
            CLH5    0.01
2015-01-03  CLF5    0.06
            CLH5    0.03
2015-01-04  CLH5   -0.02

new_rets
2015-01-02  CLF5    0.020000
2015-01-03  CLF5    0.060000
2015-01-04  CLH5    0.019494   <--- compounded (1 + 0.01) * (1 + 0.03) * (1 + -0.02) - 1

Reindexing returns

Related to #8, there are lots of scenarios where the calendar for instrument returns does not match the calendar for instrument weights. For example if you are trading futures on two different exchanges, they may have different holiday calendars. If you define your weights to be the subset of all dates which all instruments can be traded then your instrument weights will not agree with your instrument return dates. It is therefore helpful to provide the user with a utility function to reindex the returns to the calendar indicated by the instrument weights. This should address a couple of different cases, listed below

Case 1

Instrument returns include extra days which are not included in the instrument weights. These days should be compounded into the following day where there is an instrument weight

returns

2015-01-02  CLF5    0.02
2015-01-03  CLF5    0.01
            CLH5    0.06
2015-01-04  CLF5    0.03
            CLH5   -0.02
2015-01-05  CLH5   -0.05

weights

2015-01-02  CLF5    1.0
2015-01-04  CLF5    0.5
            CLH5    0.5
2015-01-05  CLH5    1.0

reindexed returns

2015-01-02  CLF5    0.0200
2015-01-04  CLF5    0.0403   <--- compounded combination of 2015-01-03 and 2015-01-04
            CLH5    0.0388   <--- compounded combination of 2015-01-03 and 2015-01-04
2015-01-05  CLH5   -0.0500

Case 2

Instrument weights include dates which are not included in the instrument returns. This introduces NaNs, which can optionally be filled up to some limit.

returns

2015-01-02  CLF5    0.02
2015-01-04  CLF5   -0.02
            CLH5   -0.05

weights

2015-01-02  CLF5    1.0
2015-01-03  CLF5    1.0
2015-01-04  CLF5    0.5
            CLH5    0.5
2015-01-05  CLF5    0.5
            CLH5    0.5
2015-01-06  CLF5    0.5
            CLH5    0.5

reindexed returns filled with limit of 1

2015-01-02  CLF5    0.02
2015-01-03  CLF5    0.00
2015-01-04  CLF5   -0.02
            CLH5   -0.05
2015-01-05  CLF5    0.00
            CLH5    0.00
2015-01-06  CLF5     NaN
            CLH5     NaN

static_transition non performant

The current implementation of static_transition is very non performant. An example below shows the issue.

import pandas as pd
import mapping.mappings


midx = pd.MultiIndex.from_product([["A1", "A2", "A3"], ["front", "back"]])
transition = pd.DataFrame([[1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1]],
                         index=[-5, -4], columns=midx)

ts = pd.bdate_range("2008-01-01", "2018-01-01")
dts = ts[ts.dayofweek == 2]

cinfo = pd.DataFrame(dts, columns=["dates"])
cinfo.loc[:, "month"] = dts.month
cinfo.loc[:, "year"] = dts.year
cinfo = cinfo.groupby(["year", "month"]).nth(2).reset_index()
code_dict = dict(zip(range(1, 13), "FGHJKMNQUVXZ"))
cinfo.loc[:, "code"] = cinfo.month.apply(lambda x: code_dict[x])
cinfo.loc[:, "name"] = (cinfo.year.apply(str) + "_" + cinfo.code +
                        "_A")

cdts = cinfo.dates.copy()
cdts.index = cinfo.name

ts = ts[ts < pd.Timestamp("2017-10-01")]
%timeit mapping.mappings.roller(timestamps=ts, contract_dates=cdts, get_weights=mapping.mappings.static_transition, transition=transition)

1 loop, best of 3: 11.1 s per loop

This could likely be sped up by using a numpy.ndarray instead of pandas.DataFrame since iterative calls to loc[] can be quite slow.

For reference, a dummy get_weights() implementation is timed below

def dummy_getter(ts, dummy):
    return [("A1", "2015_Z_A", 1, ts)]

%timeit mapping.mappings.roller(timestamps=ts, contract_dates=cdts, get_weights=dummy_getter)

100 loops, best of 3: 9.87 ms per loop

Numpy 1.15.0 breaking tests

Upgrading numpy to 1.15.0 is breaking unit tests. The issue is from a type conversion in np.busday_count taking in pd.Timestamps

import pandas as pd
import numpy as np

print(np.__version__)
1.15.0
print(pd.__version__)
0.23.0

front_expiry_dt = pd.Timestamp('2016-10-20')
timestamp = pd.Timestamp('2016-10-19')
np.busday_count(front_expiry_dt, timestamp)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-68d8718b4c95> in <module>()
----> 1 np.busday_count(front_expiry_dt, timestamp)

TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'

NaNs for missing returns data in util.calc_rets

Currently when passing returns data to util.calc_rets which does not contain values for all the weights passed in, NaNs result for missing data. This is a result of how MultiIndex selection works. As discussed here, at some point this will throw a warning and eventually an error, similar to how it works for pandas >= 0.22 for Indexs as discussed here. Throwing a KeyError in the example below would be preferable behaviour, which will eventually be the behaviour of pandas. When this behaviour is implemented a unit test covering this case should be added.

import pandas as pd
from mapping import util

idx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-03'), 'CLF5'),
                                 (pd.Timestamp('2015-01-04'), 'CLF5'),
                                 (pd.Timestamp('2015-01-04'), 'CLG5')])
irets = pd.Series([0.02, 0.01, 0.012], index=idx)
vals = [1, 1/2, 1/2, 1]
widx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-03'), 'CLF5'),
                                  (pd.Timestamp('2015-01-04'), 'CLF5'),
                                  (pd.Timestamp('2015-01-04'), 'CLG5'),
                                  (pd.Timestamp('2015-01-05'), 'CLG5')])
weights = pd.DataFrame(vals, index=widx, columns=["CL1"])

irets.head()

2015-01-03  CLF5    0.020
2015-01-04  CLF5    0.010
            CLG5    0.012
dtype: float64

weights.head()

                 CL1
2015-01-03 CLF5  1.0
2015-01-04 CLF5  0.5
           CLG5  0.5
2015-01-05 CLG5  1.0

util.calc_rets(irets, weights)

                 CL1
2015-01-03  0.004821
2015-01-04  0.003482
2015-01-05       NaN

MultiIndex names breaks util.reindex

Currently there is a bug in pandas whereby ordering of columns resulting from MultiIndex.to_frame() is inconsistent. As a result util.reindex() will possibly break due to reordering since the assumption is made that the order is ['date', 'instrument'] when the assignment is done via first_instr.columns = ['date', 'instrument']

conda create -q -n pandas1 python=3.6 pandas=0.23.4
source activate pandas1

>>> import pandas as pd
>>> 
>>> pd.MultiIndex.from_tuples(
...     [("B1", "A1"), ("B2", "A1")], names=["B", "A"]
... ).to_frame()
        B   A
B  A         
B1 A1  B1  A1
B2 A1  B2  A1

conda create -q -n pandas2 python=3.5 pandas=0.23.4
source activate pandas2

import pandas as pd

pd.MultiIndex.from_tuples(
    [("B1", "A1"), ("B2", "A1")], names=["B", "A"]
).to_frame()

        A   B
B  A         
B1 A1  A1  B1
B2 A1  A1  B2

TravisCI install of cvxpy

TravisCI is failing to install cvxpy due to a failure insalling osqp. Logs for the failure can be found here. Some discussion of the issue is available here

Reindexing returns without instrument data

Using util.reindex() with an instrument which has no data results in error.

from mapping import util
import pandas as pd
from pandas import Timestamp as TS

idx = pd.MultiIndex.from_tuples([(TS('2015-01-03'), 'CLF5')])
returns = pd.Series([0.02], index=idx)
widx = pd.MultiIndex.from_tuples([(TS('2015-01-03'), 'CLF5'),
                                  (TS('2015-01-04'), 'CLH5')])
new_rets = util.reindex(returns, widx, limit=1)

...
TypeError: 'NoneType' object is not subscriptable

The problem is related to the filling the first value of the returns. The issue is with the line cumulative_rets.groupby(level=1).apply(lambda x: x.first_valid_index()) which returns

CLF5    (2015-01-03 00:00:00, CLF5)
CLH5                           None
dtype: object

this is then cast to a list which is used for indexing, but we cannot index by None

The expected behaviour for this should be to return new_rets as

2015-01-03  CLF5    0.02
2015-01-04  CLH5   NaN

Missing weights for return days

There are times when an underlying instrument has values for a given day but there are no weights for this day (for example if the user has this defined as a holiday and therefore does not include a weight for it). This causes issues when splicing returns since the returns for the date for which there are no weights are dropped instead of being compounded into the following date.

import pandas as pd
import mapping.util as util
idx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'CLF5'),
                                 (pd.Timestamp('2015-01-03'), 'CLF5'),
                                 (pd.Timestamp('2015-01-04'), 'CLF5'),])
rets = pd.Series([0.02, -0.03, 0.06], index=idx)
vals = [1, 1]
widx = pd.MultiIndex.from_tuples([(pd.Timestamp('2015-01-02'), 'CLF5'),
                                  (pd.Timestamp('2015-01-04'), 'CLF5')])
weights = pd.DataFrame(vals, index=widx, columns=["CL1"])

>>> rets
2015-01-02  CLF5    0.02
2015-01-03  CLF5   -0.03
2015-01-04  CLF5    0.06

>>> weights
                 CL1
2015-01-02 CLF5    1
2015-01-04 CLF5    1

>>> util.calc_rets(rets, weights)
             CL1
2015-01-02  0.02
2015-01-04  0.06

Whereas the result should be

2015-01-02  0.02
2015-01-04  0.0282

Cant pip install mapping bcz it needs another package that is missing but is already installed

Hi, I trying to pip install mapping but it gives me an error that i need CMake to install Mapping but the thing is that CMake is already installed and I have tried installing it again but it gives me the same error. What should I do?

Confusing contract_dates indexing error message

When insufficient contract_dates are given to mappings.static_transition the instrument associated with a generic position at a given time will not exist resulting in an IndexError. Currently the message is difficult to interpret and should be revised to incorporate the tuple of generic information that resulting in mapping to the positional index to lookup the expected instrument.

import pandas as pd
from mapping import mappings

cols = pd.MultiIndex.from_product([["CL1", "CL2"], ['front', 'back']])
idx = [-2, -1, 0]
transition = pd.DataFrame([[1.0, 0.0, 1.0, 0.0], [0.5, 0.5, 0.5, 0.5],
                           [0.0, 1.0, 0.0, 1.0]],
                          index=idx, columns=cols)
contract_dates = pd.Series([pd.Timestamp('2016-10-20'),
                            pd.Timestamp('2016-11-21')],
                           index=['CLX16', 'CLZ16'])
ts = pd.Timestamp('2016-10-19')
wts = mappings.static_transition(ts, contract_dates, transition)

IndexError: index 2 is out of bounds for axis 0 with size 2. No 'back' contract for 2016-10-19 00:00:00
Insufficient 'contract_dates', last row:
CLZ16   2016-11-21
dtype: datetime64[ns]