casact / chainladder-python Goto Github PK

View Code? Open in Web Editor NEW

185.0 19.0 71.0 29.78 MB

Actuarial reserving in Python

Home Page: https://chainladder-python.readthedocs.io/en/latest/

License: Mozilla Public License 2.0

Python 100.00%

reserving actuarial chainladder python pandas actuary scikit-learn estimators

chainladder-python's Introduction

chainladder (python)

chainladder: Property and Casualty Loss Reserving in Python

Welcome! The chainladder package was built to be able to handle all of your actuarial needs in python. It consists of popular actuarial tools, such as triangle data manipulation, link ratios calculation, and IBNR estimates with both deterministic and stochastic models. We build this package so you no longer have to rely on outdated softwares and tools when performing actuarial pricing or reserving indications.

This package strives to be minimalistic in needing its own API. The syntax mimics popular packages pandas for data manipulation and scikit-learn for model construction. An actuary that is already familiar with these tools will be able to pick up this package with ease. You will be able to save your mental energy for actual actuarial work.

Chainladder is built by a group of volunteers, and we need YOUR help!

This package is written in Python, if you are looking for a similar package written in R, please visit chainladder.

Dedicated Documentation Site

We have a dedicated documentation website, where you can find installation instructions, tutorials, example galleries, sample datasets, API references, change log history, and more.

Visit Chainladder-Python on Read the Docs.

Licenses

This package is released under Mozilla Public License 2.0.

chainladder-python's People

Contributors

Stargazers

Watchers

Forkers

libardo1 thequackdaddy gitter-badger aragondajyosna raangulo snoozelieb abirr97 kevinsia kmostafa89 mdoerner jiayiderekchen aborah30 allenclong anjalicdw devenlu gig67 ruinedsubmartingale actuaryjoe luthieisilra johalnes agjackie synapticarbors actuarial-tools cbalona smsinclair brian-13 aegerton fibonacc112358 hp-analytica attiguyas enon0376 fychf genedan hepdowning zoasis21 odddkidout bndifferential kevinbamouni srwcf muhammadshamsherkhan2002 marc-c-schmidt claviermathieu jbogaardt j-maxey stephenll-forks kuo-tingkai jabran cbio-brendan stitliz tyler-corcoran phuang1226 henrydingliu wleescor jmgonz4 a108669 pafechet dreamfactory100 andrejakobsen miriumm el-maudra jzhng105 angelapper corneliusfranken blayz3r ramsaideep lorentzenchr thenorskeman matthewcaseres kiy0on eggbert001 cynthiatchetagni

chainladder-python's Issues

Tail on Triangle using Development defaults

Make tails work with triangle by using default development. For example, this should work:

import chainladder as cl
cl.TailCurve().fit(cl.load_dataset('raa'))

More options on Development periods

In cl.Development, we need to add the ability to do:

excluding high, low or both high and low from each development period LDF
Ability to omit entire valuation dates
ability to omit a list of (origin, development) cells. e.g.

[('1988',24),('1987',48)]

triangle needs to support partial accident years/quarters

If an accident month triangle starts with May-YYYY (or any month other than January), then grain('OYDY') will return a misshaped triangle that does not behave as intended with the rest of the chainladder package.

Refactor WeightedRegression

WeightedRegression class does not use the sklearn-style approach for an estimator, but it should.

Bokeh vs. Hoviews

So I agree that Bokeh is very pretty. But my experiments with it have left me a little bit overwhelmed with the complexity and large amounts of code necessary to get really pretty graphs.

Well, holoviews to the rescue. Holvoiews is essentially a wrapper for Bokeh (and matplotlib for that matter) so that you lose a little bit of the flexibility, but writing graphs is much, much easier (IMHO).

Add a decay assumption to TailConstant

TailCurve decays over an annual period so that run-off expectations can be developed for origin periods that are at the end of the Triangle. TailConstant needs this same functionality with an explicit assumption at object initialization.

Triangle Class - Why not use pandas.Period?

So perusing through this code I have a few suggestions. Gonna make a separate issue for each.

For the triangle class, why not use pandas.Period type for the origin labels of the pandas dataframe? The advantages are that Period already has all of the fancy math already calculated, and its pretty easy to convert an array of dates or datetimes into Periods.

For example...

import pandas as pd

dates = pd.Series(['2017-06-29', '2015-03-21', '2016-10-15',
                   '2017-12-31', '2017-01-01', '2016-12-31',
                   '2017-09-30', '2017-09-01', '2017-10-01'])
dates = pd.DatetimeIndex(dates)

pd.PeriodIndex(dates, freq='A')  # Convert dates to year ending in December periods

pd.PeriodIndex(dates, freq='A-SEP')  # Convert dates to year ending in September periods

pd.PeriodIndex(dates, freq='Q')  # Convert dates to quarter periods

pd.PeriodIndex(dates, freq='M')  # Convert dates to month periods

A full list of all the available options are here

SyntaxError: invalid syntax (core.py, line 178) when trying to import chainladder package

I have created a reprex Project (https://rstudio.cloud/spaces/54476/project/1055533) in RStudioCloud with a virtual environment for Python3. When I try to load the chainladder package previously installed within that environment, it receives the following error message in RStudioCloud;-

> reticulate::repl_python()
Python 3.5.2 (/cloud/project/NSS/bin/python)
Reticulate 1.14.9001 REPL -- A Python interpreter in R.
>>> import chainladder as cl
SyntaxError: invalid syntax (core.py, line 178)

Here is the R script;-

install.packages("devtools")
devtools::install_github("rstudio/reticulate")
library(reticulate)
virtualenv_create("NSS", python = "/usr/bin/python3")
virtualenv_install('NSS', 'chainladder')
use_virtualenv('NSS', required = T)

Here is the Python script;-

import chainladder as cl

This works satisfactorily on my desktop version of R/RStudio. Any suggestions, please?

ReadtheDocs documentation does not render well on mobile device

Tutorial example fit function not working

Hi,

I am trying to learn how to use your module but I am having issues with your example. I could not run some of the Development methods and thus I decided to try only with your included dataset. However it still does not work.

I have copied pasted the following code:

genins = cl.load_dataset('genins')

genins_dev = cl.Pipeline(
[('dev', cl.Development()),
('tail', cl.TailCurve())]).fit_transform(genins)

This works.

Then when I run genins_model = cl.Chainladder().fit(genins_dev), I get the error message below:
'NaTType' object has no attribute 'to_timestamp'

Would you know where this comes from?

Best,
Nassim

drop_high and drop_low to support lists

drop_high and drop_low are boolean and are an all or none selection for the Development estimator. Consider allowing a list with a length of our development age - 1 to allow these selections by different ages.

Errors in projecting incomplete triangles

If you have multiple LOB in a triangle, but not all LOB have the same time period, then for some LOB the triangle fills the first rows and last columns with NaN and throws errors on computation.
e.g. a new LOB mixed with established LOB.
I got some div-by-zero errors and dimension errors when trying to apply the chainladder method.

bootstrap hat matrix inversion problem on some triangles

When model is over-parameterized, we sometimes run into hat matrix inversion problems that causes BootstrapODPSample to fail. Here is some offending code:

import chainladder as cl
cl.BootstrapODPSample().fit(cl.load_dataset('quarterly')['paid'].iloc[0,0])

MackChainladder.total_mack_std_err_ TypeError

Current property is incompatible with PeriodIndex datatype for origin.
This code should work but doesn't:

import chainladder as cl
clrd = cl.load_dataset('clrd').groupby('LOB').sum()['CumPaidLoss']
cl.MackChainladder().fit(clrd).total_mack_std_err_.to_frame()

Either need to assign origin a to valuation_date or ultimate date, or return a DataFrame and not a Triangle object.

MunichAdjustment + Tail Factor

This code does not work, but needs to and should tie out to MunichChainLadder with tail in R package.

import chainladder as cl
tri = cl.load_dataset('mcl')
cl.MunichAdjustment(paid_to_incurred{'paid':'incurred'}) \ 
   .fit(cl.TailCurve().fit_transform(cl.Development().fit_transform(tri)))

Development.drop_high and drop_low

the high/low parameters drop all LDFs that match the max. In the cases where all ldfs are the same (eg 1.0 in tail), the method fails,

cell coloring option for HTML representation

Allow for optional coloring of cells of triangle (particular link-ratios) to better visualize outliers similar to pandas.DataFrame.style.background_gradient

Hetero Groups in Bootstrap

Need to include Shapland's hetero grouping option in ODP Bootstrap simulation.

Development fit producing incorrect result with quarterly data

The ldfs fit for quarterly data are incorrect.
Simple short example below. With n_period = 1 the ldfs should be the same as the last diagonal/row of link_ratios, regardless of average='simple' or 'volume'. This appears to only work on the first quarter. Note that simple and volume (and regression) produce different results even with n_periods=1.

data = cl.load_dataset('quarterly')
data=data['paid']
print(data.link_ratio)
print(cl.Development(n_periods=1, average='simple').fit(data).ldf_)
print(cl.Development(n_periods=1, average='volume').fit(data).ldf_)

Example is easier to see if you run in a notebook without using print().

TailCurve shape misaligned

offending code:

import chainladder as cl
tri = cl.load_dataset('raa')
steps = [('dev', cl.Development()),     
         ('tail', cl.TailCurve()), 
         ('chainladder', cl.Chainladder())]           
pipe = cl.Pipeline(steps=steps).fit(tri)
pipe.predict(tri)

improve origin/development inference on Triangle class

date inference relies heavily on how pandas.to_datetime infers date-like columns. It is not perfect, so many date-like styles are not supported.

# Create a dataframe with a bunch of date-like columns
import pandas as pd
df = pd.Series(['2019-01-15','2018-07-15','2019-05-05','2019-10-01'],
               dtype='datetime64[ns]', name='date').to_frame()
df['year'] = df['date'].dt.year
df['quarter'] = df['date'].dt.quarter
df['month'] = df['date'].dt.month
df['year_month']=df['date'].dt.strftime('%Y%m').astype(int)
df['year_quarter1']=df['year'].astype(str) + 'Q' + df['quarter'].astype(str)
df['year_quarter2']=df['year'].astype(str) + '-Q' + df['quarter'].astype(str)
df['quarter_year'] = 'Q' + df['quarter'].astype(str) + df['year'].astype(str)
df['date_as_str1'] = df['date'].dt.strftime('%Y-%m-%d')
df['date_as_str2'] = df['date'].dt.strftime('%m/%d/%Y')

# See how chainladder infers them
import chainladder as cl  

cl.Triangle.to_datetime(df, ['year','quarter']) # works, but treats quarters as month
cl.Triangle.to_datetime(df, ['year','month'])  # works
cl.Triangle.to_datetime(df, ['date'])   # works
cl.Triangle.to_datetime(df, ['year_quarter1'])  # works
cl.Triangle.to_datetime(df, ['year_quarter2'])  # works
cl.Triangle.to_datetime(df, ['date_as_str1'])  # works
cl.Triangle.to_datetime(df, ['date_as_str2'])  # works
# cl.Triangle.to_datetime(df, ['quarter_year']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Month', 'Day', 'Year']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Year','Month', 'Day']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Quarter', 'Year']) # doesn't work, but probably should

An improvement would be to expose the format argument to the user, e.g. origin_format, development_format in Triangle.__init__ to allow them to auto-override the auto-inference.

Problems converting a triangle to a higher grain using the grain method

I created a dataframe with OriginDates and Development Dates, from 01-01-2017 to 31-12-2019, and a payment value of 1 for each Origin Month-Development Month.

Then, the incremental triangle with Origin in Months and Development in Months will have a value of 1 in each cell. I created this triangle with the chainladder package and it is ok.

However, when I try to transform the incremental triangle to Origin in Years and Development in Years, using the grain method, the result is:
______ 11___ 23___35
2017 66.0 144.0 144.0
2018 66.0 144.0 NaN
2019 66.0 NaN NaN

But to be correct, it should be:
______ 12___ 24___36
2017 78.0 144.0 144.0
2018 78.0 144.0 NaN
2019 78.0 NaN NaN

I also tried to transform first the triangle into cumulative, but it still doesn't show the expected triangle.

Code:
datesm = pd.date_range('2017-01-01', '2019-12-31', freq='M')

origin_months = datesm.repeat(repeats=list(range(36,0,-1)))

dev_months = list()
for i in range(len(datesm)):
dev_months.extend(datesm[i:len(datesm)])

incremental_payments = [1]*len(origin_months)

dict = {'AccidentMonths':origin_months, 'DevelopmentMonths':dev_months, 'Payments':incremental_payments}
testdf = pd.DataFrame(dict)

xMM = cl.Triangle(data=testdf, origin='AccidentMonths', development='DevelopmentMonths', columns='Payments')

xYY = xMM.grain('OYDY')

Thank you!

Attachment age support in Tail methods

Tails attach to the end of LDF/CDF arrays. There is currently no way to express attaching the tail factor to an earlier maturity.

Using existing names doesn't replace column

Offending code:

import chainladder as cl
raa = cl.load_dataset('raa')
raa['values']=raa['values']*1
raa

Triangle.grain only works on cumulative triangles

Need to add functionality to the method that allows the user to specify whether a triangle is incremental and then appropriately compute the grain change.

Triangle.trend functionality along the diagonal (valuation axis)

Currently we can trend Triangle data along the origin or development axes, but it would be nice to include functionality for the the valuation axis as well.

cl.concat

Should have cl.concat analogous to pd.concat

also Triangle.append needs to work better when merging triangles with differing origin/development axes.

Chainladder version compatible with IBM Watson

The highest Python version used by Watson Studio is 3.5. Do you have chainladder version that is compatible with one of the Watson Studio Python versions?

Ultimates is missing?

Did I miss something or is the Ultimates (method or attribute) missing from the Chainladder object? I looked at the code for it and it doesn't seem likes it there in the same form as the documentation. I'm happy to add it if it needs to be or change the documentation.

Thanks
Matt

support pandas>=1.0.0

Biggest issue seems to be comparisons between PeriodIndex and Timestamp are no longer valid.

GridSearchCV with time-series cv functionality

Need to create cross-validation functionality that holds out valuation dates similar to sklearn.model_selection.TimeSeriesSplit.

need to support and/or filtering

import chainladder as cl

clrd=cl.load_dataset('clrd')
clrd[clrd.origin<1995][clrd['LOB']=='wkcomp'] # Works
clrd[(clrd.origin<1995)&(clrd['LOB']=='wkcomp')] # Doesn't work

cdf_ labels should be 'xx-Ult'

cdf_.development copies from ldf_.development, but the preference would be to relabel them as age to ultimate labels.

json serializers for everything

Objects should serialize nicely to_json and from_json. This will be useful for adding RDBMS support for more sophisticated workflow management.

Bayesian MCMC

Add method to do Baesian MCMC. See:
CAS Monograph #1

arithmetic fail

offending code throws error:

import chainladder as cl
raa = cl.load_dataset('raa')
raa.latest_diagonal / raa

Propagate bootstrap process risk

Need to complete bootstrap functionality to allow for propagation of process risk into the deterministic methods. Currently bootstrap only supports the simulation of new triangles, but does not simulate IBNR with requisite process uncertainty.

groupby not consistent with pandas

offending code throws an error, but doesn't error in a similar pandas construct:

import chainladder as cl
raa = cl.load_dataset('raa')
raa.groupby('Total').sum()

Development lag calculation is off by one quarter

Importing data and creating a triangle with origin grain of 'Y' and development grain of 'Q' results in development calculation being off by 3.

Recreate:
`dict = {'state':['ST']*21, 'AY':['2017']*11 + ['2018']*7 + ['2019']*3,
'evaluation_date':['3/31/2017','6/30/2017','9/30/2017','12/31/2017', '3/31/2018','6/30/2018','9/30/2018','12/31/2018', '3/31/2019','6/30/2019','9/30/2019'
, '3/31/2018','6/30/2018','9/30/2018','12/31/2018', '3/31/2019','6/30/2019','9/30/2019', '3/31/2019','6/30/2019','9/30/2019'],
'paid': list(range(3,36,3)) + list(range(3,24,3)) + list(range(3,12,3))
}
df = pd.DataFrame(dict)

triangle = cl.Triangle(df,
origin='AY',
development=['evaluation_date'],
index = ['state'],
columns = ['paid']
)

print(triangle[triangle.development <= 6])`

Adding a zero to QuarterEnd in _period_end in chainladder/core/base.py from:
offset = {12: pd.tseries.offsets.MonthEnd(), 4: pd.tseries.offsets.QuarterEnd(), 1: pd.tseries.offsets.YearEnd()}
to
offset = {12: pd.tseries.offsets.MonthEnd(), 4: pd.tseries.offsets.QuarterEnd(0), 1: pd.tseries.offsets.YearEnd()}
seems to fix the problem. It doesn't appear to be an issue with annual grains. I have not tested month grains.

documentation search on ReadTheDocs is pointed at sklearn and not chainladder

dropna + grain

dropna will create triangles with odd shapes and these will not always work with grain. Need to make grain function work better with these odd shapes.

Triangle Class - Internal Storing of Data

The triangle class allows you to import data in 2 manners:

Tabular or "long" format
Triangle format

I would suggest that that--while you can import the data in either format, internally Triangle should reshape the data so that it is one of the two. I'm somewhat ambivalent which one is more appropriate--although I actually have a slight preference for the triangle format--I think. Not fully convinced one way or the other.

My objection to having the class save it in either format is that it means that all the resulting methods/functions need to be aware if the triangle is in trabular or triangular format first, and you have to write separate code for each usecase. If we store the data internally as one, this problem is eliminated.

Add attachment_age parameter to TailConstant

By default, tail factors are applied to the latest development date in a triangle. It would be nice to allow the user to override the default with a different age.

import chainladder as cl
cl.TailConstant(1.15, decay=.75, attachment_age=144)

grain doesn't work on full_expectation or full_triangle

offending code:

import chainladder as cl
tri = cl.load_dataset('quarterly')
cl.Chainladder().fit(tri).full_expectation_.grain("OYDY")

PEP8 - Lots o changes

Much of the code does not conform to PEP8. Fully understand this is a free tool and all, but PEP8 provides a nice (mostly) standardized way to write python. I think its unnecessary (and potentially dangerous) to make PRs for the sole purpose of major PEP8 fixes. It would make more sense to just fix the problems as the code is refactored.

To that end, would there be objections to having one of the CI builds just run a linter/flake8 on the diff between the PR and master? Another good encouragement tool.

Note that most python editors I'm aware of have PEP8 style analysis/linters either built in or as easily installable plugins. Atom and spyder are 2 I use regularly and both do it very nicely.

Slicing cdf not working correctly

offending code:

import chainladder as cl
raa = cl.load_dataset('raa')
cdf = cl.Development().fit(raa).cdf_
cdf[cdf.origin==cdf.origin.max()]

User-specified column order should be honored

Like pandas, a user should be able to rearrange the order of columns in a triangle.

A failing unit test:

import chainladder as cl
clrd = cl.load_dateset('clrd')
a = clrd[['CumPaidLoss','BulkLoss']]
b = clrd[['BulkLoss', 'CumPaidLoss']]
assert not a.values.all() == b.values.all()

Using read_pickle direct on path where package is installed works.

casact / chainladder-python Goto Github PK

chainladder-python's Introduction

chainladder (python)

chainladder: Property and Casualty Loss Reserving in Python

Dedicated Documentation Site

Licenses

chainladder-python's People

Contributors

Stargazers

Watchers

Forkers

chainladder-python's Issues

Recommend Projects

Recommend Topics

Recommend Org