Coder Social home page Coder Social logo

casact / chainladder-python Goto Github PK

View Code? Open in Web Editor NEW
180.0 19.0 69.0 29.78 MB

Actuarial reserving in Python

Home Page: https://chainladder-python.readthedocs.io/en/latest/

License: Mozilla Public License 2.0

Python 100.00%
reserving actuarial chainladder python pandas actuary scikit-learn estimators

chainladder-python's Introduction

chainladder (python)

PyPI version Conda Version Build Status codecov io Documentation Status

chainladder: Property and Casualty Loss Reserving in Python

Welcome! The chainladder package was built to be able to handle all of your actuarial needs in python. It consists of popular actuarial tools, such as triangle data manipulation, link ratios calculation, and IBNR estimates with both deterministic and stochastic models. We build this package so you no longer have to rely on outdated softwares and tools when performing actuarial pricing or reserving indications.

This package strives to be minimalistic in needing its own API. The syntax mimics popular packages pandas for data manipulation and scikit-learn for model construction. An actuary that is already familiar with these tools will be able to pick up this package with ease. You will be able to save your mental energy for actual actuarial work.

Chainladder is built by a group of volunteers, and we need YOUR help!

This package is written in Python, if you are looking for a similar package written in R, please visit chainladder.

Dedicated Documentation Site

We have a dedicated documentation website, where you can find installation instructions, tutorials, example galleries, sample datasets, API references, change log history, and more.

Visit Chainladder-Python on Read the Docs.

Licenses

This package is released under Mozilla Public License 2.0.

chainladder-python's People

Contributors

a108669 avatar aegerton avatar allenclong avatar andrejakobsen avatar brian-13 avatar cbalona avatar genedan avatar gig67 avatar henrydingliu avatar jbogaardt avatar jmatthewpeters avatar johalnes avatar kennethshsu avatar matthewcaseres avatar srwcf avatar synapticarbors avatar thequackdaddy avatar wleescor avatar yuuuxt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chainladder-python's Issues

need to support and/or filtering

import chainladder as cl

clrd=cl.load_dataset('clrd')
clrd[clrd.origin<1995][clrd['LOB']=='wkcomp'] # Works
clrd[(clrd.origin<1995)&(clrd['LOB']=='wkcomp')] # Doesn't work

Errors in projecting incomplete triangles

If you have multiple LOB in a triangle, but not all LOB have the same time period, then for some LOB the triangle fills the first rows and last columns with NaN and throws errors on computation.
e.g. a new LOB mixed with established LOB.
I got some div-by-zero errors and dimension errors when trying to apply the chainladder method.

json serializers for everything

Objects should serialize nicely to_json and from_json. This will be useful for adding RDBMS support for more sophisticated workflow management.

SyntaxError: invalid syntax (core.py, line 178) when trying to import chainladder package

I have created a reprex Project (https://rstudio.cloud/spaces/54476/project/1055533) in RStudioCloud with a virtual environment for Python3. When I try to load the chainladder package previously installed within that environment, it receives the following error message in RStudioCloud;-

> reticulate::repl_python()
Python 3.5.2 (/cloud/project/NSS/bin/python)
Reticulate 1.14.9001 REPL -- A Python interpreter in R.
>>> import chainladder as cl
SyntaxError: invalid syntax (core.py, line 178)

Here is the R script;-

install.packages("devtools")
devtools::install_github("rstudio/reticulate")
library(reticulate)
virtualenv_create("NSS", python = "/usr/bin/python3")
virtualenv_install('NSS', 'chainladder')
use_virtualenv('NSS', required = T)

Here is the Python script;-

import chainladder as cl

This works satisfactorily on my desktop version of R/RStudio. Any suggestions, please?

support pandas>=1.0.0

Biggest issue seems to be comparisons between PeriodIndex and Timestamp are no longer valid.

Ultimates is missing?

Did I miss something or is the Ultimates (method or attribute) missing from the Chainladder object? I looked at the code for it and it doesn't seem likes it there in the same form as the documentation. I'm happy to add it if it needs to be or change the documentation.

Thanks
Matt

Bokeh vs. Hoviews

So I agree that Bokeh is very pretty. But my experiments with it have left me a little bit overwhelmed with the complexity and large amounts of code necessary to get really pretty graphs.

Well, holoviews to the rescue. Holvoiews is essentially a wrapper for Bokeh (and matplotlib for that matter) so that you lose a little bit of the flexibility, but writing graphs is much, much easier (IMHO).

Development fit producing incorrect result with quarterly data

The ldfs fit for quarterly data are incorrect.
Simple short example below. With n_period = 1 the ldfs should be the same as the last diagonal/row of link_ratios, regardless of average='simple' or 'volume'. This appears to only work on the first quarter. Note that simple and volume (and regression) produce different results even with n_periods=1.

data = cl.load_dataset('quarterly')
data=data['paid']
print(data.link_ratio)
print(cl.Development(n_periods=1, average='simple').fit(data).ldf_)
print(cl.Development(n_periods=1, average='volume').fit(data).ldf_)

Example is easier to see if you run in a notebook without using print().

Development.drop_high and drop_low

the high/low parameters drop all LDFs that match the max. In the cases where all ldfs are the same (eg 1.0 in tail), the method fails,

More options on Development periods

In cl.Development, we need to add the ability to do:

  • excluding high, low or both high and low from each development period LDF
  • Ability to omit entire valuation dates
  • ability to omit a list of (origin, development) cells. e.g.
[('1988',24),('1987',48)]

Add attachment_age parameter to TailConstant

By default, tail factors are applied to the latest development date in a triangle. It would be nice to allow the user to override the default with a different age.

import chainladder as cl
cl.TailConstant(1.15, decay=.75, attachment_age=144)

Development lag calculation is off by one quarter

Importing data and creating a triangle with origin grain of 'Y' and development grain of 'Q' results in development calculation being off by 3.

Recreate:
`dict = {'state':['ST']*21, 'AY':['2017']*11 + ['2018']*7 + ['2019']*3,
'evaluation_date':['3/31/2017','6/30/2017','9/30/2017','12/31/2017', '3/31/2018','6/30/2018','9/30/2018','12/31/2018', '3/31/2019','6/30/2019','9/30/2019'
, '3/31/2018','6/30/2018','9/30/2018','12/31/2018', '3/31/2019','6/30/2019','9/30/2019', '3/31/2019','6/30/2019','9/30/2019'],
'paid': list(range(3,36,3)) + list(range(3,24,3)) + list(range(3,12,3))
}
df = pd.DataFrame(dict)

triangle = cl.Triangle(df,
origin='AY',
development=['evaluation_date'],
index = ['state'],
columns = ['paid']
)

print(triangle[triangle.development <= 6])`

Adding a zero to QuarterEnd in _period_end in chainladder/core/base.py from:
offset = {12: pd.tseries.offsets.MonthEnd(), 4: pd.tseries.offsets.QuarterEnd(), 1: pd.tseries.offsets.YearEnd()}
to
offset = {12: pd.tseries.offsets.MonthEnd(), 4: pd.tseries.offsets.QuarterEnd(0), 1: pd.tseries.offsets.YearEnd()}
seems to fix the problem. It doesn't appear to be an issue with annual grains. I have not tested month grains.

Tutorial example fit function not working

Hi,

I am trying to learn how to use your module but I am having issues with your example. I could not run some of the Development methods and thus I decided to try only with your included dataset. However it still does not work.

I have copied pasted the following code:

genins = cl.load_dataset('genins')

genins_dev = cl.Pipeline(
[('dev', cl.Development()),
('tail', cl.TailCurve())]).fit_transform(genins)

This works.

Then when I run genins_model = cl.Chainladder().fit(genins_dev), I get the error message below:
'NaTType' object has no attribute 'to_timestamp'

Would you know where this comes from?

Best,
Nassim

MunichAdjustment + Tail Factor

This code does not work, but needs to and should tie out to MunichChainLadder with tail in R package.

import chainladder as cl
tri = cl.load_dataset('mcl')
cl.MunichAdjustment(paid_to_incurred{'paid':'incurred'}) \ 
   .fit(cl.TailCurve().fit_transform(cl.Development().fit_transform(tri)))

Add a decay assumption to TailConstant

TailCurve decays over an annual period so that run-off expectations can be developed for origin periods that are at the end of the Triangle. TailConstant needs this same functionality with an explicit assumption at object initialization.

Triangle Class - Why not use pandas.Period?

So perusing through this code I have a few suggestions. Gonna make a separate issue for each.

For the triangle class, why not use pandas.Period type for the origin labels of the pandas dataframe? The advantages are that Period already has all of the fancy math already calculated, and its pretty easy to convert an array of dates or datetimes into Periods.

For example...

import pandas as pd

dates = pd.Series(['2017-06-29', '2015-03-21', '2016-10-15',
                   '2017-12-31', '2017-01-01', '2016-12-31',
                   '2017-09-30', '2017-09-01', '2017-10-01'])
dates = pd.DatetimeIndex(dates)

pd.PeriodIndex(dates, freq='A')  # Convert dates to year ending in December periods

pd.PeriodIndex(dates, freq='A-SEP')  # Convert dates to year ending in September periods

pd.PeriodIndex(dates, freq='Q')  # Convert dates to quarter periods

pd.PeriodIndex(dates, freq='M')  # Convert dates to month periods

A full list of all the available options are here

drop_high and drop_low to support lists

drop_high and drop_low are boolean and are an all or none selection for the Development estimator. Consider allowing a list with a length of our development age - 1 to allow these selections by different ages.

bootstrap hat matrix inversion problem on some triangles

When model is over-parameterized, we sometimes run into hat matrix inversion problems that causes BootstrapODPSample to fail. Here is some offending code:

import chainladder as cl
cl.BootstrapODPSample().fit(cl.load_dataset('quarterly')['paid'].iloc[0,0])

groupby not consistent with pandas

offending code throws an error, but doesn't error in a similar pandas construct:

import chainladder as cl
raa = cl.load_dataset('raa')
raa.groupby('Total').sum()

improve origin/development inference on Triangle class

date inference relies heavily on how pandas.to_datetime infers date-like columns. It is not perfect, so many date-like styles are not supported.

# Create a dataframe with a bunch of date-like columns
import pandas as pd
df = pd.Series(['2019-01-15','2018-07-15','2019-05-05','2019-10-01'],
               dtype='datetime64[ns]', name='date').to_frame()
df['year'] = df['date'].dt.year
df['quarter'] = df['date'].dt.quarter
df['month'] = df['date'].dt.month
df['year_month']=df['date'].dt.strftime('%Y%m').astype(int)
df['year_quarter1']=df['year'].astype(str) + 'Q' + df['quarter'].astype(str)
df['year_quarter2']=df['year'].astype(str) + '-Q' + df['quarter'].astype(str)
df['quarter_year'] = 'Q' + df['quarter'].astype(str) + df['year'].astype(str)
df['date_as_str1'] = df['date'].dt.strftime('%Y-%m-%d')
df['date_as_str2'] = df['date'].dt.strftime('%m/%d/%Y')

# See how chainladder infers them
import chainladder as cl  

cl.Triangle.to_datetime(df, ['year','quarter']) # works, but treats quarters as month
cl.Triangle.to_datetime(df, ['year','month'])  # works
cl.Triangle.to_datetime(df, ['date'])   # works
cl.Triangle.to_datetime(df, ['year_quarter1'])  # works
cl.Triangle.to_datetime(df, ['year_quarter2'])  # works
cl.Triangle.to_datetime(df, ['date_as_str1'])  # works
cl.Triangle.to_datetime(df, ['date_as_str2'])  # works
# cl.Triangle.to_datetime(df, ['quarter_year']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Month', 'Day', 'Year']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Year','Month', 'Day']) # doesn't work, but should
# cl.Triangle.to_datetime(df, ['Quarter', 'Year']) # doesn't work, but probably should

An improvement would be to expose the format argument to the user, e.g. origin_format, development_format in Triangle.__init__ to allow them to auto-override the auto-inference.

Triangle Class - Internal Storing of Data

The triangle class allows you to import data in 2 manners:

  • Tabular or "long" format
  • Triangle format

I would suggest that that--while you can import the data in either format, internally Triangle should reshape the data so that it is one of the two. I'm somewhat ambivalent which one is more appropriate--although I actually have a slight preference for the triangle format--I think. Not fully convinced one way or the other.

My objection to having the class save it in either format is that it means that all the resulting methods/functions need to be aware if the triangle is in trabular or triangular format first, and you have to write separate code for each usecase. If we store the data internally as one, this problem is eliminated.

User-specified column order should be honored

Like pandas, a user should be able to rearrange the order of columns in a triangle.

A failing unit test:

import chainladder as cl
clrd = cl.load_dateset('clrd')
a = clrd[['CumPaidLoss','BulkLoss']]
b = clrd[['BulkLoss', 'CumPaidLoss']]
assert not a.values.all() == b.values.all()

cdf_ labels should be 'xx-Ult'

cdf_.development copies from ldf_.development, but the preference would be to relabel them as age to ultimate labels.

Problems converting a triangle to a higher grain using the grain method

I created a dataframe with OriginDates and Development Dates, from 01-01-2017 to 31-12-2019, and a payment value of 1 for each Origin Month-Development Month.

Then, the incremental triangle with Origin in Months and Development in Months will have a value of 1 in each cell. I created this triangle with the chainladder package and it is ok.

However, when I try to transform the incremental triangle to Origin in Years and Development in Years, using the grain method, the result is:
______ 11___ 23___35
2017 66.0 144.0 144.0
2018 66.0 144.0 NaN
2019 66.0 NaN NaN

But to be correct, it should be:
______ 12___ 24___36
2017 78.0 144.0 144.0
2018 78.0 144.0 NaN
2019 78.0 NaN NaN

I also tried to transform first the triangle into cumulative, but it still doesn't show the expected triangle.

Code:
datesm = pd.date_range('2017-01-01', '2019-12-31', freq='M')

origin_months = datesm.repeat(repeats=list(range(36,0,-1)))

dev_months = list()
for i in range(len(datesm)):
dev_months.extend(datesm[i:len(datesm)])

incremental_payments = [1]*len(origin_months)

dict = {'AccidentMonths':origin_months, 'DevelopmentMonths':dev_months, 'Payments':incremental_payments}
testdf = pd.DataFrame(dict)

xMM = cl.Triangle(data=testdf, origin='AccidentMonths', development='DevelopmentMonths', columns='Payments')

xYY = xMM.grain('OYDY')

xMM
xYY

Thank you!

PEP8 - Lots o changes

Much of the code does not conform to PEP8. Fully understand this is a free tool and all, but PEP8 provides a nice (mostly) standardized way to write python. I think its unnecessary (and potentially dangerous) to make PRs for the sole purpose of major PEP8 fixes. It would make more sense to just fix the problems as the code is refactored.

To that end, would there be objections to having one of the CI builds just run a linter/flake8 on the diff between the PR and master? Another good encouragement tool.

Note that most python editors I'm aware of have PEP8 style analysis/linters either built in or as easily installable plugins. Atom and spyder are 2 I use regularly and both do it very nicely.

cl.concat

Should have cl.concat analogous to pd.concat

also Triangle.append needs to work better when merging triangles with differing origin/development axes.

Slicing cdf not working correctly

offending code:

import chainladder as cl
raa = cl.load_dataset('raa')
cdf = cl.Development().fit(raa).cdf_
cdf[cdf.origin==cdf.origin.max()]

Propagate bootstrap process risk

Need to complete bootstrap functionality to allow for propagation of process risk into the deterministic methods. Currently bootstrap only supports the simulation of new triangles, but does not simulate IBNR with requisite process uncertainty.

TailCurve shape misaligned

offending code:

import chainladder as cl
tri = cl.load_dataset('raa')
steps = [('dev', cl.Development()),     
         ('tail', cl.TailCurve()), 
         ('chainladder', cl.Chainladder())]           
pipe = cl.Pipeline(steps=steps).fit(tri)
pipe.predict(tri)

MackChainladder.total_mack_std_err_ TypeError

Current property is incompatible with PeriodIndex datatype for origin.
This code should work but doesn't:

import chainladder as cl
clrd = cl.load_dataset('clrd').groupby('LOB').sum()['CumPaidLoss']
cl.MackChainladder().fit(clrd).total_mack_std_err_.to_frame()

Either need to assign origin a to valuation_date or ultimate date, or return a DataFrame and not a Triangle object.

Load datasets fails

Tried some of the examples listed in the 2.0.1 documentation.

In the pd.read_pickle() function the input name is converted to lower keys, but when installed from PIP many of the datasets is named with uppercase
skjermbilde 2019-01-25 kl 22 20 10
which gives error on many of the nice examples provided on read the docs.

Using read_pickle direct on path where package is installed works.

arithmetic fail

offending code throws error:

import chainladder as cl
raa = cl.load_dataset('raa')
raa.latest_diagonal / raa

dropna + grain

dropna will create triangles with odd shapes and these will not always work with grain. Need to make grain function work better with these odd shapes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.