Coder Social home page Coder Social logo

derrynknife / surpyval Goto Github PK

View Code? Open in Web Editor NEW
47.0 4.0 5.0 22.42 MB

A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can also fit distributions with 'offsets' with ease, for example the three parameter Weibull distribution.

Home Page: https://surpyval.readthedocs.io/en/latest/index.html

License: MIT License

Python 99.16% TeX 0.84%
survival-analysis parametric-methods probability-plot parametric-distribution weibull reliability reliability-engineering churn-prediction non-parametric risk-analysis

surpyval's People

Contributors

anthonycarbone avatar derrynknife avatar dfm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

surpyval's Issues

Left-censored data error

Hello,

I am trying to make analysis using left-censored data using the non-parametric estimators Kaplan-Meier but I get the following error:

import numpy as np
import surpyval as surv
x = np.array([20,20,50,50,70])
c = np.array([-1,-1,-1,-1,-1])
km_surpyval = surv.KaplanMeier.fit(x=x,c=c)
km_surpyval.plot()

The error:

ValueError: xrd format can't be used with left (c=-1) or interval (c=2) censoring

What are the alternatives?

Confidence bounds.

Need to improve the confidence bounds. They are working but only for confidence on reliability. And uncertainty on the offset value should be optional.

10.0.1 lost fs_to_xcn

I cannot call fs_to_xcn in version 10.0.1; I had to go back to 10.0.0 to get the function. I am using right censored data.

Functions ExpoWeibull, LogLogistic, and LogNormal fail to compute confidence bounds

When using the functions ExpoWeibull, LogLogistic, LogNormal with Maximum Likelihood Estimation. The method fails to compute confidence intervals. See the example below.

from surpyval import ExpoWeibull, LogLogistic, LogNormal
for _ in ['ExpoWeibull','LogLogistic','LogNormal']:
    if _ == 'ExpoWeibull':
        x = eval(_+'.random(100,1,1,1)')
    else:
        x = eval(_+'.random(100,1,1)')
    model = ExpoWeibull.fit(x)
    print(model.cb(0))

It outputs for all three cases that the confidence bound:

[[nan nan]]

Gumbel distribution

I was giving a look at the survival function of the Gumbel distribution, and I found that it is defined as:

S(x) = np.exp(-np.exp((x - mu)/sigma))

Shouldn't it be:

S(x) = 1 - np.exp(-np.exp(-(x - mu)/sigma))

I changed it manually in the code, and when I fitted the function the problem was not solved (i.e., the fitting is still done with the first equation shown above). Any suggestions on what to do?

Possible documentation error

In docs/Non-Parametric Estimation.rst, under Turnbull Estimation, is the equation for M(s) correct? I wouldn't have expected M(s) to be in the denominator on the right hand side. Perhaps it was an inadvertent carryover from the previous equation?

This is how it is now:
image

[Joss review] Test tolerance and sample sizes

I'd like to see tighter tolerances in the tests, and tests with smaller samples sizes. Currently the TOL is set to 0.2 - this is quite a large margin, practically useless in some cases. The minimum sample size is 5000 - that's medium-sized data, but how well do the algorithms work for small size data (sub 100, for example).

openjournals/joss-reviews#3484

model.plot() not working

Hello, I am new to this library. I have been looking for one that handles interval censored data and came across it.
I quite like the Surpyval however I have not been able to visualize anything yet as i keep getting this and and empty plot:

"ValueError: keyword grid_b is not recognized; valid keywords are ['size', 'width', 'color', 'tickdir', 'pad', 'labelsize', 'labelcolor', 'labelfontfamily', 'zorder', 'gridOn', 'tick1On', 'tick2On', 'label1On', 'label2On', 'length', 'direction', 'left', 'bottom', 'right', 'top', 'labelleft', 'labelbottom', 'labelright', 'labeltop', 'labelrotation', 'grid_agg_filter', 'grid_alpha', 'grid_animated', 'grid_antialiased', 'grid_clip_box', 'grid_clip_on', 'grid_clip_path', 'grid_color', 'grid_dash_capstyle', 'grid_dash_joinstyle', 'grid_dashes', 'grid_data', 'grid_drawstyle', 'grid_figure', 'grid_fillstyle', 'grid_gapcolor', 'grid_gid', 'grid_in_layout', 'grid_label', 'grid_linestyle', 'grid_linewidth', 'grid_marker', 'grid_markeredgecolor', 'grid_markeredgewidth', 'grid_markerfacecolor', 'grid_markerfacecoloralt', 'grid_markersize', 'grid_markevery', 'grid_mouseover', 'grid_path_effects', 'grid_picker', 'grid_pickradius', 'grid_rasterized', 'grid_sketch_params', 'grid_snap', 'grid_solid_capstyle', 'grid_solid_joinstyle', 'grid_transform', 'grid_url', 'grid_visible', 'grid_xdata', 'grid_ydata', 'grid_zorder', 'grid_aa', 'grid_c', 'grid_ds', 'grid_ls', 'grid_lw', 'grid_mec', 'grid_mew', 'grid_mfc', 'grid_mfcalt', 'grid_ms']"

Is anyone else experiencing this?

Mixture Models

Hi,
I find this python package very helpful for analyzing weibull data. However, I found the documentation on "Mixture models" could be more elaborate.
I was using the following example:

import surpyval as surv
import numpy as np
from matplotlib import pyplot as plt

x = [1, 2, 3, 4, 5, 6, 6, 7, 8, 10, 13, 15, 16, 17 ,17, 18, 19]
x_ = np.linspace(np.min(x), np.max(x))

model = surv.Weibull.fit(x)
wmm = surv.MixtureModel(x=x, dist=surv.Weibull, m=2)

model.plot(plot_bounds=False)
plt.plot(x_, wmm.ff(x_))

However,
(1)I could not understand what is the percentage of data (or which data points) being assigned to first weibull distribution and what is the percentage of the data assigned to the 2nd one? The wmm.params gives me the paramers, but it does not give corresponding fraction of data from which it calculated. How do we get the individual weibull data for each distribution? It seems the code currently just splits the entire data set into 2 equal arrays for weibull fitting.
(2) How do we get the statistics of the fitting for each weibull (Goodness of fit)? How do we know if it is a good fit or bad fit?

[JOSS review] License clarification

Hi @derrynknife, I'm one of the reviewers for your JOSS article. I can see the project is still being worked on, so I'll hold back some of my review for a few weeks until it stabilizes more. I'll create some minor issues for now, though.

In the MIT license in this repo, you have The Python Packaging Authority as the holder - I suspect you instead want yourself there, unless you indeed do want the Python Packaging Authority to have the copyright?

openjournals/joss-reviews#3484

Save/Load surpyval

Hello,

I am trying to save locally a survival model. But I get an error. Below an example:

import surpyval as surv
import numpy as np
import pickle
from joblib import dump

np.random.seed(10)
x = surv.Weibull.random(50, 30., 9.)
model = surv.Weibull.fit(x)
results = {'model':model}

# Using pickle
pickle.dump({'model':model},open('surpyval_model',"wb"))
# Using dump
dump(model,'surpyval_model.joblib')

None of the methods work. These are the errors I get:

Using pickle
AttributeError: Can't pickle local object 'bounds_convert.<locals>.transform'
Using dump
PicklingError: Can't pickle <function bounds_convert.<locals>.transform at 0x7f98c4178a60>: it's not found as surpyval.parametric.fitters.bounds_convert.<locals>.transform

Any advice?

add truncation to MPS

If using scalars as the left and or right truncation the maximum product of spacings can work.

Examples not working

I'm having trouble with the examples you provide on the github and documentation sites. In a new Python 3.10 virtual environment I tried running:

from surpyval import Weibull
from surpyval.datasets import BoforsSteel

# Fetch some data that comes with SurPyval
data = BoforsSteel.df

x = data['x']
n = data['n']

model = Weibull.fit(x=x, n=n, offset=True)
model.plot();

and received the following error message:


AttributeError Traceback (most recent call last)
Cell In[2], line 5
2 from surpyval.datasets import BoforsSteel
4 # Fetch some data that comes with SurPyval
----> 5 data = BoforsSteel.df
7 x = data['x']
8 n = data['n']

AttributeError: 'BoforsSteel_' object has no attribute 'df'

From the documentation site I tried:

import surpyval as surv
import numpy as np

np.random.seed(10)
x = surv.Weibull.random(50, 30., 9.)
model = surv.Weibull.fit(x)
print(model)
model.plot();

and received the following error message:


ValueError Traceback (most recent call last)
Cell In[1], line 8
6 model = surv.Weibull.fit(x)
7 print(model)
----> 8 model.plot()

File ~/env/surpyval/lib/python3.10/site-packages/surpyval/parametric/parametric.py:1141, in Parametric.plot(self, heuristic, >plot_bounds, alpha_ci, ax)
1138 ax.set_xticks(d['x_minor_ticks'], minor=True)
1139 ax.set_xticklabels([], minor=True)
-> 1141 ax.grid(b=True, which='major', color='g', alpha=0.4, linestyle='-')
1142 ax.grid(b=True, which='minor', color='g', alpha=0.1, linestyle='-')
1144 ax.set_title('{} Probability Plot'.format(self.dist.name))

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axes/_base.py:3194, in _AxesBase.grid(self, visible, which, axis, **kwargs)
3192 _api.check_in_list(['x', 'y', 'both'], axis=axis)
3193 if axis in ['x', 'both']:
-> 3194 self.xaxis.grid(visible, which=which, **kwargs)
3195 if axis in ['y', 'both']:
3196 self.yaxis.grid(visible, which=which, **kwargs)

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:1660, in Axis.grid(self, visible, which, **kwargs)
1657 if which in ['major', 'both']:
1658 gridkw['gridOn'] = (not self._major_tick_kw['gridOn']
1659 if visible is None else visible)
-> 1660 self.set_tick_params(which='major', **gridkw)

1661 self.stale = True

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:932, in Axis.set_tick_params(self, which, reset, **kwargs)
919 """
920 Set appearance parameters for ticks, ticklabels, and gridlines.
921
(...)
929 gridlines.
930 """
931 _api.check_in_list(['major', 'minor', 'both'], which=which)
--> 932 kwtrans = self._translate_tick_params(kwargs)
934 # the kwargs are stored in self._major/minor_tick_kw so that any
935 # future new ticks will automatically get them
936 if reset:

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:1076, in Axis.translate_tick_params(kw, reverse)
1074 for key in kw
:
1075 if key not in allowed_keys:
-> 1076 raise ValueError(
1077 "keyword %s is not recognized; valid keywords are %s"
1078 % (key, allowed_keys))
1079 kwtrans.update(kw_)
1080 return kwtrans

ValueError: keyword grid_b is not recognized; valid keywords are ['size', 'width', 'color', 'tickdir', 'pad', 'labelsize', 'labelcolor', 'zorder', >'gridOn', 'tick1On', 'tick2On', 'label1On', 'label2On', 'length', 'direction', 'left', 'bottom', 'right', 'top', 'labelleft', 'labelbottom', 'labelright', >'labeltop', 'labelrotation', 'grid_agg_filter', 'grid_alpha', 'grid_animated', 'grid_antialiased', 'grid_clip_box', 'grid_clip_on', 'grid_clip_path', >'grid_color', 'grid_dash_capstyle', 'grid_dash_joinstyle', 'grid_dashes', 'grid_data', 'grid_drawstyle', 'grid_figure', 'grid_fillstyle', >'grid_gapcolor', 'grid_gid', 'grid_in_layout', 'grid_label', 'grid_linestyle', 'grid_linewidth', 'grid_marker', 'grid_markeredgecolor', >'grid_markeredgewidth', 'grid_markerfacecolor', 'grid_markerfacecoloralt', 'grid_markersize', 'grid_markevery', 'grid_mouseover', >'grid_path_effects', 'grid_picker', 'grid_pickradius', 'grid_rasterized', 'grid_sketch_params', 'grid_snap', 'grid_solid_capstyle', >'grid_solid_joinstyle', 'grid_transform', 'grid_url', 'grid_visible', 'grid_xdata', 'grid_ydata', 'grid_zorder', 'grid_aa', 'grid_c', 'grid_ds', 'grid_ls', >'grid_lw', 'grid_mec', 'grid_mew', 'grid_mfc', 'grid_mfcalt', 'grid_ms']

List of packages installed and versions:
Package Version


anyio 3.7.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astor 0.8.1
asttokens 2.2.1
attrs 23.1.0
autograd 1.6.2
autograd-gamma 0.5.0
backcall 0.2.0
beautifulsoup4 4.12.2
bleach 6.0.0
cffi 1.15.1
comm 0.1.3
contourpy 1.1.0
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
exceptiongroup 1.1.1
executing 1.2.0
fastjsonschema 2.17.1
fonttools 4.40.0
formulaic 0.6.2
fqdn 1.5.1
future 0.18.3
idna 3.4
interface-meta 1.3.0
ipykernel 6.23.3
ipython 8.14.0
ipython-genutils 0.2.0
ipywidgets 8.0.6
isoduration 20.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonpointer 2.4
jsonschema 4.17.3
jupyter 1.0.0
jupyter_client 8.3.0
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.6.3
jupyter_server 2.6.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.7
kiwisolver 1.4.4
llvmlite 0.40.1
MarkupSafe 2.1.3
matplotlib 3.7.1
matplotlib-inline 0.1.6
mistune 3.0.1
nbclassic 1.0.0
nbclient 0.8.0
nbconvert 7.6.0
nbformat 5.9.0
nest-asyncio 1.5.6
notebook 6.5.4
notebook_shim 0.2.3
numba 0.57.1
numpy 1.24.4
numpy-indexed 0.3.7
overrides 7.3.1
packaging 23.1
pandas 2.0.2
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.1.2
platformdirs 3.8.0
prometheus-client 0.17.0
prompt-toolkit 3.0.38
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.21
Pygments 2.15.1
pyparsing 3.1.0
pyrsistent 0.19.3
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3
PyYAML 6.0
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
scipy 1.11.0
Send2Trash 1.8.2
setuptools 67.8.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
stack-data 0.6.2
surpyval 0.10.10
terminado 0.17.1
tinycss2 1.2.1
tornado 6.3.2
traitlets 5.9.0
typing_extensions 4.6.3
tzdata 2023.3
uri-template 1.3.0
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.1
wheel 0.40.0
widgetsnbextension 4.0.7
wrapt 1.15.0

Standardize names for lfp, zi, and offset parameter names

If the lfp and zi models are to be incorporated into the fit method there will need to be standard names for the parameters. These can also be reserved names in the Distribution class so that a user does not unwittingly cause an error

add zero inflated option

the case that is symmetrical to limited failure populations, i.e zero-inflated models.

Should be relatively simple to include a weight from which the rest is added.

Using Pandas

although the parametric fitter has a pandas fit option. Need to extend that to nonparametric and different data wrangling utils. Once done need to update docs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.