The surpyval from derrynknife

Left-censored data error

Hello,

I am trying to make analysis using left-censored data using the non-parametric estimators Kaplan-Meier but I get the following error:

import numpy as np
import surpyval as surv
x = np.array([20,20,50,50,70])
c = np.array([-1,-1,-1,-1,-1])
km_surpyval = surv.KaplanMeier.fit(x=x,c=c)
km_surpyval.plot()

The error:

ValueError: xrd format can't be used with left (c=-1) or interval (c=2) censoring

What are the alternatives?

Confidence bounds.

Need to improve the confidence bounds. They are working but only for confidence on reliability. And uncertainty on the offset value should be optional.

Combine OffsetParametric and Parametric into one class

Integrate OffsetParametric and Parametric into one class. Will be easier to maintain if these two classes are made the same. Will also provide opportunity to combine it with the LFP class.

analyse whole project using flake8

I shudder to think how ling this will take,..

Create option in .fit() method to make it a limited failure population

At present there is an LFP class to fit with a limited failure population. A simpler way to do so is to have an lfp='True' option in the .fit() method.

Add docs on local testing

Add to the docs how to do testing locally.

10.0.1 lost fs_to_xcn

I cannot call fs_to_xcn in version 10.0.1; I had to go back to 10.0.0 to get the function. I am using right censored data.

Functions ExpoWeibull, LogLogistic, and LogNormal fail to compute confidence bounds

When using the functions ExpoWeibull, LogLogistic, LogNormal with Maximum Likelihood Estimation. The method fails to compute confidence intervals. See the example below.

from surpyval import ExpoWeibull, LogLogistic, LogNormal
for _ in ['ExpoWeibull','LogLogistic','LogNormal']:
    if _ == 'ExpoWeibull':
        x = eval(_+'.random(100,1,1,1)')
    else:
        x = eval(_+'.random(100,1,1)')
    model = ExpoWeibull.fit(x)
    print(model.cb(0))

It outputs for all three cases that the confidence bound:

[[nan nan]]

Gumbel distribution

I was giving a look at the survival function of the Gumbel distribution, and I found that it is defined as:

S(x) = np.exp(-np.exp((x - mu)/sigma))

Shouldn't it be:

S(x) = 1 - np.exp(-np.exp(-(x - mu)/sigma))

I changed it manually in the code, and when I fitted the function the problem was not solved (i.e., the fitting is still done with the first equation shown above). Any suggestions on what to do?

In docs/Non-Parametric Estimation.rst, under Turnbull Estimation, is the equation for M(s) correct? I wouldn't have expected M(s) to be in the denominator on the right hand side. Perhaps it was an inadvertent carryover from the previous equation?

This is how it is now:

[Joss review] Test for regression models

Similar to #18, I'd like to see tests for regression models.

openjournals/joss-reviews#3484

[Joss review] Test tolerance and sample sizes

I'd like to see tighter tolerances in the tests, and tests with smaller samples sizes. Currently the TOL is set to 0.2 - this is quite a large margin, practically useless in some cases. The minimum sample size is 5000 - that's medium-sized data, but how well do the algorithms work for small size data (sub 100, for example).

openjournals/joss-reviews#3484

[JOSS Review] Parametric form in docs

One of the annoying things about statistics is all the different ways one can parameterize a distribution. For absolute clarity to users, I think it's important to have the parametric form used in the docs. It's been done for some distributions, but needs to be there for all (example missing.

openjournals/joss-reviews#3484

[Joss review] `scipy` should be capitalized in paper.md

openjournals/joss-reviews#3484

model.plot() not working

Hello, I am new to this library. I have been looking for one that handles interval censored data and came across it.
I quite like the Surpyval however I have not been able to visualize anything yet as i keep getting this and and empty plot:

"ValueError: keyword grid_b is not recognized; valid keywords are ['size', 'width', 'color', 'tickdir', 'pad', 'labelsize', 'labelcolor', 'labelfontfamily', 'zorder', 'gridOn', 'tick1On', 'tick2On', 'label1On', 'label2On', 'length', 'direction', 'left', 'bottom', 'right', 'top', 'labelleft', 'labelbottom', 'labelright', 'labeltop', 'labelrotation', 'grid_agg_filter', 'grid_alpha', 'grid_animated', 'grid_antialiased', 'grid_clip_box', 'grid_clip_on', 'grid_clip_path', 'grid_color', 'grid_dash_capstyle', 'grid_dash_joinstyle', 'grid_dashes', 'grid_data', 'grid_drawstyle', 'grid_figure', 'grid_fillstyle', 'grid_gapcolor', 'grid_gid', 'grid_in_layout', 'grid_label', 'grid_linestyle', 'grid_linewidth', 'grid_marker', 'grid_markeredgecolor', 'grid_markeredgewidth', 'grid_markerfacecolor', 'grid_markerfacecoloralt', 'grid_markersize', 'grid_markevery', 'grid_mouseover', 'grid_path_effects', 'grid_picker', 'grid_pickradius', 'grid_rasterized', 'grid_sketch_params', 'grid_snap', 'grid_solid_capstyle', 'grid_solid_joinstyle', 'grid_transform', 'grid_url', 'grid_visible', 'grid_xdata', 'grid_ydata', 'grid_zorder', 'grid_aa', 'grid_c', 'grid_ds', 'grid_ls', 'grid_lw', 'grid_mec', 'grid_mew', 'grid_mfc', 'grid_mfcalt', 'grid_ms']"

Is anyone else experiencing this?

[Joss review] tests with real-life data, and compare to other implementations

This is more relevant for testing regression models. Since real-life data never fits neatly into a "box", it is useful to compare results against other implementations (lifelines uses R's survival and flexsurvreg libraries).

openjournals/joss-reviews#3484

More handling of number of failures needed to get convergence.

Need to make sure:

If fully observed, at least two failures
If only interval observed, at least two failures
if only censored censoring at least one left and one right, one interval and one bounded

Update plotting in these scenarios..

[Joss review] Need tests for Kaplan Meier and other non-parametrics

The KM is by far the most widely used model, and most important non-parametric model. I'd like to see sufficient tests for it included.

openjournals/joss-reviews#3484

Mixture Models

Hi,
I find this python package very helpful for analyzing weibull data. However, I found the documentation on "Mixture models" could be more elaborate.
I was using the following example:

import surpyval as surv
import numpy as np
from matplotlib import pyplot as plt

x = [1, 2, 3, 4, 5, 6, 6, 7, 8, 10, 13, 15, 16, 17 ,17, 18, 19]
x_ = np.linspace(np.min(x), np.max(x))

model = surv.Weibull.fit(x)
wmm = surv.MixtureModel(x=x, dist=surv.Weibull, m=2)

model.plot(plot_bounds=False)
plt.plot(x_, wmm.ff(x_))

However,
(1)I could not understand what is the percentage of data (or which data points) being assigned to first weibull distribution and what is the percentage of the data assigned to the 2nd one? The wmm.params gives me the paramers, but it does not give corresponding fraction of data from which it calculated. How do we get the individual weibull data for each distribution? It seems the code currently just splits the entire data set into 2 equal arrays for weibull fitting.
(2) How do we get the statistics of the fitting for each weibull (Goodness of fit)? How do we know if it is a good fit or bad fit?

[JOSS review] License clarification

Hi @derrynknife, I'm one of the reviewers for your JOSS article. I can see the project is still being worked on, so I'll hold back some of my review for a few weeks until it stabilizes more. I'll create some minor issues for now, though.

In the MIT license in this repo, you have The Python Packaging Authority as the holder - I suspect you instead want yourself there, unless you indeed do want the Python Packaging Authority to have the copyright?

openjournals/joss-reviews#3484

Save/Load surpyval

Hello,

I am trying to save locally a survival model. But I get an error. Below an example:

import surpyval as surv
import numpy as np
import pickle
from joblib import dump

np.random.seed(10)
x = surv.Weibull.random(50, 30., 9.)
model = surv.Weibull.fit(x)
results = {'model':model}

# Using pickle
pickle.dump({'model':model},open('surpyval_model',"wb"))
# Using dump
dump(model,'surpyval_model.joblib')

None of the methods work. These are the errors I get:

Using pickle
AttributeError: Can't pickle local object 'bounds_convert.<locals>.transform'
Using dump
PicklingError: Can't pickle <function bounds_convert.<locals>.transform at 0x7f98c4178a60>: it's not found as surpyval.parametric.fitters.bounds_convert.<locals>.transform

Any advice?

add truncation to MPS

If using scalars as the left and or right truncation the maximum product of spacings can work.

Examples not working

I'm having trouble with the examples you provide on the github and documentation sites. In a new Python 3.10 virtual environment I tried running:

from surpyval import Weibull
from surpyval.datasets import BoforsSteel

# Fetch some data that comes with SurPyval
data = BoforsSteel.df

x = data['x']
n = data['n']

model = Weibull.fit(x=x, n=n, offset=True)
model.plot();

and received the following error message:

AttributeError Traceback (most recent call last)
Cell In[2], line 5
2 from surpyval.datasets import BoforsSteel
4 # Fetch some data that comes with SurPyval
----> 5 data = BoforsSteel.df
7 x = data['x']
8 n = data['n']

AttributeError: 'BoforsSteel_' object has no attribute 'df'

From the documentation site I tried:

import surpyval as surv
import numpy as np

np.random.seed(10)
x = surv.Weibull.random(50, 30., 9.)
model = surv.Weibull.fit(x)
print(model)
model.plot();

and received the following error message:

ValueError Traceback (most recent call last)
Cell In[1], line 8
6 model = surv.Weibull.fit(x)
7 print(model)
----> 8 model.plot()

File ~/env/surpyval/lib/python3.10/site-packages/surpyval/parametric/parametric.py:1141, in Parametric.plot(self, heuristic, >plot_bounds, alpha_ci, ax)
1138 ax.set_xticks(d['x_minor_ticks'], minor=True)
1139 ax.set_xticklabels([], minor=True)
-> 1141 ax.grid(b=True, which='major', color='g', alpha=0.4, linestyle='-')
1142 ax.grid(b=True, which='minor', color='g', alpha=0.1, linestyle='-')
1144 ax.set_title('{} Probability Plot'.format(self.dist.name))

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axes/_base.py:3194, in _AxesBase.grid(self, visible, which, axis, **kwargs)
3192 _api.check_in_list(['x', 'y', 'both'], axis=axis)
3193 if axis in ['x', 'both']:
-> 3194 self.xaxis.grid(visible, which=which, **kwargs)
3195 if axis in ['y', 'both']:
3196 self.yaxis.grid(visible, which=which, **kwargs)

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:1660, in Axis.grid(self, visible, which, **kwargs)
1657 if which in ['major', 'both']:
1658 gridkw['gridOn'] = (not self._major_tick_kw['gridOn']
1659 if visible is None else visible)
-> 1660 self.set_tick_params(which='major', **gridkw)

1661 self.stale = True

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:932, in Axis.set_tick_params(self, which, reset, **kwargs)
919 """
920 Set appearance parameters for ticks, ticklabels, and gridlines.
921
(...)
929 gridlines.
930 """
931 _api.check_in_list(['major', 'minor', 'both'], which=which)
--> 932 kwtrans = self._translate_tick_params(kwargs)
934 # the kwargs are stored in self._major/minor_tick_kw so that any
935 # future new ticks will automatically get them
936 if reset:

File ~/env/surpyval/lib/python3.10/site-packages/matplotlib/axis.py:1076, in Axis.translate_tick_params(kw, reverse)
1074 for key in kw:
1075 if key not in allowed_keys:
-> 1076 raise ValueError(
1077 "keyword %s is not recognized; valid keywords are %s"
1078 % (key, allowed_keys))
1079 kwtrans.update(kw_)
1080 return kwtrans

ValueError: keyword grid_b is not recognized; valid keywords are ['size', 'width', 'color', 'tickdir', 'pad', 'labelsize', 'labelcolor', 'zorder', >'gridOn', 'tick1On', 'tick2On', 'label1On', 'label2On', 'length', 'direction', 'left', 'bottom', 'right', 'top', 'labelleft', 'labelbottom', 'labelright', >'labeltop', 'labelrotation', 'grid_agg_filter', 'grid_alpha', 'grid_animated', 'grid_antialiased', 'grid_clip_box', 'grid_clip_on', 'grid_clip_path', >'grid_color', 'grid_dash_capstyle', 'grid_dash_joinstyle', 'grid_dashes', 'grid_data', 'grid_drawstyle', 'grid_figure', 'grid_fillstyle', >'grid_gapcolor', 'grid_gid', 'grid_in_layout', 'grid_label', 'grid_linestyle', 'grid_linewidth', 'grid_marker', 'grid_markeredgecolor', >'grid_markeredgewidth', 'grid_markerfacecolor', 'grid_markerfacecoloralt', 'grid_markersize', 'grid_markevery', 'grid_mouseover', >'grid_path_effects', 'grid_picker', 'grid_pickradius', 'grid_rasterized', 'grid_sketch_params', 'grid_snap', 'grid_solid_capstyle', >'grid_solid_joinstyle', 'grid_transform', 'grid_url', 'grid_visible', 'grid_xdata', 'grid_ydata', 'grid_zorder', 'grid_aa', 'grid_c', 'grid_ds', 'grid_ls', >'grid_lw', 'grid_mec', 'grid_mew', 'grid_mfc', 'grid_mfcalt', 'grid_ms']

List of packages installed and versions:
Package Version

anyio 3.7.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astor 0.8.1
asttokens 2.2.1
attrs 23.1.0
autograd 1.6.2
autograd-gamma 0.5.0
backcall 0.2.0
beautifulsoup4 4.12.2
bleach 6.0.0
cffi 1.15.1
comm 0.1.3
contourpy 1.1.0
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
exceptiongroup 1.1.1
executing 1.2.0
fastjsonschema 2.17.1
fonttools 4.40.0
formulaic 0.6.2
fqdn 1.5.1
future 0.18.3
idna 3.4
interface-meta 1.3.0
ipykernel 6.23.3
ipython 8.14.0
ipython-genutils 0.2.0
ipywidgets 8.0.6
isoduration 20.11.0
jedi 0.18.2
Jinja2 3.1.2
jsonpointer 2.4
jsonschema 4.17.3
jupyter 1.0.0
jupyter_client 8.3.0
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.6.3
jupyter_server 2.6.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.7
kiwisolver 1.4.4
llvmlite 0.40.1
MarkupSafe 2.1.3
matplotlib 3.7.1
matplotlib-inline 0.1.6
mistune 3.0.1
nbclassic 1.0.0
nbclient 0.8.0
nbconvert 7.6.0
nbformat 5.9.0
nest-asyncio 1.5.6
notebook 6.5.4
notebook_shim 0.2.3
numba 0.57.1
numpy 1.24.4
numpy-indexed 0.3.7
overrides 7.3.1
packaging 23.1
pandas 2.0.2
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 23.1.2
platformdirs 3.8.0
prometheus-client 0.17.0
prompt-toolkit 3.0.38
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.21
Pygments 2.15.1
pyparsing 3.1.0
pyrsistent 0.19.3
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3
PyYAML 6.0
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
scipy 1.11.0
Send2Trash 1.8.2
setuptools 67.8.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
stack-data 0.6.2
surpyval 0.10.10
terminado 0.17.1
tinycss2 1.2.1
tornado 6.3.2
traitlets 5.9.0
typing_extensions 4.6.3
tzdata 2023.3
uri-template 1.3.0
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.1
wheel 0.40.0
widgetsnbextension 4.0.7
wrapt 1.15.0

Standardize names for lfp, zi, and offset parameter names

If the lfp and zi models are to be incorporated into the fit method there will need to be standard names for the parameters. These can also be reserved names in the Distribution class so that a user does not unwittingly cause an error

Negative values with offsets not working

non para not easy to plot once used.

Need to change what the non para methods return. i.e. need to return all as default

add zero inflated option

the case that is symmetrical to limited failure populations, i.e zero-inflated models.

Should be relatively simple to include a weight from which the rest is added.

MLE for Weibulls breaks when leading term is censored

add truncation to mixture models

[Joss review] Kaplan Meier underflow

For large datasets with lots of unique observation times, the cumprod in KaplanMeierFitter will underflow due to the multiplication of many values less than 0.

openjournals/joss-reviews#3484