stan-dev / pystan Goto Github PK

View Code? Open in Web Editor NEW

327.0 327.0 58.0 224 KB

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

License: ISC License

Python 98.48% Shell 1.52%

pystan's People

Contributors

Stargazers

Watchers

Forkers

ariddell sicara mkcor zachmayer l619 vishwanath1306 ahartikainen clayton-roberts mjcarter95 forsakendaemon dish59742 standardgalactic amanirad riddell-stan jburroni parthivnaresh working-girl ytgdlg seeky-camelid amas0 pliu19 monikavila coolhimanshu-verma themrzmaster dut3062796s i-spark tillahoffmann mathsml antoinedmeyer jinsu-l chakchak1234 thlautenschlaeger sd2k stjordanis treibholz kwrprojects alissonsca peterjemley cclauss knappa thechopkins michaelclerx multiversal-ventures raymondseger timghill seanpm2001 chrinide iq-scm mikediessner herveyrobot conef06 texervn er-eis adibaejaz abelowska nikhilkumar1104 stevenleung2018 afuetterer

pystan's Issues

Close connection if Exception is raised

Currently, the shutdown procedure is not called if model_string is invalid. This keeps the port (8080) locked.

import pystan
model_string = "parameters {vector y;} model {y ~ cauchy(0,1);}"
program = pystan.compile(model_string, {})
# AssertionError is raised

model_string2 = "parameters {vector[10] y;} model {y ~ cauchy(0,1);}"
program2 = pystan.compile(model_string2, {})
# OSError is raised 
# OSError: [Errno 48] error while attempting to bind on address ('127.0.0.1', 8080): address already in use

One solution is to add try-finally blocks for post and post_aiter functions.

why do we need the assert " assert "data" not in kwargs, "`data` is set in `build`."" ?

@riddell-stan, in sample function of model.py file, why do we need the assert " assert "data" not in kwargs, "data is set in build."" ?
As I know, for pystan 2.0, we can build stan model without data value, and use optimizing or sampling function with data value to train model. Thank you.

tqdm progress bar missing

.sample doesn't show tqdm. What is the current situation with it?

Can we follow sampling in httpstan, and just send iter count info over to pystan which then could update the graph?

Use 'build' instead of 'compile' as function name for compiling a Stan program.

Most users will find the verb 'build' more familiar than the verb 'compile'. (RStan 3 plans on using the function/method name 'build'.)

Warnings not shown to user

Warnings printed after sampling is finished are not shown to user. For example (PyStan 2):

Elapsed Time: 3.53326 seconds (Warm-up)
               5.25664 seconds (Sampling)
               8.7899 seconds (Total)

WARNING:pystan:174 of 500 iterations ended with a divergence (34.8 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
WARNING:pystan:294 of 500 iterations saturated the maximum tree depth of 10 (58.8 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation

Missing `plot` method for Fit instances

Title says it all. Code can be copied from PyStan 2.

Upload wheel to PyPI automatically

Travis supports this:

https://docs.travis-ci.com/user/deployment/pypi/

Since pystan 3 is a universal wheel, we only need to upload a single version.

Failing to use caching

On Windows, trying to recompile previously run model (closed and reopened ipython between the runs) fails with

In [3]: posterior = stan.build(program_code, data=data)
ERROR:aiohttp.server:Error handling request
Traceback (most recent call last):
  File "C:\Users\user\miniconda3\envs\stan3\lib\site-packages\aiohttp\web_protocol.py", line 418, in start
    resp = await task
  File "C:\Users\user\miniconda3\envs\stan3\lib\site-packages\aiohttp\web_app.py", line 458, in _handle
    resp = await handler(request)
  File "c:\users\user\github\httpstan\httpstan\views.py", line 154, in handle_show_params
    model_module = httpstan.models.import_model_extension_module(model_name, module_bytes)
  File "c:\users\user\github\httpstan\httpstan\models.py", line 172, in import_model_extension_module
    return _import_module(module_name, module_path)
  File "c:\users\user\github\httpstan\httpstan\models.py", line 136, in _import_module
    module = importlib.import_module(module_name)
  File "C:\Users\user\miniconda3\envs\stan3\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'model_d968dc8b91'
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-3-249fe4204f1a> in <module>
----> 1 posterior = stan.build(program_code, data=data)

c:\users\user\github\pystan-next\stan\model.py in build(program_code, data, random_seed)
    222         path, payload = f"/v1/{model_name}/params", {"data": data}
    223         response = requests.post(f"http://{host}:{port}{path}", json=payload)
--> 224         response_payload = response.json()
    225         if response.status_code != 200:
    226             raise RuntimeError(response_payload["message"])

~\miniconda3\envs\stan3\lib\site-packages\requests\models.py in json(self, **kwargs)
    895                     # used.
    896                     pass
--> 897         return complexjson.loads(self.text, **kwargs)
    898
    899     @property

~\miniconda3\envs\stan3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

~\miniconda3\envs\stan3\lib\json\decoder.py in decode(self, s, _w)
    340         end = _w(s, end).end()
    341         if end != len(s):
--> 342             raise JSONDecodeError("Extra data", s, end)
    343         return obj
    344

JSONDecodeError: Extra data: line 1 column 5 (char 4)

After removing httpstan cached models folder and database, recompilation works.

C:\Users\user\AppData\Local\httpstan\httpstan\Cache\0.8.1

import stan

Hi,

I'm not sure if this is viable option, but for PyStan 3 should we start to prefer import as idiom for examples and common usage?

import pystan as stan

Or could we even use

import stan

The last option would probably fail on many users who have a folder named stan on working directory.

Missing `summary` method for Fit instances

Title says it all. Much of the (ugly) code can be imported from PyStan 2.

Missing `check_hmc_diagnostics` in PyStan 3

Sampling with check_hmc_diagnostics needs to be supported at some point.

as in

fit = normal_posterior.sample(num_samples=10, num_chains=1, check_hmc_diagnostics=True)

Add information from model to fit

Hi,

could we add model.program_code maybe other information too, so one can infer the model and recreate the model without model instance.

They could to dict under fit.model_info?

Setup travis

Travis needs .travis.yml-file.

Could not find .travis.yml, using standard configuration

Document how to run tests

Dear @ariddell,

Newcomers would welcome a "Setup" section in the README. I have followed these steps to install the version I'm also hacking on:

$ conda create -n pystan-next python=3.6 numpy scipy cython -c conda-forge
$ source activate pystan-next
$ pip install -r test-requirements.txt
$ pip install -e .

But I couldn't run the test suite successfully. Running

$ python -m pytest

gave OS errors related to servers and processes ([Errno 98] Address already in use).
I haven't dived into the internals, I would like to work on #338 (comment) directly!

Thank you,
Marianne

PyStan changes data inplace

I noticed that the following loop changes data inplace.

# in `data`: convert numpy arrays to normal lists
for key, value in data.items():
    if isinstance(value, np.ndarray):
        data[key] = value.tolist()

Is this what we want? Because I highly doubt that deep copy would fill anyones RAM.

Add relevant flake8 ignore rules to comply with black

There is a list of flake8 errors which should be ignored. Error codes are listed in the black README. They all need to be added to tox.ini

Exceptions need to be transmitted to pystan from httpstan

If an integer variable for data is out of bounds, httpstan throws the right exception:

ValueError: Exception: _7b946e826d3147f9a1b54029a1a8d47e28f04a956afab3f87ecfa5fba97b478e_namespace::_7b946e826d3147f9a1b54029a1a8d47e28f04a956afab3f87ecfa5fba97b478e: player0[k0__] is 0, but must be greater than or equal to 1  (in 'unknown file name' at line 6)

but this doesn't arrive at pystan in an aesthetically pleasing form.

CI: Add a basic windows test

In the event that changes in mingw or Stan math solve the windows crashing bug it would be nice to have an automatic way of learning about it. Testing windows compilation and model running with mingw -- even if it fails right now -- would let us do that.

Diagnostic messages not printed

In Pystan 2 I get this message (no hard error):

DIAGNOSTIC(S) FROM PARSER:
Unknown variable: to_vector

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b8f2cb83530f4085fcbf89c7040888ab NOW.

The diagnostic message should be printed for the user in pystan 3.

Unexpected argument message is too verbose

Steps to reproduce:

posterior.sample(num_chains=4, num_samples=1000, seed=1)

This one is particularly annoying since there's an error message for each thread.

Segfault sampling from model, 8schools samples work fine

The model is straight from the manual (below). 8schools works fine.

program_code = """
data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  int<lower=1,upper=J> jj[N];  // group for individual
  matrix[N, K] x;              // individual predictors
  row_vector[L] u[J];          // group predictors
  vector[N] y;                 // outcomes
}
parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  vector[K] beta[J];           // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
}
model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
  {
    row_vector[K] u_gamma[J];
    for (j in 1:J)
      u_gamma[j] = u[j] * gamma;
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  }
  for (n in 1:N)
    y[n] ~ normal(x[n] * beta[jj[n]], sigma);
}
"""

AssertionError with num_flat_params and constrained_param_names

Hi,

I have a simple model which won't sample.

import stan as stan
import numpy as np

schools_code = """
data {
             int<lower=0> J;
             real y[J];
             real<lower=0> sigma[J];
}
parameters {
             real mu;
             real<lower=0> tau;
             real theta_tilde[J];
}
transformed parameters {
             real theta[J];
             for (j in 1:J)
                 theta[j] = mu + tau * theta_tilde[j];
}
model {
             mu ~ normal(0, 5);
             tau ~ cauchy(0, 5);
             theta_tilde ~ normal(0, 1);
             y ~ normal(theta, sigma);
}
generated quantities {
             vector[J] log_lik;
             vector[J] y_hat;
             for (j in 1:J) {
                 log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
                 y_hat[j] = normal_rng(theta[j], sigma[j]);
             }
}
"""

data = {
    "J": 8,
    "y": np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]),
    "sigma": np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]),
}

posterior = stan.build(schools_code, data=data)
fit = posterior.sample(num_chains=4, num_samples=100, num_warmup=0)

This will give the following exception

Traceback (most recent call last):
  File "run_sc.py", line 43, in <module>
    fit = stan_model.sample(num_chains=4, num_samples=100, num_warmup=0)
  File ".../miniconda3/envs/stan3/lib/python3.7/site-packages/stan/model.py", line 176, in sample
    save_warmup,
  File ".../miniconda3/envs/stan3/lib/python3.7/site-packages/stan/fit.py", line 58, in __init__
    assert num_flat_params == len(constrained_param_names)
AssertionError

This compiles and samples fine with PyStan 2.18.

All sampling-related tests from pystan 2 should pass

Tests from the old pystan repository need to be translated into pystan 3.

Compile

test_stanc.py

Sampling

test_basic_array.py
test_basic_matrix.py
test_basic_pars.py
test_basic.py (split into test_basic_normal.py and test_basic_bernoulli.py)
test_generated_quantities_seed.py (#47)
test_linear_regression.py

All non-sampling, non-maximize related tests from PyStan 2 should pass

Tests from the old pystan repository need to be translated into pystan 3.

Compiling

test_extra_compile_args.py

Fixed param

test_fixed_param.py

Other

Extra

test_lookup.py

Implement `save_warmup`

Relevant tests are in PyStan 2's test_extract.py. Should be relatively easy to do.

I myself have never used this. Is it widely used?

Verify `data` is JSON-serializable and has correct form

Before sending data to the backend (httpstan), it needs to be:

JSON serializable
Have the appropriate structure for Stan C++/array_var_context (see https://github.com/stan-dev/httpstan/blob/master/httpstan/utils.py)

Checking for this needs to be done on the pystan(-next) side in order to give the user useful feedback.

Missing informative repr for Fit instances

Following the conventions of pandas or sklearn seems like it would be a good idea here.

Exception: variable does not exist;

I run this pystan 3.0 with the eight schools case, and find out an error:
File "/opt/python3/lib/python3.6/site-packages/stan/model.py", line 189, in build
raise RuntimeError(response_payload["error"]["message"])
RuntimeError: Error calling param_names: `Exception: variable does not exist; processing stage=data initialization; variable name=J; base type=int (in 'unknown file name' at line 3)

Could you please check it? Thank you.

The source code is :
import stan
program_code = """
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * eta[j];
}
model {
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
}
"""

data = {'J': 8,
'y': [28, 8, -3, 7, -1, 1, 18, 12],
'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}

posterior = stan.build(program_code, data=data)
fit = posterior.sample(num_chains=4, num_samples=1000)

Update .travis.yml to test Python 3.7

See the configuration used by httpstan.

`stan` is an (empty) package on PyPI

People may be confused and try to pip install stan. stan was claimed on Nov 18, 2018. But there seems to be nothing there. We might try sending a nice email to the owner asking for the name. The owner's name is in the setup.py of the (empty) package.

PyPI has a single owner

@ahartikainen what is your PyPI username? The PyPI entries for pystan and httpstan could use a backup administrator.

Test coverage is not at 100%

100% test coverage on httpstan
100% test coverage on pystan-next

Contributing guidelines should match httpstan

Title says it all.

Incorrect alpha version for pbr

alpha versions need to be prefixed by a zero for pbr, e.g., 2.0.0.0a1. We've been using 2.0.0a1, which is wrong.

All the details are available here: https://docs.openstack.org/pbr/latest/user/semver.html

pbr is somehow correcting the tag when we upload to PyPI. This might only create problems for local installs.

Missing `log_prob` method

Model instances should make a log_prob method available (just as in PyStan 2). Requires work in httpstan, see stan-dev/httpstan#113

Upgrading guide missing

Adding an upgrading.rst to the docs with basic tips would, I think, be a good idea. Here's an example of a package which has an upgrading page: https://marshmallow.readthedocs.io/en/3.0/upgrading.html

Check docstring format

Arviz is doing this. Seems like a good idea.

use keyword-only API

There's a new trick in Python 3 which is tailor-made for calls to the sampling functions and any functions which have lots of arguments. It is possible to require that arguments be passed by keyword.

Details: https://www.python.org/dev/peps/pep-3102/

I believe one uses it like this:

def hmc_nuts(*, iter, warmup, adapt, kwarg1, kwarg2, kwarg3, kwarg4):

Cannot specify initial values

PyStan 3 does not currently support user-specified initial values.

Documentation missing

http://pystan-next.readthedocs.org does not work

Issue and Pull Request templates missing

I'd like to use them but perhaps modify the old ones, borrowing from Astropy.

Build fails with new marshmallow

File "/usr/local/envs/testenv_3.6_PYSTAN_preview_PYRO_0.2.1_EMCEE_3/lib/python3.6/site-packages/marshmallow/schema.py", line 741, in _run_validator
validator_func(output, partial=partial, many=many)
TypeError: validate_values() got an unexpected keyword argument 'partial'

https://travis-ci.org/arviz-devs/arviz/jobs/546691569

black/flake should wrap on 100, not 99

The line length argument to black governs the col at which wrapping occurs. So a limit of 100, I believe, limits us to 99 characters, which is what PEP8 mentions.

(I'm happy with 120 but if arviz is going with 100, let's do that.)

No way to match Fit instance with Model instance

Storing model_name and fit_name with the instances would do the trick.

See #75

Use xarray-dataset to save all data

I think we should store all the data from the sampling straight to xarray.Dataset. This way the results are in a compact place and we have a good way to access it.

Not sure should also store functions there or only the data.

See. e.g. arviz-devs/arviz#97

Type annotations are not at 100%

All code should be type annotated. (We need to at least start measuring this.)

100% typing coverage on httpstan
100% typing coverage on pystan

Run from JupyterLab / Jupyter Notebook

Running httpstan from Jupyter Lab/Notebook fails due to jupyter is already running asyncio event

~\miniconda3\envs\stan3\lib\asyncio\base_events.py in run_forever(self)
    427             raise RuntimeError(
--> 428                 'Cannot run the event loop while another loop is running')
    429         self._set_coroutine_wrapper(self._debug)

RuntimeError: Cannot run the event loop while another loop is running

Using IPython works (or I'm running this on Windows, and Python crash when I exit the python, so I can save the results with ArviZ to netCDF and use it later).

Maximize

test_optimizing_example.py
test_rstan_testoptim.py
test_basic_array.py
test_basic_matrix.py