stan-dev / pystan Goto Github PK
View Code? Open in Web Editor NEWPyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
License: ISC License
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io
License: ISC License
Hi,
I have a simple model which won't sample.
import stan as stan
import numpy as np
schools_code = """
data {
int<lower=0> J;
real y[J];
real<lower=0> sigma[J];
}
parameters {
real mu;
real<lower=0> tau;
real theta_tilde[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * theta_tilde[j];
}
model {
mu ~ normal(0, 5);
tau ~ cauchy(0, 5);
theta_tilde ~ normal(0, 1);
y ~ normal(theta, sigma);
}
generated quantities {
vector[J] log_lik;
vector[J] y_hat;
for (j in 1:J) {
log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
y_hat[j] = normal_rng(theta[j], sigma[j]);
}
}
"""
data = {
"J": 8,
"y": np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]),
"sigma": np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]),
}
posterior = stan.build(schools_code, data=data)
fit = posterior.sample(num_chains=4, num_samples=100, num_warmup=0)
This will give the following exception
Traceback (most recent call last):
File "run_sc.py", line 43, in <module>
fit = stan_model.sample(num_chains=4, num_samples=100, num_warmup=0)
File ".../miniconda3/envs/stan3/lib/python3.7/site-packages/stan/model.py", line 176, in sample
save_warmup,
File ".../miniconda3/envs/stan3/lib/python3.7/site-packages/stan/fit.py", line 58, in __init__
assert num_flat_params == len(constrained_param_names)
AssertionError
This compiles and samples fine with PyStan 2.18.
Most users will find the verb 'build' more familiar than the verb 'compile'. (RStan 3 plans on using the function/method name 'build'.)
The line length argument to black governs the col at which wrapping occurs. So a limit of 100, I believe, limits us to 99 characters, which is what PEP8 mentions.
(I'm happy with 120 but if arviz is going with 100, let's do that.)
Tests from the old pystan repository need to be translated into pystan 3.
Arviz is doing this. Seems like a good idea.
@riddell-stan, in sample function of model.py file, why do we need the assert " assert "data" not in kwargs, "data
is set in build
."" ?
As I know, for pystan 2.0, we can build stan model without data value, and use optimizing or sampling function with data value to train model. Thank you.
Model
instances should make a log_prob
method available (just as in PyStan 2). Requires work in httpstan, see stan-dev/httpstan#113
Tests from the old pystan repository need to be translated into pystan 3.
I noticed that the following loop changes data inplace.
# in `data`: convert numpy arrays to normal lists
for key, value in data.items():
if isinstance(value, np.ndarray):
data[key] = value.tolist()
Is this what we want? Because I highly doubt that deep copy would fill anyones RAM.
Sampling with check_hmc_diagnostics
needs to be supported at some point.
as in
fit = normal_posterior.sample(num_samples=10, num_chains=1, check_hmc_diagnostics=True)
People may be confused and try to pip install stan
. stan
was claimed on Nov 18, 2018. But there seems to be nothing there. We might try sending a nice email to the owner asking for the name. The owner's name is in the setup.py
of the (empty) package.
Hi,
I'm not sure if this is viable option, but for PyStan 3 should we start to prefer import as
idiom for examples and common usage?
import pystan as stan
Or could we even use
import stan
The last option would probably fail on many users who have a folder named stan on working directory.
alpha versions need to be prefixed by a zero for pbr, e.g., 2.0.0.0a1
. We've been using 2.0.0a1
, which is wrong.
All the details are available here: https://docs.openstack.org/pbr/latest/user/semver.html
pbr is somehow correcting the tag when we upload to PyPI. This might only create problems for local installs.
In the event that changes in mingw or Stan math solve the windows crashing bug it would be nice to have an automatic way of learning about it. Testing windows compilation and model running with mingw -- even if it fails right now -- would let us do that.
The model is straight from the manual (below). 8schools works fine.
program_code = """
data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
int<lower=1> L; // num group predictors
int<lower=1,upper=J> jj[N]; // group for individual
matrix[N, K] x; // individual predictors
row_vector[L] u[J]; // group predictors
vector[N] y; // outcomes
}
parameters {
corr_matrix[K] Omega; // prior correlation
vector<lower=0>[K] tau; // prior scale
matrix[L, K] gamma; // group coeffs
vector[K] beta[J]; // indiv coeffs by group
real<lower=0> sigma; // prediction error scale
}
model {
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
to_vector(gamma) ~ normal(0, 5);
{
row_vector[K] u_gamma[J];
for (j in 1:J)
u_gamma[j] = u[j] * gamma;
beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
}
for (n in 1:N)
y[n] ~ normal(x[n] * beta[jj[n]], sigma);
}
"""
http://pystan-next.readthedocs.org does not work
All code should be type annotated. (We need to at least start measuring this.)
Title says it all.
I'd like to use them but perhaps modify the old ones, borrowing from Astropy.
Dear @ariddell,
Newcomers would welcome a "Setup" section in the README. I have followed these steps to install the version I'm also hacking on:
$ conda create -n pystan-next python=3.6 numpy scipy cython -c conda-forge
$ source activate pystan-next
$ pip install -r test-requirements.txt
$ pip install -e .
But I couldn't run the test suite successfully. Running
$ python -m pytest
gave OS errors related to servers and processes ([Errno 98] Address already in use
).
I haven't dived into the internals, I would like to work on #338 (comment) directly!
Thank you,
Marianne
Before sending data
to the backend (httpstan), it needs to be:
array_var_context
(see https://github.com/stan-dev/httpstan/blob/master/httpstan/utils.py)Checking for this needs to be done on the pystan(-next) side in order to give the user useful feedback.
Relevant tests are in PyStan 2's test_extract.py
. Should be relatively easy to do.
I myself have never used this. Is it widely used?
PyStan 3 should probably pin to a minor version of httpstan so users do not accidentally switch versions of Stan (e.g., 2.18 to 2.19).
Storing model_name
and fit_name
with the instances would do the trick.
See #75
PyStan 3 does not currently support user-specified initial values.
Adding an upgrading.rst
to the docs with basic tips would, I think, be a good idea. Here's an example of a package which has an upgrading page: https://marshmallow.readthedocs.io/en/3.0/upgrading.html
File "/usr/local/envs/testenv_3.6_PYSTAN_preview_PYRO_0.2.1_EMCEE_3/lib/python3.6/site-packages/marshmallow/schema.py", line 741, in _run_validator
validator_func(output, partial=partial, many=many)
TypeError: validate_values() got an unexpected keyword argument 'partial'
I think we should store all the data from the sampling straight to xarray.Dataset. This way the results are in a compact place and we have a good way to access it.
Not sure should also store functions there or only the data.
See. e.g. arviz-devs/arviz#97
@ahartikainen what is your PyPI username? The PyPI entries for pystan and httpstan could use a backup administrator.
Running httpstan from Jupyter Lab/Notebook fails due to jupyter is already running asyncio event
~\miniconda3\envs\stan3\lib\asyncio\base_events.py in run_forever(self)
427 raise RuntimeError(
--> 428 'Cannot run the event loop while another loop is running')
429 self._set_coroutine_wrapper(self._debug)
RuntimeError: Cannot run the event loop while another loop is running
Using IPython works (or I'm running this on Windows, and Python crash when I exit the python, so I can save the results with ArviZ to netCDF and use it later).
In Pystan 2 I get this message (no hard error):
DIAGNOSTIC(S) FROM PARSER:
Unknown variable: to_vector
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b8f2cb83530f4085fcbf89c7040888ab NOW.
The diagnostic message should be printed for the user in pystan 3.
I run this pystan 3.0 with the eight schools case, and find out an error:
File "/opt/python3/lib/python3.6/site-packages/stan/model.py", line 189, in build
raise RuntimeError(response_payload["error"]["message"])
RuntimeError: Error calling param_names: `Exception: variable does not exist; processing stage=data initialization; variable name=J; base type=int (in 'unknown file name' at line 3)
Could you please check it? Thank you.
The source code is :
import stan
program_code = """
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * eta[j];
}
model {
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
}
"""
data = {'J': 8,
'y': [28, 8, -3, 7, -1, 1, 18, 12],
'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}
posterior = stan.build(program_code, data=data)
fit = posterior.sample(num_chains=4, num_samples=1000)
Failing to pip install from source due to non-ascii char in README.rst (python 3.6 on Ubuntu (miniconda) docker)
Could (r) work instead of ®.
Warnings printed after sampling is finished are not shown to user. For example (PyStan 2):
Elapsed Time: 3.53326 seconds (Warm-up)
5.25664 seconds (Sampling)
8.7899 seconds (Total)
WARNING:pystan:174 of 500 iterations ended with a divergence (34.8 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
WARNING:pystan:294 of 500 iterations saturated the maximum tree depth of 10 (58.8 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation
Steps to reproduce:
posterior.sample(num_chains=4, num_samples=1000, seed=1)
This one is particularly annoying since there's an error message for each thread.
Title says it all. Much of the (ugly) code can be imported from PyStan 2.
There's a new trick in Python 3 which is tailor-made for calls to the sampling functions and any functions which have lots of arguments. It is possible to require that arguments be passed by keyword.
Details: https://www.python.org/dev/peps/pep-3102/
I believe one uses it like this:
def hmc_nuts(*, iter, warmup, adapt, kwarg1, kwarg2, kwarg3, kwarg4):
Title says it all. Code can be copied from PyStan 2.
.sample
doesn't show tqdm. What is the current situation with it?
Can we follow sampling in httpstan, and just send iter count info over to pystan which then could update the graph?
See the configuration used by httpstan.
Travis needs .travis.yml
-file.
Could not find .travis.yml, using standard configuration
Tests from the old pystan repository need to be translated into pystan 3.
test_basic_normal.py
and test_basic_bernoulli.py
)If an integer variable for data
is out of bounds, httpstan throws the right exception:
ValueError: Exception: _7b946e826d3147f9a1b54029a1a8d47e28f04a956afab3f87ecfa5fba97b478e_namespace::_7b946e826d3147f9a1b54029a1a8d47e28f04a956afab3f87ecfa5fba97b478e: player0[k0__] is 0, but must be greater than or equal to 1 (in 'unknown file name' at line 6)
but this doesn't arrive at pystan in an aesthetically pleasing form.
Hi,
could we add model.program_code
maybe other information too, so one can infer the model and recreate the model without model instance.
They could to dict under fit.model_info
?
Travis supports this:
https://docs.travis-ci.com/user/deployment/pypi/
Since pystan 3 is a universal wheel, we only need to upload a single version.
Currently, the shutdown procedure is not called if model_string is invalid. This keeps the port (8080) locked.
import pystan
model_string = "parameters {vector y;} model {y ~ cauchy(0,1);}"
program = pystan.compile(model_string, {})
# AssertionError is raised
model_string2 = "parameters {vector[10] y;} model {y ~ cauchy(0,1);}"
program2 = pystan.compile(model_string2, {})
# OSError is raised
# OSError: [Errno 48] error while attempting to bind on address ('127.0.0.1', 8080): address already in use
One solution is to add try-finally
blocks for post
and post_aiter
functions.
On Windows, trying to recompile previously run model (closed and reopened ipython between the runs) fails with
In [3]: posterior = stan.build(program_code, data=data)
ERROR:aiohttp.server:Error handling request
Traceback (most recent call last):
File "C:\Users\user\miniconda3\envs\stan3\lib\site-packages\aiohttp\web_protocol.py", line 418, in start
resp = await task
File "C:\Users\user\miniconda3\envs\stan3\lib\site-packages\aiohttp\web_app.py", line 458, in _handle
resp = await handler(request)
File "c:\users\user\github\httpstan\httpstan\views.py", line 154, in handle_show_params
model_module = httpstan.models.import_model_extension_module(model_name, module_bytes)
File "c:\users\user\github\httpstan\httpstan\models.py", line 172, in import_model_extension_module
return _import_module(module_name, module_path)
File "c:\users\user\github\httpstan\httpstan\models.py", line 136, in _import_module
module = importlib.import_module(module_name)
File "C:\Users\user\miniconda3\envs\stan3\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'model_d968dc8b91'
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-3-249fe4204f1a> in <module>
----> 1 posterior = stan.build(program_code, data=data)
c:\users\user\github\pystan-next\stan\model.py in build(program_code, data, random_seed)
222 path, payload = f"/v1/{model_name}/params", {"data": data}
223 response = requests.post(f"http://{host}:{port}{path}", json=payload)
--> 224 response_payload = response.json()
225 if response.status_code != 200:
226 raise RuntimeError(response_payload["message"])
~\miniconda3\envs\stan3\lib\site-packages\requests\models.py in json(self, **kwargs)
895 # used.
896 pass
--> 897 return complexjson.loads(self.text, **kwargs)
898
899 @property
~\miniconda3\envs\stan3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
~\miniconda3\envs\stan3\lib\json\decoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344
JSONDecodeError: Extra data: line 1 column 5 (char 4)
After removing httpstan cached models folder and database, recompilation works.
C:\Users\user\AppData\Local\httpstan\httpstan\Cache\0.8.1
There is a list of flake8 errors which should be ignored. Error codes are listed in the black README. They all need to be added to tox.ini
Following the conventions of pandas
or sklearn
seems like it would be a good idea here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.