Coder Social home page Coder Social logo

pairs plot about shinystan HOT 11 CLOSED

stan-dev avatar stan-dev commented on May 26, 2024
pairs plot

from shinystan.

Comments (11)

bgoodri avatar bgoodri commented on May 26, 2024

This should be prioritized now that rstan 2.7.0 has new default settings. I think of it primarily as a sampling diagnostic rather than a way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of the documentation, it now splits the lower-triangle from the upper-triangle by median accept_stat__ and overplots the divergent transitions (in red) and the transitions where it exceeds max_treedepth (in yellow). So, you can see whether the problems are concentrated in the tails or near the mode and which element(s) of the control list you need to tweak.

from shinystan.

jgabry avatar jgabry commented on May 26, 2024

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you can
see whether the problems are concentrated in the tails or near the mode and
which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri avatar bgoodri commented on May 26, 2024

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri avatar bgoodri commented on May 26, 2024

It is not so slow if you select a reasonably small number of parameters and
if you select an unreasonably large number of parameters it is both slow
and impossible to visualize anything.

On Sat, Jul 18, 2015 at 1:30 PM, Ben Goodrich [email protected]
wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more
efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected]
wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub
<#9 (comment)
.


Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bob-carpenter avatar bob-carpenter commented on May 26, 2024

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

  • Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings. I think of it primarily as a sampling diagnostic rather than a way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of the documentation, it now splits the lower-triangle from the upper-triangle by median accept_stat__ and overplots the divergent transitions (in red) and the transitions where it exceeds max_treedepth (in yellow). So, you can see whether the problems are concentrated in the tails or near the mode and which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub.

from shinystan.

bob-carpenter avatar bob-carpenter commented on May 26, 2024

It does take a long time if there are more than a handful of
variables. I killed a live demo in Sydney by forgetting to turn
off pairs() in the script and upping the predictors to 200 or so.

So maybe it should just do the first 5 parameters or something by
default with a warning to the user that if they want more, they can
specify the parameters explicitly?

  • Bob

On Jul 18, 2015, at 10:30 AM, bgoodri [email protected] wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub.

from shinystan.

bgoodri avatar bgoodri commented on May 26, 2024

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

  • Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri avatar bgoodri commented on May 26, 2024

Perhaps shinystan should not do all variables by default because it is easy
to select variables from a list before the plot fires. But for rstan I
worry that if it just does the first 5, then people will say "everything
looks okay" and move on. You really need to be looking at lp__ and
hyperparameters more than a bunch of regression coefficients.

On Sat, Jul 18, 2015 at 1:54 PM, Bob Carpenter [email protected]
wrote:

It does take a long time if there are more than a handful of
variables. I killed a live demo in Sydney by forgetting to turn
off pairs() in the script and upping the predictors to 200 or so.

So maybe it should just do the first 5 parameters or something by
default with a warning to the user that if they want more, they can
specify the parameters explicitly?

  • Bob

On Jul 18, 2015, at 10:30 AM, bgoodri [email protected] wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was
really
really slow to produce the pairs plots. Is the new function more
efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected]
wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in
red)
and the transitions where it exceeds max_treedepth (in yellow). So,
you
can
see whether the problems are concentrated in the tails or near the
mode
and
which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub
<
https://github.com/stan-dev/shinystan/issues/9#issuecomment-122569533>.


Reply to this email directly or view it on GitHub
<#9 (comment)
.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bob-carpenter avatar bob-carpenter commented on May 26, 2024

I agree that your plot's more helpful diagnostically overall.

I think it helps diagnose non-identifiability issues and the
posterior scatterplot's interesting to see in its own right to
understand how a well-formulated model's variables relate to one
another.

I don't care about seeing the linear correlation as much as
I want to see how the variables relate in the posterior. Of course
in ShinyStan we can do it with three variables, which is super cool,
but we want O(N^3) plots even less than we want O(N^2) ones.

  • Bob

On Jul 18, 2015, at 2:02 PM, bgoodri [email protected] wrote:

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

  • Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub.

from shinystan.

betanalpha avatar betanalpha commented on May 26, 2024

I think of these plots as not being so much diagnostic
but rather exploratory. Counting n_divergent and
histogram’ing max_treedepth provide the immediate
diagnostics but the pairs plots are a great first step
towards identifying exactly what is causing those
diagnostics to fail. This means that they should
require at least some thought from the user, and
I would personally be fine with pairs requiring
that the user specify variables explicitly (based
on, for example, prior suspicion).

On Jul 18, 2015, at 7:25 PM, Bob Carpenter [email protected] wrote:

I agree that your plot's more helpful diagnostically overall.

I think it helps diagnose non-identifiability issues and the
posterior scatterplot's interesting to see in its own right to
understand how a well-formulated model's variables relate to one
another.

I don't care about seeing the linear correlation as much as
I want to see how the variables relate in the posterior. Of course
in ShinyStan we can do it with three variables, which is super cool,
but we want O(N^3) plots even less than we want O(N^2) ones.

  • Bob

On Jul 18, 2015, at 2:02 PM, bgoodri [email protected] wrote:

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

  • Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub.

from shinystan.

statwonk avatar statwonk commented on May 26, 2024

As a newcomer, I can confirm that the pairs plot is where it seems one should be focused to diagnose issues. The default of plotting all covariates is a blocker, at least it was for me until I learned about pars and include. I couldn't see the plot (R was choking) so it wasn't clear what arg I needed to use / change (pars). I'm interested by

You really need to be looking at lp__ and
hyperparameters more than a bunch of regression coefficients.

What exactly am I looking for? My expectation is something unimodal and smooth looking, but annotated visual examples of good and bad would go a long way towards helping me get started with stan. :)

from shinystan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.