From <a class="user-mention notranslate" data-hovercard-type="user" data-hover

No On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry <a href="mailto:noti

pairs plot,about stan-dev/shinystan

Comments (11)

bgoodri commented on May 26, 2024

This should be prioritized now that rstan 2.7.0 has new default settings. I think of it primarily as a sampling diagnostic rather than a way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of the documentation, it now splits the lower-triangle from the upper-triangle by median accept_stat__ and overplots the divergent transitions (in red) and the transitions where it exceeds max_treedepth (in yellow). So, you can see whether the problems are concentrated in the tails or near the mode and which element(s) of the control list you need to tweak.

from shinystan.

jgabry commented on May 26, 2024

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you can
see whether the problems are concentrated in the tails or near the mode and
which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri commented on May 26, 2024

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub
#9 (comment).

—
Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri commented on May 26, 2024

It is not so slow if you select a reasonably small number of parameters and
if you select an unreasonably large number of parameters it is both slow
and impossible to visualize anything.

On Sat, Jul 18, 2015 at 1:30 PM, Ben Goodrich [email protected]
wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more
efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected]
wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub
<#9 (comment)
.

—
Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bob-carpenter commented on May 26, 2024

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings. I think of it primarily as a sampling diagnostic rather than a way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of the documentation, it now splits the lower-triangle from the upper-triangle by median accept_stat__ and overplots the divergent transitions (in red) and the transitions where it exceeds max_treedepth (in yellow). So, you can see whether the problems are concentrated in the tails or near the mode and which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub.

from shinystan.

bob-carpenter commented on May 26, 2024

It does take a long time if there are more than a handful of
variables. I killed a live demo in Sydney by forgetting to turn
off pairs() in the script and upping the predictors to 200 or so.

So maybe it should just do the first 5 parameters or something by
default with a warning to the user that if they want more, they can
specify the parameters explicitly?

On Jul 18, 2015, at 10:30 AM, bgoodri [email protected] wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was really
really slow to produce the pairs plots. Is the new function more efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in red)
and the transitions where it exceeds max_treedepth (in yellow). So, you
can
see whether the problems are concentrated in the tails or near the mode
and
which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub
#9 (comment).

—
Reply to this email directly or view it on GitHub
#9 (comment).

—
Reply to this email directly or view it on GitHub.

from shinystan.

bgoodri commented on May 26, 2024

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bgoodri commented on May 26, 2024

Perhaps shinystan should not do all variables by default because it is easy
to select variables from a list before the plot fires. But for rstan I
worry that if it just does the first 5, then people will say "everything
looks okay" and move on. You really need to be looking at lp__ and
hyperparameters more than a bunch of regression coefficients.

On Sat, Jul 18, 2015 at 1:54 PM, Bob Carpenter [email protected]
wrote:

It does take a long time if there are more than a handful of
variables. I killed a live demo in Sydney by forgetting to turn
off pairs() in the script and upping the predictors to 200 or so.

So maybe it should just do the first 5 parameters or something by
default with a warning to the user that if they want more, they can
specify the parameters explicitly?

Bob

On Jul 18, 2015, at 10:30 AM, bgoodri [email protected] wrote:

No

On Sat, Jul 18, 2015 at 1:28 PM, Jonah Gabry [email protected]
wrote:

Cool. I agree 100%. The problem I was having before was that it was
really
really slow to produce the pairs plots. Is the new function more
efficient?

On Sat, Jul 18, 2015 at 1:23 PM, bgoodri [email protected]
wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings.
I think of it primarily as a sampling diagnostic rather than a way of
visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of
the documentation, it now splits the lower-triangle from the
upper-triangle
by median accept_stat__ and overplots the divergent transitions (in
red)
and the transitions where it exceeds max_treedepth (in yellow). So,
you
can
see whether the problems are concentrated in the tails or near the
mode
and
which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub
<
https://github.com/stan-dev/shinystan/issues/9#issuecomment-122569533>.

—
Reply to this email directly or view it on GitHub
<#9 (comment)
.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub
#9 (comment).

from shinystan.

bob-carpenter commented on May 26, 2024

I agree that your plot's more helpful diagnostically overall.

I think it helps diagnose non-identifiability issues and the
posterior scatterplot's interesting to see in its own right to
understand how a well-formulated model's variables relate to one
another.

I don't care about seeing the linear correlation as much as
I want to see how the variables relate in the posterior. Of course
in ShinyStan we can do it with three variables, which is super cool,
but we want O(N^3) plots even less than we want O(N^2) ones.

On Jul 18, 2015, at 2:02 PM, bgoodri [email protected] wrote:

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub
#9 (comment).

—
Reply to this email directly or view it on GitHub.

from shinystan.

betanalpha commented on May 26, 2024

I think of these plots as not being so much diagnostic
but rather exploratory. Counting n_divergent and
histogram’ing max_treedepth provide the immediate
diagnostics but the pairs plots are a great first step
towards identifying exactly what is causing those
diagnostics to fail. This means that they should
require at least some thought from the user, and
I would personally be fine with pairs requiring
that the user specify variables explicitly (based
on, for example, prior suspicion).

On Jul 18, 2015, at 7:25 PM, Bob Carpenter [email protected] wrote:

I agree that your plot's more helpful diagnostically overall.

I think it helps diagnose non-identifiability issues and the
posterior scatterplot's interesting to see in its own right to
understand how a well-formulated model's variables relate to one
another.

I don't care about seeing the linear correlation as much as
I want to see how the variables relate in the posterior. Of course
in ShinyStan we can do it with three variables, which is super cool,
but we want O(N^3) plots even less than we want O(N^2) ones.

Bob

On Jul 18, 2015, at 2:02 PM, bgoodri [email protected] wrote:

Well, you could use pairs to visualize correlations or you could just do
cor(as.matrix(stanfit)) and see the numbers. But posterior correlation
isn't that interesting, per se, for HMC. Combined with changing variance,
it is something to worry about but if that is a concern, then it will
probably be manifested in transitions that diverge or exceed max_treedepth,
both of which are now evident from the pairs plot. Richard's version of the
pairs plot puts scatterplots in the upper triangle and the correlation
rounded to two digits in the lower triangle but to me, that makes it seem
as if there are some values of the bivariate correlation that the user
should freak out about.

On Sat, Jul 18, 2015 at 1:53 PM, Bob Carpenter [email protected]
wrote:

That sounds great --- I like the clear delineation of intent.

How would you recommend visualizing the correlations?

Bob

On Jul 18, 2015, at 10:23 AM, bgoodri [email protected] wrote:

This should be prioritized now that rstan 2.7.0 has new default
settings. I think of it primarily as a sampling diagnostic rather than a
way of visualizing cor(as.matrix(stanfit)). Anyway, as described at the end
of the documentation, it now splits the lower-triangle from the
upper-triangle by median accept_stat__ and overplots the divergent
transitions (in red) and the transitions where it exceeds max_treedepth (in
yellow). So, you can see whether the problems are concentrated in the tails
or near the mode and which element(s) of the control list you need to tweak.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub
#9 (comment).

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHub.

from shinystan.

statwonk commented on May 26, 2024

As a newcomer, I can confirm that the pairs plot is where it seems one should be focused to diagnose issues. The default of plotting all covariates is a blocker, at least it was for me until I learned about pars and include. I couldn't see the plot (R was choking) so it wasn't clear what arg I needed to use / change (pars). I'm interested by

You really need to be looking at lp__ and
hyperparameters more than a bunch of regression coefficients.

What exactly am I looking for? My expectation is something unimodal and smooth looking, but annotated visual examples of good and bad would go a long way towards helping me get started with stan. :)

from shinystan.

pairs plot about shinystan HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent