hneth / riskyr Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 1.0 469.17 MB

A toolbox for rendering risk literacy more transparent

R 32.92% HTML 67.08%

2x2-matrix bayesian-inference contingency-table r r-package representation risk risk-literacy rstats visualization

riskyr's People

Contributors

Stargazers

Watchers

Forkers

nigradwohl

riskyr's Issues

Include vignettes and package loading message with link to main package guide

Hey guys, it looks like you guys have made tremendous progress on riskyr! I just installed it in the hopes of playing around, but without a package guide I wasn't sure how to proceed.

Do you guys plan to create one soon? It looks like you have plenty of existing documentation on GitHub, so it probably wouldn't be difficult to port it over to your package.

See my FFTrees package guide for an example: https://github.com/ndphillips/FFTrees/blob/master/vignettes/guide.Rmd

Looking forward to playing around with riskyr!!

Pass frequencies (instead of probabilities) to objects

To define a riskyr object, we currently pass 3 essential probabilities (prev, sens, spec). However, we also have functions translating from probabilities into frequencies (and vice versa). Hence, why not allow passing any of these to define a riskyr object?

Representing incomplete scenarios

Related to the issues of visualizing uncertainty and representing changes:

How can we express and depict incomplete or partial scenarios, in which some parameters are known, but others are unknown or may be irrelevant?

For instance, many well-known problems involving conditional probabilities (e.g., see the so-called prosecutor's fallacy) can be visualized and explained by showing partial frequency trees (with only 1 main branch being of interest). The confusion typically results from a (mis-)interpretation of 2 different conditional probabilities.

Without a way of plotting incomplete scenarios, we have no means of representing such problems.

Allow plotting riskyr objects

At the moment, we distinguish between plotting riskyr objects (via the plot.riskyr method) or using low-level plotting functions with parameter inputs (e.g., prev, sens, spec). Why not allow the low level functions accepting riskyr objects as well?

Generalize plots from 2 to 3 perspectives

Many riskyr plots (e.g., plot_fnet, plot_tree, and plot_mosaic) currently allow choosing between 2 perspectives (by splitting the population into 2 sub-groups by either condition or decision, i.e., by = "cd" vs. by = "dc"). Adding accuracy (by = "ac") as a 3rd perspective would support 3 x 2 different versions of each plot.

Turn riskyr into a package?

Hey Hans, I see you're still hard at work on riskyr, that's great!

I was just looking at the repository and wanted to suggest that you restructure the project as an R package. Wickham has a great tutorial here http://r-pkgs.had.co.nz/ on creating R packages, and here's one on how to include Shiny apps https://deanattali.com/2015/04/21/r-package-shiny-app/

I'd be happy to try and do it myself, but I really have to finish other things this month as you might have guessed :)

Visualizing uncertainty

As of now, only plot_curve has an option for expressing uncertainty about parameter values (by setting the uc argument to a percentage value, resulting in ranges of uncertainty around a given parameter value). One could argue that uc — by being a numeric value — actually represents a form of risk. But irrespective of semantics, it would be desirable to include some means of expressing imprecision or vagueness in the other representations.

Perhaps a simple solution (or fall-back option) would be to specify parameter ranges (e.g., from min to max) and then create 2 graphs that represent the best and worst case scenarios?

Distinguish between 2 types of scaling

When plotting frequencies as graphical objects (lines, boxes, or squares), their dimensions can be scaled by magnitude (e.g., plot_fnet with area = "sq", or the new plot_bar function). When rounding frequencies to integers (as per default), the scaled graph may divert from the underlying probabilities (especially for small population sizes N). In the extreme, small frequencies may be rounded to 0 and disappear from plots.

To control this effect, introduce a scale option that defines whether objects are scaled by (rounded or non-rounded) frequencies or by (exact) probabilities. (See plot_bar for a first implementation and generalize to other plots.)

Determine necessary and sufficient conditions for a well-defined riskyr scenario

We know that providing 3 essential probabilities (prev, sens, spec) OR providing 4 essential frequencies (hi, mi, fa, cr) fully specify a scenario. However, which combination of (any) probabilities and frequencies is necessary or sufficient?

(Looking at a network diagram should tell us which parts are independent from vs. dependent on each other. Interestingly, probabilities allow abstracting from frequencies, but not vice versa...)

Defining scenarios by description vs. from data/cases (i.e., by experience)

Idea

riskyr currently assumes that scenarios are defined by 3 essential probabilities (typically prev, sens, and spec or fart, plus some population size N) or 4 essential frequencies (typically hi, mi, fa, and cr).

A more flexible setup would allow defining scenarios either from parameters (i.e., "by description) or from data or cases (i.e., "by experience").

By description: Define a scenario from parameters (to create/simulate cases):
- provide 4 essential frequencies (i.e., specifying the result)
- provide 3 essential probabilities, N, and round to exact frequencies
- provide 3 essential probabilities, N, and sample from given probabilities
By experience: Define scenario from data or cases (to compute/extract parameters):
- provide binary data frame of cases (and frame 2x2 matrix)
- provide non-binary data frame of cases and a criterion to be maximized to binarize predictor variable

ToDo

See comp_popu() for a first function that generates data/cases (as df popu) from one type of description:

from 4 essential frequencies

Add option for generating corresponding simulations: Generate popu (as df):

from probabilities and N (using exact or rounded values)
from probabilities and N (and sample() from N)

Define a complementary function desc_data() that generates the description from (binary or binarized) data or cases.

Representing changes

Related to the issue of visualizing uncertainty, it would be desirable to visualize changes in parameter values. For instance, if a condition's prevalence (prev) or a test's sensitivity (sens) changed by some percentage, how would this affect the entire scenario or some other parameter (e.g., PPV, hi, acc)?

Again, this could easily be expressed by 2 distinct representations (pre- vs. post-change). Are there better ways to integrate the effects of changes into 1 representation (similarly to showing ranges of variability in some types of plots)?

Suggestion: An interactive simulation?

The riskyr package is looking great!

As I was looking through the (very good!) documentation and examples on GitHub, I thought of one other communication tool you might consider. Namely, creating an interactive Shiny application that allows people to interactively sample cases and observe outcomes. I know this technique has gotten some buzz in jdm and forecasting areas recently (though I can't remember the exact papers right now..)

There are many ways you could do this, but one way would be a screen like this one:

every time the user clicks a "Next case" button, a ball falling from 'the sky' which are either Red (true positive) or Blue (true negative) cases. Then, they cross a decision line, where they are classified as either positive or negative based on the sensitivity / specificity of the test.

Positive classifications go to the left, and negative classifications go to the right.

Over time, the ever-growing group of classified cases would form an icon array.

Just an idea :)