Coder Social home page Coder Social logo

indrajeetpatil / ggstatsplot Goto Github PK

View Code? Open in Web Editor NEW
1.9K 44.0 179.0 2.19 GB

Enhancing {ggplot2} plots with statistical analysis πŸ“ŠπŸ“£

Home Page: https://indrajeetpatil.github.io/ggstatsplot/

License: GNU General Public License v3.0

R 90.20% TeX 9.71% Shell 0.09%
ggplot-extension dataviz r statistical-analysis datascience bayes-factors regression-models effect-size non-parametric-statistics hypothesis-testing

ggstatsplot's Introduction

{ggstatsplot}: {ggplot2} Based Plots with Statistical Details

Status Usage Miscellaneous
R build status Total downloads codecov
lifecycle Daily downloads DOI

Raison d’Γͺtre

β€œWhat is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather … the revelation of the complex.” - Edward R. Tufte

{ggstatsplot} is an extension of {ggplot2} package for creating graphics with details from statistical tests included in the information-rich plots themselves. In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling in its turn can suggest a different visualization method, and so on and so forth. The central idea of {ggstatsplot} is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster.

Installation

Type Source Command
Release CRAN Status install.packages("ggstatsplot")
Development Project Status pak::pak("IndrajeetPatil/ggstatsplot")

Citation

If you want to cite this package in a scientific journal or in any other context, run the following code in your R console:

citation("ggstatsplot")
To cite package 'ggstatsplot' in publications use:

  Patil, I. (2021). Visualizations with statistical details: The
  'ggstatsplot' approach. Journal of Open Source Software, 6(61), 3167,
  doi:10.21105/joss.03167

A BibTeX entry for LaTeX users is

  @Article{,
    doi = {10.21105/joss.03167},
    url = {https://doi.org/10.21105/joss.03167},
    year = {2021},
    publisher = {{The Open Journal}},
    volume = {6},
    number = {61},
    pages = {3167},
    author = {Indrajeet Patil},
    title = {{Visualizations with statistical details: The {'ggstatsplot'} approach}},
    journal = {{Journal of Open Source Software}},
  }

Acknowledgments

I would like to thank all the contributors to {ggstatsplot} who pointed out bugs or requested features I hadn’t considered. I would especially like to thank other package developers (especially Daniel LΓΌdecke, Dominique Makowski, Mattan S. Ben-Shachar, Brenton Wiernik, Patrick Mair, Salvatore Mangiafico, etc.) who have patiently and diligently answered my relentless questions and supported feature requests in their projects. I also want to thank Chuck Powell for his initial contributions to the package.

The hexsticker was generously designed by Sarah Otterstetter (Max Planck Institute for Human Development, Berlin). This package has also benefited from the larger #rstats community on Twitter, LinkedIn, and StackOverflow.

Thanks are also due to my postdoc advisers (Mina Cikara and Fiery Cushman at Harvard University; Iyad Rahwan at Max Planck Institute for Human Development) who patiently supported me spending hundreds (?) of hours working on this package rather than what I was paid to do. 😁

Documentation and Examples

To see the detailed documentation for each function in the stable CRAN version of the package, see:

Summary of available plots

Function Plot Description
ggbetweenstats() violin plots for comparisons between groups/conditions
ggwithinstats() violin plots for comparisons within groups/conditions
gghistostats() histograms for distribution about numeric variable
ggdotplotstats() dot plots/charts for distribution about labeled numeric variable
ggscatterstats() scatterplots for correlation between two variables
ggcorrmat() correlation matrices for correlations between multiple variables
ggpiestats() pie charts for categorical data
ggbarstats() bar charts for categorical data
ggcoefstats() dot-and-whisker plots for regression models and meta-analysis

In addition to these basic plots, {ggstatsplot} also provides grouped_ versions (see below) that makes it easy to repeat the same analysis for any grouping variable.

Summary of types of statistical analyses

The table below summarizes all the different types of analyses currently supported in this package-

Functions Description Parametric Non-parametric Robust Bayesian
ggbetweenstats() Between group/condition comparisons βœ… βœ… βœ… βœ…
ggwithinstats() Within group/condition comparisons βœ… βœ… βœ… βœ…
gghistostats(), ggdotplotstats() Distribution of a numeric variable βœ… βœ… βœ… βœ…
ggcorrmat Correlation matrix βœ… βœ… βœ… βœ…
ggscatterstats() Correlation between two variables βœ… βœ… βœ… βœ…
ggpiestats(), ggbarstats() Association between categorical variables βœ… βœ… ❌ βœ…
ggpiestats(), ggbarstats() Equal proportions for categorical variable levels βœ… βœ… ❌ βœ…
ggcoefstats() Regression model coefficients βœ… βœ… βœ… βœ…
ggcoefstats() Random-effects meta-analysis βœ… ❌ βœ… βœ…

Summary of Bayesian analysis

Analysis Hypothesis testing Estimation
(one/two-sample) t-test βœ… βœ…
one-way ANOVA βœ… βœ…
correlation βœ… βœ…
(one/two-way) contingency table βœ… βœ…
random-effects meta-analysis βœ… βœ…

Statistical reporting

For all statistical tests reported in the plots, the default template abides by the gold standard for statistical reporting. For example, here are results from Yuen’s test for trimmed means (robust t-test):

Summary of statistical tests and effect sizes

Statistical analysis is carried out by {statsExpressions} package, and thus a summary table of all the statistical tests currently supported across various functions can be found in article for that package: https://indrajeetpatil.github.io/statsExpressions/articles/stats_details.html

Primary functions

ggbetweenstats()

This function creates either a violin plot, a box plot, or a mix of two for between-group or between-condition comparisons with results from statistical tests in the subtitle. The simplest function call looks like this-

set.seed(123)

ggbetweenstats(
  data  = iris,
  x     = Species,
  y     = Sepal.Length,
  title = "Distribution of sepal length across Iris species"
)

Defaults return

βœ… raw data + distributions
βœ… descriptive statistics
βœ… inferential statistics
βœ… effect size + CIs
βœ… pairwise comparisons
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

A number of other arguments can be specified to make this plot even more informative or change some of the default options. Additionally, there is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable:

set.seed(123)

grouped_ggbetweenstats(
  data             = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
  x                = mpaa,
  y                = length,
  grouping.var     = genre,
  ggsignif.args    = list(textsize = 4, tip_length = 0.01),
  p.adjust.method  = "bonferroni",
  palette          = "default_jama",
  package          = "ggsci",
  plotgrid.args    = list(nrow = 1),
  annotation.args  = list(title = "Differences in movie length by mpaa ratings for different genres")
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggbetweenstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbetweenstats.html

ggwithinstats()

ggbetweenstats() function has an identical twin function ggwithinstats() for repeated measures designs that behaves in the same fashion with a few minor tweaks introduced to properly visualize the repeated measures design. As can be seen from an example below, the only difference between the plot structure is that now the group means are connected by paths to highlight the fact that these data are paired with each other.

set.seed(123)
library(WRS2) ## for data
library(afex) ## to run ANOVA

ggwithinstats(
  data    = WineTasting,
  x       = Wine,
  y       = Taste,
  title   = "Wine tasting"
)

Defaults return

βœ… raw data + distributions
βœ… descriptive statistics
βœ… inferential statistics
βœ… effect size + CIs
βœ… pairwise comparisons
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

As with the ggbetweenstats(), this function also has a grouped_ variant that makes repeating the same analysis across a single grouping variable quicker. We will see an example with only repeated measurements-

set.seed(123)

grouped_ggwithinstats(
  data            = dplyr::filter(bugs_long, region %in% c("Europe", "North America"), condition %in% c("LDLF", "LDHF")),
  x               = condition,
  y               = desire,
  type            = "np",
  xlab            = "Condition",
  ylab            = "Desire to kill an artrhopod",
  grouping.var    = region
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggwithinstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggwithinstats.html

gghistostats()

To visualize the distribution of a single variable and check if its mean is significantly different from a specified value with a one-sample test, gghistostats() can be used.

set.seed(123)

gghistostats(
  data       = ggplot2::msleep,
  x          = awake,
  title      = "Amount of time spent awake",
  test.value = 12,
  binwidth   = 1
)

Defaults return

βœ… counts + proportion for bins
βœ… descriptive statistics
βœ… inferential statistics
βœ… effect size + CIs
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

There is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable:

set.seed(123)

grouped_gghistostats(
  data              = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
  x                 = budget,
  test.value        = 50,
  type              = "nonparametric",
  xlab              = "Movies budget (in million US$)",
  grouping.var      = genre,
  normal.curve      = TRUE,
  normal.curve.args = list(color = "red", size = 1),
  ggtheme           = ggthemes::theme_tufte(),
  ## modify the defaults from `{ggstatsplot}` for each plot
  plotgrid.args     = list(nrow = 1),
  annotation.args   = list(title = "Movies budgets for different genres")
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/gghistostats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/gghistostats.html

ggdotplotstats()

This function is similar to gghistostats(), but is intended to be used when the numeric variable also has a label.

set.seed(123)

ggdotplotstats(
  data       = dplyr::filter(gapminder::gapminder, continent == "Asia"),
  y          = country,
  x          = lifeExp,
  test.value = 55,
  type       = "robust",
  title      = "Distribution of life expectancy in Asian continent",
  xlab       = "Life expectancy"
)

Defaults return

βœ… descriptives (mean + sample size)
βœ… inferential statistics
βœ… effect size + CIs
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

As with the rest of the functions in this package, there is also a grouped_ variant of this function to facilitate looping the same operation for all levels of a single grouping variable.

set.seed(123)

grouped_ggdotplotstats(
  data            = dplyr::filter(ggplot2::mpg, cyl %in% c("4", "6")),
  x               = cty,
  y               = manufacturer,
  type            = "bayes",
  xlab            = "city miles per gallon",
  ylab            = "car manufacturer",
  grouping.var    = cyl,
  test.value      = 15.5,
  point.args      = list(color = "red", size = 5, shape = 13),
  annotation.args = list(title = "Fuel economy data")
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggdotplotstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggdotplotstats.html

ggscatterstats()

This function creates a scatterplot with marginal distributions overlaid on the axes and results from statistical tests in the subtitle:

ggscatterstats(
  data  = ggplot2::msleep,
  x     = sleep_rem,
  y     = awake,
  xlab  = "REM sleep (in hours)",
  ylab  = "Amount of time spent awake (in hours)",
  title = "Understanding mammalian sleep"
)

Defaults return

βœ… raw data + distributions
βœ… marginal distributions
βœ… inferential statistics
βœ… effect size + CIs
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

There is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable.

set.seed(123)

grouped_ggscatterstats(
  data             = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
  x                = rating,
  y                = length,
  grouping.var     = genre,
  label.var        = title,
  label.expression = length > 200,
  xlab             = "IMDB rating",
  ggtheme          = ggplot2::theme_grey(),
  ggplot.component = list(ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))),
  plotgrid.args    = list(nrow = 1),
  annotation.args  = list(title = "Relationship between movie length and IMDB ratings")
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggscatterstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html

ggcorrmat

ggcorrmat makes a correlalogram (a matrix of correlation coefficients) with minimal amount of code. Just sticking to the defaults itself produces publication-ready correlation matrices. But, for the sake of exploring the available options, let’s change some of the defaults. For example, multiple aesthetics-related arguments can be modified to change the appearance of the correlation matrix.

set.seed(123)

## as a default this function outputs a correlation matrix plot
ggcorrmat(
  data     = ggplot2::msleep,
  colors   = c("#B2182B", "white", "#4D4D4D"),
  title    = "Correlalogram for mammals sleep dataset",
  subtitle = "sleep units: hours; weight units: kilograms"
)

Defaults return

βœ… effect size + significance
βœ… careful handling of NAs

If there are NAs present in the selected variables, the legend will display minimum, median, and maximum number of pairs used for correlation tests.

There is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable:

set.seed(123)

grouped_ggcorrmat(
  data         = dplyr::filter(movies_long, genre %in% c("Action", "Comedy")),
  type         = "robust",
  colors       = c("#cbac43", "white", "#550000"),
  grouping.var = genre,
  matrix.type  = "lower"
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggcorrmat.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcorrmat.html

ggpiestats()

This function creates a pie chart for categorical or nominal variables with results from contingency table analysis (Pearson’s chi-squared test for between-subjects design and McNemar’s chi-squared test for within-subjects design) included in the subtitle of the plot. If only one categorical variable is entered, results from one-sample proportion test (i.e., a chi-squared goodness of fit test) will be displayed as a subtitle.

To study an interaction between two categorical variables:

set.seed(123)

ggpiestats(
  data         = mtcars,
  x            = am,
  y            = cyl,
  package      = "wesanderson",
  palette      = "Royal1",
  title        = "Dataset: Motor Trend Car Road Tests",
  legend.title = "Transmission"
)

Defaults return

βœ… descriptives (frequency + %s)
βœ… inferential statistics
βœ… effect size + CIs
βœ… Goodness-of-fit tests
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

There is also a grouped_ variant of this function that makes it easy to repeat the same operation across a single grouping variable. Following example is a case where the theoretical question is about proportions for different levels of a single nominal variable:

set.seed(123)

grouped_ggpiestats(
  data         = mtcars,
  x            = cyl,
  grouping.var = am,
  label.repel  = TRUE,
  package      = "ggsci",
  palette      = "default_ucscgb"
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggpiestats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggpiestats.html

ggbarstats()

In case you are not a fan of pie charts (for very good reasons), you can alternatively use ggbarstats() function which has a similar syntax.

N.B. The p-values from one-sample proportion test are displayed on top of each bar.

set.seed(123)
library(ggplot2)

ggbarstats(
  data             = movies_long,
  x                = mpaa,
  y                = genre,
  title            = "MPAA Ratings by Genre",
  xlab             = "movie genre",
  legend.title     = "MPAA rating",
  ggplot.component = list(ggplot2::scale_x_discrete(guide = ggplot2::guide_axis(n.dodge = 2))),
  palette          = "Set2"
)

Defaults return

βœ… descriptives (frequency + %s)
βœ… inferential statistics
βœ… effect size + CIs
βœ… Goodness-of-fit tests
βœ… Bayesian hypothesis-testing
βœ… Bayesian estimation

And, needless to say, there is also a grouped_ variant of this function-

## setup
set.seed(123)

grouped_ggbarstats(
  data         = mtcars,
  x            = am,
  y            = cyl,
  grouping.var = vs,
  package      = "wesanderson",
  palette      = "Darjeeling2" # ,
  # ggtheme      = ggthemes::theme_tufte(base_size = 12)
)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggbarstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggbarstats.html

ggcoefstats()

The function ggcoefstats() generates dot-and-whisker plots for regression models saved in a tidy data frame. The tidy data frames are prepared using parameters::model_parameters(). Additionally, if available, the model summary indices are also extracted from performance::model_performance().

Although the statistical models displayed in the plot may differ based on the class of models being investigated, there are few aspects of the plot that will be invariant across models:

  • The dot-whisker plot contains a dot representing the estimate and their confidence intervals (95% is the default). The estimate can either be effect sizes (for tests that depend on the F-statistic) or regression coefficients (for tests with t-, $\chi^{2}$-, and z-statistic), etc. The function will, by default, display a helpful x-axis label that should clear up what estimates are being displayed. The confidence intervals can sometimes be asymmetric if bootstrapping was used.

  • The label attached to dot will provide more details from the statistical test carried out and it will typically contain estimate, statistic, and p-value.e

  • The caption will contain diagnostic information, if available, about models that can be useful for model selection: The smaller the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC) values, the β€œbetter” the model is.

  • The output of this function will be a {ggplot2} object and, thus, it can be further modified (e.g.Β change themes) with {ggplot2} functions.

set.seed(123)

## model
mod <- stats::lm(formula = mpg ~ am * cyl, data = mtcars)

ggcoefstats(mod)

Defaults return

βœ… inferential statistics
βœ… estimate + CIs
βœ… model summary (AIC and BIC)

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggcoefstats.html

For more, also read the following vignette: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcoefstats.html

Extracting expressions and data frames with statistical details

{ggstatsplot} also offers a convenience function to extract data frames with statistical details that are used to create expressions displayed in {ggstatsplot} plots.

set.seed(123)

p <- ggbetweenstats(mtcars, cyl, mpg)

# extracting expression present in the subtitle
extract_subtitle(p)
#> list(italic("F")["Welch"](2, 18.03) == "31.62", italic(p) == 
#>     "1.27e-06", widehat(omega["p"]^2) == "0.74", CI["95%"] ~ 
#>     "[" * "0.53", "1.00" * "]", italic("n")["obs"] == "32")

# extracting expression present in the caption
extract_caption(p)
#> list(log[e] * (BF["01"]) == "-14.92", widehat(italic(R^"2"))["Bayesian"]^"posterior" == 
#>     "0.71", CI["95%"]^HDI ~ "[" * "0.57", "0.79" * "]", italic("r")["Cauchy"]^"JZS" == 
#>     "0.71")

# a list of tibbles containing statistical analysis summaries
extract_stats(p)
#> $subtitle_data
#> # A tibble: 1 Γ— 14
#>   statistic    df df.error    p.value
#>       <dbl> <dbl>    <dbl>      <dbl>
#> 1      31.6     2     18.0 0.00000127
#>   method                                                   effectsize estimate
#>   <chr>                                                    <chr>         <dbl>
#> 1 One-way analysis of means (not assuming equal variances) Omega2        0.744
#>   conf.level conf.low conf.high conf.method conf.distribution n.obs expression
#>        <dbl>    <dbl>     <dbl> <chr>       <chr>             <int> <list>    
#> 1       0.95    0.531         1 ncp         F                    32 <language>
#> 
#> $caption_data
#> # A tibble: 6 Γ— 17
#>   term     pd prior.distribution prior.location prior.scale     bf10
#>   <chr> <dbl> <chr>                       <dbl>       <dbl>    <dbl>
#> 1 mu    1     cauchy                          0       0.707 3008850.
#> 2 cyl-4 1     cauchy                          0       0.707 3008850.
#> 3 cyl-6 0.780 cauchy                          0       0.707 3008850.
#> 4 cyl-8 1     cauchy                          0       0.707 3008850.
#> 5 sig2  1     cauchy                          0       0.707 3008850.
#> 6 g_cyl 1     cauchy                          0       0.707 3008850.
#>   method                          log_e_bf10 effectsize         estimate std.dev
#>   <chr>                                <dbl> <chr>                 <dbl>   <dbl>
#> 1 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#> 2 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#> 3 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#> 4 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#> 5 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#> 6 Bayes factors for linear models       14.9 Bayesian R-squared    0.714  0.0503
#>   conf.level conf.low conf.high conf.method n.obs expression
#>        <dbl>    <dbl>     <dbl> <chr>       <int> <list>    
#> 1       0.95    0.574     0.788 HDI            32 <language>
#> 2       0.95    0.574     0.788 HDI            32 <language>
#> 3       0.95    0.574     0.788 HDI            32 <language>
#> 4       0.95    0.574     0.788 HDI            32 <language>
#> 5       0.95    0.574     0.788 HDI            32 <language>
#> 6       0.95    0.574     0.788 HDI            32 <language>
#> 
#> $pairwise_comparisons_data
#> # A tibble: 3 Γ— 9
#>   group1 group2 statistic   p.value alternative distribution p.adjust.method
#>   <chr>  <chr>      <dbl>     <dbl> <chr>       <chr>        <chr>          
#> 1 4      6          -6.67 0.00110   two.sided   q            Holm           
#> 2 4      8         -10.7  0.0000140 two.sided   q            Holm           
#> 3 6      8          -7.48 0.000257  two.sided   q            Holm           
#>   test         expression
#>   <chr>        <list>    
#> 1 Games-Howell <language>
#> 2 Games-Howell <language>
#> 3 Games-Howell <language>
#> 
#> $descriptive_data
#> NULL
#> 
#> $one_sample_data
#> NULL
#> 
#> $tidy_data
#> NULL
#> 
#> $glance_data
#> NULL

Note that all of this analysis is carried out by {statsExpressions} package: https://indrajeetpatil.github.io/statsExpressions/

Using {ggstatsplot} statistical details with custom plots

Sometimes you may not like the default plots produced by {ggstatsplot}. In such cases, you can use other custom plots (from {ggplot2} or other plotting packages) and still use {ggstatsplot} functions to display results from relevant statistical test.

For example, in the following chunk, we will create our own plot using {ggplot2} package, and use {ggstatsplot} function for extracting expression:

## loading the needed libraries
set.seed(123)
library(ggplot2)

## using `{ggstatsplot}` to get expression with statistical results
stats_results <- ggbetweenstats(morley, Expt, Speed) %>% extract_subtitle()

## creating a custom plot of our choosing
ggplot(morley, aes(x = as.factor(Expt), y = Speed)) +
  geom_boxplot() +
  labs(
    title = "Michelson-Morley experiments",
    subtitle = stats_results,
    x = "Speed of light",
    y = "Experiment number"
  )

Summary of benefits of using {ggstatsplot}

  • No need to use scores of packages for statistical analysis (e.g., one to get stats, one to get effect sizes, another to get Bayes Factors, and yet another to get pairwise comparisons, etc.).

  • Minimal amount of code needed for all functions (typically only data, x, and y), which minimizes chances of error and makes for tidy scripts.

  • Conveniently toggle between statistical approaches.

  • Truly makes your figures worth a thousand words.

  • No need to copy-paste results to the text editor (MS-Word, e.g.).

  • Disembodied figures stand on their own and are easy to evaluate for the reader.

  • More breathing room for theoretical discussion and other text.

  • No need to worry about updating figures and statistical details separately.

Misconceptions about {ggstatsplot}

This package is…

❌ an alternative to learning {ggplot2}
βœ… (The better you know {ggplot2}, the more you can modify the defaults to your liking.)

❌ meant to be used in talks/presentations
βœ… (Default plots can be too complicated for effectively communicating results in time-constrained presentation settings, e.g.Β conference talks.)

❌ the only game in town
βœ… (GUI software alternatives: JASP and jamovi).

Extensions

In case you use the GUI software jamovi, you can install a module called jjstatsplot, which is a wrapper around {ggstatsplot}.

Contributing

I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the GitHub issues system over trying to reach out to me in other ways (personal e-mail, Twitter, etc.). Pull Requests for contributions are encouraged.

Here are some simple ways in which you can contribute (in the increasing order of commitment):

  • Read and correct any inconsistencies in the documentation
  • Raise issues about bugs or wanted features
  • Review code
  • Add new functionality (in the form of new plotting functions or helpers for preparing subtitles)

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ggstatsplot's People

Contributors

antoinesoetewey avatar csoneson avatar danheck avatar dependabot[bot] avatar emilhvitfeldt avatar hbaniecki avatar ibecav avatar indrajeetpatil avatar mikemahoney218 avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggstatsplot's Issues

robust anova not working when `NA`s present in data

# data
ggplot2::msleep
#> # A tibble: 83 x 11
#>    name  genus vore  order conservation sleep_total sleep_rem sleep_cycle
#>    <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl>
#>  1 Chee~ Acin~ carni Carn~ lc                  12.1      NA        NA    
#>  2 Owl ~ Aotus omni  Prim~ <NA>                17         1.8      NA    
#>  3 Moun~ Aplo~ herbi Rode~ nt                  14.4       2.4      NA    
#>  4 Grea~ Blar~ omni  Sori~ lc                  14.9       2.3       0.133
#>  5 Cow   Bos   herbi Arti~ domesticated         4         0.7       0.667
#>  6 Thre~ Brad~ herbi Pilo~ <NA>                14.4       2.2       0.767
#>  7 Nort~ Call~ carni Carn~ vu                   8.7       1.4       0.383
#>  8 Vesp~ Calo~ <NA>  Rode~ <NA>                 7        NA        NA    
#>  9 Dog   Canis carni Carn~ domesticated        10.1       2.9       0.333
#> 10 Roe ~ Capr~ herbi Arti~ lc                   3        NA        NA    
#> # ... with 73 more rows, and 3 more variables: awake <dbl>, brainwt <dbl>,
#> #   bodywt <dbl>

# with `WRS2` works
WRS2::t1way(formula = sleep_rem ~ vore, 
            data = ggplot2::msleep)
#> Call:
#> WRS2::t1way(formula = sleep_rem ~ vore, data = ggplot2::msleep)
#> 
#> Test statistic: F = 2.7569 
#> Degrees of freedom 1: 3 
#> Degrees of freedom 2: 9.37 
#> p-value: 0.10159 
#> 
#> Explanatory measure of effect size: 0.78

# when bootstrapping, it doesn't work
subtitle_ggbetween_rob_anova(
  data = ggplot2::msleep,
  x = vore,
  y = sleep_rem
)
#> Error in subtitle_ggbetween_rob_anova(data = ggplot2::msleep, x = vore, : could not find function "subtitle_ggbetween_rob_anova"

Created on 2018-09-25 by the reprex package (v0.2.1)

possible issue with devel version of broom.mixed?

Testing devel version of ggstatsplot with devel version of broom.mixed (0.2.3, on GitHub, about to go to CRAN ...) I get

Quitting from lines 556-575 (ggcoefstats.Rmd) 
Error: processing vignette 'ggcoefstats.Rmd' failed with diagnostics:
replacement has 1 row, data has 0
Execution halted

As far as I have been able to dig in this seems to come from inside tidy.clm(), which is not part of broom.mixed ... can you double-check on your end please?

 1: plotlist %>% purrr::map(.x = ., .f = ~ggstatsplot::ggcoefstats(x = ordinal:
 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 3: eval(quote(`_fseq`(`_lhs`)), env, env)
 4: eval(quote(`_fseq`(`_lhs`)), env, env)
 5: `_fseq`(`_lhs`)
 6: freduce(value, `_function_list`)
 7: withVisible(function_list[[k]](value))
 8: function_list[[k]](value)
 9: purrr::map(.x = ., .f = ~ggstatsplot::ggcoefstats(x = ordinal::clm(formula 
10: .f(.x[[i]], ...)
11: ggstatsplot::ggcoefstats(x = ordinal::clm(formula = as.factor(rating) ~ bel
12: broom::tidy(x = x, conf.int = TRUE, conf.level = conf.level, quick = FALSE,
13: tidy.clm(x = x, conf.int = TRUE, conf.level = conf.level, quick = FALSE, co
14: process_clm(ret, x, conf.int = conf.int, conf.level = conf.level, exponenti
15: `[<-`(`*tmp*`, ret$term %in% names(x$zeta), "coefficient_type", value = "ze
16: `[<-.data.frame`(`*tmp*`, ret$term %in% names(x$zeta), "coefficient_type", 

bug in `grouped_` variants of functions?

As the README mentions, all functions in ggstatsplot are supposed to work irrespective of whether you enter a character (x = "x") or or a bare expression (x = x). But this doesn't seem to be working for grouped_ variants of functions for the grouping.var argument?

Definintely something is not right with the way I've implemented this using rlang, my Achilles heel.

@ibecav You wanna take a look at this?

library(ggstatsplot)

# works
ggstatsplot::grouped_ggbetweenstats(
  data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.25) %>%
    dplyr::filter(.data = ., mpaa %in% c("R", "PG-13"), genre %in% c("Drama", "Comedy")),
  x = genre,
  y = rating,
  grouping.var = mpaa
)

# doesn't work
ggstatsplot::grouped_ggbetweenstats(
  data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.25) %>%
    dplyr::filter(.data = ., mpaa %in% c("R", "PG-13"), genre %in% c("Drama", "Comedy")),
  x = genre,
  y = rating,
  grouping.var = "mpaa"
)
#> Error in arrange_impl(.data, dots): incorrect size (1) at position 1, expecting : 341

12.
stop(structure(list(message = "incorrect size (1) at position 1, expecting : 341", 
    call = arrange_impl(.data, dots), cppstack = structure(list(
        file = "", line = -1L, stack = "C++ stack not available on this system"), class = "Rcpp_stack_trace")), class = c("Rcpp::exception", 
"C++Error", "error", "condition"))) 
11.
arrange_impl(.data, dots) 
10.
arrange.tbl_df(.data = ., !!rlang::enquo(grouping.var)) 
9.
dplyr::arrange(.data = ., !!rlang::enquo(grouping.var)) 
8.
function_list[[i]](value) 
7.
freduce(value, `_function_list`) 
6.
`_fseq`(`_lhs`) 
5.
eval(quote(`_fseq`(`_lhs`)), env, env) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2.
df %<>% dplyr::mutate_if(.tbl = ., .predicate = purrr::is_bare_character, 
    .funs = ~as.factor(.)) %>% dplyr::mutate_if(.tbl = ., .predicate = is.factor, 
    .funs = ~base::droplevels(.)) %>% dplyr::filter(.data = ., 
    !is.na(!!rlang::enquo(grouping.var))) %>% dplyr::arrange(.data = .,  ... at grouped_ggbetweenstats.R#132
1.
ggstatsplot::grouped_ggbetweenstats(data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, 
    size = 0.25) %>% dplyr::filter(.data = ., mpaa %in% c("R", 
    "PG-13"), genre %in% c("Drama", "Comedy")), x = genre, y = rating, 
    grouping.var = "mpaa") 

Created on 2018-11-13 by the reprex package (v0.2.1)

goals for 0.0.7

(Goal for release date: last week of December)

To do:

  • Add groupedstats as dependencies and import shared functions from there
  • Go full rlang rather than using short-cuts
  • Refactor code to remove stats::na.omit(). Take a more fine-grained approach to remove NAs only from columns of interest.
  • Add results.subtitle argument to all functions
  • Get ggcoefstats to work with dataframe arguments
  • Showing both 50% and 95% CIs for ggcoefstats (like in Bayesian inference plots: e.g., https://twitter.com/tjmahr/status/1048226472710873089)
  • Add many more tests and get the code coverage to at least 50%
    (currently at 14%: https://github.com/IndrajeetPatil/ggstatsplot/tree/master/tests)
  • Check font size for theme_ggstatsplot function; give user arguments option to change all aspects of the theme?
  • Clean up Rmd using gramr package
  • Change k = 2 for all functions to follow APA guidelines
  • Add Bayes Factors to ggscatterstats, ggpiestats, and ggbetweenstats (anova designs)
  • When there are many levels in a factor, ggpiestats labels can overlap; give the option to have the labels to be either "internal" (current default) or "external" to the slices
  • Add group option for ggscatterstats to support grouped marginals (https://github.com/daattali/ggExtra/blob/master/inst/vignette_files/ggExtra_files/figure-markdown_strict/ggmarginal-grouping-1.png)
  • Add ggplot.function argument to grouped_ variants to make modifications with ggplot2 functions to customize the plot
  • Add pairwise comparisons support for ggbetweenstats
  • Add new function ggdotplotstats for dot plots/charts
  • Change 95% CI to have 95% as a subscript

When marginal=TRUE on ggscatterstats, graphs does display when you run the chunk within an R notebook

This issue appears to be specific to running a chunk within an R notebook.

When I run ggscatterstats within a chunk in an R notebook with marginal=TRUE, I get the error "Warning: This function doesn't return ggplot2 object and is not further modifiable with ggplot2 commands."

Here is my code:

library(ggstatsplot)
ggscatterstats(cars,speed,dist,marginal=TRUE)

I should note I also tried with messages=FALSE. That suppressed the message, but did not render the plot.

And for clarity, here is a screen shot of it not working with marginal, and working without marginal.

image

I love those histograms, so I hope there is a way to do this in notebooks!

Package Installation without .rmd files

Hi - my employer blocks .rmd files and this unfortunately prevents me from being able to install the package; I was wondering if there was a way to install it without these - not sure how this would work, but figured I would ask. Thanks for your time.

this is a snippet of the error: /README.Rmd': Permission denied

Missing images in the documentation?

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Hi Indrajeet Patil,

it seems as if the images of the documentation here oh GitHub went missing - they won't load (at least using firefox).

Best,
Jonas

# insert reprex here

Warn linux users of required linux packages for ggstatsplot installation

Dear Indrajeet Patil,

Thank you so much for your hard work on this package!

I'm using archlinux, and was getting errors when trying to install this package, both from cran or directly from github.
Then I found out Linux users need to have OpenGL libraries installed in order to install this package (namely libx11, mesa and Mesa OpenGL Utility library - glu), given that its dependency 'rgl' requires it.
Could you warn linux users about this in the installation instructions in the Readme?

Best regards,
JoΓ£o

Preparing for 0.0.6 release

To do:

  • Check font size for theme_ggstatsplot function and if you wanna change it
  • Clean up Rmd using gramr package
  • Change merMod tidiers as soon as broom.mixed is on CRAN.
  • Attempt to reduce size of vignettes to prevent R CMD CRAN from producing a NOTE.
  • Add tests for all the new functions being exported to prepare text subtitles for results.
  • Change how the bf.message is currently implemented.
  • Add warning message to every grouped_ function that they can't be further modified.

ggpiestats not recycling colors

trying to use ggpiestats on a variable that has more than 8 levels. The first 8 levels are working fine, anything above 8 does not get a color assigned. Perhaps that's by design but seems dysfunctional. Could it at least recycle colors so the extras are not white? Or am I doing something wrong?

Very, very very simple reprex below

library(ggstatsplot)
mainexample <- rep(c("one","two","three","four","five","six","seven","eight","nine","ten"),5)
mainexample<-as.data.frame(mainexample)
# This shows the problem
ggpiestats(data = mainexample, main = "mainexample")

# I can of course manually force a different palette
ggpiestats(data = mainexample, main = "mainexample", palette = "Set3")

Created on 2018-10-24 by the reprex package (v0.2.1)

Shapiro-Wilk test

When I try to plot between-group statistics with ggbetweenstats, it fails for samples larger than 5000 because of the Shapiro-Wilk test:

dat <- data.frame(x = c(rep(1,2500),rep(2,2501)), y = rnorm(5001))
ggbetweenstats(data = dat, x = x, y = y)
Warning:  aesthetic `x` was not a factor; converting it to factor
Reference:  Welch's t-test is used as a default. (Delacre, Lakens, & Leys, International Review of Social Psychology, 2017).
Error in stats::shapiro.test(data$y) :
  sample size must be between 3 and 5000

This could be fixed easily by using ks.test(y, "pnorm") instead.

standardizing regression coefficients

@ibecav Opening a new issue to discuss how to standardize regression coefficients.

The existing functions that do this:

The latter is the most general in the sense that it can work with any model object.

I think we should do something similar to by_2sd: write a stand alone function (maybe with S3-methods) that takes regression model objects and outputs a tidy dataframe with standardized estimates and their confidene intervals (using broom::tidy/broom.mixed::tidy() in the backend).

I still think this issues should be given a low priority (because there is still the option of rescaling variables that can alleviate resolution issues) compared to writing tests. I am a bit bummed that all results from ggcoefstats function with merMod objects are showing incorrect results on CRAN vignettes due to the broom.mixed bug (bbolker/broom.mixed#30). If there were tests, this would have been caught immediately.

Problem installing ggstatplot - NON ZERO EXIT status

Hi,

As prescribed,
tried installing ggstatsplot from CRAN
but get the error message (below).
Same error
if I try to install from GITHUB ("If you have time" install code)

ERROR messages at the end of install...

  • installing source package β€˜rgl’ ...
    ** package β€˜rgl’ successfully unpacked and MD5 sums checked
    checking for gcc... gcc -std=gnu99
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc -std=gnu99 accepts -g... yes
    checking for gcc -std=gnu99 option to accept ISO C89... none needed
    checking how to run the C preprocessor... gcc -std=gnu99 -E
    checking for gcc... (cached) gcc -std=gnu99
    checking whether we are using the GNU C compiler... (cached) yes
    checking whether gcc -std=gnu99 accepts -g... (cached) yes
    checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed
    checking for libpng-config... yes
    configure: using libpng-config
    configure: using libpng dynamic linkage
    checking for X... libraries , headers
    checking GL/gl.h usability... no
    checking GL/gl.h presence... no
    checking for GL/gl.h... no
    checking GL/glu.h usability... no
    checking GL/glu.h presence... no
    checking for GL/glu.h... no
    configure: error: missing required header GL/gl.h
    ERROR: configuration failed for package β€˜rgl’
  • removing β€˜/home/ray/R/i686-pc-linux-gnu-library/3.5/rgl’
    Error in i.p(...) :
    (converted from warning) installation of package β€˜rgl’ had non-zero exit status

And so, ggstatsplot is not installed...
Help!
SFd99
San Francisco

Using latest Rstudio, w/R 351, Ubuntu Linux 14.04 32-bits.

Loading additional palettes doesn't seem to work

I'm trying to load additional palettes in to help me plot a variable with more than 8 categories. However, when I do that, I get the following error:

Error in ggstatsplot::ggbetweenstats(data = sums_inc, x = AnnualIncome, :
unused arguments (ggstatsplot.layer = FALSE, package = "wesanderson")

Any advice/insights on what to do?

ggstatsplot::ggbetweenstats(
data = sums_inc,
x = AnnualIncome,
y = negative_affect.z,
mean.plotting = F,
mean.label.size = 3.5,
k = 2,
xlab = "Annual Income",
ylab = "Negative Affect",
title = "Income and Negative Affect",
plot.type = "boxviolin",
type = F,
ggtheme = ggthemes::theme_fivethirtyeight(),
messages = FALSE,
ggstatsplot.layer = FALSE,
package = "wesanderson",
palette = "Darjeeling1")

Factor Re-leveling supported?

Hi - thank you so much for this package, it's my absolute favorite thing. I'm not sure if this is an issue so I apologize if I'm posting in the wrong place, but I was wondering if there's a way to re-order levels so that each grouping would be sorted in the same order; see below - is it possible to re-sort the 'Sepal.Width' grouping in ascending order so it looks like the other groups or is each group sorted independently by design? Thank you!

image

melt <- melt(iris)

grouped_ggbetweenstats(melt, Species, value, grouping.var = variable, messages = F )

Bug in gghistostats

Hi,

Love this package. Forked to learn and study more. Encountered a bug in gghistostats that generates:

Error in as.list.environment(x, all.names = TRUE) :
object 'len' not found

For what should be identical behavior between non dataframe format. I tried on both the CRAN and GitHub versions of the code. I'll put some sleuthing into it but I only just started looking at your code which is quite complex.

Thank you. Chuck

# insert reprex here
library(ggstatsplot)
# Minimum reproducible example
gghistostats(
  x = ToothGrowth$len,
  xlab = "Tooth length",
  bar.measure = "mix"
)
gghistostats(
  data = ToothGrowth,
  x = len, 
  xlab = "Tooth length",
  bar.measure = "mix"
)
gghistostats(
  data = ToothGrowth,
  x = len, 
  xlab = "Tooth length"
)

color control

Terrific package.
I'm using ggstatsplot::ggbetweenstats and couldn't find any function that might control for the fill color of the points that are created in the violin plot that is produced.
Similar to how you have xfill and yfill arguments for the ggscatterstats script, is there an option within ggbetweenstats that could function where you pass in a vector of colors associated with each group?
I'd like to use a 2 hue, 2 color system where light/dark represents one variable, and the other color (say yellow/purple) represents a different variable.
In your between stats example combining plots, I can get half way there because the same scheme is used for each subplot, but I don't want to split my 4 groups apart.

Here's the plot I have so far:

image

What I'd like is to have something that allows me to substitute those four colors with the four hex-code specified colors I want, which would ultimately produce something where the first two groups are yellow (light yellow, dark yellow) and the last two groups are purple (light purple, dark purple).

Apologies if this already exists and I can't find it in the function descriptions!

Thanks

user question

require("foreign")
#> Loading required package: foreign
library(foreign)
## import and descriptives
aggression <-
  read.spss(
    "http://www.people.fas.harvard.edu/~mair/datasets/aggression.sav",
    to.data.frame = TRUE,
    use.value.labels = FALSE
  )
#> re-encoding from CP1252
colnames(aggression) <-
  c("car", "sex", "age", "frequency", "duration", "honk")
aggression[, 1] <-
  factor(aggression[, 1], labels = c("BMW", "Ford KA"))
aggression[, 2] <-
  factor(aggression[, 2], labels = c("male", "female"))
head(aggression)
#>   car  sex age frequency duration honk
#> 1 BMW male   7         5        2    8
#> 2 BMW male   8         1        4    1
#> 3 BMW male  NA         3        1    4
#> 4 BMW male   6         1        7    1
#> 5 BMW male   6         0       NA    0
#> 6 BMW male   6         1        9    3
dim(aggression)
#> [1] 127   6

## DV: honking duration; Factor: car (BMW vs. Ford)
# hist(aggression$duration)

using ggstatsplot:

ggstatsplot::ggbetweenstats(
  data = aggression,
  x = car,
  y = duration,
  messages = FALSE
)

ggstatsplot::ggbetweenstats(
  data = na.omit(aggression),
  x = car,
  y = duration,
  messages = FALSE
) 

Created on 2018-09-24 by the reprex package (v0.2.1)

changing `k`/`digits` argument doesn't make any difference to `ggcorrmat` plot correlations

This should display correlations with 3 digits after the decimal point.

# for reproducibility
set.seed(123)

# as a default this function outputs a correlalogram plot
ggstatsplot::ggcorrmat(
  data = ggplot2::msleep,
  corr.method = "robust",                    # correlation method
  sig.level = 0.001,                         # threshold of significance
  p.adjust.method = "holm",                  # p-value adjustment method for multiple comparisons
  cor.vars = c(sleep_rem, awake:bodywt),     # a range of variables can be selected  
  cor.vars.names = c("REM sleep",            # variable names
                     "time awake", 
                     "brain weight", 
                     "body weight"), 
  matrix.type = "upper",                     # type of visualization matrix
  digits = 3,                                # no. of digits after decimal point
  colors = c("#B2182B", "white", "#4D4D4D"), 
  title = "Correlalogram for mammals sleep dataset",
  subtitle = "sleep units: hours; weight units: kilograms"
)

Created on 2018-11-21 by the reprex package (v0.2.1)

Treatment of NAs in gghistostats with regard to central tendency measures

In gghistostats, If you give a vector with NAs, it will still plot the histogram, but it won't draw the mean/median. Might be worth making consistent what the function will accept to plot and what it will display central tendency for.

(As noted on twitter, "The na.rm = TRUE arguments are indeed missing for the geom_line() function in gghistostats and that's why it's behaving this way.")

Bug in `ggbetweenstats` `var.equal`

Just formalizing a bug I noted earlier. I tested a lot of permutations and it appears that it is var.equal that breaks the tibble that is output of mean differences. As a side product it actually makers the resultant plots inaccurate as well.

library(ggstatsplot)
# works
ggstatsplot::ggbetweenstats(
  data = movies_long,
  x = mpaa,
  y = rating,
  pairwise.comparisons = TRUE
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> # tibble [3 Γ— 11]
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 R      PG-13           -0.219   -0.375    -0.064 0.047   3.31  1142.
#> 2 R      PG              -0.323   -0.573    -0.074 0.075   3.05   277.
#> 3 PG-13  PG              -0.104   -0.362     0.154 0.077   0.952  309.
#> # ... with 3 more variables: p.value <dbl>, significance <chr>,
#> #   p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = 0.004
#> 

# gives incorrect effect size tibble
ggstatsplot::ggbetweenstats(
  data = movies_long,
  x = mpaa,
  y = rating,
  pairwise.comparisons = TRUE,
  var.equal = TRUE
)
#> Note: 95% CI for effect size estimate was computed with 100 bootstrap samples.
#> 
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> # tibble [5 Γ— 8]
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13               0.104  -0.140      0.348 NA       <NA>        
#> 2 R      PG               0.323   0.0944     0.552  0.00283 **          
#> 3 R      PG               0.219   0.0570     0.381  0.00283 **          
#> 4 PG-13  PG              NA      NA         NA      0.316   ns          
#> 5 R      PG-13           NA      NA         NA      0.00310 **          
#> # ... with 1 more variable: p.value.label <chr>
#> Note: Shapiro-Wilk Normality Test for rating : p-value = < 0.001
#> 
#> Note: Bartlett's test for homogeneity of variances for factor mpaa : p-value = 0.004
#> 

Created on 2018-12-14 by the reprex package (v0.2.1)

making `ggscatterstats` consistent with the overall API

Since #38 was solved, ggscatterstats arguments label.var and label.expression work only with characters but not bare expressions. This is incosistent with the API principles for the rest of the functions in the package. It will be nice if this function's arguments behaved the same way as other functions and accepted both bare expressions and characters.

# setup
set.seed(123)
library(ggstatsplot)
library(ggplot2)

# works
ggscatterstats(
  msleep,
  brainwt,
  sleep_total,
  marginal = FALSE,
  label.var = "genus",
  label.expression = "brainwt > 3"
)

# doesn't work
ggscatterstats(
  msleep,
  brainwt,
  sleep_total,
  marginal = FALSE,
  label.var = genus,
  label.expression = brainwt > 3
)
#> Error in parse_exprs(x): object 'brainwt' not found

Created on 2018-12-14 by the reprex package (v0.2.1)

Adaptation to new `effsize` version

The novel version of effsize package breaks a test that was designed to adapt to a known bug.

The test output with version 0.7.4 (available on https://github.com/mtorchiano/effsize)

1. Failure: parametric t-test works (between-subjects without NAs) (@test_subtitle_t_parametric.R#63) 

This is a possible correction to the test case:

testthat::test_that(
  desc = "parametric t-test works (between-subjects without NAs)",
  code = {

    # ggstatsplot output
    set.seed(123)
    using_function1 <-
      suppressWarnings(
        ggstatsplot::subtitle_t_parametric(
          data = dplyr::filter(
            ggstatsplot::movies_long,
            genre == "Action" | genre == "Drama"
          ),
          x = genre,
          y = rating,
          effsize.type = "d",
          effsize.noncentral = TRUE,
          var.equal = TRUE,
          conf.level = .99,
          k = 5,
          messages = FALSE
        )
      )

    # expected output
    # this test will have to be changed with the next release of `effsize`
    # d here should be negative but is displayed as positive
    # this is a bug in effsize and has been fixed in the development version
    # (https://github.com/mtorchiano/effsize/commit/3561d93f9e9f5a61b3460ba120b316f7e4c3352f)
    set.seed(123)
    results1 <-
      ggplot2::expr(
        paste(
          italic("t"),
          "(",
          "1317.00000",
          ") = ",
          "-9.46816",
          ", ",
          italic("p"),
          " = ",
          "< 0.001",
          ", ",
          italic("d"),
          " = ",
          #"0.51775",
          "-0.56364",  ## FIX
          ", CI"["99%"],
          " [",
          #"0.36213",
          "-0.71947",  ## FIX
          ", ",
          #"0.67319",
          "-0.40762",  ## FIX
          "]",
          ", ",
          italic("n"),
          " = ",
          1319L
        )
      )

    
    # testing overall call
    testthat::expect_equal(using_function1, results1)
  }
)

displaying outlier labels properly in case `plot.type = "violin"`

Since geom_violin(), like geom_boxplot(), doesn't have outlier point highlighting, figure out a way to better display the outliers.

set.seed(123)

# plot
ggstatsplot::ggbetweenstats(
  data = ToothGrowth,
  x = supp,
  y = len,
  plot.type = "violin",
  messages = FALSE,
  results.subtitle = FALSE,
  outlier.tagging = TRUE,
  outlier.coef = 0.75,
  mean.plotting = FALSE,
  sample.size.label = FALSE
)

Created on 2018-12-11 by the reprex package (v0.2.1)

`ggcoefstats` displays incorrect labels for anova + partial omega-squared combo

All p-values are below 0.05 but confidence intervals contain 0. This traces back to sjstats::omega_sq() (strengejacke/sjstats#51).

# for reprducibility
set.seed(123)
library(ggstatsplot)

# to speed up the calculation, let's use only 10% of the data
movies_10 <-
  dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)

# `aov` object
stats.object <- stats::aov(formula = rating ~ mpaa * genre,
                           data = movies_10)

# plot
ggstatsplot::ggcoefstats(x = stats.object, effsize = "omega")

Created on 2018-11-17 by the reprex package (v0.2.1)

Using ggcoefstats to display a tbl_df containing the output of a brms model

Hi,

I tried displaying the output of a Bayesian regression model using ggcoefstats but I just can't manage to get it to work. I don't have reproducible code as it basically involves one function (ggcoefstats()) and one line of code, but here's my workflow, described in words:

  1. After fitting my model using brms() and saving the output as an .RDS file, I load the .RDS file and convert it to a tidied tibble using broom.mixed::tidy(), which supports brms;

  2. I then try displaying the tibble using ggcoefstats, but I get an error without any description. To be more precise, I'm trying ggstatsplot::ggcoefstats(x = RT_model_tidy), without any test statistic specification. Note that the tibble has all the appropriate columns, as required (i.e., term and estimate).

Thanks in advance for any tips!

bug in pairwise_p for Student's t test comparisons

set.seed(123)
library(ggstatsplot)

# works properly (with the defaults)
pairwise_p(movies_wide,
           mpaa,
           rating,
           var.equal = FALSE)
#> Note: The parametric pairwise multiple comparisons test used-
#>  Games-Howell test.
#>  Adjustment method for p-values: holm
#> 
#> # A tibble: 3 x 11
#>   group1 group2 mean.difference conf.low conf.high    se t.value    df
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl> <dbl>   <dbl> <dbl>
#> 1 PG-13  R                0.219    0.064     0.375 0.047   3.31  1142.
#> 2 PG-13  PG              -0.104   -0.362     0.154 0.077   0.952  309.
#> 3 R      PG              -0.323   -0.573    -0.074 0.075   3.05   277.
#> # ... with 3 more variables: p.value <dbl>, significance <chr>,
#> #   p.value.label <chr>

# doesn't work properly
pairwise_p(movies_wide,
           mpaa,
           rating,
           var.equal = TRUE)
#> Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [1, 3].
#> Note: The parametric pairwise multiple comparisons test used-
#>  Student's t-test.
#>  Adjustment method for p-values: holm
#> 
#> # A tibble: 5 x 8
#>   group1 group2 mean.difference conf.low conf.high  p.value significance
#>   <chr>  <chr>            <dbl>    <dbl>     <dbl>    <dbl> <chr>       
#> 1 PG     13               0.104  -0.140      0.348 NA       <NA>        
#> 2 R      PG               0.323   0.0944     0.552  0.00283 **          
#> 3 R      PG               0.219   0.0570     0.381  0.00283 **          
#> 4 PG-13  PG              NA      NA         NA      0.316   ns          
#> 5 R      PG-13           NA      NA         NA      0.00310 **          
#> # ... with 1 more variable: p.value.label <chr>

Created on 2018-12-14 by the reprex package (v0.2.1)

This issue is specific to dataframes where x factor levels have a - in their name, which messes with the following code (esp. L331-335):

if (isTRUE(var.equal) || isTRUE(paired)) {
df <-
dplyr::full_join(
# mean difference and its confidence intervals
x = stats::aov(formula = y ~ x, data = data) %>%
stats::TukeyHSD(x = .) %>%
broom::tidy(x = .) %>%
dplyr::select(
.data = .,
comparison, estimate, conf.low, conf.high
) %>%
tidyr::separate(
data = .,
col = comparison,
into = c("group1", "group2"),
sep = "-"
) %>%
dplyr::rename(.data = ., mean.difference = estimate),
y = broom::tidy(
stats::pairwise.t.test(
x = data$y,
g = data$x,
p.adjust.method = p.adjust.method,
paired = paired,
alternative = "two.sided",
na.action = na.omit
)
) %>%
ggstatsplot::signif_column(data = ., p = p.value),
by = c("group1", "group2")
)

Is it possible to change color palette for ggscatterstats?

I notice that custom color palette can be passed to the plot by argument package=xxx, palette=xxx .

It works in most function of ggstatsplot, but seems that it is impossible to pass that argument to ggscatterstats.

Here is the error message:

Error in ggscatterstats(., , : unused argument (package = "ggsci")

Bug in lm_effsize_ci

There's a bug in lm_effsize_ci that is limited to only the case where your input object is of type anova and you specify partial = FALSE note that partial = TRUE works

Reprex

library(ggstatsplot)
# works as it should
ggstatsplot:::lm_effsize_ci((lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.472       0.772
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0666      0.357
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00395     0.123
# works as it should
ggstatsplot:::lm_effsize_ci((aov(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <dbl> <dbl>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.475       0.763
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0734      0.363
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00767     0.118
# fails even though anova and aov are just different aspect of the same analysis
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.
# fails for lm as well
ggstatsplot:::lm_effsize_ci(anova(lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable

#> Warning in anova.lm(model): ANOVA F-tests on an essentially perfect fit are
#> unreliable
#> Error in mutate_impl(.data, dots): Evaluation error: Result 2 is not a length 1 atomic vector.
# but if I remove partial = FALSE it works for the next four
ggstatsplot:::lm_effsize_ci((lm(mpg ~ hp * wt, mtcars)), effsize = "eta", partial = FALSE)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value  etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>  <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12 0.602   0.473       0.755
#> 2 wt       54.5     1    28 4.86e- 8 0.224   0.0793      0.365
#> 3 hp:wt    14.1     1    28 8.11e- 4 0.0580 -0.00410     0.115
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta")
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.699      0.892
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.414      0.773
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0729     0.539
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", conf.level = .99)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.638      0.906
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.322      0.801
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0236     0.593
ggstatsplot:::lm_effsize_ci(anova(aov(mpg ~ hp * wt, mtcars)), effsize = "eta", conf.level = .99, nboot = 100)
#> # A tibble: 3 x 8
#>   term  F.value   df1   df2  p.value partial.etasq conf.low conf.high
#>   <chr>   <dbl> <int> <int>    <dbl>         <dbl>    <dbl>     <dbl>
#> 1 hp      146.      1    28 1.23e-12         0.839   0.638      0.906
#> 2 wt       54.5     1    28 4.86e- 8         0.661   0.322      0.801
#> 3 hp:wt    14.1     1    28 8.11e- 4         0.335   0.0236     0.593


<sup>Created on 2018-10-04 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup>

unable to further modify plots w/ ggplot2 syntax

Love the package, thanks I!

However, I'm not having any luck modifying a grouped_ggbetweenstats plot with subsequent + ggplot code, see the example below. I'm an intermediate coder at best so maybe missing something obvious? In any case it works with the ungrouped version but doesn't seem to pass the command to the two plots in the grouped version. Thanks for all your work on this!


Brief description of the problem

# generate some data
d1 <- data.frame(target=rep(c('boy','girl'),100),
                 rating=rnorm(100),
                 gender=rep(c('boy','boy','girl','girl'),25))

# these work as expected

ggbetweenstats(data=d1,
               x=target,y=rating) +
  scale_y_continuous(breaks=seq(-3,3,.5))

ggbetweenstats(data=d1,
               x=target,y=rating) +
  ggplot2::labs(subtitle = NULL)

# but don't do anything in the grouped_versions

grouped_ggbetweenstats(data=d1,
                       x=target,y=rating, grouping.var = gender,
                       title.prefix = 'Participant Gender') + ggplot2::labs(subtitle = NULL)

grouped_ggbetweenstats(data=d1,
                       x=target,y=rating, grouping.var = gender,
                       title.prefix = 'Participant Gender') +
  scale_y_continuous(breaks=seq(-3,3,.5))

Feature request ggpiestats

Like to suggest we add to ggpiestats something similar to test.k for gghistostats that is a way to change the number of decimal places show for the labels on the pie slices. Right not it just displays a rounded integer.

adding support for `gamlss` class objects in `ggcoefstats`

library(gamlss)
#> Loading required package: splines
#> Loading required package: gamlss.data
#> Loading required package: gamlss.dist
#> Loading required package: MASS
#> Loading required package: nlme
#> Loading required package: parallel
#>  **********   GAMLSS Version 5.1-2  **********
#> For more on GAMLSS look at http://www.gamlss.org/
#> Type gamlssNews() to see new features/changes/bug fixes.
library(tidyverse)
library(ggstatsplot)

g <- gamlss(
  y ~ pb(x),
  sigma.fo = ~ pb(x),
  family = BCT,
  data = abdom,
  method = mixed(1, 20)
)
#> GAMLSS-RS iteration 1: Global Deviance = 4771.925 
#> GAMLSS-CG iteration 1: Global Deviance = 4771.013 
#> GAMLSS-CG iteration 2: Global Deviance = 4770.994 
#> GAMLSS-CG iteration 3: Global Deviance = 4770.994

broom::tidy(g, conf.int = TRUE)
#>   parameter        term     estimate   std.error   statistic       p.value
#> 1        mu (Intercept) -64.44299460 1.328921129 -48.4927158 1.889994e-210
#> 2        mu       pb(x)  10.69463541 0.057769202 185.1269371  0.000000e+00
#> 3     sigma (Intercept)  -2.65041283 0.108045909 -24.5304321  8.093605e-93
#> 4     sigma       pb(x)  -0.01002512 0.003784911  -2.6487067  8.290567e-03
#> 5        nu (Intercept)  -0.10715726 0.557434072  -0.1922331  8.476237e-01
#> 6       tau (Intercept)   2.49483399 0.301271895   8.2810047  7.765827e-16
confint(g)
#> Warning in vcov.gamlss(object, robust = robust): Additive terms exists in the  mu formula. 
#>   Standard errors for the linear terms maybe are not appropriate
#> Warning in vcov.gamlss(object, robust = robust): Additive terms exists in the  sigma formula. 
#>   Standard errors for the linear terms maybe are not appropriate
#>                 2.5 %    97.5 %
#> (Intercept) -67.05752 -61.82847
#> pb(x)        10.58121  10.80806

ggcoefstats(x = g)
#> Note: No 95% confidence intervals available for regression coefficients from gamlss object, so skipping whiskers in the plot.
#> 
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

Created on 2018-10-07 by the reprex package (v0.2.1)

Possible bug in ggscatterstats

label.expression works for single conditions but seems to fail silently for a joint condition. See reprex (sorry for some reason on my machine it's only rendering the final plot but I assure you the first two are working.

library(ggstatsplot)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# Remove non unique movies for clarity
# Test "length < 60" it works
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "length < 60", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 
# Remove non unique movies for clarity
# Test "budget > 150" it works
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "budget > 150", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 
# Remove non unique movies for clarity
# Try both and it silently drops the labels
ggscatterstats(
  data = distinct(movies_long, title, year, .keep_all = TRUE),
  x = length,
  y = budget, 
  label.expression = "budget > 150 & length < 60", 
  label.var = "title"
)
#> Warning: This plot can't be further modified with `ggplot2` functions.
#> In case you want a `ggplot` object, set `marginal = FALSE`.
#> 

Created on 2018-12-05 by the reprex package (v0.2.1)

not shortening names for `ggcorrmat` output when confidence intervals are returned

This stems from the fact that psych::corr.test function itself produces shortened names when confidence intervals are needed, even if minlength argument is changed.

# for reproducibility
set.seed(123)

# creating the object
res_df <- psych::corr.test(
    x = dplyr::select(iris, -Species),
    y = NULL,
    use = "pairwise",
    alpha = .05,
    ci = TRUE,
    minlength = 20
  )

# checking confidence intervals
res_df$ci
#>                  lower          r       upper            p
#> Spl.L-Spl.W -0.2726932 -0.1175698  0.04351158 1.518983e-01
#> Spl.L-Ptl.L  0.8270363  0.8717538  0.90550805 1.038667e-47
#> Spl.L-Ptl.W  0.7568971  0.8179411  0.86483606 2.325498e-37
#> Spl.W-Ptl.L -0.5508771 -0.4284401 -0.28794993 4.513314e-08
#> Spl.W-Ptl.W -0.4972130 -0.3661259 -0.21869663 4.073229e-06
#> Ptl.L-Ptl.W  0.9490525  0.9628654  0.97298532 4.675004e-86

Created on 2018-10-31 by the reprex package (v0.2.1)

why is this not working?

# needed libraries
library(ggstatsplot)
library(tidyverse)

# data
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

# looking at the structure
str(df)
#> 'data.frame':    100 obs. of  2 variables:
#>  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ y: num  22.57 16.74 8.33 19.36 120.54 ...
colnames(df)
#> [1] "x" "y"

# adding results from correlation
ggscatterstats(df, x, y)
#> Error in filter_impl(.data, quo): Evaluation error: object 'x' not found.

Created on 2018-09-12 by the reprex package (v0.2.0.9000).

Feature request for new function ggbarstats

So I find piecharts very appealing for looking at univariate situations, although there are many who dislike them even when you add labels as you have. But as soon as you move to bivariate cases especially when one of the variables has multiple factor levels, I (and my students more so) have a hard time interpreting multiple pie charts depicting the relationship between two variables.

So looking at the usual Titanic example currently I can imagine a function that is very similiar in nature but shifts to percentage bars with labels.

You'll notice all I have really done is take hunks of your current code and change the call to ggplot in one small way...

library(ggstatsplot)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
ggpiestats(Titanic_full, main = Survived, condition = Class, title = "Test")
#> Note: Results from faceted one-sample proportion tests:
#> 
#> # A tibble: 4 x 7
#>   condition No     Yes    `Chi-squared`    df `p-value` significance
#>   <fct>     <chr>  <chr>          <dbl> <dbl>     <dbl> <chr>       
#> 1 1st       37.54% 62.46%         20.2      1     0     ***         
#> 2 2nd       58.60% 41.40%          8.43     1     0.004 **          
#> 3 3rd       74.79% 25.21%        174.       1     0     ***         
#> 4 Crew      76.05% 23.95%        240.       1     0     ***         
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#> 

# this is the crucial piece of ggpiestats defunctionalized
Titanic_full %>%
  group_by(Class,Survived) %>%
  summarize(counts = n()) %>% 
  mutate(perc = (counts / sum(counts)))-> tempdf
# only real changes are geom bar and percent y axis
ggplot(tempdf, aes(fill=Survived, y=perc, x=Class)) +
  geom_bar(stat="identity", position="fill") +
  ylab("Percent") +
  scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.10)) +
  geom_label(aes(label = paste0(round(x = perc*100, digits = 1), "%")), show.legend = FALSE, position = position_fill(vjust = 0.5)) +
  ggtitle("test", subtitle = subtitle_contigency_tab(Titanic_full,Class,Survived))
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#> 

Created on 2018-10-26 by the reprex package (v0.2.1)
Thanks for considering

purrr::pmap not working with ggscatterstats when expression is used

The expression is not evaluated properly and so there are 0 rows in label_data and geom_label_repel fails.

library(tidyverse)

# for reproducibility
set.seed(123)

# let's split the dataframe and create a list by mpaa rating
mpaa_list <- ggstatsplot::movies_wide %>%
  base::split(x = ., f = .$mpaa, drop = TRUE)

# this created a list with 4 elements, one for each mpaa rating
# you can check the structure of the file for yourself
# str(mpaa_list)

# checking the length and names of each element
length(mpaa_list)
#> [1] 4
names(mpaa_list)
#> [1] "NC-17" "PG"    "PG-13" "R"

# running function on every element of this list note that if you want the same
# value for a given argument across all elements of the list, you need to
# specify it just once
plot_list1 <- purrr::pmap(
  .l = list(
    data = mpaa_list,
    x = "budget",
    y = "rating",
    xlab = "Budget (in millions of US dollars)",
    ylab = "Rating on IMDB",
    title = list(
      "MPAA Rating: NC-17",
      "MPAA Rating: PG",
      "MPAA Rating: PG-13",
      "MPAA Rating: R"
    ),
    label.var = list("title", "year", "votes", "genre"),
    label.expression = list(
      ("rating" > 8.5 &
        "budget" < 50),
      ("rating" > 8.5 &
        "budget" < 100),
      ("rating" > 8 & "budget" < 50),
      ("rating" > 9 & "budget" < 10)
    ),
    type = list("r", "np", "p", "np"),
    method = list(MASS::rlm, "lm", "lm", "lm"),
    marginal.type = list("histogram", "boxplot", "density", "violin"),
    centrality.para = "mean",
    xfill = list("#56B4E9", "#009E73", "#999999", "#0072B2"),
    yfill = list("#D55E00", "#CC79A7", "#F0E442", "#D55E00"),
    ggtheme = list(
      ggplot2::theme_grey(),
      ggplot2::theme_classic(),
      ggplot2::theme_light(),
      ggplot2::theme_minimal()
    ),
    messages = FALSE
  ),
  .f = ggstatsplot::ggscatterstats
)
#> Error: Aesthetics must be either length 1 or the same as the data (1): x, y

# combine plots
ggstatsplot::combine_plots(
  plotlist = plot_list,
  nrow = 4
)

Created on 2018-09-01 by the reprex package (v0.2.0.9000).

Typo in explanation test

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem

"t-tets" is written instead of "t-test" in last line of first paragraph

# insert reprex here

goals for `0.0.8` release

Planned date: early Feb 2019

(Goal for release date: last week of December)

To do:

  • Massively refactor subtitle maker functions to avoid repetition of code across functions
  • Go full rlang rather than using short-cuts
  • Since #38 was solved, ggscatterstats arguments label.var and label.expression work only with characters but not bare expressions. This is incosistent with the API principles for the rest of the functions in the package. See if there is a way to implement this.
  • Refactor grouped_ variants of functions to remove the ugly purrr hack they currently implement
  • Refactor code to remove stats::na.omit(). Take a more fine-grained approach to remove NAs only from columns of interest.
  • Showing both 50% and 95% CIs for ggcoefstats (like in Bayesian inference plots: e.g., https://twitter.com/tjmahr/status/1048226472710873089)
  • Check font size for theme_ggstatsplot function; maybe give user arguments option to change all aspects of the theme?
  • Clean up .Rmd file language using gramr package
  • When there are many levels in a factor, ggpiestats labels can overlap; give the option to have the labels to be either "internal" (current default) or "external" to the slices
  • Add group option for ggscatterstats to support grouped marginals (https://github.com/daattali/ggExtra/blob/master/inst/vignette_files/ggExtra_files/figure-markdown_strict/ggmarginal-grouping-1.png)
  • Add ggplot.function argument to grouped_ variants to make modifications with ggplot2 functions to customize the plot
  • Make the package lighter; the number of dependencies is getting out of control

`theme_wsj()` doesn't work with ggstatsplot layer

# doesn't work
ggstatsplot::ggscatterstats(
  data = iris,
  x = Sepal.Length,
  y = Petal.Width,
  ggtheme = ggthemes::theme_wsj(),
  ggstatsplot.layer = TRUE
)
#> Error in unit(rep(just$hjust, n), "npc"): 'x' and 'units' must have length > 0

# works
ggstatsplot::ggscatterstats(
  data = iris,
  x = Sepal.Length,
  y = Petal.Width,
  ggtheme = ggthemes::theme_wsj(),
  ggstatsplot.layer = FALSE
)
#> Warning: The plot is not a `ggplot` object and therefore can't be further modified with `ggplot2` functions.
#> 

Created on 2018-09-25 by the reprex package (v0.2.1)

adding missing tests

Some of the following functions were discovered to have bugs and this escaped before because there were no tests for them-

  • context("subtitle_ggscatterstats")

  • Pearson's r

  • percentage bend correlations

  • bayes factor

  • context("subtitle_t_onesample")

  • Wilcox test

  • robust location measure test

  • context("subtitle_mann_nonparametric")

  • within-subjects design

  • context("effctsize_ci")
    (new functions introduced in 0.0.7)

  • yuend_ci

  • kw_eta_h_ci

  • context("pairwise comparisons")

  • tests for pairwise_p() function

diagnosing Ubuntu fails in Travis

@IndrajeetPatil, in response to #23, I'll try a few things in sequence to isolate the problem, including

  • temporarily disable OS-X builds in the matrix (in case they're timing out --but I don't think that's the case).

At the very least, that should buy you some extra run time on Travis before the time-limit is reached.

  • experiment w/ location of the package sources, which hopefully addresses this error message
The command "eval sudo apt-get install -y r-cran-stringi r-cran-magrittr r-cran-curl r-cran-jsonlite r-cran-rcpp r-cran-bindrcpp r-cran-rcppeigen r-cran-openssl r-cran-rlang r-cran-utf8 r-cran-gss r-cran-haven r-cran-data.table r-cran-dplyr r-cran-purrr r-cran-tidyr r-cran-readr r-cran-minqa r-cran-mvtnorm r-cran-nloptr r-cran-sparsem r-cran-lme4 r-cran-httpuv r-cran-markdown r-cran-sem r-cran-readxl r-cran-openxlsx r-cran-pander " failed. Retrying, 2 of 3.

NA omission is too harsh for `grouped_` variant of some functions

Same dataset has different sample sizes (n) across the bare and grouped variant of the function.

For example, ggbetweenstats-

set.seed(123)
library(ggplot2)
library(ggstatsplot)

# create a dataset
df <- ggplot2::msleep
df$group <- "1"

# bare function
ggbetweenstats(df,
               vore,
               brainwt,
               messages = FALSE,
               outlier.label = conservation)

# grouped function
grouped_ggbetweenstats(
  df,
  vore,
  brainwt,
  grouping.var = group,
  outlier.label = conservation,
  messages = FALSE
)

Created on 2018-12-12 by the reprex package (v0.2.1)

Plus, outlier.tagging is not TRUE and yet that column is getting evaluated. Needs to be fixed.

writing unit tests for bootstrapped effect sizes

Somehow even after setting the seed to the same value, the subtitle prepared by the bare function and the one computed in the function environment consistently differ slightly. This is because the confidence intervals for effect size are not identical.

How do you write tests for such cases?
Want to make sure here that the entire call is identical in the helper subtitle function and its instantiation in the plotting function.

# plot
set.seed(123)
p <- ggstatsplot::ggbetweenstats(
  data = mtcars,
  x = cyl,
  y = wt,
  nboot = 50,
  var.equal = TRUE,
  messages = FALSE,
  k = 3
)


# subtitle
set.seed(123)
p_subtitle <- ggstatsplot::subtitle_anova_parametric(
  data = mtcars,
  x = cyl,
  y = wt,
  nboot = 50,
  var.equal = TRUE,
  messages = FALSE,
  k = 3
)

# checking if these two are equal
p$labels$subtitle
#> paste(italic("F"), "(", 2, ",", "29", ") = ", "22.911", ", ", 
#>     italic("p"), " = ", "< 0.001", ", ", omega["p"]^2, " = ", 
#>     "0.578", ", CI"["95%"], " [", "0.432", ", ", "0.774", "]", 
#>     ", ", italic("n"), " = ", 32L)

p_subtitle
#> paste(italic("F"), "(", 2, ",", "29", ") = ", "22.911", ", ", 
#>     italic("p"), " = ", "< 0.001", ", ", omega["p"]^2, " = ", 
#>     "0.578", ", CI"["95%"], " [", "0.431", ", ", "0.770", "]", 
#>     ", ", italic("n"), " = ", 32L)

Created on 2018-11-29 by the reprex package (v0.2.1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.