brad-cannell / meantables Goto Github PK

The goal of meantables is to quickly make tables of descriptive statistics (i.e., counts, means, confidence intervals) for continuous variables. This package is designed to work in a Tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.

License: Other

R 100.00%

epidemiology descriptive-statistics r

meantables's Introduction

meantables

Installation

You can install the released version of meantables from CRAN with:

install.packages("meantables")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("brad-cannell/meantables")

Example

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(meantables)

data("mtcars")

Overall mean table with defaults

mtcars %>% 
  mean_table(mpg)
#> # A tibble: 1 × 9
#>   response_var     n  mean    sd   sem   lcl   ucl   min   max
#>   <chr>        <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg             32  20.1  6.03  1.07  17.9  22.3  10.4  33.9

Formatting overall mean and 95% CI

mtcars %>%
  mean_table(mpg) %>%
  mean_format(
    recipe = "mean (lcl - ucl)",
    name = "mean_95",
    digits = 1
  ) %>% 
  select(response_var, mean_95)
#> # A tibble: 1 × 2
#>   response_var mean_95           
#>   <chr>        <chr>             
#> 1 mpg          20.1 (17.9 - 22.3)

Formatting grouped means table with mean and sd

mtcars %>%
  group_by(cyl) %>%
  mean_table(mpg) %>%
  mean_format("mean (sd)") %>% 
  select(response_var:group_cat, formatted_stats)
#> # A tibble: 3 × 4
#>   response_var group_var group_cat formatted_stats
#>   <chr>        <chr>         <dbl> <chr>          
#> 1 mpg          cyl               4 26.66 (4.51)   
#> 2 mpg          cyl               6 19.74 (1.45)   
#> 3 mpg          cyl               8 15.1 (2.56)

Grouped means table with defaults

mtcars %>% 
  group_by(cyl) %>% 
  mean_table(mpg)
#> # A tibble: 3 × 11
#>   response_var group_var group…¹     n  mean    sd   sem   lcl   ucl   min   max
#>   <chr>        <chr>       <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 mpg          cyl             4    11  26.7  4.51 1.36   23.6  29.7  21.4  33.9
#> 2 mpg          cyl             6     7  19.7  1.45 0.549  18.4  21.1  17.8  21.4
#> 3 mpg          cyl             8    14  15.1  2.56 0.684  13.6  16.6  10.4  19.2
#> # … with abbreviated variable name ¹group_cat

meantables's People

Contributors

Stargazers

Watchers

Forkers

alturabi1990

meantables's Issues

Make mean_table(x) work with variable named "x"

Currently, this produces and error:

df <- tibble(
  x = c(1, 2, NA, 4, 5)
)

df <- df %>% 
  mean_table(x)

 Error: Problem with `summarise()` column `response_var`.
ℹ `response_var = rlang::quo_name(x)`.
x `expr` must quote a symbol, scalar, or call
Run `rlang::last_error()` to see where the error occurred.

I'm pretty sure it's because of the x argument in

mean_table <- function(.data, x, t_prob = 0.975, output = default, digits = 2, ...)

Just change the x to .x or .var

Create NEWS.md

Replace t_prob argument with percent_ci

Make consistent with freqtables.
entering percent_ci = 95 is much more natural for the end user than entering t_prob = 0.975.

Edit the function documentation
Edit the unit tests
Edit the using_meantables vignette

Add mean_format to mean_tables

Use the same (or similar) code that freq_tables uses.

Change the calculation for "n"

Currently, the calculation for n simply uses the n() function. However, this doesn't give us the answer we are most likely looking for when there is missing data -- the number of non-missing values.

Replace: n = n(),
With: n = !is.na(.data[[rlang::quo_name(x)]]) %>% sum(),

This will also require tweaking the argument for the mean and sd calculation as well. See confidence_intervals.Rmd in r_notes. Wait, maybe not.

Add standard deviation to mean_tables

Initial submit to CRAN

Submit to win-builder
Change version number
Check changes that had to be made to freqtables

Consistent variable naming

There will probably be other things as I dig into this.

Should we use a formula impute to the functions (i.e., similar to lm())? Would that make the distinction between response variables and grouping variables more clear?

mode_val <- function(x) {
  
  # Count the number of occurrences for each value of x
  value_counts <- table(x)
  
  # Get the maximum number of times any value is observed
  max_count <- max(value_counts)
  
  # Create and index vector that identifies the positions that correspond to
  # count values that are the same as the maximum count value: TRUE if so
  # and false otherwise
  index <- value_counts == max_count
  
  # Use the index vector to get all values that are observed the same number 
  # of times as the maximum number of times that any value is observed
  unique_values <- names(value_counts)
  result <- unique_values[index]
  
  # If result is the same length as value counts that means that every value
  # occured the same number of times. If every value occurred the same number
  # of times, then there is no mode
  no_mode <- length(value_counts) == length(result)
  
  # If there is no mode then change the value of result to NA
  if (no_mode) {
    result <- NA
  }
  
  # Return result
  result
}