brad-cannell / freqtables Goto Github PK

Quickly make tables of descriptive statistics (i.e., counts, percentages, confidence intervals) for categorical variables. This package is designed to work in a tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.

License: Other

R 98.17% CSS 1.83%

epidemiology data-analysis categorical-data descriptive-statistics r

freqtables's Introduction

freqtables

The goal of freqtables is to quickly make tables of descriptive statistics for categorical variables (i.e., counts, percentages, confidence intervals). This package is designed to work in a tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.

Installation

You can install the released version of freqtables from CRAN with:

install.packages("freqtables")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("brad-cannell/freqtables")

Example

Because freqtables is intended to be used in a dplyr pipeline, loading dplyr into your current R session is recommended.

library(dplyr)
library(freqtables)

The examples below will use R’s built-in mtcars data set.

data("mtcars")

freq_table()

The freq_table() function produces one-way and two-way frequency tables for categorical variables. In addition to frequencies, the freq_table() function displays percentages, and the standard errors and confidence intervals of the percentages. For two-way tables only, freq_table() also displays row (subgroup) percentages, standard errors, and confidence intervals.

For one-way tables, the default 95 percent confidence intervals displayed are logit transformed confidence intervals equivalent to those used by Stata. Additionally, freq_table() will return Wald (“linear”) confidence intervals if the argument to ci_type = “wald”.

For two-way tables, freq_table() returns logit transformed confidence intervals equivalent to those used by Stata.

Here is an example of using freq_table() to create a one-way frequency table with all function arguments left at their default values:

mtcars %>% 
  freq_table(am)
#>   var cat  n n_total percent       se   t_crit      lcl      ucl
#> 1  am   0 19      32  59.375 8.820997 2.039513 40.94225 75.49765
#> 2  am   1 13      32  40.625 8.820997 2.039513 24.50235 59.05775

Here is an example of using freq_table() to create a two-way frequency table with all function arguments left at their default values:

mtcars %>% 
  freq_table(am, cyl)
#> # A tibble: 6 × 17
#>   row_var row_cat col_var col_cat     n n_row n_total percent_total se_total
#>   <chr>   <chr>   <chr>   <chr>   <int> <int>   <int>         <dbl>    <dbl>
#> 1 am      0       cyl     4           3    19      32          9.38     5.24
#> 2 am      0       cyl     6           4    19      32         12.5      5.94
#> 3 am      0       cyl     8          12    19      32         37.5      8.70
#> 4 am      1       cyl     4           8    13      32         25        7.78
#> 5 am      1       cyl     6           3    13      32          9.38     5.24
#> 6 am      1       cyl     8           2    13      32          6.25     4.35
#> # … with 8 more variables: t_crit_total <dbl>, lcl_total <dbl>,
#> #   ucl_total <dbl>, percent_row <dbl>, se_row <dbl>, t_crit_row <dbl>,
#> #   lcl_row <dbl>, ucl_row <dbl>

You can learn more about the freq_table() function and ways to adjust default behaviors in vignette(“descriptive_analysis”).

freq_test()

The freq_test() function is an S3 generic. It currently has methods for conducting hypothesis tests on one-way and two-way frequency tables. Further, it is made to work in a dplyr pipeline with the freq_table() function.

For the freq_table_two_way class, the methods used are Pearson’s chi-square test of independence Fisher’s exact test. When cell counts are <= 5, Fisher’s Exact Test is considered more reliable.

Here is an example of using freq_test() to test the equality of proportions on a one-way frequency table with all function arguments left at their default values:

mtcars %>%
  freq_table(am) %>%
  freq_test() %>%
  select(var:percent, p_chi2_pearson)
#>   var cat  n n_total percent p_chi2_pearson
#> 1  am   0 19      32  59.375      0.2888444
#> 2  am   1 13      32  40.625      0.2888444

Here is an example of using freq_test() to conduct a chi-square test of independence on a two-way frequency table with all function arguments left at their default values:

mtcars %>%
  freq_table(am, vs) %>%
  freq_test() %>%
  select(row_var:n, percent_row, p_chi2_pearson)
#> # A tibble: 4 × 7
#>   row_var row_cat col_var col_cat     n percent_row p_chi2_pearson
#>   <chr>   <chr>   <chr>   <chr>   <int>       <dbl>          <dbl>
#> 1 am      0       vs      0          12        63.2          0.341
#> 2 am      0       vs      1           7        36.8          0.341
#> 3 am      1       vs      0           6        46.2          0.341
#> 4 am      1       vs      1           7        53.8          0.341

You can learn more about the freq_table() function and ways to adjust default behaviors in vignette(“using_freq_test”).

freq_format()

The freq_format function is intended to make it quick and easy to format the output of the freq_table function for tables that may be used for publication. For example, a proportion and 95% confidence interval could be formatted as “24.00 (21.00 - 27.00).”

mtcars %>%
  freq_table(am) %>%
  freq_format(
    recipe = "percent (lcl - ucl)",
    name = "percent_95",
    digits = 2
  ) %>%
  select(var, cat, percent_95)
#>   var cat            percent_95
#> 1  am   0 59.38 (40.94 - 75.50)
#> 2  am   1 40.62 (24.50 - 59.06)

You can learn more about the freq_format() function by reading the function documentation.

freqtables's People

Contributors

Stargazers

Watchers

Forkers

alturabi1990

freqtables's Issues

Make freq_table aware of when it is inside summarise

To truly make freq_table work in a typical dplyr pipeline, it really needs to be able to work inside summarise(). Something like:

mtcars %>% summarise( freq_table( cyl ) )

Now that dplyr 1.0 allows summarise to return multiple columns, this should be possible. However, to make freq_table backwards compatible and because there is probably still some value in being able to just use mtcars %>% freq_table( cyl ), I'd like to be able to make freq_table aware of when it is inside of summarise() or not and respond accordingly. Having said that, I haven't tried out mtcars %>% summarise( freq_table( cyl ) ) yet. It may just work already.

Fix error: "Error in vars(-n) : could not find function "vars""

This error pops up if you try to use freqtables without loading dplyr. For example:

freqtables::freq_table(mtcars, am)
Error in vars(-n) : could not find function "vars"

I don't consider this a huge problem becasue freqtables is meant to be used in a dplyr pipeline. However, I think it's a pretty easy fix (dplyr::vars() ?), so I'd like to go ahead and fix it.

Change x argument to .data argument in freq_test

For consistency.

Update vignette
Update version number

Make a package down site for freqtables

You might want to wait until after you make the group_by() changes. The vignettes might chance quite a bit when you do that.

Start with something really simple. Just get it going. You can improve over time.

This will have more detail than R4Epi, and it will be more formal than R Notes.

https://pkgdown.r-lib.org/index.html

Remove t_crit from default output

Most of the time we won't care about this. We can request it when we do.

Add example to README

Add quick examples for use to README.
Use the same examples that you use in the roxygen documentation.
Do this after all the changes in functionality.

Create an "interpret function"

Given results from various analyses, return narrative prose that describes the results in a clear and statistically valid way - along with citations.

Add support for weights

as title?

Return factor inputs as factors in the results

The current behavior is to return them as character type. This is a pain if you've made them factors above to control ordering, and have to make them factors again in the results table.

Sun Study example:

map_student <- map_student %>% 
  mutate(
    ss_application_f = factor(
      ss_application,
      c("absent", "finished", "no_start", "no_finish", "disturbance")
    )
  )

p_overall <- map_student %>% 
  # Complete case analysis
  filter(!is.na(ss_application_f)) %>% 
  freq_table(period_f, ss_application_f) %>% 
  rename(period = row_cat, ss_application = col_cat) %>% 
  # Have to coerce to factor again to control ordering in plot
  mutate(
    ss_application = factor(
      ss_application, 
      c("absent", "finished", "no_start", "no_finish", "disturbance")
    ),
    period = factor(
      period, 
      c("Baseline", "Intervention", "Randomization", "Maintenance"))
  ) %>% 
  select(period, ss_application, percent_row) %>% 
  ggplot(aes(period, percent_row, group = ss_application)) +
    geom_point(aes(col = ss_application))

Instead of t_prob, have percent_ci as argument to freq_table

entering percent_ci = 95 is much more natural for the end user than entering t_prob = 0.975.

Edit the function
Edit the function documentation
Edit the unit tests
Edit vignettes

Remove rounding from freq_tables

We just want raw output in freq_tables, format_tables (freq_format) will do all the formatting -- including rounding.

Also remove rounding parameter from the function definition.

Create a hex stick for freqtables

I added a hex sticker to the README. Now I'm getting an error that says "Non-standard file/directory found at top level". Need to fix this. I think I might have fixed it already in meantables.

Create the freq_tbl function

Overview

This is documented in the wiki.

Creating the freq_tbl() function can be done in isolation, but it is part of a bigger effort to retool the freqtables package in several important ways. For example, this is also related to using group_by() with freq_tbl() and freq_table() (#40).

Tasks

Add simulated data to the master branch (#46).
Create a new branch, iss-39-freq-tbl, for working on this issue.
Get a working freq_tbl function.
Create unit tests
Create roxygen2 documentation

Create function to add an empty row

Often, when I am building tables for reports, I want there to be an empty row in between variables. Currently, I can manually add an empty row with dplyr::add_row(). Here is an example from the Sun Study report:

# Add a blank line in between each classroom
table_ss_application <- table_ss_application %>% 
  add_row(Classroom = "", `Sunscreen Application Outcome` = "", 
          Baseline = "", Intervention = "",
          Randomization = "", Maintenance = "", .after = 5
  ) %>% 
  add_row(Classroom = "", `Sunscreen Application Outcome` = "", 
          Baseline = "", Intervention = "",
          Randomization = "", Maintenance = "", .after = 11
  ) %>% 
  add_row(Classroom = "", `Sunscreen Application Outcome` = "", 
          Baseline = "", Intervention = "",
          Randomization = "", Maintenance = "", .after = 17
  )

However, that is long and clunky, and you have to list every variable by name. I want to come up with something more generic I can use to quickly add empty rows.

Would be even better if it were possible to do this directly in flextable rather than in data frames.

Use group_by with freq_table

Overview

Previously, in #1 I removed the ability to use a grouped tibble with freq_table(). Now, I'm finding that using group_by() might be the most dplyr way to do things. Remember, freq_table() is intended to be integrated with a dplyr pipeline.

Additionally, using group_by() might help with issue #9 in that group_var_1, group_var_1, etc. would naturally flow from the variables added to group_by().

Adding multiple var names to the group_by() function could result in multiple tables rather than being used as grouping variables. (Nah, I don't think I like this idea).

Passing one var name to freq_table() should still produce a one-way frequency table. In other words, you shouldn't need to use group_by() to produce a one-way frequency table.

It turns out that just removing

  if (("grouped_df" %in% .data_class)) {
    .data <- dplyr::ungroup(.data)
  }

from the freq_table code will make it so that group_by() works again. All the stats still work too. The only issue that I can see is that.

mtcars %>% 
  freq_table(am, cyl)

and

mtcars %>% 
  group_by(am) %>%
  freq_table(cyl)

Now return the exact same result. I'm not sure if that's good or not. I guess one problem is that it makes it harder to rename the output columns as described in #9 (i.e., group and outcome). Does it though? Need to think more about this.

One good thing is that we don't have to worry about previous groupings messing up the groups we expect when using group_by with freq_table. According to the group_by documentation, If you apply group_by() to an already grouped dataset, will overwrite the existing grouping variables.

Left off at

2023-03-17

Working through the stuff below in test.Rmd. Decided to create some test files that I can use to compare freqtables to Stata and SAS. The specifics are outlined in #22.

2022-07-31

Trying to decide if I want to soft depreciate ... or hard deprecate it the ... argument in freq_tables. In the iss-40-group-by branch, I have four different versions of the freq_table() function:

freq_table(): The current CRAN version of the function.
freq_table_v2(): In this version, I'm soft deprecating .... It still works, but I'm also adding a .x argument and an informative warning message for users about deprecating .... This is probably the safest route, but it feel like it will slow me down from doing what I actually want to do with freqtables. Also, not being able to use the .x argument by position feels wrong.
freq_table_v3(): In this version, I'm hard deprecating .... I'm just replacing it with the .x argument and an informative warning message for users about deprecating .... Of course, there are issues with the approach breaking code.
freq_table_v4(): In this version, I'm also hard deprecating .... This is the most extreme version and what I was last working on. It begins from the new freq_tbl function and builds on from there. Not only might this fix the group_by issue, but we might also address #9, #14, #39, and #22. And also modularize the code a little more, which is something I've been wanting to do for a while. Of course, there are lots of issues with the approach breaking code.

Task list

Add an argument that will allow the user to choose to the names of the analysis varaibles in a column or in the column name

The current behavior is to have the names of the analysis variables in a column and use generic column names like "row_var" and "col_var". This is usually the behavior I want because it makes it easy to combine multiple results tables into a single table for presentation (e.g. Table 1). However, there are times (e.g., when passing results to ggplot) that it can be useful to have the analysis variable as the column name containing its categories.

Sun Study example:

map_student %>% 
  # Complete case analysis
  filter(!is.na(ss_application_f)) %>% 
  freq_table(period_f, ss_application_f) %>% 
  rename(period = row_cat, ss_application = col_cat) %>% 
  select(period, ss_application, percent_row) %>% 
  ggplot(aes(period, percent_row)) +
    geom_point()

Add ability to make n-way tables

Overview

Currently, I freqtables will only create one- and two-way tables. It will not create n-way tables. We want to add the ability to create n-way tables.

What I had in mind was something like:

demo_nih %>% 
    freqtables::freq_table(ethnicity_nih, race_nih, sex_nih)

However, what I've been doing in the meantime is:

make_table_section <- function(cat) {
  demo_nih %>% 
    filter(ethnicity_nih == cat) %>% 
    freqtables::freq_table(race_nih, sex_nih) 
}

And then:

purrr::map_dfc(
  .x = c("Not Hispanic or Latino", "Hispanic or Latino", "Unknown/Not Reported"),
  .f = make_table_section
)

Obviously, this is more verbose, but it gets the job done and is very versitile (e.g., user can return a list instead of a data frame). However, the spirit of freqtables isn't really to be the most "versitile" package. It's to be the easiest to use "out of the box" for 85%+ of normal use. Give this some thought.

The suggestion from a user on RStudio Community could also be useful:

mtcars %>% 
  gather(variable,category,cyl,vs,am,factor_key = TRUE)%>%
  group_by(variable,category)%>%
  summarize(n=n())

Left off at

2023-03-17: Working on test.Rmd as part of #40.

I created two data files for comparing freqtables with Stata and SAS.
The data files are called /inst/extdata/freq_study.dta and /inst/extdata/freq_study.xpt.
These data files are created using data-raw/study.R.
I also created a do file - /inst/extdata/compare_freqtables.do - and a SAS script - /inst/extdata/compare_freqtables.sas.
I added all of these files to buildignore.

2020-06-11: Created test.Rmd on the plane to Minnesota to test out different ways of doing this. test.Rmd is git ignored and build ignored.

Tasks

Complete one, two, and n-way tables in Stata (/inst/extdata/compare_freqtables.do). Use them for comparison.
Complete one, two, and n-way tables in SAS (/inst/extdata/compare_freqtables.sas). Use them for comparison.
Figure out how you want freq_tbl to treat n-way tables.
Figure out how you want freq_table to treat n-way tables.
Figure out how you want freq_test to calculated stats for n-way tables.

Change freq_table functioning

Instead of choosing which stats to return with an argument to freq_table, return everything and then select what you want with the select verb.
Make it so that format_table then automatically gives you what you want depending on what you selected (e.g., n and percent). If it doesn't know how based on the combination you gave then it gives you instructions.
If it's a one-way table, don't use group_by first. I feel like it might make more sense to pass the variable directly to freq_table. If it's a two-way table, then use group_by for the grouping variable.
Change the wording of @return in the roxygen header to mention something about the arbitrary nature of "row" and "column" in the context of the table of results that are returned.

Add sandwich estimators for clustering

Work with MDL on this.

Remove DiagrammR from vignette

Replace the DiagrammR flow chart with a flow chart made in PowerPoint.

Count explicit 0 for unobserved factor level in freq_table

Currently, if I have a factor variable in my data with an unobserved level, that level will not get an explicit n = 0 in freq_table(). See the example below.

Update the documentation to explicity state that it must be a factor variable.

df <- data.frame(
cat_var = factor(
c(rep("Always", 2), rep("Sometimes", 3)),
levels = c("Always", "Sometimes", "Never")
)
)

df %>%
group_by(cat_var) %>%
freq_table()

A tibble: 2 x 7

var cat n n_total percent lcl ucl

1 cat_var Always 2 5 40 3.77 91.9
2 cat_var Sometimes 3 5 60 8.1 96.2

This can be fixed by adding .drop = FALSE to the underlying code.

df %>%
count(cat_var, .drop = FALSE)

A tibble: 3 x 2

cat_var n

1 Always 2
2 Sometimes 3
3 Never 0

Remove the need to group_by first in freq_table

Current syntax:

matcars %>%
group_by(am) %>%
freq_table()

Move group_by into the body of freq_table (like dplyr::count) so that the new sytax would simply be:

matcars %>%
freq_table(am)

matcars %>%
freq_table(am, disp)

Decompose the freq_table function into multiple separate functions

Overview

I think there's a lot of room for modularizing the function and making it easier to maintain and test.

ChatGPT chat about it

Add folder structure to wiki

Overview

The folder structure for this package is starting to get large and slightly confusing. I need to document it in the repo's wiki.

You might want to wait until you are done with #40 and merge everything back into the main branch. It's possible that there will be some big changes once that happens.

Create CRAN documentation

See "The submission process" here: http://r-pkgs.had.co.nz/release.html

Group and subgroup make more sense than row and col.

Row and column make sense for a contingency table, but not so much for a frequency table (click here to see the difference).

For a frequency table, we should use group and outcome. Or possibly just use the variable names as column headers?

Handing code if we want to grab the group variable names for some reason:

mtcars %>% 
  group_by(am) %>% 
  group_vars()

Or the group levels

mtcars %>% 
  group_by(am) %>% 
  group_keys()

freq-table return all stats

Instead of choosing which stats to return with an argument to freq_table, return everything and then select what you want with the select verb.

Work with contingency tables

Overview

Need to come up with a way to work with 2x2 contingency tables.

Here's what I want to do:

Input a contingency table
Display the cases
Convert from raw data to data frame
Display contingency table as a data frame
Display the contingency table as a 2x2 table (matrix)

Helpful websites:

R Notes notes on making contingency tables
Check out PercTable as a potential format for the results.

Thoughts

Enter cross tab as a freq_table.
Create functions for risks, odds, effect modification, etc.
Create a function that will allow freq_table to output in table format.
Create a function that will allow freq_table to output a data frame of observations/cases
Instead of row and column, consider using exposure and outcome.
Update vignettes

Other thoughts
freq_crosstab:

freqtable input
Manual frequency table inout
Data frame of observations input
Output is a table or matrix
From there, we can calculate quantities of interest (e.g. odds ratio). Broom style.
Also, need a display only version that can show n’s, row/column percents, etc.

Potential output style

..1	..2	..3	..4
		Col var
Row var	Stats	Outcome +	Outcome -
Exposure +	N
Exposure +	Percent
Exposure -	N
Exposure -	Percent

Maybe even display as a flextable. Document that this layout is for viewing rather than analysis?
Maybe just use gmodels::CrossTable() instead of reinventing the wheel?

Make a freqtable_to_matrix function. https://brad-cannell.github.io/r_notes/contingency-tables.html#convert-a-frequency-table-into-a-contingency-table.
Make a freqtable_to_df function. https://brad-cannell.github.io/r_notes/contingency-tables.html#convert-a-frequency-table-into-a-data-frame-of-observations.

Upload to CRAN

I want to upload this to CRAN for at least 3 reasons:

To show that I can.
For easier install.
For vignetttes.

Before resubmission

Run checks
Update cran-comments.md

After accepted,

Do this stuff: http://r-pkgs.had.co.nz/release.html#release-submission
Create a new release on Github
Put vignette files back in the main directory
Close this issue

Convert freqtables project to new project format

Move over the rest of the relevant files from bfuncs

Eventually, I'd like to make several changes to the way freq_table works. Those are documented as separate issues. First, though, I just need to get all the old files moved over.

format_table change percent/mean_95 to percent/mean_ci

They aren't necessarily 95% confidence intervals.
It would be really cool to make the number after the "_" reflect the confidence level.

Actually, consider changing to format_table(...), where ... can be arbitrary columns and symbols that you want to paste together. For example format_table(n, " (", percent, ")").

Calculate p-values first.
If you want to do something stupid like "***" for p < 0.001 you can.
Should it automatically select just the variable lables and formatted statistics? Probably.
Create an option to round at this stage too.

Call whatever you output formatted_stat so that it is easier to bind with meantables

Change the name from format_table to freq_format
Make the new freq_format code w/ recipe argument into its own R script. Borrow from the old freq_format (e.g., description as needed). Erase the old freq_format and format_table R scripts.
Update the unit tests for freq_format
Change the name of the unit test file
Update the vignette
Update the version number

This is all good, but how to I use 90% ci's instead of 95% ci's?
Ok, I can change this in freq_table.

Improve Vignettes

You've made a lot of changes to freqtables. Make vignettes that show all of the capabilities.

Function documentation should contain examples of using all the different function options.
Vignette should not contain example of using all the different function options. It should contain one or two example use case from start to finish.

Change name of get_group_n to freq_group_n for naming consistency
Move CSS out of the vignette Rmd file and into a separate CSS file
Move demonstration of changing function arguments from vignette to function documentation
Improve README too. I just tried to get something down quickly before submitting to CRAN.
Covert to one or two compelling use cases
Create a vignette for making a Table 1 (adapt Rmd file from R notes).
Remvoe the function results from the help documentation for all of the functions.

Create an informative error for when all (most?) of the n's are zero

During the Sun Study, I was using this code:

map_student %>% 
  # Complete case analysis
  filter(!is.na(ss_application_f)) %>% 
  filter(period_f == "baseline") %>% 
  freq_table(teacher_f, ss_application_f)

And got this error:

Problem with `mutate()` input `t_crit_total`.
x NaNs produced
ℹ Input `t_crit_total` is `stats::qt(t_prob, df = n_total - 1)`.NaNs producedProblem with `mutate()` input `t_crit_row`.
x NaNs produced
ℹ Input `t_crit_row` is `stats::qt(t_prob, df = n_total - 1)`.NaNs produced

The problem was that there was no "baseline" factor level. There was a "Baseline" factor level. I either need to come up with a more informative error or a data check to prevent this situation.

Add built-in example study data to freqtables

Overview

I already created this for testing on the iss-40-group-by branch. I need to add it to the master branch.

Hopefully, this will also help with #39 and #40.

Useful links

Notes on simulating data

Put ... argument last in freq_table

for consistency with tidyverse.
Actually, I'm not sure I want to do this. I had some trouble with freq_format when I tried doing this.

Make Fisher's Exact Test the default when cell counts are less than 5

Currently, freqtables gives a warning when cell counts are less than 5. But, the user still has to request Fisher's Exact Test. Why? I never want Pearson's chi square test when cell counts are less than 5, so why give it? Just give Fisher by default.

Creating multiple one or n-way tables

Currently using purr::map_df. Here is an example from L2C quarterly report:

# Loop over all categorical vars
cat_stats <- purrr::map_df(
  quos(gender_f, race_3cat_f, hispanic_f), 
  function(x) {
    demographics %>%
      filter(screened_in == 1) %>% 
      freq_table({{x}}) %>%
      freq_format(recipe = "n (percent)", digits = 1) %>%
      select(var, cat, formatted_stats) %>%
      # Add a row with the var name only
      add_row(var = quo_name(x), .before = 1) %>% 
      # Add blank row below
      add_row(var = "", cat = "", formatted_stats = "")
  }
)

I should either:

Create a wrapper function to make this easier to read.
Document using purrr::map_df really well.
Both.

Update @return in freq_table

Change the wording of @return in the roxygen header to mention something about the arbitrary nature of "row" and "column" in the context of the table of results that are returned.

Create vignette about adding freq tables to Word documents with flextable

Add stuff learned from LEAD panel report
- flextable embedded in Rmd knitted to Word
- flextable embedded into Word template at bookmarks
- flextable embedded into a Word document created programatically using officer (with or without a template)

Add an informative error for user if they forget to pass a data frame to freq_table

A user submitted this code

freq_table(scrGen)

Which produced the following error:

Error in UseMethod("summarise_") : no applicable method for "summarise_" applied to an object of class "c('integer', 'numeric')"

The problem was that the user was not passing the data frame name to freq_table becaus the had already "attached" the data frame. I need to come up with a more informative error for when this happens.

Create error message
Create a unit test
Update news
Update version number

Make return all levels of a factor optional

At some point I changed the code so that unobserved factor levels appear in the table with 0 observations. There are definitely times when I want that to happen. But, there are also times when I don't. An example came up when I was working on the sun study.

map_student_day20 %>%
freq_table(period, success)

Gave me 0 observations for the "maintenance" period when really I wanted to drop the maintenance period.

So, add an argument that makes the return of all factor levels optional.

Create .drop argument to freq_table

Add a total row

See example from L2C quarterly report. The table needed a column for each group, but also a total column (see attachment). That isn't currently easy to get out of freq_tables.

Think about making this part of the default output or about creating a helper function.

Add select stats to vignette

In #13 we changed the output of freq_table to include all stats. We recognize that many users will often not need/want all of these statistics, but restricting display of some of the statistics would require one of the following two options:

Developers make a decision about which stats are displayed and the rest are lost to the user.
Create separate functions for each combination of stats that the user wants to display (e.g., freq_table_n_percent()).
Add arguments to the freq_table function that allow the user to select which stats are displayed.

Option 1 is too restrictive.
Option 2 is unwieldy and contradicts the intent of freq_table, which is to be simple and easy to remember/use.
Option 3 was the original solution, but it felt kind of clunky and it still required the developers to make choices up front about options for which combinations of statistics the users could choose to display (e.g., stats = “n and percent”). Further, we are trying to adhere to the philosophy that the function should do one specific thing. This function creates a table of statistics. The dplyr::select() function makes it really straight forward to choose which of those statistics to keep.

Therefore, when the user does not wish to display all of the statistics that freq_table outputs by default we recommend one of the following two solutions:

Just use select. After all, freq_table was made to be used in a dplyr pipeline.

mtcars %>%
freq_table(am) %>%
select(var, cat, n, percent)

If you are going to use the same pattern of variables in select repeatedly, then just quickly create a function wrapper.

my_freq_table <- function(.data, …) {
.data %>%
freq_table(…) %>%
select(var, cat, n, percent)
}

brad-cannell / freqtables Goto Github PK

freqtables's Introduction

freqtables

Installation

Example

freq_table()

freq_test()

freq_format()

freqtables's People

Contributors

Stargazers

Watchers

Forkers

freqtables's Issues

Overview

Tasks

Overview

Left off at

Task list

Overview

Left off at

Tasks

A tibble: 2 x 7

A tibble: 3 x 2

Overview

Overview

Overview

Thoughts

Potential output style

Overview

Useful links

Recommend Projects

Recommend Topics

Recommend Org