danchaltiel / crosstable Goto Github PK

Easy and thorough description of datasets

Home Page: https://danchaltiel.github.io/crosstable/

R 99.37% CSS 0.28% JavaScript 0.35%

r descriptive-statistics rstats html-report frequency-table msword flextable officer

crosstable's Introduction

crosstable

Crosstable is a package centered on a single function, crosstable, which easily computes descriptive statistics on datasets. It can use the tidyverse syntax and is interfaced with the package officer to create automatized reports.

Installation

# Install last version available on CRAN
install.packages("crosstable")

# Install development version on Github
devtools::install_github("DanChaltiel/crosstable", build_vignettes=TRUE)

# Install specific commit or tagged version (for reproducibility purpose)
devtools::install_github("DanChaltiel/crosstable@ee012f6", build_vignettes=TRUE)
devtools::install_github("DanChaltiel/[email protected]", build_vignettes=TRUE)

Note that, for reproducibility purpose, an even better solution would be to use renv.

Overview

Here are 2 examples to try and show you the main features of crosstable. See the documentation website for more.

Example #1

Dear crosstable, using the mtcars2 dataset, please describe columns disp and vs depending on the levels of column am, with totals in both rows and columns, and with proportions formatted with group size, percent on row and percent on column, with no decimals.

library(crosstable)
ct1 = crosstable(mtcars2, c(disp, vs), by=am, total="both", 
                 percent_pattern="{n} ({p_row}/{p_col})", percent_digits=0) %>%
  as_flextable()
ct1

With only a few arguments, we did select which column to describe (c(disp, vs)), define a grouping variable (by=am), set the percentage calculation in row/column (percent_pattern=), and ask for totals (total=).

Since mtcars2 is a dataset with labels, they are displayed instead of the variable name (see here for how to add some).

As crosstable() is returning a data.frame, we use as_flextable() to output a beautiful HTML table. This one can even be exported to MS Word with a few more lines of code (see here to learn how).

Example #2

Here is a more advanced example.

Dear crosstable, using the mtcars2 dataset again, please describe all columns whose name starts with "cy" and those whose name ends with "at", depending on the levels of both columns am and vs, without considering labels, applying mean() and quantile() as summary function, with probs 25% and 75% defined for this latter function, and with 3 decimals for numeric variables:

ct2 = crosstable(mtcars2, c(starts_with("cy"), ends_with("at")), by=c(am, vs), 
                 label=FALSE, num_digits=3, funs=c(mean, quantile), 
                 funs_arg=list(probs=c(.25,.75))) %>% 
  as_flextable(compact=TRUE, header_show_n=1:2)
ct2

Here, the variables were selected using tidyselect helpers and the summary functions mean and quantile were specified, along with argument probs for the latter. Using label=FALSE allowed to see which variables were selected but it is best to keep the labels in the final table.

In as_flextable(), the compact=TRUE option yields a longer output, which may be more suited in some contexts (for instance for publication), and header_show_n=1:2 adds the group sizes for both rows of the header.

Documentation

You can find the whole documentation on the dedicated website:

vignette("crosstable") for a first step-by-step guide on how to use crosstable (link)
vignette("crosstable-report") for more on creating MS Word reports using either {officer} or Rmarkdown (link)
vignette("pertent_pattern") for more on how to use percent_pattern (link)
vignette("crosstable-selection") for more on variable selection (link), although you should better read https://tidyselect.r-lib.org/articles/syntax.html.

There are lots of other features you can learn about there, for instance (non-exhaustive list):

description of correlation, dates, and survival data (link)
variable selection with functions, e.g. is.numeric (link)
formula interface, allowing to describe more mutated columns, e.g. sqrt(mpg) or Surv(time, event) (link)
automatic computation of statistical tests (link) and of effect sizes (link)
global options to avoid repeating arguments (link)

Getting help and giving feedback

If you have a question about how to use crosstable, please ask on StackOverflow with the tag crosstable. You can @DanChaltiel in a comment if you are struggling to get answers. Don't forget to add a minimal reproducible example to your question, ideally using the reprex package.

If you miss any feature that you think would belong in crosstable, please fill a Feature Request issue.

If you encounter an unexpected error while using crosstable, please fill a Bug Report issue. In case of any installation problem, try the solutions proposed in this article first.

Acknowledgement

In its earliest development phase, crosstable was based on the awesome package biostat2 written by David Hajage. Thanks David!

crosstable's People

Contributors

Stargazers

Watchers

Forkers

jameshawes lionel- jimhester arrendi davisvaughan hadley clinicopath genomicsnx oncostat

crosstable's Issues

Issue installing the package on Mac

I have the same issue that you mentioned in the package page. It says:
ERROR: failed to lock directory ‘/Library/Frameworks/R.framework/Versions/4.1/Resources/library’ for modifying
Try removing ‘/Library/Frameworks/R.framework/Versions/4.1/Resources/library/00LOCK-flextable’

The downloaded source packages are in
‘/private/var/folders/zh/sd5lrgn551x1jp3p7pzy1qz00000gp/T/RtmpDW8QTG/downloaded_packages’

Unintuitive showNA="no"

Describe the bug
When setting showNA="no", the user expects that NAs are also omitted from totals. However, they creep in (se column 6 below). This means that a manual summation of n across cells in a column (or row) does not result in the total. Whether or not this is intended, it is very counter-intuitive. Also, NA in the by variable are shown, even though showNA is "no" (see column NA below).

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(crosstable)
x <- data.frame(a=c(1:3, NA), b=c(NA, 4:6))
crosstable(x, cols=a, by=b, total="both", showNA = "no")
#> # A tibble: 4 × 8
#>   .id   label variable `4`         `5`         `6`        `NA`  Total      
#>   <chr> <chr> <chr>    <chr>       <chr>       <chr>      <chr> <chr>      
#> 1 a     a     1        0 (NA%)     0 (NA%)     0 (NA%)    <NA>  1 (33.33%) 
#> 2 a     a     2        1 (100.00%) 0 (0%)      0 (0%)     <NA>  1 (33.33%) 
#> 3 a     a     3        0 (0%)      1 (100.00%) 0 (0%)     <NA>  1 (33.33%) 
#> 4 a     a     Total    1 (33.33%)  1 (33.33%)  1 (33.33%) 1     4 (100.00%)

^{Created on 2022-10-25 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  nb.utf8
#>  ctype    nb.utf8
#>  tz       Europe/Berlin
#>  date     2022-10-25
#>  pandoc   2.19.2 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version    date (UTC) lib source
#>    assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.2.1)
#>    backports     1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  P base64enc     0.1-3      2015-07-28 [?] CRAN (R 4.0.3)
#>    checkmate     2.1.0      2022-04-21 [1] CRAN (R 4.2.1)
#>    cli           3.4.1      2022-09-23 [1] CRAN (R 4.2.1)
#>    crosstable  * 0.5.0.9006 2022-10-25 [1] Github (DanChaltiel/crosstable@995c7f6)
#>    data.table    1.14.2     2021-09-27 [1] CRAN (R 4.2.1)
#>    DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.1)
#>  P digest        0.6.30     2022-10-18 [?] CRAN (R 4.2.1)
#>    dplyr       * 1.0.10     2022-09-01 [1] CRAN (R 4.2.1)
#>    ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.2.1)
#>  P evaluate      0.17       2022-10-07 [?] CRAN (R 4.2.1)
#>    fansi         1.0.3      2022-03-24 [1] CRAN (R 4.2.1)
#>  P fastmap       1.1.0      2021-01-25 [?] CRAN (R 4.0.3)
#>    flextable     0.8.2      2022-09-26 [1] CRAN (R 4.2.1)
#>    forcats       0.5.2      2022-08-19 [1] CRAN (R 4.2.1)
#>  P fs            1.5.2      2021-12-08 [?] CRAN (R 4.1.2)
#>    gdtools       0.2.4      2022-02-14 [1] CRAN (R 4.2.1)
#>    generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  P glue          1.6.2      2022-02-24 [?] CRAN (R 4.1.2)
#>  P highr         0.9        2021-04-16 [?] CRAN (R 4.0.5)
#>  P htmltools     0.5.3      2022-07-18 [?] CRAN (R 4.2.1)
#>  P knitr         1.40       2022-08-24 [?] CRAN (R 4.2.1)
#>    lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.1)
#>  P magrittr      2.0.3      2022-03-30 [?] CRAN (R 4.1.3)
#>    officer       0.4.4      2022-09-09 [1] CRAN (R 4.2.1)
#>    pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>    pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.1)
#>    purrr         0.3.5      2022-10-06 [1] CRAN (R 4.2.1)
#>  P R6            2.5.1      2021-08-19 [?] CRAN (R 4.1.1)
#>    Rcpp          1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>    reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  P rlang         1.0.6      2022-09-24 [?] CRAN (R 4.2.1)
#>  P rmarkdown     2.17       2022-10-07 [?] CRAN (R 4.2.1)
#>    rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.1)
#>  P sessioninfo   1.2.2      2021-12-06 [?] CRAN (R 4.2.1)
#>  P stringi       1.7.8      2022-07-11 [?] CRAN (R 4.2.1)
#>  P stringr       1.4.1      2022-08-20 [?] CRAN (R 4.2.1)
#>    systemfonts   1.0.4      2022-02-11 [1] CRAN (R 4.2.1)
#>    tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>    tidyr         1.2.1      2022-09-08 [1] CRAN (R 4.2.1)
#>    tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.1)
#>    utf8          1.2.2      2021-07-24 [1] CRAN (R 4.2.1)
#>    uuid          1.1-0      2022-04-19 [1] CRAN (R 4.2.0)
#>    vctrs         0.4.2      2022-09-29 [1] CRAN (R 4.2.1)
#>    withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.1)
#>  P xfun          0.34       2022-10-18 [?] CRAN (R 4.2.1)
#>    xml2          1.3.3      2021-11-30 [1] CRAN (R 4.2.1)
#>  P yaml          2.3.6      2022-10-18 [?] CRAN (R 4.2.1)
#>    zip           2.2.1      2022-09-08 [1] CRAN (R 4.2.1)
#> 
#>  [1] C:/Users/py128/OneDrive - NIFU/Github-R/skolesporringer/renv/library/R-4.2/x86_64-w64-mingw32
#>  [2] C:/Users/py128/AppData/Local/Temp/Rtmpgp6H7Y/renv-system-library
#>  [3] C:/Users/py128/AppData/Local/Programs/R/R-4.2.1/library
#> 
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

'as_gt' does not apply the labels provided in the 'generic_labels' optional arg

Describe the bug
Labels for generic columns passed to the as_gt function are not applied to the resulting gt table.

Reproducible example

library(crosstable)

t <- data.frame(x=c('a', 'a', 'b'))
ct <- crosstable(t, cols=c('x'))
gt <- as_gt(ct, generic_labels=list(value="count"))
print(gt)

The output is:

value
x
a	2 (66.67%)
b	1 (33.33%)

Expected behaviour would be for the topmost label to be "count" rather than "value".

Session info

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices datasets utils methods base

other attached packages:
[1] crosstable_0.4.1

loaded via a namespace (and not attached):
[1] zip_2.2.0 Rcpp_1.0.8 pillar_1.7.0 compiler_4.1.2 forcats_0.5.1 base64enc_0.1-3 tools_4.1.2 digest_0.6.29 uuid_1.0-3 gtable_0.3.0
[11] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.6 checkmate_2.0.0 pkgconfig_2.0.3 rlang_1.0.1 rstudioapi_0.13 DBI_1.1.2 cli_3.2.0 xfun_0.29
[21] fastmap_1.1.0 stringr_1.4.0 officer_0.4.1 dplyr_1.0.8 knitr_1.37 xml2_1.3.3 sass_0.4.0 gdtools_0.2.4 generics_0.1.2 vctrs_0.3.8
[31] systemfonts_1.0.4 grid_4.1.2 tidyselect_1.1.2 glue_1.6.2 data.table_1.14.2 R6_2.5.1 fansi_1.0.2 rmarkdown_2.11 ggplot2_3.3.5 tidyr_1.2.0
[41] purrr_0.3.4 magrittr_2.0.2 scales_1.1.1 backports_1.4.1 ellipsis_0.3.2 htmltools_0.5.2 gt_0.4.0 assertthat_0.2.1 colorspace_2.0-3 flextable_0.6.10
[51] renv_0.15.2 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 crayon_1.5.0

allow time variables: hms, period...

tibble(h=rpois(10, 10), m=rpois(10, 30), hm=hm(paste0(h,":",m))) %>% crosstable()

x=structure(c(3600, 3600, 7200, 16200, 3600, 3600), class = c("hms", "difftime"), units = "secs")
tibble(x) %>% crosstable()

Same labels are merged

mtcars2 %>% 
  apply_labels(mpg="Number of cylinders") %>% 
  crosstable(c(mpg, cyl)) %>% 
  af()

If this happens, there should be a warning.

Cannot load package

install.packages("crosstable")
Installing package into ‘C:/Users/davis/OneDrive/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.1/crosstable_0.4.1.zip'
Content type 'application/zip' length 539115 bytes (526 KB)
downloaded 526 KB

library(crosstable)
Error: package or namespace load failed for ‘crosstable’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘xfun’ 0.25 is being loaded, but >= 0.29 is required

it can not be installed

when I type the following
devtools::install_github("DanChaltiel/crosstable", build_vignettes=TRUE)

It kept giving the following error messages:
** using staged installation
** R
Error in parse(outFile) :
/tmp/RtmpDgOkKg/Rbuild22af522f5a7/crosstable/R/utils.R:575:46: unexpected '>'
574: ansi_align_by = function(text, pattern){
575: pos = gregexpr(pattern, ansi_strip(text)) |>
^
ERROR: unable to collate and parse R files for package ‘crosstable’

Deprecation of functions `officer::slip_in_*`

Dear Dan,

Your package is using functions slip_in_* that I wanted to deprecate them since a long time... It's now done on github version. davidgohel/officer@8f31e5f

Could you update your code to use fpar/ftext/run_word_field/run_bookmark/...?

KR
David

Store N as attribute

For downstream wrapper functions (i.e. table notes, captions, etc) it would be great if N, meaning nrow(data) was stored as an attribute as part of the crosstable class. Should be quite easy to implement, and not resulting in risky behaviour? Currently impossible to recreate N by summing up the value-column, because the table might only consist of e.g. percentages.

Change "NA" to "NA (string)" instead of warning

This warning is annoying and unnecessary.

Cannot describe column "exnd_s" as it contains both NA (missing values) and "NA" (string)

Change every "NA" in the column to "NA (string)" instead.

Optimizing by optional getTableCI()

Hi Dan,
While profiling my own downstream package I noticed that crosstable() spends significant amount of time obtaining getTableCI() even if I have not requested it. It is about 300 ms* for the latter (out of a total of 600 ms). Could we either have simple logical argument for turning it off, or that the parser of the glue-like string checks for this? When creating many tables in a report (+ the inevitable iteration until perfection), those 300 ms make a difference. From perusing your summarize_categorical_single() functions it seems easy to insert such a condition? Also, I had a look at getTableCI() and across_unpack() but could not spot any obvious improvements there - I guess the bottleneck is the actual CI computation.

*Note: My toy dataset contained 4 variables and 100 rows. When using the mtcars2 the timings are halved, but still getTableCI() takes about half the time.
crosstable::mtcars2 |> crosstable::crosstable(cols = tidyselect::matches("vs|am"), percent_pattern = "{n}", showNA = "ifany", label = TRUE)

`header_show_n` doesn't work when `by==NULL`

crosstable(iris, 1:2) %>% af(header_show_n=TRUE)

`header_show_n=T` doesn't work if `by=NULL`

crosstable(mtcars) %>% af(header_show_n=T)
crosstable(mtcars, by=am) %>% af(by_header=F, header_show_n=T)

Bug in `percent_pattern` when misfilled

BUG if percent_pattern is of length 1

crosstable(mtcars2, am, by=vs, total="both", percent_pattern=list(total_all = "{n}")) %>% 
  as_flextable(keep_id=TRUE)

BUG result has NA if percent_pattern has missing values

crosstable(mtcars2, am, by=vs, total="both", percent_pattern=list(total_row="{n}", total_all = "{n}")) %>% 
  as_flextable(keep_id=TRUE)

improve `format_fixed()`

Allow scientific=FALSE to simply turn it off
Allow eps=0.001 like in base::format.pval(): values <eps are labelled "<{eps}"

`body_add_normal()` do not remove the `

It should.
Maybe keep them if doubled?

check that all options are handled in `crosstable_options()`

#' @noRd
#' @keywords internal
missing_options_helper = function(){
  options_found = dir("R/", full.names=T) %>% 
    map(readLines) %>% 
    map(~str_subset(.x, "getOption")) %>% 
    keep(~length(.x)>0) %>% 
    unlist() %>% 
    str_extract_all("getOption\\((.*?)\\)") %>% unlist() %>% 
    str_extract("getOption\\((.*?)(,(.*))?\\)", group=1) %>% 
    unique() %>% 
    str_subset('"') %>% 
    str_remove_all('"')
  
  options_proposed = names(formals(crosstable_options))
  
  options_found %>% setdiff(options_proposed)
}


#in tests
test_that("No missing options", {
  missing_options = missing_options_helper()
  expect_identical(missing_options, character(0))
})

unnecessary warning when ... is empty

crosstable::crosstable(iris, Sepal.Length, ) |> invisible()
#> Warning: The `...` argument of `crosstable()` is deprecated as of crosstable 0.2.0.
#> i Please use the `cols` argument instead.
#> i Instead of `crosstable(iris, Sepal.Length, , ...)`, write `crosstable(iris,
#>   c(Sepal.Length, ), ...)`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

^{Created on 2023-08-02 with reprex v2.0.2}

Thousand separator (i.e ",")

Describe the new feature
Most of the time when I am dealing with number of the range of thousands, reducing that numbers with scientific notation is too much, so having a thousand separator would improve the readiness of that numbers.

Known workaround
I am just accepting as it is.

option to sort values

Maybe use https://stackoverflow.com/questions/32378108/using-gtoolsmixedsort-or-alternatives-with-dplyrarrange ?

allow empty `,` in `...`

crosstable(iris, 1:2, )

Same in apply_label

crosstable doesn't properly show N in header when using multiple by variables and removing header keys

Describe the bug
When using multiple by variables and removing header keys there are two issues.

First, the key for the first by variable is truncated to only display the text after N=. So if the text should read "right (N=232)", instead it just reads "232)".

Second, the key for the other by variable isn't removed.

Reproducible example

library(crosstable)


set.seed(1)
n <- 1000
data <- data.frame(
  g1 = ifelse(runif(n) > 0.5, "high", "low"),
  g2 = ifelse(runif(n) > 0.5, "right", "left")
)

data$y <- as.integer(data$g1 == "high") + rnorm(n)

crosstable(data, 
           cols = y,
           by = c(g2, g1)) %>% 
  as_flextable(remove_header_keys = TRUE,
               header_show_n = TRUE)

Session info

``` r R version 4.1.2 (2021-11-01) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] crosstable_0.5.0.9002

loaded via a namespace (and not attached):
[1] zip_2.2.0 Rcpp_1.0.9 pillar_1.8.1
[4] compiler_4.1.2 forcats_0.5.2 base64enc_0.1-3
[7] tools_4.1.2 digest_0.6.29 uuid_1.0-3
[10] evaluate_0.16 lifecycle_1.0.1 tibble_3.1.8
[13] checkmate_2.1.0 pkgconfig_2.0.3 rlang_1.0.4
[16] DBI_1.1.1 cli_3.3.0 rstudioapi_0.13
[19] xfun_0.32 fastmap_1.1.0 officer_0.4.2
[22] dplyr_1.0.9 stringr_1.4.1 knitr_1.40
[25] xml2_1.3.3 generics_0.1.3 gdtools_0.2.4
[28] vctrs_0.4.1 systemfonts_1.0.4 tidyselect_1.1.2
[31] glue_1.6.2 data.table_1.14.2 R6_2.5.1
[34] fansi_1.0.3 rmarkdown_2.16 tidyr_1.2.0
[37] purrr_0.3.4 magrittr_2.0.3 ellipsis_0.3.2
[40] backports_1.4.1 htmltools_0.5.3 assertthat_0.2.1
[43] flextable_0.7.0 utf8_1.2.2 stringi_1.7.8
[46] crayon_1.4.2

</details>

Unexpected warning "NaNs produced" when calculating percents in totals

library(crosstable)
x = structure(list(x = structure(c(NA, NA, 1L, NA, NA, 1L, NA, NA, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA, NA, NA, NA, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 2L, 2L, 
                                   2L, NA, 2L, 2L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA, NA, NA, NA, NA, 
                                   NA, NA, NA, NA, 2L, NA, NA, 2L, NA, NA, 2L, NA, NA, NA, 2L, 1L, 
                                   NA, NA, NA, 2L, 1L, 2L, 1L, 2L, NA, 2L, NA, NA, NA, NA, NA, NA, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                   NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA, NA, 2L, 2L, 2L, 
                                   1L, NA, NA, 2L, 2L, 2L, NA, NA, NA, NA, 2L, 2L), 
                                 levels = c("A", 
                                            "B"), 
                                 class = "factor"), 
                   y = structure(c(2L, 2L, 2L, 2L, 2L, 
                                   2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                   2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 
                                   1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 
                                   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 
                                   1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 
                                   1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 
                                   1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 
                                   1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 
                                   2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
                                   2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L), 
                                 levels = c("A", 
                                            "B"), class = "factor")), row.names = c(NA, -164L), class = c("tbl_df", "tbl", "data.frame"))
crosstable(x, x, by=y, total=TRUE)
#> Warning in sqrt((p * (1 - p) + 0.25 * z2/n)/n): NaNs produced

#> Warning in sqrt((p * (1 - p) + 0.25 * z2/n)/n): NaNs produced

#> Warning in sqrt((p * (1 - p) + 0.25 * z2/n)/n): NaNs produced
#> # A tibble: 4 x 6
#>   .id   label variable A           B            Total        
#>   <chr> <chr> <chr>    <chr>       <chr>        <chr>        
#> 1 x     x     A        1 (14.29%)  6 (85.71%)   7 (21.21%)   
#> 2 x     x     B        7 (26.92%)  19 (73.08%)  26 (78.79%)  
#> 3 x     x     NA       47          84           131          
#> 4 x     x     Total    55 (33.54%) 109 (66.46%) 164 (100.00%)

^{Created on 2022-09-01 with reprex v2.0.2}

crosstable is checking the contents of foreign error messages

Hello, I'm preparing a release of tidyselect and found a problem with your package.

It boils down to crosstable checking for the contents of error messages it doesn't generate, e.g.

    expect_error({A="foobar";crosstable(iris2, ~A, by="Species")},
                 "Can't coerce element 1 from a character to a logical.*")

Could you please change all of these checks to snapshot tests (see https://testthat.r-lib.org/articles/snapshotting.html). This allows you to monitor error messages without causing CRAN failures when upstream packages change messages.

support more markdown, like in `onbrand::md_to_officer()`

The documentation of onbrand::md_to_officer() says:

bold: can be either "**text in bold**" or "__text in bold__"
italics: can be either "*text in italics*" or "_text in italics_"
subscript: "Normal~subscript~"
superscript: "Normal^superscript^"
color: "<color:red>red text</color>"
shade: "<shade:#33ff33>shading</shade>"
font family: "<ff:symbol>symbol</ff>"

Replace superceded dplyr functions with across

I noticed a few uses of dplyr functions which now are considered superceded:
importFrom(dplyr,mutate_all)
importFrom(dplyr,mutate_at)
importFrom(dplyr,mutate_if)
You already use across() so should be easy replacements. Just to be future-proof.

Failure on dev testthat

This is just an official issue that crosstable fails with dev testthat:

── Failure ('test-by_factor.R:290:3'): By multiple errors ──────────────────────
`.` did not throw the expected error.
Backtrace:
    ▆
 1. ├─... %>% expect_error(class = "crosstable_all_na_by_warning") at test-by_factor.R:290:2
 2. └─testthat::expect_error(., class = "crosstable_all_na_by_warning")

wrong information in the documentation (again)

https://danchaltiel.github.io/crosstable/articles/crosstable-report.html#post-production-for-tablefigure-legends

Depending on your version of officer, Word will ask you to update the fields

What version of 'officer' automatically updates fields for users? I'm not sure I understand what you mean about 'officer', but it seems badly adapted to users and lacking in clarity. As you pretend to document officer based on your experience of autocompleting, maybe you should warn your reader you don't read the documentation ? davidgohel/flextable#620 (comment)

Be aware that you unfortunately cannot reference a bookmark more than once using this method. Writing: body_add_normal("Table \@ref(iris_col1) is about flowers. I like this Table \@ref(iris_col1).") will prevent all the numbering from applying.

Is that a limit of crosstable? If your sentence implies it is a limit of 'officer', please correct and explain this is about your implementation because officer has not this limitation

Crosstable installation on Mac issue

Describe the bug
I came here from: https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-install.html
to try and diagnose my crosstable installation issues on the Mac. I experienced the error you noted at the bottom.

Reproducible example

install.packages("crosstable")
installing the source package ‘crosstable’

trying URL 'https://cran.rstudio.com/src/contrib/crosstable_0.4.0.tar.gz'
Content type 'application/x-gzip' length 985971 bytes (962 KB)
==================================================
downloaded 962 KB

* installing *source* package ‘crosstable’ ...
** package ‘crosstable’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/systemfonts/libs/systemfonts.so':
  dlopen(/Library/Frameworks/R.framework/Versions/3.6/Resources/library/systemfonts/libs/systemfonts.so, 0x0006): Library not loaded: /opt/X11/lib/libfreetype.6.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/systemfonts/libs/systemfonts.so
  Reason: tried: '/opt/X11/lib/libfreetype.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libfreetype.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk1.8.0_231.jdk/Contents/Home/jre/lib/server/libfreetype.6.dylib' (no such file)
Calls: <Anonymous> ... asNamespace -> loadNamespace -> library.dynam -> dyn.load
Execution halted
ERROR: lazy loading failed for package ‘crosstable’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/crosstable’
Warning in install.packages :
  installation of package ‘crosstable’ had non-zero exit status

The downloaded source packages are in
	‘/private/var/folders/_r/j4pb20313cb2czs8ly2qdlhr0000gp/T/RtmpdJUfpN/downloaded_packages’

Session info

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] colorspace_2.0-0 scales_1.1.1     compiler_3.6.2   assertthat_0.2.1 R6_2.5.0         cli_2.2.0        tools_3.6.2     
 [8] glue_1.4.2       rstudioapi_0.13  crayon_1.3.4     fansi_0.4.2      knitr_1.30       xfun_0.20        lifecycle_0.2.0 
[15] munsell_0.5.0    rlang_1.0.1

NAs in crosstable() not getting their percentage

Describe the bug
As title says, seems something is odd in the output. Notice the is.na(variable)-rows.

Reproducible example

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(crosstable)
x <-
    structure(list(l_tp_1 = structure(c(3L, 1L, 1L, 1L, 2L, 3L, 2L, 
                                        1L, 3L, 2L, 1L, 2L, 1L, 1L, NA, 3L, 3L, 1L, 4L), levels = c("I liten grad", 
                                                                                                    "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Fundamentale paradigmer for forståelse\n - Trygg på", class = "factor"), 
                   l_tp_2 = structure(c(2L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 2L, 
                                        2L, 3L, 1L, 1L, 1L, NA, 3L, 2L, 2L, 1L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Fenomenologi - Trygg på", class = "factor"), 
                   l_tp_3 = structure(c(3L, 3L, 1L, 2L, 1L, 4L, 3L, 1L, 2L, 
                                        NA, 2L, 1L, 2L, 2L, NA, 2L, 1L, 1L, 2L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Grounded theory - Trygg på", class = "factor"), 
                   l_tp_4 = structure(c(3L, 2L, 1L, 2L, 2L, 3L, 3L, 1L, 2L, 
                                        NA, 1L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Konstruktivisme - Trygg på", class = "factor"), 
                   l_tp_5 = structure(c(2L, 3L, 2L, 2L, 3L, NA, 1L, 1L, 2L, 
                                        3L, 1L, 3L, 3L, 2L, 3L, 3L, 3L, 2L, 2L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Dokumentanalyse - Trygg på", class = "factor"), 
                   l_tp_6 = structure(c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 3L, 
                                        NA, 3L, 1L, 1L, 1L, NA, 1L, 4L, 2L, 1L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Klasseromsobservasjon - Trygg på", class = "factor"), 
                   l_tp_7 = structure(c(3L, 1L, 2L, 1L, 4L, 3L, 2L, 1L, 1L, 
                                        NA, 1L, 1L, 3L, 2L, NA, 1L, 2L, 4L, 2L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Kvalitativ komparativ analyse (med sett-teori) - Trygg på", class = "factor"), 
                   l_tp_8 = structure(c(2L, 3L, 1L, 1L, 3L, 3L, 2L, 1L, 2L, 
                                        3L, 1L, 3L, 1L, 2L, NA, 1L, 2L, 2L, 1L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Diskursanalyse - Trygg på", class = "factor"), 
                   l_tp_9 = structure(c(2L, 3L, 2L, 2L, 2L, 3L, 1L, 1L, 3L, 
                                        3L, 1L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L), levels = c("I liten grad", 
                                                                                            "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Evalueringer av prosess - Trygg på", class = "factor"), 
                   l_tp_10 = structure(c(2L, 1L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 
                                         NA, 1L, 1L, 1L, 2L, NA, 2L, 2L, 2L, 2L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Etnografi - Trygg på", class = "factor"), 
                   l_tp_11 = structure(c(3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 
                                         2L, 2L, 2L, 2L, 3L, NA, 3L, 2L, 2L, 2L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Koding av kvalitativ data - Trygg på", class = "factor"), 
                   l_tp_12 = structure(c(3L, 3L, 3L, 1L, 2L, 3L, 2L, 2L, 3L, 
                                         3L, 3L, 3L, 2L, 4L, 3L, 3L, 3L, 2L, 2L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Tematisk analyse - Trygg på", class = "factor"), 
                   l_tp_13 = structure(c(2L, 3L, 3L, 2L, 2L, 3L, 4L, 1L, 2L, 
                                         3L, 2L, 3L, 3L, 4L, 3L, 2L, 3L, 2L, 2L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Innholdsanalyse - Trygg på", class = "factor"), 
                   l_tp_14 = structure(c(2L, 2L, 1L, 1L, 1L, 3L, 2L, 1L, 2L, 
                                         3L, 2L, 2L, 1L, 4L, 1L, 2L, 1L, 3L, 1L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Narrativ analyse - Trygg på", class = "factor"), 
                   l_tp_15 = structure(c(2L, 2L, 2L, 3L, 2L, 1L, 3L, 2L, 3L, 
                                         NA, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), levels = c("I liten grad", 
                                                                                             "I noe grad", "I stor grad", "Vet ikke"), label = "Kvalitative metoder - Nvivo - Trygg på", class = "factor")), class = c("tbl_df", 
                                                                                                                                                                                                                       "tbl", "data.frame"), row.names = c(NA, -19L))
x %>% 
    select(all_of(c('l_tp_10', 'l_tp_11', 'l_tp_15', 'l_tp_2', 'l_tp_3', 'l_tp_4', 'l_tp_5',
                    'l_tp_6', 'l_tp_7', 'l_tp_8'))) %>%
    crosstable(by = l_tp_2, percent_pattern = "{p_col}") %>% transpose_crosstable()
#> # A tibble: 25 × 12
#>    .id    l_tp_2 varia…¹ Kvali…² Kvali…³ Kvali…⁴ Kvali…⁵ Kvali…⁶ Kvali…⁷ Kvali…⁸
#>    <chr>  <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 I lit… I lit… I lite… 66.67%  0%      11.11%  44.44%  33.33%  11.11%  77.78% 
#>  2 I lit… I lit… I noe … 33.33%  77.78%  77.78%  44.44%  55.56%  44.44%  22.22% 
#>  3 I lit… I lit… I stor… 0%      22.22%  11.11%  11.11%  11.11%  44.44%  0%     
#>  4 I lit… I lit… Vet ik… 0%      0%      0%      0%      0%      0%      0%     
#>  5 I lit… I lit… NA      0       0       0       0       0       0       0      
#>  6 I noe… I noe… I lite… 0%      0%      0%      40.00%  0%      16.67%  20.00% 
#>  7 I noe… I noe… I noe … 80.00%  50.00%  60.00%  20.00%  60.00%  50.00%  40.00% 
#>  8 I noe… I noe… I stor… 20.00%  50.00%  40.00%  40.00%  40.00%  33.33%  20.00% 
#>  9 I noe… I noe… Vet ik… 0%      0%      0%      0%      0%      0%      20.00% 
#> 10 I noe… I noe… NA      1       0       1       1       1       0       1      
#> # … with 15 more rows, 2 more variables:
#> #   `Kvalitative metoder - Kvalitativ komparativ analyse (med sett-teori) - Trygg på` <chr>,
#> #   `Kvalitative metoder - Diskursanalyse - Trygg på` <chr>, and abbreviated
#> #   variable names ¹variable, ²`Kvalitative metoder - Etnografi - Trygg på`,
#> #   ³`Kvalitative metoder - Koding av kvalitativ data - Trygg på`,
#> #   ⁴`Kvalitative metoder - Nvivo - Trygg på`,
#> #   ⁵`Kvalitative metoder - Grounded theory - Trygg på`, …

^{Created on 2022-09-01 with reprex v2.0.2}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23 ucrt)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  nb.utf8
#>  ctype    nb.utf8
#>  tz       Europe/Berlin
#>  date     2022-09-01
#>  pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.3)
#>  backports     1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
#>  base64enc     0.1-3      2015-07-28 [1] CRAN (R 4.0.3)
#>  checkmate     2.1.0      2022-04-21 [1] CRAN (R 4.1.3)
#>  cli           3.3.0      2022-04-25 [1] CRAN (R 4.1.3)
#>  crayon        1.5.1      2022-03-26 [1] CRAN (R 4.1.3)
#>  crosstable  * 0.5.0.9001 2022-08-29 [1] Github (DanChaltiel/crosstable@efb5882)
#>  data.table    1.14.2     2021-09-27 [1] CRAN (R 4.1.1)
#>  DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  dplyr       * 1.0.9      2022-04-28 [1] CRAN (R 4.2.1)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.0.5)
#>  evaluate      0.16       2022-08-09 [1] CRAN (R 4.2.1)
#>  fansi         1.0.3      2022-03-24 [1] CRAN (R 4.1.3)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.0.3)
#>  flextable     0.7.3      2022-08-09 [1] CRAN (R 4.2.1)
#>  forcats       0.5.2      2022-08-19 [1] CRAN (R 4.2.1)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  gdtools       0.2.4      2022-02-14 [1] CRAN (R 4.1.2)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.0.5)
#>  htmltools     0.5.3      2022-07-18 [1] CRAN (R 4.2.1)
#>  knitr         1.40       2022-08-24 [1] CRAN (R 4.2.1)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.1.3)
#>  officer       0.4.3      2022-06-12 [1] CRAN (R 4.2.0)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.3)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.3)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.0     2022-06-28 [1] CRAN (R 4.2.1)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.1)
#>  Rcpp          1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
#>  rlang         1.0.4      2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.16       2022-08-24 [1] CRAN (R 4.2.1)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.1)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.8      2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       1.4.1      2022-08-20 [1] CRAN (R 4.2.1)
#>  styler        1.7.0      2022-03-13 [1] CRAN (R 4.2.1)
#>  systemfonts   1.0.4      2022-02-11 [1] CRAN (R 4.1.2)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
#>  tidyr         1.2.0      2022-02-01 [1] CRAN (R 4.2.1)
#>  tidyselect    1.1.2      2022-02-21 [1] CRAN (R 4.2.1)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  uuid          1.1-0      2022-04-19 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1      2022-04-13 [1] CRAN (R 4.1.3)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.32       2022-08-10 [1] CRAN (R 4.2.1)
#>  xml2          1.3.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#>  zip           2.2.0      2021-05-31 [1] CRAN (R 4.1.0)
#> 
#>  [1] C:/Users/py128/OneDrive - NIFU/R
#>  [2] C:/Program Files/R/R-4.2.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

How to only show percentages using crosstable?

Hi there,

I really like the usability of this package, but I've been having issues with figuring out how to show only percentages on my output from the instructions. For example, my output would come out like the photo below, but I only want the percentages and not the Count + percentages. I have my code and an example of my output attached below. Thank you!

df1 <- crosstable(ce_c, c(msgc1_emotion,msgc2_emotion,msgc3_emotion,msgc4_emotion,msgc5_emotion), by=condition,showNA="no", percent_digits=0) %>%
as_flextable(keep_id=FALSE)
df1

Compact tables for Likert-type categorical variables with shared response options

Describe the new feature
Very difficult to find (tidyverse-compliant) packages that produce tables such as this:
Google search: likert table

Known workaround
Not a full workaround, but essentially using tidyr::pivot_wider(names_from = variable, values_from = value) on the output from crosstable(). Unfortunately, as_flextable() and as_workbook() does not work from there (likely do to tidyverse/tidyr#1379), but as_gt() does.
The workaround is fairly easy, but I would like for my colleagues a super-easy solution.

Some values of percent_pattern cause percentages in "Total" row or column to report 100%

Describe the bug
Some choices of percent_pattern cause the percentages in the "Total" row or column to report as 100%. (The examples below are the best way to illustrate.)

Reproducible example

Default and expected behaviour. The percentages in the rightmost column show the total of each row as a percentage of the grand total; the percentages in the bottom row show the total of each column as a percentage of the grand total.

> crosstable(mtcars, gear, by=cyl, total="both")
# A tibble: 4 x 7
  .id   label variable `4`         `6`        `8`         Total       
  <chr> <chr> <chr>    <chr>       <chr>      <chr>       <chr>       
1 gear  gear  3        1 (6.67%)   2 (13.33%) 12 (80.00%) 15 (46.88%) 
2 gear  gear  4        8 (66.67%)  4 (33.33%) 0 (0%)      12 (37.50%) 
3 gear  gear  5        2 (40.00%)  1 (20.00%) 2 (40.00%)  5 (15.62%)  
4 gear  gear  Total    11 (34.38%) 7 (21.88%) 14 (43.75%) 32 (100.00%)

Using percent_pattern with a value that simply reformats the percentage in the non-total cells. The cells in the total row/column don't change their format accordingly, but they still report the right numerical percentage.

> crosstable(mtcars, gear, by=cyl, total="both", percent_pattern = "{n} ~~~ {p_row}")
# A tibble: 4 x 7
  .id   label variable `4`          `6`          `8`           Total       
  <chr> <chr> <chr>    <chr>        <chr>        <chr>         <chr>       
1 gear  gear  3        1 ~~~ 6.67%  2 ~~~ 13.33% 12 ~~~ 80.00% 15 (46.88%) 
2 gear  gear  4        8 ~~~ 66.67% 4 ~~~ 33.33% 0 ~~~ 0%      12 (37.50%) 
3 gear  gear  5        2 ~~~ 40.00% 1 ~~~ 20.00% 2 ~~~ 40.00%  5 (15.62%)  
4 gear  gear  Total    11 (34.38%)  7 (21.88%)   14 (43.75%)   32 (100.00%)

Using percent_pattern with a value that applies a formula to p_row. This breaks the percentage calculation in the total row and column, so that the cells in the total column all show 100% (formatted according to the pattern), while the cells in the total row show no percentage.

> crosstable(mtcars, gear, by=cyl, total="both", percent_pattern = "{n} ~~~ {str_replace(p_row, '%', 'pct')}")
# A tibble: 4 x 7
  .id   label variable `4`            `6`            `8`             Total           
  <chr> <chr> <chr>    <chr>          <chr>          <chr>           <chr>           
1 gear  gear  3        1 ~~~ 6.67pct  2 ~~~ 13.33pct 12 ~~~ 80.00pct 15 ~~~ 100.00pct
2 gear  gear  4        8 ~~~ 66.67pct 4 ~~~ 33.33pct 0 ~~~ 0pct      12 ~~~ 100.00pct
3 gear  gear  5        2 ~~~ 40.00pct 1 ~~~ 20.00pct 2 ~~~ 40.00pct  5 ~~~ 100.00pct 
4 gear  gear  Total    11             7              14              32

Wrapping that formula in paste fixes the calculation and gives the same behaviour as (1):

> crosstable(mtcars, gear, by=cyl, total="both", percent_pattern = "{n} ~~~ {paste(str_replace(p_row, '%', 'pct'))}")
# A tibble: 4 x 7
  .id   label variable `4`            `6`            `8`             Total       
  <chr> <chr> <chr>    <chr>          <chr>          <chr>           <chr>       
1 gear  gear  3        1 ~~~ 6.67pct  2 ~~~ 13.33pct 12 ~~~ 80.00pct 15 (46.88%) 
2 gear  gear  4        8 ~~~ 66.67pct 4 ~~~ 33.33pct 0 ~~~ 0pct      12 (37.50%) 
3 gear  gear  5        2 ~~~ 40.00pct 1 ~~~ 20.00pct 2 ~~~ 40.00pct  5 (15.62%)  
4 gear  gear  Total    11 (34.38%)    7 (21.88%)     14 (43.75%)     32 (100.00%)

But wrapping it in identity instead gives the same issue as shown in (3):

> crosstable(mtcars, gear, by=cyl, total="both", percent_pattern = "{n} ~~~ {identity(str_replace(p_row, '%', 'pct'))}")
# A tibble: 4 x 7
  .id   label variable `4`            `6`            `8`             Total           
  <chr> <chr> <chr>    <chr>          <chr>          <chr>           <chr>           
1 gear  gear  3        1 ~~~ 6.67pct  2 ~~~ 13.33pct 12 ~~~ 80.00pct 15 ~~~ 100.00pct
2 gear  gear  4        8 ~~~ 66.67pct 4 ~~~ 33.33pct 0 ~~~ 0pct      12 ~~~ 100.00pct
3 gear  gear  5        2 ~~~ 40.00pct 1 ~~~ 20.00pct 2 ~~~ 40.00pct  5 ~~~ 100.00pct 
4 gear  gear  Total    11             7              14              32

This seems to be due to the code around lines 126-139 of cross_categorical.R, which does this...

        any_p = pattern_vars %>% str_starts("p") %>% any()

...and then uses that to decide whether to do a margin calculation. The dependency on having a glue variable that starts with the letter "p" would explain why using "paste" (as in (4)) is an effective workaround, but using "identity" (as in (5)) is not.

Session info

``` r R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices datasets utils methods base

other attached packages:
[1] crosstable_0.4.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5
[10] tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] xfun_0.29 tidyselect_1.1.2 haven_2.4.3 colorspace_2.0-3 vctrs_0.3.8 generics_0.1.2 htmltools_0.5.2 base64enc_0.1-3 utf8_1.2.2
[10] rlang_1.0.1 pillar_1.7.0 glue_1.6.2 withr_2.4.3 DBI_1.1.2 gdtools_0.2.4 dbplyr_2.1.1 uuid_1.0-3 modelr_0.1.8
[19] readxl_1.3.1 lifecycle_1.0.1 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 zip_2.2.0 rvest_1.0.2 evaluate_0.15 knitr_1.37
[28] fastmap_1.1.0 tzdb_0.2.0 fansi_1.0.2 broom_0.7.12 Rcpp_1.0.8 renv_0.15.2 scales_1.1.1 backports_1.4.1 checkmate_2.0.0
[37] jsonlite_1.8.0 systemfonts_1.0.4 fs_1.5.2 digest_0.6.29 hms_1.1.1 stringi_1.7.6 grid_4.1.2 cli_3.2.0 tools_4.1.2
[46] magrittr_2.0.2 crayon_1.5.0 pkgconfig_2.0.3 ellipsis_0.3.2 data.table_1.14.2 xml2_1.3.3 reprex_2.0.1 lubridate_1.8.0 rmarkdown_2.11
[55] officer_0.4.1 assertthat_0.2.1 httr_1.4.2 rstudioapi_0.13 flextable_0.6.10 R6_2.5.1 compiler_4.1.2

</details>

library(crosstable) does not work

I used the standard r 4,1,0 gui in Windows 10
I got no errors when I installed crosstable today (8/27)

library(crosstable)
Error: package or namespace load failed for ‘crosstable’:
object ‘check_dots_empty’ is not exported by 'namespace:rlang'
In addition: Warning messages:
1: package ‘crosstable’ was built under R version 4.1.3
2: replacing previous import ‘purrr::list_along’ by ‘rlang::list_along’ when loading ‘crosstable’
3: replacing previous import ‘purrr::modify’ by ‘rlang::modify’ when loading ‘crosstable’
4: replacing previous import ‘purrr::prepend’ by ‘rlang::prepend’ when loading ‘crosstable’

use of t-tests or rank-sum tests

Hi Dan,
I am using the crosstable() function to perform some analyses on my data. Specifically, I am using the following code:

crosstable(mtcars2, cols = c(disp, qsec), by = am, test = T) %>% as_flextable()

However, I noticed that the crosstable() function uses a Wilcoxon rank sum test for the disp variable and a t-test for the qsec variable, when comparing the means between the two levels of the am variable. I would like to modify the function to use t-tests for both variables, to ensure consistency across the analyses.

Is it possible to modify the crosstable() function to achieve this? If so, could you please provide some guidance on how to do it?
Thank you very much for your help!

as_flextable() generic_labels argument ignored

# from last example in as_flextable() documentation, changed from
# total = "Tot" to total = "Total
crosstable(mtcars2, -model, by=vs, total="both", test=TRUE, effect=TRUE) %>%
  rename(ID=.id, math=variable, Tot=Total, lab=label, pval=test, fx=effect) %>%
  as_flextable(by_header = "Engine shape", 
               generic_labels=list(id = "ID", variable = "math", total="Total", 
                                   label = "lab", test = "pval", effect="fx"))

produces the table shown in the attached screenshot, with the heading 'Tot', rather than the desired 'Total'

Wonderful package. If necessary, I can run the html through a filter to get the presentation required.

apply line breaks in `body_add_normal()`

Parse the text in body_add_normal() and friends to apply line breaks with run_linebreak() :

body_add_normal("xxx \n xxx")

https://davidgohel.github.io/officer/reference/run_linebreak.html

Beware,str_squish() removes \n.

Test mysteriously fails in crosstable()

Describe the bug
In short, basically came across the common "issue" with the fischer.test() in crosstable(). As has been discussed in other packages e.g. Lagkouvardos/Rhea#17), perhaps either a tryCatch that reruns by increasing the workspace argument in fischer.test? Of course not too hard to just do it manually outside of crosstable(), or to simply run a chisq.test(), but colleagues get seriously confused by the error, and it kind of presents and obstacle to automatization.

Reproducible example
Note, you may not experience this error, depending upon your hardware and setup.

library(crosstable)
structure(list(s_901_1 = structure(c(NA, 1L, 4L, 4L, 4L, 4L, 
                                     4L, 4L, 4L, 4L, 3L, 4L, 1L, 3L, 4L, NA, 3L, NA, 4L, 4L, NA, 4L, 
                                     4L, 4L, 4L, 4L, 4L, 3L, NA, 3L, 1L, 3L, 4L, 2L, 4L, NA, 3L, 3L, 
                                     2L, 4L, 3L, 4L, 3L, 3L, 3L, 4L, 3L, 4L, 3L, NA, NA, 3L, NA, 4L, 
                                     NA, NA, 4L, 4L, 4L, 4L, 4L, NA, NA, 3L, NA, NA, 4L, 4L, 3L, 3L, 
                                     NA, 4L, 4L, 3L, 3L, NA, 3L, 4L, NA, 4L, 4L, 3L, 4L, 3L, 4L, 3L, 
                                     NA, 3L, 4L, NA, 4L, 3L, NA, 3L, 3L, 3L, 3L, 3L, NA, 4L, 4L, 3L, 
                                     3L, 2L, 4L, NA, NA, NA, NA, NA, 3L, NA, 3L, 3L, 2L, 4L, 4L, NA, 
                                     2L, 3L), levels = c("a", "b", "c", 
                                                         "d"), 
                                   class = "factor"), 
               fylke_gs = structure(c(NA, 1L, 
                                      6L, 9L, 9L, 7L, 6L, 2L, 7L, 4L, 10L, 1L, 6L, 11L, 8L, NA, 4L, 
                                      NA, 6L, 11L, NA, 11L, 4L, 4L, 1L, 1L, 5L, 7L, NA, 2L, 4L, 5L, 
                                      6L, 7L, 9L, 5L, 5L, 7L, 6L, 6L, 7L, 10L, 10L, 1L, 1L, 10L, 5L, 
                                      11L, 2L, NA, NA, 1L, NA, 6L, NA, 8L, 2L, 9L, 5L, 10L, 3L, NA, 
                                      NA, 2L, NA, NA, 8L, 7L, 1L, 2L, NA, 9L, 5L, 9L, 9L, NA, 2L, 6L, 
                                      NA, 3L, 6L, 9L, 6L, 3L, 2L, 2L, 3L, 10L, 11L, NA, 7L, 3L, NA, 
                                      6L, 5L, 4L, 2L, 10L, NA, 6L, 5L, 2L, 6L, 3L, 2L, NA, NA, NA, 
                                      NA, NA, 9L, NA, 1L, 10L, 9L, 9L, 11L, NA, 6L, 9L), levels = letters[1:11], 
                                    class = "factor")), 
          row.names = c(NA, -120L), class = c("data.frame")) |>
  crosstable(s_901_1, by=fylke_gs, test = TRUE)
#> Warning in crosstable(structure(list(s_901_1 = structure(c(NA, 1L, 4L, 4L, : Be aware that automatic global testing should only be done in an exploratory
#> context, as it would cause extensive alpha inflation otherwise.
#> This warning is displayed once every 8 hours.
#> Error in fisher.test(x, y): FEXACT error 7(location). LDSTP=18630 is too small for this problem,
#>   (pastp=20.0336, ipn_0:=ipoin[itp=473]=7271, stp[ipn_0]=17.5124).
#> Increase workspace or consider using 'simulate.p.value=TRUE'

^{Created on 2022-11-10 with reprex v2.0.2}

Cannot activate library(crosstable)

Describe the bug
Cannot activate library(crosstable) version 6

Reproducible example
library(crosstable)
Avis : le package ‘crosstable’ a été compilé avec la version R 4.2.3 Registered S3 method overwritten by 'data.table': method from print.data.table Error: le chargement du package ou de l'espace de noms a échoué pour ‘crosstable’ : l'objet ‘fct_na_value_to_level’ n'est pas exporté par 'namespace:forcats'

| >

Session info

``` r #Paste here the output of `sessionInfo()` here ```

`as_flextable()` put bold lines in the wrong places

crosstable(iris) %>% af()

Dont show id if there is no label when `keep_id=TRUE`

mtcars2 %>% mutate(mpg=as.numeric(mpg), by=am) %>% crosstable(c(mpg, cyl)) %>% af(T, compact=TRUE)

Redundancy on mpg

Bad format for helper in ...

crosstable::crosstable(iris, Species, c(Sepal.Length, Sepal.Width)) |> invisible()
#> Warning: The `...` argument of `crosstable()` is deprecated as of crosstable 0.2.0.
#> i Please use the `cols` argument instead.
#> i Instead of `crosstable(iris, Species, c("c", "Sepal.Length", "Sepal.Width"),
#>   ...)`, write `crosstable(iris, c(Species, c("c", "Sepal.Length",
#>   "Sepal.Width")), ...)`
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

^{Created on 2023-08-02 with reprex v2.0.2}

add `body_add_section()`

Maybe also add a little normal paragraph?
Something like "Information about {tolower(title)} is described in Table @ref({bookmark})".

#' Add a section with a table and its legend
#'
#' @param doc a `rdocx` object
#' @param x a table: crosstable, flextable, or plain old dataframe
#' @param legend the legend to use
#' @param bookmark the bookmark to use. Default to the variable name of `x`
#' @param title the title to add for the section. Can also be `FALSE` (no title) or `TRUE` (the title defaults to `legend`)
#' @param title_lvl the title level if applicable
#' @param ... passed on to [body_add_flextable()] or [body_add_crosstable()] 
#'
#' @return The `docx` object `doc` 
#' @export
#'
#' @examples
body_add_table_section = function(
    doc, x, legend, bookmark=NULL, 
    title=getOption("crosstable_legend_title", TRUE), 
    title_lvl=getOption("crosstable_legend_title_level", 3), 
    sentence=FALSE,
    ...
){
  ctname = rlang::caller_arg(x)
  stopifnot(!is.null(x))
  stopifnot(inherits(x, c("data.frame", "flextable", "crosstable")))
  if(is.null(bookmark)) bookmark = ctname
  
  if(!is.null(title) && !isFALSE(title)){
    if(isTRUE(title)) title = legend
    doc = body_add_title(doc, title, title_lvl)
  }
  if(isTRUE(sentence)){
    doc = body_add_normal(doc, "Information about {tolowser(title)} is described in Table @ref({bookmark})")
  }
  if(inherits(x, "crosstable")){
    doc = body_add_crosstable(doc, x, ...)
  } else {
    if(!inherits(x, "flextable")) x = qflextable(x)
    doc = body_add_flextable(doc, x, ...)
  }
  doc = body_add_table_legend(doc, legend=legend, bookmark=bookmark)
  doc
}

Fails with no error if crossing by a column containing only "label"

library(dplyr)
library(crosstable)
mtcars2 %>% mutate(x="label") %>% crosstable(am, by=x)
#> # A tibble: 2 x 3
#>   .id   label        variable
#>   <chr> <chr>        <chr>   
#> 1 am    Transmission auto    
#> 2 am    Transmission manual
mtcars2 %>% mutate(x="lab") %>% crosstable(am, by=x)
#> # A tibble: 2 x 4
#>   .id   label        variable lab        
#>   <chr> <chr>        <chr>    <chr>      
#> 1 am    Transmission auto     19 (59.38%)
#> 2 am    Transmission manual   13 (40.62%)

^{Created on 2024-03-28 with reprex v2.1.0}

x = mtcars2 %>% 
  mutate(group="label", group2=ifelse(row_number()/n()>0.5, "label", "foobar"))

x %>% crosstable(am, by=group) %>% af
x %>% crosstable(am, by=group2) %>% af

Better error message in `as_flextable()` when crosstable is empty

library(crosstable)
crosstable(iris, 0) %>% af
#> Warning in crosstable(iris, 0): Variable selection in crosstable ended with no
#> variable to describe
#> Error in has_test && !is.null(x[[test]]): invalid 'x' type in 'x && y'

^{Created on 2023-07-05 with reprex v2.0.2}

table autofitting: documentation is wrong

https://danchaltiel.github.io/crosstable/articles/crosstable-report.html#autofit-macro-for-large-tables

This is great, but large tables will unfortunately overflow your document.

This is a (known) limitation that cannot be fixed using R.

This is wrong, computed fields have nothing to do with your point. I find this curious, especially as the link doesn't take you to a theme related to tables and their dimensions at all.

Furthermore, the documentation of generate_autofit_macro() suggests that this functionality does not exist in flextable. I think it would be more clear for the user to say your function reproduce set_table_properties(layout = "autofit"). From our discussion davidgohel/flextable#620 (comment), I know you don't read the manuals, so I think it's better if I give you the correct names of functions.

Replace forcats::fct_explicit_na() with forcats::fct_na_value_to_level()

With the latest forcats 1.0, there is a lifecycle note that forcats::fct_explicit_na() has been deprecated in favour of the new forcats::fct_na_value_to_level().
No big deal yet, works fine for now.

formats don't apply in `body_add_legend()`

read_docx() %>%
  body_add_table_legend("A *crosstable* of the `iris` dataset") %>%
  write_and_open()

improve body_add_parsed()

body_add_normal(read_docx(), "`du code` **en gras et** **  *italique* **: ") %>% write_and_open() #works but awkward
body_add_normal(read_docx(), "**`du code` en gras et *italique* **: ") %>% write_and_open() #doesnt work

implement cbind.crosstable()

left_join() seems to work but as_flextable() needs to look better.

Check that cols is the same on both flextables.

add a syntax in by maybe? It would be confusing with by=c(vs, am), maybe by=list(vs, am) ? or a custom wrapper?

library(tidyverse)
library(crosstable)

ct1 = crosstable(mtcars, cyl, by=vs)
ct2 = crosstable(mtcars, cyl, by=am)

ct = left_join(ct1, ct2, by=c(".id", "label", "variable"), 
               suffix=c("_vs", "_am"))
ct
#> # A tibble: 3 × 7
#>   .id   label variable `0_vs`       `1_vs`      `0_am`      `1_am`    
#>   <chr> <chr> <chr>    <chr>        <chr>       <chr>       <chr>     
#> 1 cyl   cyl   4        1 (9.09%)    10 (90.91%) 3 (27.27%)  8 (72.73%)
#> 2 cyl   cyl   6        3 (42.86%)   4 (57.14%)  4 (57.14%)  3 (42.86%)
#> 3 cyl   cyl   8        14 (100.00%) 0 (0%)      12 (85.71%) 2 (14.29%)

as_flextable(ct)

^{Created on 2022-06-16 by the reprex package (v2.0.1)}

Source: https://stackoverflow.com/questions/70841416/creating-crosstable-with-multiple-variables-summarized-by-row-categories/72641518#72641518

danchaltiel / crosstable Goto Github PK

crosstable's Introduction

crosstable

Installation

Overview

Example #1

Example #2

Documentation

Getting help and giving feedback

Acknowledgement

crosstable's People

Contributors

Stargazers

Watchers

Forkers

crosstable's Issues

Recommend Projects

Recommend Topics

Recommend Org