kaz-yos / tableone Goto Github PK

View Code? Open in Web Editor NEW

212.0 19.0 41.0 5.64 MB

R package to create "Table 1", description of baseline characteristics with or without propensity score weighting

Home Page: https://cran.r-project.org/web/packages/tableone/index.html

Makefile 1.02% R 98.98%

statistics descriptive-statistics r baseline-characteristics cran

tableone's Introduction

tableone

An R package to create “Table 1”, description of baseline characteristics

Creates “Table 1”, i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the survey package.

tableone was inspired by descriptive statistics functions in Deducer , a Java-based GUI package by Ian Fellows. This package does not require GUI or Java, and intended for command-line users.

tableone in action

The code being executed can be found in the introduction vignette.

tableone code example

In this table, continuous and categorical variables can be placed in any order. The p-valeus are from exact tests for pre-specified variables. For nonnormal variables, it shows median and IQR instead of mean and SD, and p-values are from nonparametric tests. Numerically coded categorical variables can be transformed on the fly with factorVars. SMD stands for standardized mean differences. For weighted data, first created a svydesign object, and use the svyCreateTableOne() function. Most other options remain the same.

## Load package
library(tableone)
## Load data
data(pbc, package = "survival")
# drop ID from variable list
vars <- names(pbc)[-1]
## Create Table 1 stratified by trt (can add more stratifying variables)
tableOne <- CreateTableOne(vars = vars, strata = c("trt"), data = pbc,
                            factorVars = c("status","edema","stage"))
## Specifying nonnormal variables will show the variables appropriately,
## and show nonparametric test p-values. Specify variables in the exact
## argument to obtain the exact test p-values.
print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"),
      exact = c("status","stage"), smd = TRUE,
      formatOptions = list(big.mark = ","))

##                          Stratified by trt
##                           1                           2                           p      test    SMD   
##   n                            158                         154                                         
##   time (mean (SD))        2,015.62 (1,094.12)         1,996.86 (1,155.93)          0.883          0.017
##   status (%)                                                                       0.884 exact    0.054
##      0                          83 (52.5)                   85 (55.2)                                  
##      1                          10 ( 6.3)                    9 ( 5.8)                                  
##      2                          65 (41.1)                   60 (39.0)                                  
##   trt (mean (SD))             1.00 (0.00)                 2.00 (0.00)             <0.001            Inf
##   age (mean (SD))            51.42 (11.01)               48.58 (9.96)              0.018          0.270
##   sex = f (%)                  137 (86.7)                  139 (90.3)              0.421          0.111
##   ascites (mean (SD))         0.09 (0.29)                 0.06 (0.25)              0.434          0.089
##   hepato (mean (SD))          0.46 (0.50)                 0.56 (0.50)              0.069          0.206
##   spiders (mean (SD))         0.28 (0.45)                 0.29 (0.46)              0.886          0.016
##   edema (%)                                                                        0.877          0.058
##      0                         132 (83.5)                  131 (85.1)                                  
##      0.5                        16 (10.1)                   13 ( 8.4)                                  
##      1                          10 ( 6.3)                   10 ( 6.5)                                  
##   bili (median [IQR])         1.40 [0.80, 3.20]           1.30 [0.72, 3.60]        0.842 nonnorm  0.171
##   chol (median [IQR])       315.50 [247.75, 417.00]     303.50 [254.25, 377.00]    0.544 nonnorm  0.038
##   albumin (mean (SD))         3.52 (0.44)                 3.52 (0.40)              0.874          0.018
##   copper (median [IQR])      73.00 [40.00, 121.00]       73.00 [43.00, 139.00]     0.717 nonnorm <0.001
##   alk.phos (median [IQR]) 1,214.50 [840.75, 2,028.00] 1,283.00 [922.50, 1,949.75]  0.812 nonnorm  0.037
##   ast (mean (SD))           120.21 (54.52)              124.97 (58.93)             0.460          0.084
##   trig (median [IQR])       106.00 [84.50, 146.00]      113.00 [84.50, 155.00]     0.370 nonnorm  0.017
##   platelet (mean (SD))      258.75 (100.32)             265.20 (90.73)             0.555          0.067
##   protime (mean (SD))        10.65 (0.85)                10.80 (1.14)              0.197          0.146
##   stage (%)                                                                        0.205 exact    0.246
##      1                          12 ( 7.6)                    4 ( 2.6)                                  
##      2                          35 (22.2)                   32 (20.8)                                  
##      3                          56 (35.4)                   64 (41.6)                                  
##      4                          55 (34.8)                   54 (35.1)

Installation

This version of tableone package for R is developmetal, and may not be available from the CRAN. You can install it using one of the following way.

Direct installation from github

You first need to install the devtools package to do the following. You can choose from the latest stable version and the latest development version.

## Install devtools (if you do not have it already)
install.packages("devtools")
## Install directly from github (develop branch)
devtools::install_github(repo = "kaz-yos/tableone", ref = "develop")

Using devtools may requires some preparation, please see the following link for information.

https://www.rstudio.com/projects/devtools/

Contributors

I would like to thank all the contributors!

Alexander Bartel ndevln
Jonathan J Chipman chipmanj
Justin Bohn jmb01
Lucy D’Agostino McGowan LucyMcGowan
Malcolm Barrett malcolmbarrett
Rune Haubo B Christensen runehaubo
gbouzill

Similar or complementary projects

There are multiple similar or complementary projects of interest.

DescTools: Tools for Descriptive Statistics. https://cran.r-project.org/web/packages/DescTools/index.html
Gmisc: Descriptive Statistics, Transition Plots, and More. https://cran.r-project.org/web/packages/Gmisc/
Hmisc (summary.formula): Advanced table making and many more. https://github.com/harrelfe/Hmisc/
arsenal: An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries. https://github.com/eheinzen/arsenal
atable: Create Tables for Reporting Clinical Trials. https://github.com/arminstroebel/atable
compareGroups: Descriptive Analysis by Groups. http://www.comparegroups.eu
expss: Tables with Labels and Some Useful Functions from Spreadsheets and ‘SPSS’ Statistics. https://github.com/gdemin/expss
finalfit: Quickly Create Elegant Regression Results Tables and Plots when Modelling. https://finalfit.org/index.html
framework for easily create tables for reporting: framework for easily create tables for reporting. https://davidgohel.github.io/flextable/
furniture: Furniture for Quantitative Scientists. https://cran.r-project.org/web/packages/furniture/
gtsummary: Presentation-Ready Data Summary and Analytic Result Tables. https://CRAN.R-project.org/package=gtsummary
htmlTable: An R package for generating advanced tables. https://github.com/gforge/htmlTable
kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. https://github.com/haozhu233/kableExtra
pander: An R Pandoc Writer. https://rapporter.github.io/pander/
pixiedust: Format models for console and to markdown, HTML, and LaTeX. https://github.com/nutterb/pixiedust
qwraps2: quickly placing data summaries and formatted regression results into .Rnw or .Rmd files. https://github.com/dewittpe/qwraps2/
stargazer: Well-Formatted Regression and Summary Statistics Tables. https://cran.r-project.org/web/packages/stargazer/index.html
tab: Functions for Creating Summary Tables for Statistical Reports. https://cran.r-project.org/package=tab
table1: Tables of Descriptive Statistics in HTML. https://github.com/benjaminrich/table1
table1xls: Exports Reproducible Summary Tables to Multi-Tab Spreadsheet Files. https://cran.r-project.org/web/packages/table1xls/index.html
xtable: Export Tables to LaTeX or HTML. https://cran.r-project.org/web/packages/xtable/index.html
(Python) tableone: Create “Table 1” for research papers in Python. https://github.com/tompollard/tableone

tableone's People

Contributors

Stargazers

Watchers

Forkers

jmg1232 libardo1 neerajnj10 dkillian mkim0710 lucymcgowan medewitt runehaubo aliyasarmanova gbouzill dataeducation chipmanj lsuiyanka guhjy xiaosongz nerissanance malcolmbarrett pkuepi sgibb jminnier ghowoo ndevln hmutanqilong boshi713 clinicopath jhuanglabtools gongliangz watsonwoo yadevi erictleung biostata zwael tjebo dryzliang tszberkowitz zimolzak kingsman-key tdvle ashlynrei michaelchirico youngbee12

tableone's Issues

Vertically align digits on the decimal point?

Hi Mr. Yoshida,

Thank you and your colleagues for creating a great resource for all of us to benefit from! My issue is not so much an issue as a functionality question. Is there was a way to vertically align digits on the decimal point using the print(CreateTableOne) function? I get that sending print(CreateTableOne) creates a matrix object, but I'm not sure how to align those digits in R. Any advice or suggestions would be greatly appreciated.

repro<-structure(c("1000", "2000.00 (0.00)", "1000.00 (00.00)", "600", 
"700.00 (0.00)", "50.72 (50.00)"), .Dim = 3:2, .Dimnames = list(
    c("n", "unique_n", "Age"), `Stratified by Something` = c("0", 
    "1")))

         Stratified by Something
           0                 1              
  n        "1000"            "600"          
  unique_n "2000.00 (0.00)"  "700.00 (0.00)"
  Age      "1000.00 (00.00)" "50.72 (50.00)"

Categorical variables omitted from output

I cant seem to get the categorical variables into the CreateTableone output: any help will be appreciated. You can see below that the dataset contains categorical as well as continuous variables.

https://www.dropbox.com/s/qbw9viacsx774q4/temp3.csv?dl=0 ## link to the dataset
https://www.dropbox.com/s/hi3sh82noslktpd/table%20one%20example.Rmd?dl=0

link to the rmd file containing code

setwd("/Dropbox/tableone example") ## set wd
temp3 <- read.csv("/Dropbox/tableone example/temp3.csv")
head(temp3)

Here is the code for making table One.

library(tableone)

factorVars <-c("tachy", "fever", "hypotherm", "luekocytosis", "leukopenia", "tachypnea", "sot", "immunosuppresive", "unequivocal_pos", "neutropenic")

set factor variables

vars<- c("max_wbc", "procal_corr", "max_lact", "max_temp", "max_rr",
"max_hr", "min_temp", "max_anc", "min_sbp","sirs_score")

set continous variables

tableOne <- CreateTableOne(vars = vars, strata = "unequivocal_pos", data = temp3, factorVars = (factorVars)) #

tableOne

Printing without = ... after factor variable

Thank you for the great package!

I was wondering whether it is possible not to report which level in the factor is used. For example (in bold):

Female = Female (%) 16 (37.2) 15 (36.6) 1.00

Thanks in advance!

Leading white spaces

I use the CreateTableOne together with the kable function from the knitr package. Unfortenately, leading spaces are then lost (unconditional on the noSpace argument to the print method).

I can solve this by the following (note: gsub(" ", " ", rownames(.), fixed = TRUE)):

tableone::CreateTableOne(
  vars             = v_all,  
  strata           = "hip",  
  data             = d_paired,
  factorVars       = c(v_eq, "ingtyp_grov", "diagnos"),
  includeNA        = TRUE
) %>% 
print(
  printToggle      = FALSE,
  showAllLevels    = TRUE,
  cramVars         = "kon"
) %>% 
{data.frame(
  what             = gsub(" ", "&nbsp;", rownames(.), fixed = TRUE), ., 
  row.names        = NULL, 
  check.names      = FALSE, 
  stringsAsFactors = FALSE)} %>% 
knitr::kable()

But mayby this step could somehow be included in print.TableOne (at least if so wished through an argument to the function)?

Covariates missing impact outputs ?

Hello
I've found some differences between 2 adjusted tables output, and i was a little surprised because it was supposed to be the same table, but with some more covariates.

I searched and found that adjusted table, can varry in function of covariates specified, and more particularly if added covariates has missing values (NA)

Eg.
Step 1 : just 1 covariate

                         1             2             3             p      test SMD   
  n                      152.77        155.31        154.70                          
  glycemie_g (mean (sd))   0.97 (0.22)   1.07 (0.37)   1.03 (0.29)  0.001       0.223

Step 2 : adding age (that has no NA)

  n                      152.77         155.31         154.70                           
  glycemie_g (mean (sd))   0.97 (0.22)    1.07 (0.37)    1.03 (0.29)   0.001       0.223
  age (mean (sd))         41.30 (11.88)  41.23 (11.46)  41.31 (10.53)  0.983       0.010

==> glycemie values are identical

Step 3 : adding imc_18, that has several NA

                         1              2              3              p      test SMD   
  n                      152.77         155.31         154.70                           
  glycemie_g (mean (sd))   0.97 (0.22)  **1.09 (0.39)**  1.03 (0.28)   0.001       0.223
  age (mean (sd))         41.10 (11.76)  41.27 (11.57)  40.60 (10.41)  0.983       0.010
  imc_18 (mean (sd))      30.01 (6.72)   28.67 (5.55)   24.12 (3.85)  <0.001       0.750

==> glycemie (and age) values have changed

I found this package very usefull to fastly compute big tables in a descent format, but i'am a bit suprised about this issue when using the adjusted table. Maybe i've forgotten something !?

EDIT : i'am not sure to mention well what i've done : i did a weighted adjustment (using AddMwToData function and svydesign) and the issue is on the adjusted table.

How to change the chisq.test for categorical data with no Continuity correction method?

tableone for matched case-control study

Thank you for this helpful package.

Since my data comes from a 1:2 matched case-control study, the available statistical methods in tableone are not appropriate. I want to use a linear mixed model (nlme) for continuous variables and conditional logistic regression (clogit) for categorical variables. I've tried looking into the separate functions (https://github.com/kaz-yos/tableone/tree/master/R) to see how I can use these methods for tableone, but I can't think of any way how to do this. Any advise on this would be greatly appreciated.

Thank you in advance for your response.

Table formatting issue when showAllLevels = TRUE

Thanks so much for making this awesome package! When showAllLevels = FALSE (the default), print(table) returns a beautifully formatted table where categorical variables are separated nicely like this (if there are at least 3 levels within that categorical variable):

**Variable**                   **Overall**

3-Level Category                  
    Sub-category 1                  10
    Sub-category 2                  20
    Sub-category 3                  15
2-Level Category                  
    Sub-category 2                  25

If there are only 2 levels within the categorical variable, only 1 line is shown to avoid displaying redundant information. I'd still like to display both levels for 2-level categorical variables, and the solution to that seems to be to set 'showAllLevels' = TRUE.

However, when I do this, the formatting changes to add a 3rd column, and removes the nice indents, like this:

**Variable**             **Level**              **Overall**

3-Level Category         Sub-category 1            10
                         Sub-category 2            20
                         Sub-category 3            15
2-Level Category         Sub-category 1            30     
                         Sub-category 2            25

Is there any way you could add a function to maintain the same 2-column formatting that currently exists by default, but to include all levels? Ideally it would look exactly like the 1st example, but would also include both levels of a 2-level categorical variable, like this:

**Variable**                   **Overall**

3-Level Category                  
    Sub-category 1                  10
    Sub-category 2                  20
    Sub-category 3                  15
2-Level Category
    Sub-category 1                  30
    Sub-category 2                  25

Thanks!
David

format options

Are there any possibilities to format the numerical output?
e.g. by passing options to the function format as big.mark, small.mark, decimal.mark, ...

As far as I understand, most formatting is done with sprintf(?) which does give an straightforward option to format numbers differently...

Support for svyrepdesign

It would be great to allow for making tables based on designs with replication weights. So dat the function ::StopIfNotSurveyDesign prevents from creating a table. The design object with replication weights is of class: "svyrep.design"

TableOne treats expresses my continous variables as categorical

Once I define some factorVars, all myVars are treated as categorical in the table. I have checked, and they are continous in the dataframe. I attach code and result. Thank you very much.
tableone categorical continous variables trouble.docx

[Enhancement] Ability to count by id column

One issue I run into in creating various table 1's is if a subject can select multiple categories the counts are off.

Example say I have 2 subjects, but they are allowed to select multiple race's.

We typically report the total selected (in this case 0,1, or 2) for each race and the total within the race category may not sum to the total overall N (in this case 2). I don't think there is currently a way around this in CreateTableOne. I'd be willing to submit a pull request if this type of enhancement would be accepted.

library(tidyverse)
library(tableone)

dat <- tibble(id = c("A", "B", "B"),
              race = c("W", "W", "B"))

t2 <- CreateTableOne(data = dat, vars = c("race"))

print(t2, showAllLevels = TRUE)

gives

           level Overall  
  n              3        
  race (%) B     1 (33.3) 
           W     2 (66.7)

What I would like is to be able to say count by unique id and get something more like

           level Overall  
  n              2        
  race (%) B     1 (50.0) 
           W     2 (100)

thoughts?

Is there a way tableone can provide row totals?

Huge fan of tableone. This is my first time posting on github. Is there a way for tableone to take a continuous variable and calculate its total rather than its mean?

For example I have a dataset where rows are different pharmacies and one column is units sold of a drug. I want tableone to show the total units sold across all pharmacies rather than showing the mean number of units sold at each pharmacy.

customize table title

The default display title is 'level Overall'. How can I change this?
For example, If I want to remove the 'level' and change the 'overall' to something else?

p-value NaN

Let's say I have some data:

data <- data.frame(
  a = factor(rep(1:2, 2), levels = 1:3), 
  b = rep(1:2, each = 2))

Hence, a has three levels but we did´t actually observe any elements of level 3.

Then:

tableone::CreateTableOne(
  vars             = "a",  
  strata           = "b",
  data             = data,
  factorVars       = "a"
) %>% 
  print(
    showAllLevels  = TRUE,
)

gives:

       Stratified by b
        level 1         2         p      test
  n           2         2                    
  a (%) 1     1 (50.0)  1 (50.0)   1.000     
        2     1 (50.0)  1 (50.0)

hence, a p-value is given but the third level is dropped (as according to #15 ).

If I do not specify `factorVars = "a"`` such as:

tableone::CreateTableOne(
  vars             = "a",  
  strata           = "b",
  data             = data,
) %>% 
  print(
    showAllLevels  = TRUE,
  )

I get:

       Stratified by b
        level 1         2         p    test
  n           2         2                  
  a (%) 1     1 (50.0)  1 (50.0)   NaN     
        2     1 (50.0)  1 (50.0)           
        3     0 ( 0.0)  0 ( 0.0)

hence, data for all levels but p-value = NaN.

According to the man page, kruskal.test is used in this situation but if I call kruskat.test manually (kruskal.test(a ~ b, data))I do get a valid p-value (= 1, the same as in the first case above).

I therefore assume it would be possible to give this p-value also when all levels are presented? And to to so would be quite nice!

Display order of Binary variables

Hi,

Love this library and use a lot on my work.
I have a question for printing tableone output. For binary categorical variable, is there an
argument to choose which label to use? e.g. In surgeon specialty, I have Thoracic or Other
value in the variable. Instead of printing SPECIALTY = Other (%), can I manually choose to display
SPECIALTY = Thoracic (%)? I don't want to use showAllLevels mode because it would also display too many YES/NO variables that made my tableone unnecessarily long.

Thanks in advance

Difference between summary(tab1) and print(tab1)

print(tab1, etc.)

the output is:

_ Level Non User User p test SMD

Continuous Variable (mean(sd)) 0.42 (0.09) 0.44 (0.09) <0.001 0.169

_	Level	Non User	User	p	test	SMD
Continuous Variable (mean(sd))		0.42 (0.09)	0.44 (0.09)	<0.001		0.169

But when I use

summary(tab1)

the output is:

_ n miss p.miss mean sd median p25 p75 min max skew kurt

Continuous Variable 15180 0 0 4e-01 9e-02 4e-01 4e-01 5e-01 0.2 5e-01 -0.8 0.4

_	n	miss	p.miss	mean	sd	median	p25	p75	min	max	skew	kurt
Continuous Variable	15180	0	0	4e-01	9e-02	4e-01	4e-01	5e-01	0.2	5e-01	-0.8	0.4

What does "4e-01" even mean, and why is it displayed like that in summary() compared to print()?

R crashes when using strata option

Hi there,

I've noticed that R will sometimes crash when using the strata option in the CreateTableOne() function. I've been unable to replicate the issue consistently. The problem seems to come and go depending on the number of variables included. I'm not sure if it will help but below is the problem signature provided by R. This issue occurs both in base R and R Studio. I've tried re-installing all packages to their latest version. Any thoughts on what could be the source of this issue?

Problem signature:
Problem Event Name: APPCRASH
Application Name: rsession.exe
Application Version: 1.1.447.0
Application Timestamp: 5ad67649
Fault Module Name: stats.dll
Fault Module Version: 3.41.7329.0
Fault Module Timestamp: 59563b37
Exception Code: c0000005
Exception Offset: 00000000000045d3
OS Version: 6.3.9600.2.0.0.16.7
Locale ID: 1033
Additional Information 1: 1d45
Additional Information 2: 1d4536b9e7e18a05ea8687f97e5e4e03
Additional Information 3: 46da
Additional Information 4: 46da746a32162c90f7b8cd6fcb5e1f64

Is there a way to change the mean/sd to median and range in CreateTableOne() ?

CreateTableOne() gives me mean/sd but we use median more often. I kind of know how to change it in CreateContTable, but I didn't find the funcNanmes argument in the CreateTableOne().

using tableone within jamovi module

Dear @kaz-yos

Previously I tried to contact with email.
I have built a https://github.com/jamovi/jamovi module to ease my friends research. For the descriptive statistics part, I have used some tableone functions as well.

Since the module has many dependencies its size is big that is why it is not currently included in the jamovi library. It can be installed install using side-load.

I would like to thank for your package. I hope it will be more commonly used by non-R users as well.

Best wishes

Details: https://sbalci.github.io/ClinicoPathJamoviModule/
windows: https://library.jamovi.org/win64/R3.6.1/ClinicoPath-0.0.1.jmo
macOS: https://library.jamovi.org/macos/R3.6.1/ClinicoPath-0.0.1.jmo
https://sbalci.github.io/ClinicoPathJamoviModule/reference/figures/jamovi-sideload.gif

Total column

Great package- any chance you could include a addTotal=TRUE option for the grand total across strata?

Nonnormal argument missing, and multiple group summary problem

Hi, I'm following the excellent intro to tableone in RStudio (1.1.447) and R version 3.5.0 but hitting a few errors:

Summarizing nonnormal variables: "Let’s do it with the nonnormal argument to the print() method". My code:

print(tab2, showAllLevels = TRUE, nonnormal = "notnormalVars")

The nonnormal argument doesn't seem to exist. The nonnormal argument is not even suggested in the command prompt pop ups. All of my non-normal variables are still being summarised with mean and sd (i.e. not median and IQR). Is base print() meant here? Feels like I'm missing a package install.

Multiple group summary. I'm grouping cases and want to summarise by group. My group is a binary int variable (0 or 1) called "known_software_client" (I summarise this as a factor in catVars(), but I don't think that's relevant for my problem). When I use this as a strata...

tab3 <- CreateTableOne(vars = myVars, strata = "known_software_client", data = ch_d, factorVars = catVars)

I get an error:

Error in parse(text = x, keep.source = FALSE) : <text>:1:12: unexpected symbol 1: ~ Provider Ownership ^
Not really sure what's going on here. Is this because there is a space in the variable "Provider Ownership" in my myVars list? When I include `` around the variable name (i.e. "Provider Ownership" it then doesn't even recognise the variable in my dataset. Besides, when I use

tab2<- CreateTableOne(vars = myVars, data = ch_d, factorVars = catVars)

There is no problem. The problem only seems to occur when I use strata.

Think these are two separate issues but just including them in one message. Any help welcome! Thanks and great package, i've been looking for something like this for MONTHS!!

CreateTableOne() don't accept ordered factor

First, thank you for this helpful package in medical research.

CreateTableOne() function is not able to use ordered factor, although there is no problem with factor.

Example :

# Create a data frame
nsubjects = 100
example_df <- data.frame(sex = factor(x = rbinom(n = nsubjects, size = 1, prob = 0.45), 
    labels = c("male", "female")), weight = round(rnorm(nsubjects, mean = 70, 
    sd = 9), digits = 1), study_level = factor(x = rbinom(nsubjects, 2, 0.3), 
    levels = 0:2, labels = c("primary", "secondary", "superior"), ordered = T), 
    city = factor(x = rbinom(n = nsubjects, size = 2, prob = 0.3), labels = c("Paris", 
        "London", "New-York")))

head(example_df)

##      sex weight study_level   city
## 1 female   91.8     primary London
## 2   male   65.5     primary  Paris
## 3   male   77.2     primary  Paris
## 4 female   67.2    superior London
## 5   male   75.8     primary  Paris
## 6 female   62.4     primary London

str(example_df)

## 'data.frame':    100 obs. of  4 variables:
##  $ sex        : Factor w/ 2 levels "male","female": 2 1 1 2 1 2 2 2 2 1 ...
##  $ weight     : num  91.8 65.5 77.2 67.2 75.8 62.4 76.8 67.6 57.2 74 ...
##  $ study_level: Ord.factor w/ 3 levels "primary"<"secondary"<..: 1 1 1 3 1 1 1 3 2 1 ...
##  $ city       : Factor w/ 3 levels "Paris","London",..: 2 1 1 2 1 2 3 2 3 1 ...

When I try to use CreateTableOne with this data.frame containing an ordered factor, I have an error

library(tableone)
CreateTableOne(vars = c("weight", "study_level", "city"), strata = "sex", data = example_df)

## Warning: Dropping variable(s) study_level due to unsupported class.

##                     Stratified by sex
##                      male          female        p      test
##   n                  43            57                       
##   weight (mean (sd)) 69.55 (9.09)  67.44 (9.98)   0.279     
##   city (%)                                        0.001     
##      Paris              32 (74.4)     21 (36.8)             
##      London              9 (20.9)     28 (49.1)             
##      New-York            2 ( 4.7)      8 (14.0)

In CreateTableOne(vars = c("weight", "study_level", "city"), strata = "sex",  :
  Dropping variable(s) study_level due to unsupported class.

It's because orderer factor have to classes

class(example_df$study_level)

## [1] "ordered" "factor"

ordered and factor.

Maybe, it could work if you to add ordered in line 122 of CreateTableOne.R

varFactors  <- names(varClasses[varClasses == "factor"  | varClasses == "logical" | varClasses == "character" | varClasses == "ordered"])

createTableOne, factorVars does not work on variables that are already factors

Great package! Unfortunately, I am not getting the factorVars parameter to work properly and I get the error message below:

NOTE: no factor/logical/character variables supplied, using CreateContTable()

I have tried passing factorVars as a character vector. I have tried passing the column in my data frame as both a factor and an integer, but I get the same error message.

Thanks for your help!

Label for Variable Name

Is it possible to show labels instead of variable names in the output? For example, show "Age in Years (median [IQR]) " instead of show "age_in_years (median [IQR]) ". Thanks.

Support for more output formats?

Possible when printing a tableone object to include median as well as mean (sd) in the output?

Something like mean +- sd (median)?

option to print.TableOne (or CreateTableOne?) that allows comma-savvy printing

Though perhaps in the future (#22) number and percent (or smd, etc.) columns might be separated, in any case it'd be great to be able to flag to use comma values (e.g. "9,992 (50.2)") for printing. As you mentioned in #22, dplyr separation on "(", then formatting the as.numeric cells might be a workaround... but this seems like useful functionality if you're going to excel or word with it.

Warning message about dropped variable paste the names

When using CreateTableOne() on unsupported classes (e.g. : Dates), names of these variables are pasted in the warning message. Example :

In CreateTableOne(vars = c("CAT1", "CAT2", "CA", "SADMDTE", "DSCHDTE",  :
  Dropping variable(s) SADMDTEDSCHDTEDTHDTELSTCTDTE due to unsupported class.

A possible solution is to to use paste with sep = "," (but don't seem to work as I tried to implement it).

 warning("Dropping variable(s) ", paste(varDrop, sep = ", "), 
            " due to unsupported class.\n")

Numbers with comma at thousands

Is there any way to have the table include a comma at the thousands place. Current output is this
alk.phos (mean (SD)) "1982.66 (2140.39)"

Is there a way to get this instead? alk.phos (mean (SD)) "1,982.66 (2,140.39)"

Option to separate n from (%) col

Hi there, big fan like many others.

Would you consider an option for print() (particularly useful for those formatting in Excel, I imagine) to separate the n and (%) columns? That way excel multi column alignment would be a bit easier. As is I write Excel functions to split the combo column up, but it's tedious each time. Thoughts?

Linear-by-linear association comparing > 2 groups of categorical variables

TableOne is such a great package! Thank you so much for making our life so much easier.
I would like to compare the association of a categorical variable between 3 groups (a specific blood test divided into tertiles).

How does TableOne compare categorical variables across > 2 groups?
Is it possible to compare 3 groups analyzing a linear-by-linear association ("p for trend")?

Thank you again.

Clarify help text for argument "factorVars"

It says for argument factorVars ?CreateTableOne that:

Numerically coded variables that should be handled as categorical variables given as a character  
vector. If omitted, only factors are considered categorical variables. If all categorical variables in the
dataset are already factors, this option is not necessary. The variables specified here must also be 
specified in the vars argument.

To me If all categorical variables in the dataset are already factors, this option is not necessary, sounds like a redundant specification of these variables have no effect, when it in reality means that such factor variables are relevelled (unused levels are dropped).

Mayby this could be stated more clearly?
Mayby an additional message when a factor variable is relevelled this way could also be helpful?

two overall

Hi everyone,

I am trying to create a table with measurements from two different regions. I managed to divide it in the two groups and I need to add an overall column at the end of each region but I haven't been able to do it. Does somebody have any ideas on how I can do it? I divided my results in two groups and I need the overall for each group (Cont and GRM).

labels <- list(
  variables = list(HGT = "Total Hg (ww)",
                       Nitro15 = "Nitrogen 15",
                       Carb13 = "Carbon 13",
                       Length = "Standard Length"), 
  groups=list("CONT","GMR"))
levels(tunahg2$ZONE) <- c("CONTCentral","CONTSouth", "GMRCentral", "GMRNorth", "GMRSouth", "GMRWest")

strata <- c (split(tunahg2, tunahg2$ZONE),list(Overall=tunahg2))
my.render.cont <- function(x){
     with(stats.apply.rounding(stats.default(x), digits = 1), c("", "Mean (SD)", "Median [Min, Max]" = sprintf("%s (&plusms; %s)", MEAN, SD,MEDIAN, MIN, MAX)))
   }
my.render.cat <- function(x) {
      c("", sapply(stats.default(x), function(y) with(y,
                                                        +         sprintf("%d (%0.0f %%)", FREQ, PCT))))
   }
table1(strata, labels, groupspan = c(2,4), render.continuous.default = my.render.cont, render.categorical.default = my.render.cat)

Output of summary.TableOne as data.frame?

Thank you for such excellent work.

Simple question: how can we save summary.TableOne output as DF?
It doesn't seem to work the same as the print.TableOne wrt as.data.frame.

Any help/ideas are very much appreciated.

Facilitating latex output of tableone object

Hi. Tableone is great package. But it however requires copy pasting in excel. Is it possible to enable latex output such as seen with "xtable" package? Thanks in advance

svyCreateTableOne -- request for additional feature

Thanks for this helpful package.

With regard to the svyCreateTableOne It would be nice to be able to set a population total to normalize to (like Ntotal in svytable) and show the SE that goes with it.

CreateTableOne drops unused factor levels

It seams CreateTableOne relevels a data frame with factor variables and drops unused levels.

library(dplyr)
library(tableone)

tribble(
  ~Group, ~value,
  #------|-----
  "G1"   , "Level 1" ,
  "G1"   , "Level 2" ,
  "G1"   , "Level 2" ,
  "G1"   , "Level 3" ,
  "G2"   , "Level 3" ,
  "G2"   , "Level 3" 
)%>%
  mutate(value = as.factor(value)) -> mydf

t1 <- CreateTableOne(vars = "value", factorVars = "value", strata = "Group", data = mydf)
print(t1, showAllLevels = TRUE)

This gives me what I'd expect

           Stratified by Group
            level   G1        G2         p      test
  n                 4         2                     
  value (%) Level 1 1 (25.0)  0 (  0.0)   0.223     
            Level 2 2 (50.0)  0 (  0.0)             
            Level 3 1 (25.0)  2 (100.0)

Now if I subset out the first row of my data frame

x <- mydf[-1, ]

str(x)

I still have 3 levels as expected

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	5 obs. of  2 variables:
 $ Group: chr  "G1" "G1" "G1" "G2" ...
 $ value: Factor w/ 3 levels "Level 1","Level 2",..: 2 2 3 3 3

t2 <- CreateTableOne(vars = "value", factorVars = "value", strata = "Group", data = x)
print(t2, showAllLevels = TRUE)

But now I'm only shown 2 of my 3 levels. I would have expected Level 1 to be in the output but 0 (0.) for both groups.

           Stratified by Group
            level   G1        G2         p      test
  n                 3         2                     
  value (%) Level 2 2 (66.7)  0 (  0.0)   0.576     
            Level 3 1 (33.3)  2 (100.0)

Looking at the structure of the tableone object it appears the data is releveled.

> str(t2)
List of 3
 $ ContTable: NULL
 $ CatTable :List of 2
  ..$ G1:List of 1
  .. ..$ value:'data.frame':	2 obs. of  7 variables:
  .. .. ..$ n          : int [1:2] 3 3
  .. .. ..$ miss       : int [1:2] 0 0
  .. .. ..$ p.miss     : num [1:2] 0 0
  .. .. ..$ level      : Factor w/ 2 levels "Level 2","Level 3": 1 2
  .. .. ..$ freq       : 'table' int [1:2(1d)] 2 1
  .. .. .. ..- attr(*, "dimnames")=List of 1
  .. .. .. .. ..$ x: chr [1:2] "Level 2" "Level 3"
  .. .. ..$ percent    : table [1:2(1d)] 66.7 33.3
  .. .. .. ..- attr(*, "dimnames")=List of 1
  .. .. .. .. ..$ x: chr [1:2] "Level 2" "Level 3"

Is there a way to keep the original levels? This issue comes up for me quite a bit where'd I 'd like to show all possible levels but some are quite rare (like death) in my actual data set.

odds ratio instead of p-value for categorical variables

Dream package! Kudos to its author Kazuki.

Now, for summarizing binary variables stratified by another binary variable, is there a way to output odds ratios (see attached picture taken from Agresti's CDA book) instead of p-values?

I guess there's an easy tweak to achieve this. Can you point me to the right place to get started? Thank you.

useful export combining overall and stratified table1 to excel

hi!

I thought this might be useful for someone.

These few lines of code combine table1 overall and table1 stratified and export it to a nice *.xlsx automatically.

table1_printed<-as.data.frame(print(table1))
table1strat_printed<-as.data.frame(print(table1strat))
table1_final<-cbind(table1_printed,table1strat_printed)

xlsx::write.xlsx(as.data.frame(print(table1_final)), "table1_final.xlsx")

Specify alternative summary-function to be used in print.ContTable for a given variable

Is there a possibility use in print.ContTable() the other function defined in CreateContTable() (both funcNames and funcAdditional) for a given subset of variables. Example: I quite often would like to specify to total number of samples in each strata (i.e. I have a variable "nr.of.samples" per patient and would like to see the "sum" in print.ContTable()).

Any suggestion?

By the way: great package that already saved our team used dozens of working hours!

Pascal

get percentage only when using surveydata

Thanks for this great package!

Is it possible to only output the percantages when using the package with survey data? The Ns that result from weighting are not appropriate, so an option to print unweighted Ns or only wieghted percentages would be very useful.

Can't seem to change argApprox

Hello,

I am trying to change the default setting for argsApprox to not use the continuity correction for the chi-square test. Maybe I am not defining the argument correctly. Here is the issue illustrated using the example in the package documentation:

## Load
library(tableone)
## Load Mayo Clinic Primary Biliary Cirrhosis Data
library(survival)
data(pbc)
## Check variables
head(pbc)
## Make categorical variables factors
varsToFactor <- c("status","trt","ascites","hepato","spiders","edema","stage")
pbc[varsToFactor] <- lapply(pbc[varsToFactor], factor)
## Create a variable list
dput(names(pbc))
vars <- c("time","status","age","sex","ascites","hepato",
          "spiders","edema","bili","chol","albumin",
          "copper","alk.phos","ast","trig","platelet",
          "protime","stage")


#### Continuity correction = TRUE -----------------------------

tableOne <- CreateTableOne(vars = vars, 
                           strata = c("trt"), 
                           data = pbc)
tableOne

#### Continuity correction = FALSE -----------------------------
tableOne_2 <- CreateTableOne(vars = vars, 
                             strata = c("trt"), 
                             data = pbc, 
                             testApprox = chisq.test, 
                             argsApprox = list(correct = FALSE))
tableOne_2

The p-value in both is 0.894. Maybe I am not setting it up right, but it seems like an issue.

I get the same results when I use CreateCatTable.

Session info:

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] survival_2.41-3 tableone_0.9.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15     lattice_0.20-35  zoo_1.8-1        class_7.3-14     MASS_7.3-49      grid_3.4.3       labelled_1.0.1   magrittr_1.5    
 [9] e1071_1.6-8      survey_3.33      pillar_1.2.1     rlang_0.2.0.9000 Matrix_1.2-12    splines_3.4.3    forcats_0.3.0    tools_3.4.3     
[17] yaml_2.1.17      compiler_3.4.3   haven_1.1.1      tibble_1.4.2

Arguments in kableone and print

As brought up in #32, kableone() only lets you pass arguments to kable() and not to print.tableone().

I think there are a couple of related issues here. Right now I'm thinking a good PR for this will do the following:

Move the code currently in print.tableone to as.data.frame.tableone with printToggle = FALSE to avoid capture.output()
change print.tableone to something like

print.tableone <- function(x, {arguments}, ...) {
  x <- as.data.frame(x, {arguments})
  print(x, ...)
}

Change kableone() to something like

kableone <- function(x, kable_args = list(), print_args = list()) {
 #  need to evaluate arguments as well
 # ...
  x <- as.data.frame(x, print_args)
  knitr::kable(x, kable_args)
}

white spaces in column names lead to error when using strata

Hi all, thank you for the effort on TableOne!
When the column names contain white spaces the following code runs fine:

CreateTableOne(
    vars = my_vars,
    data = my_data,
    factorVars = my_cat_vars
)

When the column names contain white spaces the following code leads to an error message:

CreateTableOne(
    vars = my_vars,
    data = my_data,
    factorVars = my_cat_vars,
    strata = c("my_grouping_column")
)

A straight-forward solution is to rename all columns. Do you have any other suggestions?
KR
Iakov

How to create a table with hierarchical groups?

Many thanks to your package!
I want to create a table like below:

Say I have data like this:

Gram	Species	Count
pos	A	1
pos	B	1
neg	C	1
neg	D	1
neg	D	1

How to output first table using tableone?
THANK YOU!

Is there a way to change testApprox to mantelhaen.test in CreateTableOne?

I am trying to apply the Cochran-Mantel-Haenszel Chi-Squared Test rather than the traditional chisq.test. Is this possible in in CreateTableOne()?

write_csv error out

Currently tableone object can only be written out using write.csv. Suggest expand to write_csv

svyCreateTableOne error: 'sum' not meaningful for factors

I'm getting the following error. Any ideas?

library(survey)
library(tableone)
age<- c(55,66,77,33,44)
gender<-c("Male","Male","Female","Male","Female")
weights<-c(2.3,1.0,3.0,2.3,1.0)

df<- data.frame(age,gender,weights)
svy <- svydesign(id = ~0, data = df, weights=~weights)
t1 <- svyCreateTableOne(data = svy, vars=c("age", "gender"))

Output:
Error in Summary.factor(c(3L, 1L), na.rm = TRUE) :
'sum' not meaningful for factors
In addition: Warning message:
package 'tableone' was built under R version 3.4.4

My session info:

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] tableone_0.9.3 survey_3.34 survival_2.41-3 Matrix_1.2-12

loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 lattice_0.20-35 dplyr_0.7.4 assertthat_0.2.0
[5] R6_2.2.2 magrittr_1.5 labelled_1.1.0 pillar_1.1.0
[9] rlang_0.2.2 bindrcpp_0.2 splines_3.4.3 forcats_0.3.0
[13] tools_3.4.3 glue_1.2.0 hms_0.4.2 compiler_3.4.3
[17] pkgconfig_2.0.1 haven_1.1.2 bindr_0.1 tibble_1.4.2

output with R markdown

Hi,

thank you very much such a great package, it is very handy,
I am trying to output the result from createTableOne to doc file, but neither r markdown not html works, and Copy/Paste from the console will mixed up the format. How do you output the table to other document?

Show sub-group p-value when select 2 group variables

Hello.
Thanks for great packages. I have a suggestion.

How about show subgroup p-values when select 2 group variables?

As I know, CreateTableOne function only show p-values among total subgroup.

For example.

When select ("Sex", "Hypertension") as group variables, there are 2 p values.

Normal vs Hypertension : Male
Normal vs Hypertension : Female

Sincerely.
Jinseob Kim

kaz-yos / tableone Goto Github PK

tableone's Introduction

tableone

tableone in action

tableone code example

Installation

Contributors

Similar or complementary projects

tableone's People

Contributors

Stargazers

Watchers

Forkers

tableone's Issues

link to the rmd file containing code

set factor variables

set continous variables

Recommend Projects

Recommend Topics

Recommend Org