The crosstabr from tklebel

Programming vs interactive

Rethink internals in order to allow programming with crosstabr.

[Hadley's talk at UseR!2016](https://channel9.msdn.com/Events/useR-international-R-User-- conference/useR2016/Towards-a-grammar-of-interactive-graphics)
Lazyeval

Clarify vignette/README that the current approach is in favour of interactive use.

Missing values

Find a way to deal with NA. For starters, they could just be omitted.

Would there be any alternative? It doesn't seem to make sense to include them in any way in the table.

Better way to find dimnames for matrix

Currently, dimnames are retrieved from the proportional data.frame by levels.

Check if this approach is always valid. It should always find the correct labels for the resulting table
Think about, what to do with unused levels (levels which were changed to NA, where the level-vector stays the same). Should they simply be dropped by default? Should there be an option to keep them? (if yes, why?)

Consider using tableHTML

https://cran.r-project.org/web/packages/tableHTML/index.html

Using lapply or purrr:walk on crosstabr

Something along the following lines should work for end users.

list_fun <- function(indep) {
  print(crosstab(titanic, reformulate(indep, response = "Survived")))
}

vars <- names(titanic)[1:3]
purrr::walk(vars, list_fun)

Export Function

Think about a function to export the table. Export and copy to clipboard works in RStudio, but it could be useful to have a function similar to ggsave for programming use.

drop unobserved factor levels

Think about the following case:

test_df2 <- data.frame(
  gender = factor(c("male", "female")),
  smoke = factor(c(rep("yes", 5), rep("no", 5))),
  age = factor(c("young", "old"))
)
cross_table(test_df, smoke~gender)

# now we recode a level to be missing
test_df2$gender[test_df2$gender == "female"] <- NA

# females still show up
cross_table(test_df2, smoke~gender)

# levels should be removed too
test_df2$gender <- factor(test_df2$gender, levels = "male")
cross_table(test_df2, smoke~gender)

Should we simply do layout_column(drop = T) to drop all unobserved factor levels? Or should we let the user specify which levels to remove?

For the first case the computation could make use of tidyr::complete(model_frame).

gmodels::CrossTable seems to drop unobserved factor levels. For exploratory analysis this is not optimal: you should notice, if some combinations were not observed in the data.

Maybe layout_column() could gain the argument drop:

drop can be TRUE or FALSE with default TRUE.
alternatively you can provide a character vector, specifying the levels to drop. (if not all levels should be dropped)

cross_table should have a separate argument layer, where you can specify a third variable, which you want to control for. This seems easier to remember than to add another variable to the input-formula.

Rethink internals for add_stats

Maybe to incorporate the work in funs from dplyr?

At least use: lazyeval::lazy_dots(...) and lazyeval::as.lazy_dots().

Think about cases with user defined functions – are they working in the current approach?

build_tab

Extract parts which are building the table from print.cross_table.

Should be similar to ggvis:

at the top: guess layer, if none is provided
setup layout (column or row)
add custom labels, if provided
add stats computed by add_stats

implement add_stats

add_stats should take arguments as a vector and output a box with stats, taken from vcd::assocstats (Chi-square, Cramers-V, ...)

Plotting function

Think about plotting the table. Could possibly work with ggplot2 heat-map, where you plot the residuals.

Should be fairly simple, since the the factors are in the data.frame already, so it is just a matter of gathering and plotting accordingly. A layered cross_table could be implemented with facet_wrap.

Resizable table

Table should either be resizable by user (look into jQuery with ggvis) or should resize automatically (js?)

marginal counts

Add marginal counts and percentages to output.

add_labels

Function to add labels which are to be used instead of variable names.

Manage css files for knit_print

Rename table class

crosstab_outer
crosstab_inner

Resolve compatibility issues between stylesheets

Currently the first row of the table () is displayed in bold letters, which shouldn't be the case. Furthermore the stylesheet of crosstabr should not change anything else than its own objects.

Input as formula
Output to RStudio-viewer, similar to ggvis
simply display counts and column-wise percent

format table output with css

Create .css to format html output.

tklebel / crosstabr Goto Github PK

crosstabr's People

Contributors

Stargazers

Watchers

crosstabr's Issues

Rename table class

Resolve compatibility issues between stylesheets

Recommend Projects

Recommend Topics

Recommend Org