rsquaredacademy / descriptr Goto Github PK
View Code? Open in Web Editor NEWGenerate descriptive statistics
Home Page: https://descriptr.rsquaredacademy.com/
License: Other
Generate descriptive statistics
Home Page: https://descriptr.rsquaredacademy.com/
License: Other
Generate all plots using ggplot2.
We are contacting you because you are the maintainer of descriptr, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend
and linejoin
in geom_rect()
and geom_tile()
, and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.
Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2")
.
If you have any questions, let me know!
The following should accept multiple arguments or detect all continuous variables in the data set and return
metrics for all of them.
ds_measures_location()
ds_measures_variation()
ds_measures_symmetry()
ds_percentiles()
Prepare for release:
devtools::check_win_devel()
rhub::check_for_cran()
Perform release:
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Wait for CRAN...
Template from r-lib/usethis#338
Redesign ds_multi_stats()
to work generate summary statistics for all continuous variables in the data set.
Rename ds_multi_stats()
to ds_tidy_stats()
.
Add a function to return the current version of the package on CRAN and GitHub.
Move shiny app to xplorerr package.
Remove redundant comments and commented out code from the following:
ftable
returned by freq_table()
must be a data.frame or tibble instead of matrix.
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> freq_table(ecom$referrer)
$ftable
Levels Frequency Cum Frequency Percent Cum Percent
[1,] "bing" "194" "194" "19.4" "19.4"
[2,] "direct" "191" "385" "19.1" "38.5"
[3,] "google" "208" "593" "20.8" "59.3"
[4,] "social" "200" "793" "20" "79.3"
[5,] "yahoo" "207" "1000" "20.7" "100"
$varname
[1] "referrer"
attr(,"class")
[1] "freq_table"
Use rlang
equivalents for errors, warnings and messages.
This is a request for an enhancement in the ds_launch_shiny_app()
:
oway_tables()
returns the following error:
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> oway_tables(ecom)
Error in freq_table2.default(factors.df[, i], nam[i]) :
(list) object cannot be coerced to type 'double'
Called from: freq_table2.default(factors.df[, i], nam[i])
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] descriptr_0.1.1 readr_1.1.1 dplyr_0.7.2 bindrcpp_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 learnr_0.9 compiler_3.4.1
[4] git2r_0.15.0 bindr_0.1 tools_3.4.1
[7] digest_0.6.10 jsonlite_1.5 evaluate_0.10
[10] memoise_1.0.0.9001 tibble_1.3.4 pkgconfig_2.0.1
[13] rlang_0.1.2 rstudioapi_0.6 shiny_1.0.4
[16] curl_2.2 yaml_2.1.14 withr_1.0.2
[19] stringr_1.2.0 httr_1.3.1 knitr_1.17
[22] hms_0.2 htmlwidgets_0.9 devtools_1.13.3
[25] tidyselect_0.1.1 rprojroot_1.2-10 glue_1.1.1
[28] R6_2.2.2 rmarkdown_1.6 tidyr_0.7.0
[31] purrr_0.2.3 skimr_0.9000 magrittr_1.5
[34] backports_1.0.4 htmltools_0.3.6 colformat_0.0.0.9000
[37] rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[40] xtable_1.8-0 httpuv_1.3.5 stringi_1.1.2
[43] crayon_1.3.2.9000 markdown_0.8
Generate descriptive statistics for a set of continuous/numeric variables.
The ds_summary_stats()
function should be redesigned to:
The normal distribution plot does not change when the mean and standard deviation are changed.
Rename ds_auto_tabulation()
to ds_auto_freq_table()
.
Generate summary statistics for combination of levels of two or more categorical variables.
Add an interactive tutorial using learnr.
In the presence of missing values, ds_summary_stats()
does not show the correct number of observations and corresponding missing values.
library(descriptr)
mt <- mtcarz
mt[2, 1] <- NA
ds_summary_stats(mt, mpg)
#> Univariate Analysis
#>
#> N 31.00 Variance 37.51
#> Missing 0.00 Std Deviation 6.12
#> Mean 20.06 Range 23.50
#> Median 19.20 Interquartile Range 7.45
#> Mode 10.40 Uncorrected SS 13601.31
#> Trimmed Mean 19.92 Corrected SS 1125.19
#> Skewness 0.68 Coeff Variation 30.53
#> Kurtosis -0.10 Std Error Mean 1.10
#>
#> Quantiles
#>
#> Quantile Value
#>
#> Max 33.90
#> 99% 33.45
#> 95% 31.40
#> 90% 30.40
#> Q3 22.80
#> Median 19.20
#> Q1 15.35
#> 10% 14.30
#> 5% 11.85
#> 1% 10.40
#> Min 10.40
#>
#> Extreme Values
#>
#> Low High
#>
#> Obs Value Obs Value
#> 14 10.4 19 33.9
#> 15 10.4 17 32.4
#> 23 13.3 18 30.4
#> 6 14.3 27 30.4
#> 16 14.7 25 27.3
Users should be able to select plotting library from the following:
Remove all functions deprecated in 0.4.1
Update app with the new api
Use the standard template for README:
ds_freq_cont()
should return a tibble for further usage.
Continue to launch shiny app from xplorerr package. Check whether suggested packages are available and offer to install missing packages.
Error in 90th and 95th percentile in ds_summary_stats
.
Redesign and rename
ds_oway_tables()
to ds_auto_tabulate()
ds_tway_tables()
to ds_auto_cross_table()
Both the functions should allow users to specify a subset of columns to be used.
Error in the grand total displayed in ds_cross_table()
in the presence of NA's.
library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_cross_table(mt, gear, cyl)
#> Cell Contents
#> |---------------|
#> | Frequency |
#> | Percent |
#> | Row Pct |
#> | Col Pct |
#> |---------------|
#>
#> Total Observations: 32
#>
#> ----------------------------------------------------------------------------
#> | | cyl |
#> ----------------------------------------------------------------------------
#> | gear | 4 | 6 | 8 | Row Total |
#> ----------------------------------------------------------------------------
#> | 3 | 1 | 1 | 11 | 13 |
#> | | 0.031 | 0.031 | 0.344 | |
#> | | 0.08 | 0.08 | 0.85 | 0.41 |
#> | | 0.11 | 0.17 | 0.85 | |
#> ----------------------------------------------------------------------------
#> | 4 | 6 | 4 | 0 | 10 |
#> | | 0.188 | 0.125 | 0 | |
#> | | 0.6 | 0.4 | 0 | 0.31 |
#> | | 0.67 | 0.67 | 0 | |
#> ----------------------------------------------------------------------------
#> | 5 | 2 | 1 | 2 | 5 |
#> | | 0.062 | 0.031 | 0.062 | |
#> | | 0.4 | 0.2 | 0.4 | 0.16 |
#> | | 0.22 | 0.17 | 0.15 | |
#> ----------------------------------------------------------------------------
#> | Column Total | 9 | 6 | 13 | 32 |
#> | | 0.281 | 0.187 | 0.406 | |
#> ----------------------------------------------------------------------------
Return plot objects instead of printing. Use the argument print_plot
with the default value TRUE
.
Add the following:
Merge ds_multi_summary_stats()
into ds_summary_stats()
.
In ds_freq_table()
, add a row at the end of the frequency table to display the frequency of NA
values.
plot.ds_data_summary()
should create plots for the following data types:
numeric/integer
factors
Example
example_ds <- data.frame(
col_integer = c(2L, 2L, 2L, 5L, 5L),
col_numeric = c(1.9, NA, 2.9, 9.1, 9.6),
col_ordinal = ordered(c("S", "V", "V", "S", NA)),
col_factor = factor(c("R", "G", "G", "B", "B")),
col_logical = c(FALSE, TRUE, TRUE, FALSE, TRUE),
col_character = c("-", "-", "some text", "-", "-"),
col_date = Sys.Date(),
col_time = Sys.time(),
stringsAsFactors = FALSE
)
descriptr::ds_screener(example_ds)
This example fais with error:
Error in max(sapply(lengths[[i]], nchar)) :
invalid 'type' (list) of argument
Create kable()
friendly output for the following:
ds_summary_stats()
ds_screener()
ds_cross_table()
ds_freq_table()
ds_freq_cont()
ds_group_summary()
ds_oway_tables()
ds_tway_tables()
One common function for frequency tables. Merge ds_freq_cont()
into ds_freq_table()
.
Soft deprecate all dist_*
functions.
Improve code coverage by adding tests for plots.
Rename ds_auto_summary()
to ds_auto_summary_stats()
.
Remove shiny application from the inst
folder.
Generate automated report for descriptive statistics. Check the report package.
Modify all ds_*
functions to handle missing values.
ds_multi_stats()
throws an error in the presence of missing values.
library(descriptr)
mt <- mtcarz
mt[c(3, 6, 9, 12), c(2, 3, 5, 6, 8, 10)] <- NA
ds_multi_stats(mt, disp, hp, mpg)
#> Error in summarise_impl(.data, dots): Evaluation error: missing values and NaN's not allowed if 'na.rm' is FALSE.
ds_auto_summary()
should identify and generate appropriate summary for the following data types:
numeric/integer
factor
Integrate the descriptive statistics report template from reportr.
User friendly error messages (check here).
Modify all ds_*
functions to handle missing values.
tway_tables()
returns the following error:
> ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv',
+ col_types = list(col_integer(),
+ col_factor(levels = c('bing', 'direct', 'google', 'social', 'yahoo')),
+ col_factor(levels = c('tablet', 'laptop', 'mobile')),
+ col_factor(levels = c('true', 'false')), col_integer(), col_double(),
+ col_double(), col_character(), col_factor(levels = c('true', 'false')),
+ col_double(), col_double())
+ )
> tway_tables(ecom)
Cell Contents
|---------------|
| Frequency |
| Percent |
| Row Pct |
| Col Pct |
|---------------|
Total Observations: 1000
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Called from: sort.list(y)
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] descriptr_0.1.1 readr_1.1.1 dplyr_0.7.2 bindrcpp_0.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 learnr_0.9 compiler_3.4.1
[4] git2r_0.15.0 bindr_0.1 tools_3.4.1
[7] digest_0.6.10 jsonlite_1.5 evaluate_0.10
[10] memoise_1.0.0.9001 tibble_1.3.4 pkgconfig_2.0.1
[13] rlang_0.1.2 rstudioapi_0.6 shiny_1.0.4
[16] curl_2.2 yaml_2.1.14 withr_1.0.2
[19] stringr_1.2.0 httr_1.3.1 knitr_1.17
[22] hms_0.2 htmlwidgets_0.9 devtools_1.13.3
[25] tidyselect_0.1.1 rprojroot_1.2-10 glue_1.1.1
[28] R6_2.2.2 rmarkdown_1.6 tidyr_0.7.0
[31] purrr_0.2.3 skimr_0.9000 magrittr_1.5
[34] backports_1.0.4 htmltools_0.3.6 colformat_0.0.0.9000
[37] rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[40] xtable_1.8-0 httpuv_1.3.5 stringi_1.1.2
[43] crayon_1.3.2.9000 markdown_0.8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.