Coder Social home page Coder Social logo

antoinesoetewey / statsandr Goto Github PK

View Code? Open in Web Editor NEW
35.0 5.0 16.0 275.26 MB

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com

Home Page: http://statsandr.com/

HTML 33.30% CSS 0.76% JavaScript 64.99% TeX 0.85% R 0.01% SCSS 0.09%
statistics rstudio r rmarkdown shiny giscus blog

statsandr's Introduction

Welcome to the blog Stats and R. As the name suggests, this blog is about statistics and its applications in R (an open source statistical software program). It aims at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R.

The goal of this website is to make statistics easy to understand by illustrating statistical notions with examples and using plain English. When possible, for all statistical concepts covered in this blog, there is also an article on how to apply them in R.

If you are new to this blog, I invite you to:

Status

Netlify Status

statsandr's People

Contributors

antoinesoetewey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

statsandr's Issues

Generate PDF and WORD from this code

library(shinyBS)
library(shiny)
library(rmarkdown)
library(knitr)

ui <- bootstrapPage(
  navbarPage(theme = shinytheme("flatly"), collapsible = TRUE,
             "TEST", 

  tabPanel("First page",
  sidebarLayout(
    sidebarPanel(
      
      sliderInput("bins1",
                  "Number of bins:",
                  min = 1,
                  max = 30,
                  value = 10),
      radioButtons("format", "Download report:", c("PDF", "Word"),
                   inline = TRUE),
      downloadButton("downloadReport"),
      
      
    ),
    
    mainPanel(
      plotOutput("distPlot1")
    ))),
  
  tabPanel("Second page",
           sidebarLayout(
             sidebarPanel(
               
               sliderInput("bins2",
                           "Number of bins:",
                           min = 1,
                           max = 30,
                           value = 10),
             ),
             
             mainPanel(
               plotOutput("distPlot2")
  
  )))))

server <- function(input, output) {
  
  output$distPlot1 <- renderPlot({
    x    <- faithful[, 2]
    bins1 <- seq(min(x), max(x), length.out = input$bins1 + 1)
    
    hist(x, breaks = bins1, col = 'darkgray', border = 'white')
  })
  
  output$distPlot2 <- renderPlot({
    x    <- faithful[, 2]
    bins2 <- seq(min(x), max(x), length.out = input$bins2 + 1)
    
    hist(x, breaks = bins2, col = 'darkgray', border = 'white')
  })
  
  
  output$downloadReport <- downloadHandler(
    filename = function() {
      paste("my-report", sep = ".", switch(
        input$format, PDF = "pdf", Word = "docx"
      ))
    },
    
    content = function(file) {
      src <- normalizePath("report.Rmd")
      
      owd <- setwd(tempdir())
      on.exit(setwd(owd))
      file.copy(src, "report.Rmd", overwrite = TRUE)
      
      library(rmarkdown)
      out <- render("report.Rmd", switch(
        input$format,
        PDF = pdf_document(), Word = word_document()
      ))
      file.rename(out, file)
    }
  )
}

# Run the application
shinyApp(ui = ui, server = server)

idea

for the Wilcoxon test the medians are compared.

Documentation - section: Details: http://finzi.psych.upenn.edu/R/library/stats/html/wilcox.test.html:

Note that in the two-sample case the estimator for the difference in location parameters does not estimate the difference in medians (a common misconception) but rather the median of the difference between a sample from x and a sample from y.

# Hodges-Lehmann estimator:
> Boy <- subset(dat,Sex=="Boy")$Grade
> Girl <- subset(dat,Sex=="Girl")$Grade
> diff <- outer(Boy,Girl, "-")
> diff
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]   -3   -2    7   -1    8    9    0   -3   -4     7     5    -2
 [2,]  -14  -13   -4  -12   -3   -2  -11  -14  -15    -4    -6   -13
 [3,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
 [4,]  -17  -16   -7  -15   -6   -5  -14  -17  -18    -7    -9   -16
 [5,]   -5   -4    5   -3    6    7   -2   -5   -6     5     3    -4
 [6,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
 [7,]  -15  -14   -5  -13   -4   -3  -12  -15  -16    -5    -7   -14
 [8,]  -12  -11   -2  -10   -1    0   -9  -12  -13    -2    -4   -11
 [9,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
[10,]  -13  -12   -3  -11   -2   -1  -10  -13  -14    -3    -5   -12
[11,]  -12  -11   -2  -10   -1    0   -9  -12  -13    -2    -4   -11
[12,]   -5   -4    5   -3    6    7   -2   -5   -6     5     3    -4
> median(diff)
[1] -4
> library("coin")
> wilcox_test(Grade ~ Sex,data= dat, conf.int= T,distribution = exact())

	Exact Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.01763
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
 -10  -1
sample estimates:
difference in location 
                    -4 

equality of medians if the distributions are symmetric and of same scale parameter

idea

It is worth a mention:

  • Multivariate Outlier
> library(mvoutlier)
> Y <- as.matrix(ggplot2::mpg[,c(5,9)])
> res1 <- uni.plot(Y,symb=T)

# index outliers:
> which(res1$outliers == TRUE)
[1] 213 222 223


# value outliers:
> ggplot2::mpg[which(res1$outliers == TRUE),]
# A tibble: 3 x 11
  manufacturer model   displ  year   cyl trans   drv     cty   hwy fl    class  
  <chr>        <chr>   <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>  
1 volkswagen   jetta     1.9  1999     4 manualf        33    44 d     compact
2 volkswagen   new be1.9  1999     4 manualf        35    44 d     subcom3 volkswagen   new be1.9  1999     4 auto(lf        29    41 d     subcom# value md:
> res1$md[which(res1$outliers == TRUE)]
[1] 4.161048 4.161048 3.423600

res1

> res2 <- aq.plot(Y)

res2

> par(mfrow=c(2,2))
> res3 <- dd.plot(Y)
> res4 <- symbol.plot(Y)
> res5 <- corr.plot(Y[,1], Y[,2])
> res6 <- color.plot(Y)
> which(res3$outliers == TRUE)
[1] 213 222 223

res3

  • Transformations

" The use of transformations is problematic for numerous reasons, including (a) transformations often fail to
restore normality and homoscedasticity; (b) they do not
deal with outliers; (c) they can reduce power;
(d) they sometimes rearrange the order of the means from what they
were originally; and (e) they make the interpretation of
results difficult, as findings are based on the transformed
rather than the original data (Grissom, 2000; Leech &
Onwuegbuzie, 2002; Lix, Keselman, & Keselman, 1996).
We strongly recommend using modern robust methods
instead of conducting classic parametric analyses on transformed data "
.

source: Modern robust statistical methods: an easy way to maximize the accuracy and power of your research,
author: David M Erceg-Hurn and Vikki M Mirosevich, journal: The American psychologist, year: 2008, volume: 63 7, pages: 591-601

See:

?asbio::win
?DescTools::Winsorize

cumulative_incident_cases

Dear Antoine,

thank you very much for you paper on the top resources for covid 19 analytics.

I tried to adjust your code for Belgium for Ukraine. It looks easy - thanks for this. But as I could see - column "cumulative_incident_cases" in my table "fit" is the same as the colum "I"

Should they be equal?

idea

Note that the presence of equal elements (ties) prevents an exact p-value calculation

Other functions calculate ?

  • Exact Wilcoxon-Mann-Whitney test with adjustment for ties:
> library("coin")
> wilcox_test(Grade ~ Sex,data= dat,alternative= "less",distribution = exact())

	Exact Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.008815
alternative hypothesis: true mu is less than 0
  • Asymptotic Wilcoxon-Mann-Whitney test with adjustment for ties:
> library("coin")
> wilcox_test(Grade ~ Sex,data= dat,alternative= "less")

	Asymptotic Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.009516
alternative hypothesis: true mu is less than 0

Solution for stochastic equality:

  • Brunner-Munzel generalized Wilcoxon-Mann-Whitney test
library("nparcomp")
r1 <- npar.t.test(Grade ~ Sex,data= dat,alternative= "less",method="t.app",rounds=6)
summary(r1)

       Effect Estimator    Lower Upper        T  p.Value
1 p(Boy,Girl)   0.78125 0.612167     1 2.861507 0.004655
  • Permutation Brunner-Munzel generalized Wilcoxon-Mann-Whitney test
library("nparcomp")
r2 <- npar.t.test(Grade ~ Sex,data= dat,alternative= "less",method="permu",rounds=6)
summary(r2)

       Estimator Statistic    Lower    Upper p.value
id       0.78125  2.861507 0.573536 0.995312  0.0087
logit    0.78125  2.213386 0.560774 0.911383  0.0082
probit   0.78125  2.331348 0.562881 0.920822  0.0083

idea

  • Comparison of several Pearson's linear correlation coefficients:
> RVAideMemoire::cor.multcomp(mtcars$mpg, mtcars$disp, factor(mtcars$cyl))

	Comparison of 3 Pearson's linear correlation coefficients

data:  mtcars$mpg and mtcars$disp by factor(mtcars$cyl) 
X-squared = 4.0477, df = 2, p-value = 0.1321
alternative hypothesis: true difference in coefficients is not equal to 0 
sample estimates:
coeff in group 4 coeff in group 6 coeff in group 8 
      -0.8052361        0.1030827       -0.5197670 

        Common correlation coefficient, 95% confidence interval
          and equality to given value 0 

     inf       r     sup theoretical     U Pr(>|U|)   
 -0.7831 -0.5681 -0.2318           0 3.092 0.001988 **
> correlation::correlation(dat, include_factors = TRUE, method = "auto")
Parameter1 | Parameter2 |     r |         95% CI |     t | df |      p |  Method | n_Obs
----------------------------------------------------------------------------------------
mpg        |        cyl | -0.85 | [-0.93, -0.72] | -8.92 | 30 | < .001 | Pearson |    32
mpg        |       disp | -0.85 | [-0.92, -0.71] | -8.75 | 30 | < .001 | Pearson |    32
mpg        |         hp | -0.78 | [-0.89, -0.59] | -6.74 | 30 | < .001 | Pearson |    32
mpg        |       drat |  0.68 | [ 0.44,  0.83] |  5.10 | 30 | < .001 | Pearson |    32
mpg        |         wt | -0.87 | [-0.93, -0.74] | -9.56 | 30 | < .001 | Pearson |    32
mpg        |       qsec |  0.42 | [ 0.08,  0.67] |  2.53 | 30 | 0.137  | Pearson |    32
mpg        |       gear |  0.48 | [ 0.16,  0.71] |  3.00 | 30 | 0.065  | Pearson |    32
mpg        |       carb | -0.55 | [-0.75, -0.25] | -3.62 | 30 | 0.016  | Pearson |    32
cyl        |       disp |  0.90 | [ 0.81,  0.95] | 11.45 | 30 | < .001 | Pearson |    32
cyl        |         hp |  0.83 | [ 0.68,  0.92] |  8.23 | 30 | < .001 | Pearson |    32
cyl        |       drat | -0.70 | [-0.84, -0.46] | -5.37 | 30 | < .001 | Pearson |    32
cyl        |         wt |  0.78 | [ 0.60,  0.89] |  6.88 | 30 | < .001 | Pearson |    32
cyl        |       qsec | -0.59 | [-0.78, -0.31] | -4.02 | 30 | 0.007  | Pearson |    32
cyl        |       gear | -0.49 | [-0.72, -0.17] | -3.10 | 30 | 0.054  | Pearson |    32
cyl        |       carb |  0.53 | [ 0.22,  0.74] |  3.40 | 30 | 0.027  | Pearson |    32
disp       |         hp |  0.79 | [ 0.61,  0.89] |  7.08 | 30 | < .001 | Pearson |    32
disp       |       drat | -0.71 | [-0.85, -0.48] | -5.53 | 30 | < .001 | Pearson |    32
disp       |         wt |  0.89 | [ 0.78,  0.94] | 10.58 | 30 | < .001 | Pearson |    32
disp       |       qsec | -0.43 | [-0.68, -0.10] | -2.64 | 30 | 0.131  | Pearson |    32
disp       |       gear | -0.56 | [-0.76, -0.26] | -3.66 | 30 | 0.015  | Pearson |    32
disp       |       carb |  0.39 | [ 0.05,  0.65] |  2.35 | 30 | 0.177  | Pearson |    32
hp         |       drat | -0.45 | [-0.69, -0.12] | -2.75 | 30 | 0.110  | Pearson |    32
hp         |         wt |  0.66 | [ 0.40,  0.82] |  4.80 | 30 | < .001 | Pearson |    32
hp         |       qsec | -0.71 | [-0.85, -0.48] | -5.49 | 30 | < .001 | Pearson |    32
hp         |       gear | -0.13 | [-0.45,  0.23] | -0.69 | 30 | 1.000  | Pearson |    32
hp         |       carb |  0.75 | [ 0.54,  0.87] |  6.21 | 30 | < .001 | Pearson |    32
drat       |         wt | -0.71 | [-0.85, -0.48] | -5.56 | 30 | < .001 | Pearson |    32
drat       |       qsec |  0.09 | [-0.27,  0.43] |  0.50 | 30 | 1.000  | Pearson |    32
drat       |       gear |  0.70 | [ 0.46,  0.84] |  5.36 | 30 | < .001 | Pearson |    32
drat       |       carb | -0.09 | [-0.43,  0.27] | -0.50 | 30 | 1.000  | Pearson |    32
wt         |       qsec | -0.17 | [-0.49,  0.19] | -0.97 | 30 | 1.000  | Pearson |    32
wt         |       gear | -0.58 | [-0.77, -0.29] | -3.93 | 30 | 0.008  | Pearson |    32
wt         |       carb |  0.43 | [ 0.09,  0.68] |  2.59 | 30 | 0.132  | Pearson |    32
qsec       |       gear | -0.21 | [-0.52,  0.15] | -1.19 | 30 | 1.000  | Pearson |    32
qsec       |       carb | -0.66 | [-0.82, -0.40] | -4.76 | 30 | < .001 | Pearson |    32
gear       |       carb |  0.27 | [-0.08,  0.57] |  1.56 | 30 | 0.774  | Pearson |    32
  • correlation matrix:
PerformanceAnalytics::chart.Correlation(as.matrix(dat), histogram=TRUE, pch="+")

corrPlot

Idea

Kruskal-Wallis test:

This non-parametric test, robust to non normal distributions, has the same null and alternative hypotheses, and the same interpretations
than the ANOVA.

Note that the Kruskal-Wallis test does not require the assumptions of normality nor homoscedasticity of the variances.


Note 1

The Kruskal–Wallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing whether samples originate from the same distribution.
It is used for comparing two or more samples that are independent, and that may have different sample sizes, and extends the Mann–Whitney U test to more than two groups. The parametric equivalent of the Kruskal-Wallis test is the one-way analysis of variance (ANOVA).
When rejecting the null hypothesis of the Kruskal-Wallis test, then at least one sample stochastically dominates at least one other sample. The test does not identify where this stochastic dominance occurs or for how many pairs of groups stochastic dominance obtains.
Dunn's test would help analyze the specific sample pairs for stochastic dominance.
Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution of the residuals, unlike the analogous one-way analysis of variance.
If the researcher can make the more stringent assumptions of an identically shaped and scaled distribution for all groups, except for any difference in medians, then the null hypothesis is that the medians of all groups are equal, and the alternative hypothesis is that at least one population median of one group is different from the population median of at least one other group.

source: Cortés, Omar. (2015). Re: Which statistical test to use?. Retrieved from: https://www.researchgate.net/post/Which_statistical_test_to_use4/55edbeb260614b7ac18b458d/citation/download.


Note 2

A problem with the Kruskal-Wallis test is that, while it does not assume normality for groups, it still assumes homoscedasticity
(i.e. the groups have the same distributional shape). As a solution Brunner et al. (1997) proposed a heteroscedastic version of
the Kruskal-Wallis test which utilizes the F-distribution. Along with being robust to non-normality and heteroscedasticity, calculations of
exact P-values using the Brunner-Dette-Munk method are not made more complex by tied values. This is another obvious advantage over the
traditional Kruskal-Wallis approach.

source: asbio

Consider replacing utterances with giscus

Hey there,

I see that you're using utterances for comments. As we know, it utilizes GitHub Issues, which (given enough time) would flood your repository's issues with comments. (Looks like you've already got quite a lot of issues for comments here).

I've been developing an alternative: giscus, a similar project that utilizes GitHub Discussions instead. It has support for replies and other cool stuff from GitHub Discussions. The big advantage is the fact that it uses your repository's Discussions, which is more suitable for comments. I would really appreciate it if you tried it. Feedback is welcome, the code is open source.

You can convert existing issues into discussions, as described here.

Thanks!

Example of issue with reprex addin

# packages
library(palmerpenguins)
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# data
data("penguins")
head(penguins)
#> # A tibble: 6 x 8
#>   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
#>   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
#> 1 Adelie  Torge…           39.1          18.7              181        3750 male 
#> 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
#> 3 Adelie  Torge…           40.3          18                195        3250 fema…
#> 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
#> 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
#> 6 Adelie  Torge…           39.3          20.6              190        3650 male 
#> # … with 1 more variable: year <int>

# descriptive stats
summary(penguins)
#>       species          island    bill_length_mm  bill_depth_mm  
#>  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
#>  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
#>  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
#>                                  Mean   :43.92   Mean   :17.15  
#>                                  3rd Qu.:48.50   3rd Qu.:18.70  
#>                                  Max.   :59.60   Max.   :21.50  
#>                                  NA's   :2       NA's   :2      
#>  flipper_length_mm  body_mass_g       sex           year     
#>  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
#>  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
#>  Median :197.0     Median :4050   NA's  : 11   Median :2008  
#>  Mean   :200.9     Mean   :4202                Mean   :2008  
#>  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
#>  Max.   :231.0     Max.   :6300                Max.   :2009  
#>  NA's   :2         NA's   :2

# scatterplot
penguins %>%
  filter(!is.na(sex)) %>%
  ggplot() +
  aes(x = bill_length_mm, y = flipper_length_mm, colour = species) +
  geom_point(size = 1L) +
  scale_color_hue() +
  theme_minimal() +
  facet_wrap(vars(sex))

Created on 2020-12-18 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Brussels             
#>  date     2020-12-18                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version date       lib source        
#>  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports        1.1.8   2020-06-17 [1] CRAN (R 4.0.0)
#>  callr            3.5.1   2020-10-13 [1] CRAN (R 4.0.2)
#>  cli              2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  colorspace       1.4-1   2019-03-18 [1] CRAN (R 4.0.0)
#>  crayon           1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  curl             4.3     2019-12-02 [1] CRAN (R 4.0.0)
#>  desc             1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools         2.3.2   2020-09-18 [1] CRAN (R 4.0.2)
#>  digest           0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  dplyr          * 1.0.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  ellipsis         0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate         0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi            0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  farver           2.0.3   2020-01-16 [1] CRAN (R 4.0.0)
#>  fs               1.4.2   2020-06-30 [1] CRAN (R 4.0.2)
#>  generics         0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
#>  ggplot2        * 3.3.2   2020-06-19 [1] CRAN (R 4.0.0)
#>  glue             1.4.1   2020-05-13 [1] CRAN (R 4.0.0)
#>  gtable           0.3.0   2019-03-25 [1] CRAN (R 4.0.0)
#>  highr            0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools        0.5.0   2020-06-16 [1] CRAN (R 4.0.0)
#>  httr             1.4.2   2020-07-20 [1] CRAN (R 4.0.2)
#>  knitr            1.29    2020-06-23 [1] CRAN (R 4.0.0)
#>  labeling         0.3     2014-08-23 [1] CRAN (R 4.0.0)
#>  lifecycle        0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr         1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise          1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  mime             0.9     2020-02-04 [1] CRAN (R 4.0.0)
#>  munsell          0.5.0   2018-06-12 [1] CRAN (R 4.0.0)
#>  palmerpenguins * 0.1.0   2020-07-23 [1] CRAN (R 4.0.2)
#>  pillar           1.4.4   2020-05-05 [1] CRAN (R 4.0.0)
#>  pkgbuild         1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  pkgload          1.1.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits      1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx         3.4.4   2020-09-03 [1] CRAN (R 4.0.2)
#>  ps               1.3.3   2020-05-08 [1] CRAN (R 4.0.0)
#>  purrr            0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
#>  R6               2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  remotes          2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang            0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown        2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  rprojroot        1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  scales           1.1.1   2020-05-11 [1] CRAN (R 4.0.0)
#>  sessioninfo      1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi          1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr          1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat         2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
#>  tibble           3.0.1   2020-04-20 [1] CRAN (R 4.0.0)
#>  tidyselect       1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
#>  usethis          1.6.3   2020-09-17 [1] CRAN (R 4.0.2)
#>  utf8             1.1.4   2018-05-24 [1] CRAN (R 4.0.0)
#>  vctrs            0.3.1   2020-06-05 [1] CRAN (R 4.0.0)
#>  withr            2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun             0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  xml2             1.3.2   2020-04-23 [1] CRAN (R 4.0.0)
#>  yaml             2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Users/antoine/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R - Stats and R

This article explains in details what is the normal or Gaussian distribution, its importance in statistics and how to test if your data is normally distributed

https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

Data entry

Rather than use long hand to create the database, it is more productive to create a file that holds the data. In that way, it is easier to edit the database without editing the codes. This will also avoid errors in coding and instead of so many lines of code, the entry becomes just two lines from 10 and would elongate if more roles are added:

library(readxl)
cv <- read_excel("filename.xlsx")

ggplot2 tutorial

That is a nice tutorial on ggplot2. A minor update is that the first paragraph should probably mention lattice. Yes, it has been effectively superseded by ggplot2, but it still exists and did play an important role.

Terry Therneau
Mayo Clinic

Error: object 'death' not found

Hi There!
Thank you very much for your tutorial.
I am unable to replicate your code/results
I get the following message: "Error: object 'death' not found" after running the following code segment:

%>% <- magrittr::%>%

extract the cumulative incidence

df <- coronavirus %>%
dplyr::filter(Country.Region == "Belgium") %>%
dplyr::group_by(date, type) %>%
dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%
tidyr::pivot_wider(
names_from = type,
values_from = total
) %>%
dplyr::arrange(date) %>%
dplyr::ungroup() %>%
dplyr::mutate(active = confirmed - death - recovered) %>%
dplyr::mutate(
confirmed_cum = cumsum(confirmed),
death_cum = cumsum(death),
recovered_cum = cumsum(recovered),
active_cum = cumsum(active)
)

I am sure I am doing the right thing and also have relevant packages loaded.
Please check.
Thanks,
Emmy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.