antoinesoetewey / statsandr Goto Github PK

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com

Home Page: http://statsandr.com/

HTML 33.30% CSS 0.76% JavaScript 64.99% TeX 0.85% R 0.01% SCSS 0.09%

statistics rstudio r rmarkdown shiny giscus blog

statsandr's Introduction

statsandr.com

Welcome to the blog Stats and R. As the name suggests, this blog is about statistics and its applications in R (an open source statistical software program). It aims at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R.

The goal of this website is to make statistics easy to understand by illustrating statistical notions with examples and using plain English. When possible, for all statistical concepts covered in this blog, there is also an article on how to apply them in R.

If you are new to this blog, I invite you to:

See all articles or articles by categories
Learn more about who is behind this blog
Subscribe to the newsletter to receive updates by email every time a new article is published
Follow the blog on Twitter
Contribute by writing a guest post (collaborations are also welcome)
Support my work so I can keep providing free content on an ad-free blog
Contact me if you have any questions

Status

statsandr's People

Contributors

Stargazers

Watchers

Forkers

carlos-alberto-silva yshin12 apokhrel24 proloy2018 stjordanis ayalhassan narainritkaruna igorgeyn diegotangassi ariff118 sealedhermit bonnyopiyo dr-joe-roberts

statsandr's Issues

Generate PDF and WORD from this code

library(shinyBS)
library(shiny)
library(rmarkdown)
library(knitr)

ui <- bootstrapPage(
  navbarPage(theme = shinytheme("flatly"), collapsible = TRUE,
             "TEST", 

  tabPanel("First page",
  sidebarLayout(
    sidebarPanel(
      
      sliderInput("bins1",
                  "Number of bins:",
                  min = 1,
                  max = 30,
                  value = 10),
      radioButtons("format", "Download report:", c("PDF", "Word"),
                   inline = TRUE),
      downloadButton("downloadReport"),
      
      
    ),
    
    mainPanel(
      plotOutput("distPlot1")
    ))),
  
  tabPanel("Second page",
           sidebarLayout(
             sidebarPanel(
               
               sliderInput("bins2",
                           "Number of bins:",
                           min = 1,
                           max = 30,
                           value = 10),
             ),
             
             mainPanel(
               plotOutput("distPlot2")
  
  )))))

server <- function(input, output) {
  
  output$distPlot1 <- renderPlot({
    x    <- faithful[, 2]
    bins1 <- seq(min(x), max(x), length.out = input$bins1 + 1)
    
    hist(x, breaks = bins1, col = 'darkgray', border = 'white')
  })
  
  output$distPlot2 <- renderPlot({
    x    <- faithful[, 2]
    bins2 <- seq(min(x), max(x), length.out = input$bins2 + 1)
    
    hist(x, breaks = bins2, col = 'darkgray', border = 'white')
  })
  
  
  output$downloadReport <- downloadHandler(
    filename = function() {
      paste("my-report", sep = ".", switch(
        input$format, PDF = "pdf", Word = "docx"
      ))
    },
    
    content = function(file) {
      src <- normalizePath("report.Rmd")
      
      owd <- setwd(tempdir())
      on.exit(setwd(owd))
      file.copy(src, "report.Rmd", overwrite = TRUE)
      
      library(rmarkdown)
      out <- render("report.Rmd", switch(
        input$format,
        PDF = pdf_document(), Word = word_document()
      ))
      file.rename(out, file)
    }
  )
}

# Run the application
shinyApp(ui = ui, server = server)

blog/world-map-of-visited-countries-in-r/

World map of visited countries in R - Stats and R

This article illustrates how to draw a world map of the countries you have visited in R. This world map can also be used to highlight some specific countries

https://statsandr.com/blog/world-map-of-visited-countries-in-r/

idea

for the Wilcoxon test the medians are compared.

Documentation - section: Details: http://finzi.psych.upenn.edu/R/library/stats/html/wilcox.test.html:

Note that in the two-sample case the estimator for the difference in location parameters does not estimate the difference in medians (a common misconception) but rather the median of the difference between a sample from x and a sample from y.

# Hodges-Lehmann estimator:
> Boy <- subset(dat,Sex=="Boy")$Grade
> Girl <- subset(dat,Sex=="Girl")$Grade
> diff <- outer(Boy,Girl, "-")
> diff
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]   -3   -2    7   -1    8    9    0   -3   -4     7     5    -2
 [2,]  -14  -13   -4  -12   -3   -2  -11  -14  -15    -4    -6   -13
 [3,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
 [4,]  -17  -16   -7  -15   -6   -5  -14  -17  -18    -7    -9   -16
 [5,]   -5   -4    5   -3    6    7   -2   -5   -6     5     3    -4
 [6,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
 [7,]  -15  -14   -5  -13   -4   -3  -12  -15  -16    -5    -7   -14
 [8,]  -12  -11   -2  -10   -1    0   -9  -12  -13    -2    -4   -11
 [9,]   -4   -3    6   -2    7    8   -1   -4   -5     6     4    -3
[10,]  -13  -12   -3  -11   -2   -1  -10  -13  -14    -3    -5   -12
[11,]  -12  -11   -2  -10   -1    0   -9  -12  -13    -2    -4   -11
[12,]   -5   -4    5   -3    6    7   -2   -5   -6     5     3    -4
> median(diff)
[1] -4

> library("coin")
> wilcox_test(Grade ~ Sex,data= dat, conf.int= T,distribution = exact())

	Exact Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.01763
alternative hypothesis: true mu is not equal to 0
95 percent confidence interval:
 -10  -1
sample estimates:
difference in location 
                    -4

https://www.researchgate.net/post/does_wilcoxon_rank_sum_test_test_the_difference_in_medians:

equality of medians if the distributions are symmetric and of same scale parameter

blog/anova-in-r/

ANOVA in R - Stats and R

Learn how to perform an Analysis Of VAriance (ANOVA) in R to compare 3 groups or more. See also how to interpret the results and test the assumptions

https://statsandr.com/blog/anova-in-r/

idea

It is worth a mention:

Multivariate Outlier

> library(mvoutlier)
> Y <- as.matrix(ggplot2::mpg[,c(5,9)])
> res1 <- uni.plot(Y,symb=T)

# index outliers:
> which(res1$outliers == TRUE)
[1] 213 222 223


# value outliers:
> ggplot2::mpg[which(res1$outliers == TRUE),]
# A tibble: 3 x 11
  manufacturer model   displ  year   cyl trans   drv     cty   hwy fl    class  
  <chr>        <chr>   <dbl> <int> <int> <chr>   <chr> <int> <int> <chr> <chr>  
1 volkswagen   jetta     1.9  1999     4 manual… f        33    44 d     compact
2 volkswagen   new be…   1.9  1999     4 manual… f        35    44 d     subcom…
3 volkswagen   new be…   1.9  1999     4 auto(l… f        29    41 d     subcom…


# value md:
> res1$md[which(res1$outliers == TRUE)]
[1] 4.161048 4.161048 3.423600

> res2 <- aq.plot(Y)

> par(mfrow=c(2,2))
> res3 <- dd.plot(Y)
> res4 <- symbol.plot(Y)
> res5 <- corr.plot(Y[,1], Y[,2])
> res6 <- color.plot(Y)
> which(res3$outliers == TRUE)
[1] 213 222 223

Transformations

" The use of transformations is problematic for numerous reasons, including (a) transformations often fail to
restore normality and homoscedasticity; (b) they do not
deal with outliers; (c) they can reduce power;
(d) they sometimes rearrange the order of the means from what they
were originally; and (e) they make the interpretation of
results difficult, as findings are based on the transformed
rather than the original data (Grissom, 2000; Leech &
Onwuegbuzie, 2002; Lix, Keselman, & Keselman, 1996).
We strongly recommend using modern robust methods
instead of conducting classic parametric analyses on transformed data ".

source: Modern robust statistical methods: an easy way to maximize the accuracy and power of your research,
author: David M Erceg-Hurn and Vikki M Mirosevich, journal: The American psychologist, year: 2008, volume: 63 7, pages: 591-601

See:

?asbio::win
?DescTools::Winsorize

blog/outliers-detection-in-r/

Outliers detection in R - Stats and R

Learn how to detect outliers in R thanks to descriptive statistics and via the Hampel filter, the Grubbs, the Dixon and the Rosner tests for outliers

https://statsandr.com/blog/outliers-detection-in-r/

blog/getting-started-in-r-markdown/

Getting started in R markdown - Stats and R

This article is a practical guide about R Markdown, from why it is an important writing tool in R to how to compile and edit your first R Markdown document

https://statsandr.com/blog/getting-started-in-r-markdown/

blog/mortgage-calculator-r-shiny/

Mortgage calculator in R Shiny - Stats and R

Mortgage calculator - a R Shiny app to compute monthly loan or mortgage payments and to generate amortization tables

https://statsandr.com/blog/mortgage-calculator-r-shiny/

blog/an-efficient-way-to-install-and-load-r-packages/

An efficient way to install and load R packages - Stats and R

What are R packages and how to use them? Discover also a more efficient way to install and load R packages in R thanks to the pacman and librarian packages

https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/

cumulative_incident_cases

Dear Antoine,

thank you very much for you paper on the top resources for covid 19 analytics.

I tried to adjust your code for Belgium for Ukraine. It looks easy - thanks for this. But as I could see - column "cumulative_incident_cases" in my table "fit" is the same as the colum "I"

Should they be equal?

idea

Note that the presence of equal elements (ties) prevents an exact p-value calculation

Other functions calculate ?

Exact Wilcoxon-Mann-Whitney test with adjustment for ties:

> library("coin")
> wilcox_test(Grade ~ Sex,data= dat,alternative= "less",distribution = exact())

	Exact Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.008815
alternative hypothesis: true mu is less than 0

Asymptotic Wilcoxon-Mann-Whitney test with adjustment for ties:

> library("coin")
> wilcox_test(Grade ~ Sex,data= dat,alternative= "less")

	Asymptotic Wilcoxon-Mann-Whitney Test

data:  Grade by Sex (Boy, Girl)
Z = -2.3449, p-value = 0.009516
alternative hypothesis: true mu is less than 0

Solution for stochastic equality:

Brunner-Munzel generalized Wilcoxon-Mann-Whitney test

library("nparcomp")
r1 <- npar.t.test(Grade ~ Sex,data= dat,alternative= "less",method="t.app",rounds=6)
summary(r1)

       Effect Estimator    Lower Upper        T  p.Value
1 p(Boy,Girl)   0.78125 0.612167     1 2.861507 0.004655

Permutation Brunner-Munzel generalized Wilcoxon-Mann-Whitney test

library("nparcomp")
r2 <- npar.t.test(Grade ~ Sex,data= dat,alternative= "less",method="permu",rounds=6)
summary(r2)

       Estimator Statistic    Lower    Upper p.value
id       0.78125  2.861507 0.573536 0.995312  0.0087
logit    0.78125  2.213386 0.560774 0.911383  0.0082
probit   0.78125  2.331348 0.562881 0.920822  0.0083

blog/correlation-coefficient-and-correlation-test-in-r/

Correlation coefficient and correlation test in R - Stats and R

Learn how to compute a correlation coefficient (Pearson and Spearman) and perform a correlation test in R

https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/

blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/

How to upload your R code on GitHub: example with an R script on MacOS - Stats and R

See a step-by-step guide (with screenshots) on how to create a GitHub repository and upload R code and scripts on MacOS using GitHub desktop

https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/

Typo on normal distribution article

Hi Antoine, nice article about normal distribution, I found a small typo at the economics and statistics example, it says 11 and should be 8, the standard deviation:

https://www.statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

blog/chi-square-test-of-independence-by-hand/

Chi-square test of independence by hand - Stats and R

Test if two categorical variables are dependent via the Chi-square test of independence. See also how to compute it by hand and how to interpret the results

https://statsandr.com/blog/chi-square-test-of-independence-by-hand/

blog/chi-square-test-of-independence-in-r/

Chi-square test of independence in R - Stats and R

Learn when and how to use the Chi-square test of independence in R. See also how it works in practice and how to interpret the results of the Chi-square test

https://statsandr.com/blog/chi-square-test-of-independence-in-r/

blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/

Correlogram in R: how to highlight the most correlated variables in a dataset - Stats and R

Make the the most correlated variables stand out via a correlogram. See also how to enhance a correlation plot to show significant correlations among variables

https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/

idea

Comparison of several Pearson's linear correlation coefficients:

> RVAideMemoire::cor.multcomp(mtcars$mpg, mtcars$disp, factor(mtcars$cyl))

	Comparison of 3 Pearson's linear correlation coefficients

data:  mtcars$mpg and mtcars$disp by factor(mtcars$cyl) 
X-squared = 4.0477, df = 2, p-value = 0.1321
alternative hypothesis: true difference in coefficients is not equal to 0 
sample estimates:
coeff in group 4 coeff in group 6 coeff in group 8 
      -0.8052361        0.1030827       -0.5197670 

        Common correlation coefficient, 95% confidence interval
          and equality to given value 0 

     inf       r     sup theoretical     U Pr(>|U|)   
 -0.7831 -0.5681 -0.2318           0 3.092 0.001988 **

https://easystats.github.io/correlation/:

> correlation::correlation(dat, include_factors = TRUE, method = "auto")
Parameter1 | Parameter2 |     r |         95% CI |     t | df |      p |  Method | n_Obs
----------------------------------------------------------------------------------------
mpg        |        cyl | -0.85 | [-0.93, -0.72] | -8.92 | 30 | < .001 | Pearson |    32
mpg        |       disp | -0.85 | [-0.92, -0.71] | -8.75 | 30 | < .001 | Pearson |    32
mpg        |         hp | -0.78 | [-0.89, -0.59] | -6.74 | 30 | < .001 | Pearson |    32
mpg        |       drat |  0.68 | [ 0.44,  0.83] |  5.10 | 30 | < .001 | Pearson |    32
mpg        |         wt | -0.87 | [-0.93, -0.74] | -9.56 | 30 | < .001 | Pearson |    32
mpg        |       qsec |  0.42 | [ 0.08,  0.67] |  2.53 | 30 | 0.137  | Pearson |    32
mpg        |       gear |  0.48 | [ 0.16,  0.71] |  3.00 | 30 | 0.065  | Pearson |    32
mpg        |       carb | -0.55 | [-0.75, -0.25] | -3.62 | 30 | 0.016  | Pearson |    32
cyl        |       disp |  0.90 | [ 0.81,  0.95] | 11.45 | 30 | < .001 | Pearson |    32
cyl        |         hp |  0.83 | [ 0.68,  0.92] |  8.23 | 30 | < .001 | Pearson |    32
cyl        |       drat | -0.70 | [-0.84, -0.46] | -5.37 | 30 | < .001 | Pearson |    32
cyl        |         wt |  0.78 | [ 0.60,  0.89] |  6.88 | 30 | < .001 | Pearson |    32
cyl        |       qsec | -0.59 | [-0.78, -0.31] | -4.02 | 30 | 0.007  | Pearson |    32
cyl        |       gear | -0.49 | [-0.72, -0.17] | -3.10 | 30 | 0.054  | Pearson |    32
cyl        |       carb |  0.53 | [ 0.22,  0.74] |  3.40 | 30 | 0.027  | Pearson |    32
disp       |         hp |  0.79 | [ 0.61,  0.89] |  7.08 | 30 | < .001 | Pearson |    32
disp       |       drat | -0.71 | [-0.85, -0.48] | -5.53 | 30 | < .001 | Pearson |    32
disp       |         wt |  0.89 | [ 0.78,  0.94] | 10.58 | 30 | < .001 | Pearson |    32
disp       |       qsec | -0.43 | [-0.68, -0.10] | -2.64 | 30 | 0.131  | Pearson |    32
disp       |       gear | -0.56 | [-0.76, -0.26] | -3.66 | 30 | 0.015  | Pearson |    32
disp       |       carb |  0.39 | [ 0.05,  0.65] |  2.35 | 30 | 0.177  | Pearson |    32
hp         |       drat | -0.45 | [-0.69, -0.12] | -2.75 | 30 | 0.110  | Pearson |    32
hp         |         wt |  0.66 | [ 0.40,  0.82] |  4.80 | 30 | < .001 | Pearson |    32
hp         |       qsec | -0.71 | [-0.85, -0.48] | -5.49 | 30 | < .001 | Pearson |    32
hp         |       gear | -0.13 | [-0.45,  0.23] | -0.69 | 30 | 1.000  | Pearson |    32
hp         |       carb |  0.75 | [ 0.54,  0.87] |  6.21 | 30 | < .001 | Pearson |    32
drat       |         wt | -0.71 | [-0.85, -0.48] | -5.56 | 30 | < .001 | Pearson |    32
drat       |       qsec |  0.09 | [-0.27,  0.43] |  0.50 | 30 | 1.000  | Pearson |    32
drat       |       gear |  0.70 | [ 0.46,  0.84] |  5.36 | 30 | < .001 | Pearson |    32
drat       |       carb | -0.09 | [-0.43,  0.27] | -0.50 | 30 | 1.000  | Pearson |    32
wt         |       qsec | -0.17 | [-0.49,  0.19] | -0.97 | 30 | 1.000  | Pearson |    32
wt         |       gear | -0.58 | [-0.77, -0.29] | -3.93 | 30 | 0.008  | Pearson |    32
wt         |       carb |  0.43 | [ 0.09,  0.68] |  2.59 | 30 | 0.132  | Pearson |    32
qsec       |       gear | -0.21 | [-0.52,  0.15] | -1.19 | 30 | 1.000  | Pearson |    32
qsec       |       carb | -0.66 | [-0.82, -0.40] | -4.76 | 30 | < .001 | Pearson |    32
gear       |       carb |  0.27 | [-0.08,  0.57] |  1.56 | 30 | 0.774  | Pearson |    32

correlation matrix:

PerformanceAnalytics::chart.Correlation(as.matrix(dat), histogram=TRUE, pch="+")

blog/hypothesis-test-by-hand/

Hypothesis test by hand - Stats and R

Learn the structure of a hypothesis test by hand, illustrated by 4 easy steps using the critical value, p-value and confidence interval methods

https://statsandr.com/blog/hypothesis-test-by-hand/

blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/

Student's t-test in R and by hand: how to compare two groups under different scenarios - Stats and R

Learn how to apply the Student's t-test by hand and in R in order to compare two independent or paired samples with known or unknown variances

https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/

blog/data-manipulation-in-r/

Data manipulation in R - Stats and R

See how to subset a dataset, create a new variable, recode categorical variables or labels, rename a variable, create a dataframe and deal with missing values

https://statsandr.com/blog/data-manipulation-in-r/

Idea

Kruskal-Wallis test:

This non-parametric test, robust to non normal distributions, has the same null and alternative hypotheses, and the same interpretations
than the ANOVA.

Note that the Kruskal-Wallis test does not require the assumptions of normality nor homoscedasticity of the variances.

Note 1

The Kruskal–Wallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing whether samples originate from the same distribution.
It is used for comparing two or more samples that are independent, and that may have different sample sizes, and extends the Mann–Whitney U test to more than two groups. The parametric equivalent of the Kruskal-Wallis test is the one-way analysis of variance (ANOVA).
When rejecting the null hypothesis of the Kruskal-Wallis test, then at least one sample stochastically dominates at least one other sample. The test does not identify where this stochastic dominance occurs or for how many pairs of groups stochastic dominance obtains.
Dunn's test would help analyze the specific sample pairs for stochastic dominance.
Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution of the residuals, unlike the analogous one-way analysis of variance.
If the researcher can make the more stringent assumptions of an identically shaped and scaled distribution for all groups, except for any difference in medians, then the null hypothesis is that the medians of all groups are equal, and the alternative hypothesis is that at least one population median of one group is different from the population median of at least one other group.

source: Cortés, Omar. (2015). Re: Which statistical test to use?. Retrieved from: https://www.researchgate.net/post/Which_statistical_test_to_use4/55edbeb260614b7ac18b458d/citation/download.

Note 2

A problem with the Kruskal-Wallis test is that, while it does not assume normality for groups, it still assumes homoscedasticity
(i.e. the groups have the same distributional shape). As a solution Brunner et al. (1997) proposed a heteroscedastic version of
the Kruskal-Wallis test which utilizes the F-distribution. Along with being robust to non-normality and heteroscedasticity, calculations of
exact P-values using the Brunner-Dette-Munk method are not made more complex by tied values. This is another obvious advantage over the
traditional Kruskal-Wallis approach.

source: asbio

blog/graphics-in-r-with-ggplot2/

Graphics in R with ggplot2 - Stats and R

Learn how to create professional graphics and plots in R (histogram, barplot, boxplot, scatter plot, line plot, density plot, etc.) with the ggplot2 package

https://statsandr.com/blog/graphics-in-r-with-ggplot2/

blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/

A package to download free Springer books during Covid-19 quarantine - Stats and R

This article presents a R package which allows to download free data science books in PDF that are made available by Springer during the COVID-19 quarantine

https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/

blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/

How to do a t-test or ANOVA for more than one variable at once in R - Stats and R

Learn how to compare samples for multiple variables at once in R thanks to a Student t-test or ANOVA and communicate the results in a better way

https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/

blog/how-to-install-r-and-rstudio/

How to install R and RStudio? - Stats and R

This article explains what is R and RStudio (an open source statistical software program), and how to install it. Some examples of basic code are also presented

https://statsandr.com/blog/how-to-install-r-and-rstudio/

blog/track-blog-performance-in-r/

How to track the performance of your blog in R? - Stats and R

Learn how to track the performance of your blog or website in R by analyzing page views, sessions, users and engagement with the {googleAnayticsR} package

https://statsandr.com/blog/track-blog-performance-in-r/

blog/variable-types-and-examples/

Variable types and examples - Stats and R

Learn the differences between a quantitative continuous, quantitative discrete, qualitative ordinal and qualitative nominal variable via concrete examples

https://statsandr.com/blog/variable-types-and-examples/

blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/

Why do I have a data science blog? 7 benefits of sharing your code - Stats and R

Learn about the 7 seven main benefits and reasons of having a data science blog and sharing your code, expertise and knowledge through your blog

https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/

blog/how-to-publish-shiny-app-example-with-shinyapps-io/

How to publish a Shiny app: example with shinyapps.io - Stats and R

See a step-by-step guide (with screenshots) on how to deploy and publish online a Shiny app using shinyapps.io

https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/

blog/hello-world/

Hello World! - Stats and R

Hello readers! This is the first article of a blog aimed at making statistics easy to understand by illustrating with concrete examples and using plain English

https://statsandr.com/blog/hello-world/

Consider replacing utterances with giscus

Hey there,

I see that you're using utterances for comments. As we know, it utilizes GitHub Issues, which (given enough time) would flood your repository's issues with comments. (Looks like you've already got quite a lot of issues for comments here).

I've been developing an alternative: giscus, a similar project that utilizes GitHub Discussions instead. It has support for replies and other cool stuff from GitHub Discussions. The big advantage is the fact that it uses your repository's Discussions, which is more suitable for comments. I would really appreciate it if you tried it. Feedback is welcome, the code is open source.

You can convert existing issues into discussions, as described here.

Thanks!

blog/descriptive-statistics-in-r/

Descriptive statistics in R - Stats and R

Learn how to perform a descriptive analysis of your data in R, from simple descriptive statistics to more advanced graphics used to describe your data at hand

https://statsandr.com/blog/descriptive-statistics-in-r/

Example of issue with reprex addin

# packages
library(palmerpenguins)
library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# data
data("penguins")
head(penguins)
#> # A tibble: 6 x 8
#>   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
#>   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
#> 1 Adelie  Torge…           39.1          18.7              181        3750 male 
#> 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
#> 3 Adelie  Torge…           40.3          18                195        3250 fema…
#> 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
#> 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
#> 6 Adelie  Torge…           39.3          20.6              190        3650 male 
#> # … with 1 more variable: year <int>

# descriptive stats
summary(penguins)
#>       species          island    bill_length_mm  bill_depth_mm  
#>  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
#>  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
#>  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
#>                                  Mean   :43.92   Mean   :17.15  
#>                                  3rd Qu.:48.50   3rd Qu.:18.70  
#>                                  Max.   :59.60   Max.   :21.50  
#>                                  NA's   :2       NA's   :2      
#>  flipper_length_mm  body_mass_g       sex           year     
#>  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
#>  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
#>  Median :197.0     Median :4050   NA's  : 11   Median :2008  
#>  Mean   :200.9     Mean   :4202                Mean   :2008  
#>  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
#>  Max.   :231.0     Max.   :6300                Max.   :2009  
#>  NA's   :2         NA's   :2

# scatterplot
penguins %>%
  filter(!is.na(sex)) %>%
  ggplot() +
  aes(x = bill_length_mm, y = flipper_length_mm, colour = species) +
  geom_point(size = 1L) +
  scale_color_hue() +
  theme_minimal() +
  facet_wrap(vars(sex))

^{Created on 2020-12-18 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Brussels             
#>  date     2020-12-18                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version date       lib source        
#>  assertthat       0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports        1.1.8   2020-06-17 [1] CRAN (R 4.0.0)
#>  callr            3.5.1   2020-10-13 [1] CRAN (R 4.0.2)
#>  cli              2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  colorspace       1.4-1   2019-03-18 [1] CRAN (R 4.0.0)
#>  crayon           1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  curl             4.3     2019-12-02 [1] CRAN (R 4.0.0)
#>  desc             1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools         2.3.2   2020-09-18 [1] CRAN (R 4.0.2)
#>  digest           0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  dplyr          * 1.0.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  ellipsis         0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate         0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi            0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  farver           2.0.3   2020-01-16 [1] CRAN (R 4.0.0)
#>  fs               1.4.2   2020-06-30 [1] CRAN (R 4.0.2)
#>  generics         0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
#>  ggplot2        * 3.3.2   2020-06-19 [1] CRAN (R 4.0.0)
#>  glue             1.4.1   2020-05-13 [1] CRAN (R 4.0.0)
#>  gtable           0.3.0   2019-03-25 [1] CRAN (R 4.0.0)
#>  highr            0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools        0.5.0   2020-06-16 [1] CRAN (R 4.0.0)
#>  httr             1.4.2   2020-07-20 [1] CRAN (R 4.0.2)
#>  knitr            1.29    2020-06-23 [1] CRAN (R 4.0.0)
#>  labeling         0.3     2014-08-23 [1] CRAN (R 4.0.0)
#>  lifecycle        0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr         1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise          1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  mime             0.9     2020-02-04 [1] CRAN (R 4.0.0)
#>  munsell          0.5.0   2018-06-12 [1] CRAN (R 4.0.0)
#>  palmerpenguins * 0.1.0   2020-07-23 [1] CRAN (R 4.0.2)
#>  pillar           1.4.4   2020-05-05 [1] CRAN (R 4.0.0)
#>  pkgbuild         1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
#>  pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  pkgload          1.1.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits      1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx         3.4.4   2020-09-03 [1] CRAN (R 4.0.2)
#>  ps               1.3.3   2020-05-08 [1] CRAN (R 4.0.0)
#>  purrr            0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
#>  R6               2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  remotes          2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang            0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown        2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  rprojroot        1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  scales           1.1.1   2020-05-11 [1] CRAN (R 4.0.0)
#>  sessioninfo      1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi          1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr          1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat         2.3.2   2020-03-02 [1] CRAN (R 4.0.0)
#>  tibble           3.0.1   2020-04-20 [1] CRAN (R 4.0.0)
#>  tidyselect       1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
#>  usethis          1.6.3   2020-09-17 [1] CRAN (R 4.0.2)
#>  utf8             1.1.4   2018-05-24 [1] CRAN (R 4.0.0)
#>  vctrs            0.3.1   2020-06-05 [1] CRAN (R 4.0.0)
#>  withr            2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun             0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  xml2             1.3.2   2020-04-23 [1] CRAN (R 4.0.0)
#>  yaml             2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Users/antoine/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption - Stats and R

Learn how to do the Wilcoxon test (non-parametric version of the Student's t-test) in R, used to compare 2 groups when the normality assumption is violated

https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/

blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R - Stats and R

This article explains in details what is the normal or Gaussian distribution, its importance in statistics and how to test if your data is normally distributed

https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/

blog/top-r-resources-on-covid-19-coronavirus/

Top 100 R resources on Novel COVID-19 Coronavirus - Stats and R

Best R resources about Coronavirus (COVID-19). These resources are Shiny app, R packages or code that you can use freely to analyze the Coronavirus outbreak

https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/

blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/

How to perform a one sample t-test by hand and in R: test on one mean - Stats and R

Learn how to perform the one sample t-test by hand and in R in order to compare a sample to a hypothesized value, with known or unknown population variance

https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/

blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/

A Shiny app for simple linear regression by hand and in R - Stats and R

This article presents a Shiny app for computing simple linear regression by hand and in R. Add your own data, see the results and download them as a report

https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/

blog/a-shiny-app-for-inferential-statistics-by-hand/

A Shiny app for inferential statistics by hand - Stats and R

This article presents how to perform inferential statistics by hand, namely, confidence intervals and hypothesis tests for means, proportions and variances

https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/

blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R - Stats and R

Learn how to perform clustering analysis, namely k-means and hierarchical clustering, by hand and in R. See also how the different clustering algorithms work

https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/

blog/covid-19-in-belgium/

COVID-19 in Belgium - Stats and R

This article presents an analysis of the Novel COVID-19 Coronavirus in Belgium using R. Feel free to apply it to your own country

https://statsandr.com/blog/covid-19-in-belgium/

blog/descriptive-statistics-by-hand/

Descriptive statistics by hand - Stats and R

Learn how to perform a descriptive analysis of your data by hand. You will learn how to compute both location and dispersion measures to describe your data

https://statsandr.com/blog/descriptive-statistics-by-hand/

Data entry

Rather than use long hand to create the database, it is more productive to create a file that holds the data. In that way, it is easier to edit the database without editing the codes. This will also avoid errors in coding and instead of so many lines of code, the entry becomes just two lines from 10 and would elongate if more roles are added:

library(readxl)
cv <- read_excel("filename.xlsx")

ggplot2 tutorial

That is a nice tutorial on ggplot2. A minor update is that the first paragraph should probably mention lattice. Yes, it has been effectively superseded by ggplot2, but it still exists and did play an important role.

Terry Therneau
Mayo Clinic

Error: object 'death' not found

Hi There!
Thank you very much for your tutorial.
I am unable to replicate your code/results
I get the following message: "Error: object 'death' not found" after running the following code segment:

%>% <- magrittr::%>%

extract the cumulative incidence

df <- coronavirus %>%
dplyr::filter(Country.Region == "Belgium") %>%
dplyr::group_by(date, type) %>%
dplyr::summarise(total = sum(cases, na.rm = TRUE)) %>%
tidyr::pivot_wider(
names_from = type,
values_from = total
) %>%
dplyr::arrange(date) %>%
dplyr::ungroup() %>%
dplyr::mutate(active = confirmed - death - recovered) %>%
dplyr::mutate(
confirmed_cum = cumsum(confirmed),
death_cum = cumsum(death),
recovered_cum = cumsum(recovered),
active_cum = cumsum(active)
)

I am sure I am doing the right thing and also have relevant packages loaded.
Please check.
Thanks,
Emmy

blog/rstudio-addins-or-how-to-make-your-coding-life-easier/

RStudio addins, or how to make your coding life easier - Stats and R

Discover the best RStudio addins, how to use them in practice and how they can help you when writing code in R or R Markdown

https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/

blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/

How to create a simple Coronavirus dashboard specific to your country in R - Stats and R

This article will help you to build a visually appealing dashboard about the spread of COVID-19 Coronavirus specific to a country in R using flexdashboard

https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/

add link to github repo for all shiny apps in all articles

blog/tips-and-tricks-in-rstudio-and-r-markdown/

Tips and tricks in RStudio and R Markdown - Stats and R

This article illustrates the main tips, tricks and shortcuts you can use in RStudio and R Markdown to write code more quickly and more efficiently

https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/

antoinesoetewey / statsandr Goto Github PK

statsandr's Introduction

Status

statsandr's People

Contributors

Stargazers

Watchers

Forkers

statsandr's Issues

World map of visited countries in R - Stats and R

ANOVA in R - Stats and R

Outliers detection in R - Stats and R

Getting started in R markdown - Stats and R

Mortgage calculator in R Shiny - Stats and R

An efficient way to install and load R packages - Stats and R

Correlation coefficient and correlation test in R - Stats and R

How to upload your R code on GitHub: example with an R script on MacOS - Stats and R

Chi-square test of independence by hand - Stats and R

Chi-square test of independence in R - Stats and R

Correlogram in R: how to highlight the most correlated variables in a dataset - Stats and R

Hypothesis test by hand - Stats and R

Student's t-test in R and by hand: how to compare two groups under different scenarios - Stats and R

Data manipulation in R - Stats and R

Graphics in R with ggplot2 - Stats and R

A package to download free Springer books during Covid-19 quarantine - Stats and R

How to do a t-test or ANOVA for more than one variable at once in R - Stats and R

How to install R and RStudio? - Stats and R

How to track the performance of your blog in R? - Stats and R

Variable types and examples - Stats and R

Why do I have a data science blog? 7 benefits of sharing your code - Stats and R

How to publish a Shiny app: example with shinyapps.io - Stats and R

Hello World! - Stats and R

Descriptive statistics in R - Stats and R

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption - Stats and R

Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R - Stats and R

Top 100 R resources on Novel COVID-19 Coronavirus - Stats and R

How to perform a one sample t-test by hand and in R: test on one mean - Stats and R

A Shiny app for simple linear regression by hand and in R - Stats and R

A Shiny app for inferential statistics by hand - Stats and R

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R - Stats and R

COVID-19 in Belgium - Stats and R

Descriptive statistics by hand - Stats and R

extract the cumulative incidence

RStudio addins, or how to make your coding life easier - Stats and R

How to create a simple Coronavirus dashboard specific to your country in R - Stats and R

Tips and tricks in RStudio and R Markdown - Stats and R

Recommend Projects

Recommend Topics

Recommend Org