appliedepi / epirhandbook_eng Goto Github PK
View Code? Open in Web Editor NEWThe repository for the English version of the Epidemiologist R Handbook
License: Other
The repository for the English version of the Epidemiologist R Handbook
License: Other
once this package hits cran we should consider updating the sankey diagrams in the handbook
https://github.com/davidsjoberg/ggsankey
Into the pages Common errors, the links of interpreting error messages doesn't work.
In line 11 says StackExchange.com, stackoverflow.com, community.rstudio.com
and should be StackExchange.com, stackoverflow.com, community.rstudio.com
Create a new repo that pulls all the html outputs from translations into subfolders so that website can be built on a single domain with landing page.
New bookdown version has option to specify folder for output - so worth considering a wrapper script for rendering.
Also need to consider how this will interact with #8
We should probably consider signing up for the beta version of github project planning .
Seems quite functional and from the FAQ seems will be a free version... thoughts?
Hola,
When rendering epicurves.Rmd (alone or the entire book). Gives that error.
I am using R 4.1.1 and incidence2 1.2.1 and later incidence 1.2.2
Error in (function (cond) :
error in evaluating the argument 'x' in selecting a method for function 'plot': cumulate()
was deprecated in incidence2 1.2.0 and is now defunct.
lines 757-760 epicurves.Rmd
wkly_inci %>%
cumulate() %>%
plot()
Hi there,
I appritiate for your work on epirhankbook based tidyverse of R and learned a lot.
When I get to the part of https://epirhandbook.com/pivoting-data.html#pivoting-data-of-multiple-classes,
these codes like:
df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)
df_long
df_long <-
df_long %>%
mutate(
date = date %>% lubridate::as_date(),
observation =
observation %>%
str_remove_all("obs") %>%
as.numeric()
)
df_long
df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)
df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)
I
I would like to recommand the more ideal code as follows:
#Import data
obs <-
structure(
list(
id = c("A", "B", "C"),
obs1_date = c("2021-04-23", "2021-04-23", "2021-04-23"),
obs1_status = c("Healthy", "Healthy", "Missing"),
obs2_date = c("2021-04-24", "2021-04-24", "2021-04-24"),
obs2_status = c("Healthy", "Healthy", "Healthy"),
obs3_date = c("2021-04-25", "2021-04-25", "2021-04-25"),
obs3_status = c("Unwell", "Healthy", "Healthy")),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))
#Tidy data
obs %>%
pivot_longer(
2:last_col(),
names_to = c("obs", ".value"),
names_pattern = "obs(.)_(.+)",
names_transform = list(obs = as.integer),
values_transform = list(date = as.Date))
Best wishes,
Tony
From email from [email protected] to [email protected] on 8 August
Thanks very much to Thuan from the Vietnamese team for pointing this out:
The code of the 'Standardised rates' page does not work well. Finally, I found that a minor error in the code
# Remove specific string from column values
standard_pop_clean <- standard_pop_data %>%
mutate(
age_cat5 = str_replace_all(age_cat5, "years", ""), # remove "year"
age_cat5 = str_replace_all(age_cat5, "plus", ""), # remove "plus"
age_cat5 = str_replace_all(age_cat5, " ", "")) %>% # remove " " space
rename(pop = WorldStandardPopulation) # change col name to "pop", as this is expected by dsr package
Only change age_cat5 by AgeGroup in the first row of mutate(), everything works well, see highlighted text bellow.
standard_pop_clean <- standard_pop_data %>%
mutate(
age_cat5 = str_replace_all(AgeGroup, "years", ""), # remove "year"
age_cat5 = str_replace_all(age_cat5, "plus", ""), # remove "plus"
age_cat5 = str_replace_all(age_cat5, " ", "")) %>% # remove " " space
rename(pop = WorldStandardPopulation) # change col name to "pop", as this is expected by dsr package
Use of accuracy = 0.1 or 0.01 within scales() used on a normal data frame, to adjust the number of decimal places shown.
see https://stackoverflow.com/questions/53072282/how-to-prevent-scalespercent-from-adding-decimal
# use the code below to automatically make epidemic curves by communue, for EVERY Department (iteration)
# The plots are saved into the "png" folder in the R project.
# Define vector of unique department names
dept_names <- palu %>%
pull(Département) %>%
unique()
# "map" a function across each of the department names.
# The function filters the dataset for the department and creates/saves the plot
dept_plots <- palu %>% # begin with the complete dataset
group_split(Département) %>% # split into different datasets by Departement
purrr::map(~ggsave( # the function that is iterated is ggsave() to save the plot
# within the ggsave(), the file name is created as:
filename = here::here(
"png", # the folder "png"
str_glue("paludisme_par_mois_{first(.$Département)}.png")), # and then a dynamic file name that contains the department name.
# and the plot to save is created from the split data, created above (.x)
plot = ggplot(data = .x,
mapping = aes(x = mois, y = nombre_cas, group = annee))+
geom_line(aes(color = annee), size = 2, alpha = 0.6)+
facet_wrap(~Communes, scales = "free_y")+ # one plot per commune
labs(
y = "Nombre de cas",
x = "Mois",
title = str_glue("Paludisme, Department {first(.x$Département)}"))+
theme_minimal(16)+
theme(axis.text.x = element_text(angle = 90)),
width = 18,
height = 10
) # end ggsave()
) # end map()
In the ordered y axis section of 34.3, the below code is used to order the location_name variable, however, this does not work. The location_name variable needs to be converted to a factor first and then fct_relevel will do what is expected.
You'll see that the current figures on the webpage with/without an ordered y-axis are identical.
Maybe also better to put the p_load(forcats) to the start of the chapter rather than in this chunk
`load package
pacman::p_load(forcats)
create factor and define levels manually
agg_weeks <- agg_weeks %>%
mutate(location_name = fct_relevel(
location_name, facility_order$location_name)
)`
just so we dont forget to address all the leftover issues in the archive repo
All translations need updated Github links:
Use Find in Files tool to replace all:
epirhandbook/Epi_R_handbook to:
appliedepi/epirhandbook_eng
all the various pages that link to data (opening paragraphs)
In section 37.3 Handling, the subset code produces an error:
sub_attributes <- subset( epic, node_attribute = list( gender = "m", date_infection = as.Date(c("2014-04-01", "2014-07-01")) ), edge_attribute = list(location = "Nosocomial") )
Error in FUN(X[[i]], ...) :
Value for date_infection is not found in dataset
In addition: Warning message:
In if (!attribute %in% data) stop(paste("Value for", name_attribute, :
the condition has length > 1 and only the first element will be used
It looks like this is because the dates are stored as datetime when reading the "linelist_cleaned.xlsx" dataset.
Adding the line below when reading the data fixes the problem:
linelist <- rio::import("linelist_cleaned.xlsx") %>% mutate(across(starts_with("date"), as.Date))
both months and minutes are written with the same. strptime, line 106 - 122, file dates.Rmd
Epicurves page:
Add to all ggplot chunks that use a breaks =
statement
closed = "left", # count cases from start of breakpoint
Add bullet to explain its use, in Weekly Epicurve Example section
closed = "left"
in the geom_histogram()
to ensure the date are counted in the correct binsEdit the multi-line date labels to include labels = scales::label_date_short()
.
Remove one of the old examples using faceting.
Edit the green faceting example to remove the facet panel border.
Also this is updated in the ggplot tip page section on date axis.
Technical updates in pages, that enable rendering:
drop_na(objectid) %>%
to chunk that creates case_adm3_sf
~ line 605 # drop any empty rows. This prevents error about discrepancy between 9 and 10 rows.OpenStreetMap::
around line 730 to openmap() commandWelcome page
Epicurves page
Fixing issue #56 with these changes. Also need to add a sentence referring people to the Dates page if they are reading in an Excel file.
Nice to have but:
in the process of reducing the repo size (#6) we could also consider standardising the repo language name extensions to either be 3-letter ISO codes or country domains
Weather data is missing. The codes chunks for Fitting regressions are returning error messages.
You can normally copy/paste most chunks of code from the book to R studio, but this is different in much of the page on flexdashboards. Unclear if this is intentional but ideally good to have chunks of code that can be copied rather than as images.
Minor point, there is a message appearing just before "Create heat plot" section in section 34.3
It says ## Joining, by = c("location_name", "week") and it appears as code that a user could copy/paste to R.
Probably good to just remove this message.
https://juliasilge.com/blog/reorder-within/
We should add something on this, perhaps at the end of the page as an advanced tip (or in ggplot2 tips page)
Are there files available to run through the survey analysis section?
I couldn't find them after installing the handbook/or on github.
It would be helpful if such files were available to ensure users are getting the same results as presented online.
#import the survey data survey_data <- rio::import("survey_data.xlsx")
import the dictionary into R survey_dict <- rio::import("survey_dict.xlsx")
Data begins in linelist format, with each row having a classification. Then transformed to "wide" with one column per time (values the classifications), plus one column for counts. Then transformed to long with ggalluvial-specific function.
Data preparation:
ELR_wide <- ELR %>%
pivot_wider(
id_cols = c(country, who_region),
names_from = week,
values_from = final) %>%
drop_na() %>%
group_by(across(contains("2021"))) %>%
count() %>%
ungroup() %>%
mutate(across(.cols = -n,
.fns = ~recode(.x,
"1 - Critical" = "Critical",
"2 - Very high" = "Very high",
"3 - High" = "High",
"4 - Medium" = "Medium",
"5 - Low" = "Low",
"6 - Minimal/NA" = "Minimal",
"No Data" = "No Data"))) %>%
mutate(across(.cols = -n,
.fns = ~fct_relevel(.x, c(
"Critical",
"Very high",
"High",
"Medium",
"Low",
"Minimal",
"No Data")))) %>%
mutate(across(.cols = -n,
.fns = fct_rev))
#levels(ELR_wide$`2021-07-19`)
library(ggalluvial)
#is_alluvia_form(as.data.frame(ELR_wide), axes = 1:3, silent = TRUE)
ELR_long_alluvial_original <- to_lodes_form(data.frame(ELR_wide),
key = "date",
axes = 1:6) %>%
mutate(date = str_replace_all(date, "X2021.", "")) %>%
mutate(stratum = fct_relevel(stratum,
c("Critical",
"Very high",
"High",
"Medium",
"Low",
"Minimal",
"No Data")))
# add current ELR classifications
ELR_long_alluvial <- left_join(
ELR_long_alluvial_original,
ELR_long_alluvial_original %>%
select(-n) %>%
filter(date == max(date, na.rm=T)) %>%
rename(
current_date = date,
current_ELR = stratum),
by = c("alluvium" ))
# add earliest ELR classifications
ELR_long_alluvial <- left_join(
ELR_long_alluvial,
ELR_long_alluvial_original %>%
select(-n) %>%
filter(date == min(date, na.rm=T)) %>%
rename(
oldest_date = date,
oldest_ELR = stratum),
by = c("alluvium" ))
data visualization with ggalluvial
# plot showing current classification
ggplot(data = ELR_long_alluvial,
aes(x = date, stratum = stratum, alluvium = alluvium,
y = n, label = stratum)) +
geom_alluvium(aes(fill = current_ELR)) +
geom_stratum() +
geom_text(stat = "stratum", size = 1) +
theme_minimal() +
scale_fill_manual(
values = c(
"Critical" = "black",
"Very high" = "darkred",
"High" = "red",
"Medium" = "darkorange",
"Low" = "darkgreen",
"Minimal" = "green",
"No Data" = "grey"))+
labs(
title = "[INTERNAL] ELR classification trajectories\ncolored by current classification (global)",
fill = str_glue("ELR classification\non {max(ELR_long_alluvial$date, na.rm=T)}"),
caption = "Last 6 weeks of data")
New to add:
https://www.aj2duncan.com/blog/missing-data-ggplot2-barplots/
https://stackoverflow.com/questions/10834382/ggplot2-keep-unused-levels-barplot
Also make more clear in Factors page how o keep all levels in a plot. Note you may need to use scale_x_discrete drop=F, as well as scale_fill_ drop=F
how to deal with some values missing in some facets of barplots.
the solution that worked for me was:
geom_col(position = position_dodge(preserve = 'single'))+
The EpiNow2 code chunk below returned an error message:
epinow_res <- epinow(
reported_cases = cases,
generation_time = generation_time,
delays = delay_opts(incubation_period),
return_output = TRUE,
verbose = TRUE,
horizon = 21,
stan = stan_opts(samples = 750, chains = 4)
)
Error Message:
Logging threshold set at INFO for the EpiNow2 logger
Writing EpiNow2 logs to the console and: C:\Users\AKARST1\AppData\Local\Temp\RtmpcZOade/regional-epinow/2015-04-30.log1\AppData\Local\Temp\RtmpcZOade/epinow/2015-04-30.log
Logging threshold set at INFO for the EpiNow2.epinow logger
Writing EpiNow2.epinow logs to the console and: C:\Users\AKARST
Error in seq.int(0, to0 - from, by) : 'to' must be a finite number
Follow this sequence of actions (see two linked issues below):
Languages to complete:
Thanks very much Thuan from the Vietnamese team for point out that the offline version of the vietnamese translation is not available from the website. (think this just needs to be knit?)
The Vietnamese version does not have an offline version although it is not necessarily required.
ggplotly no longer supports levels.grates_yearweek()
which the incidence package uses and so the code for the interactive plot does not run.
It doesn't work with other packages - tsibble or aweek. Weeks or months are not supported by ggplotly
Currently none of the language translations have the donate button on the homepage.
As suggested by Bassem we might want to look at the pricing plans for github large file storage if our clean up in (#6) still leaves us with a heavy load.
Somehow there were some translated sentences that went live in the R Basics page. Need to be removed and re-rendered.
@yuriei @ishaberry @ebuajitti and @aspina7 @hitomik723
This is a copy of an issue from epiRhandbook_jp regarding gis.Rmd (28 GIS basics):
appliedepi/epiRhandbook_jp#5
Geographic Information System (GIS) is geography-dependent, i.e. the national/local governments may maintain their coordinate systems and provide data such as national census and buildings. I suggest adding local information to the Japanese version of GIS.Rmd.
Below is a sample of additional information for readers in Japan.
CRS used in Japan
* Japan Plane Rectangular CS I to XIII, EPSG: 2443-2455
* JGD2011 GRS80 ellipsoid, EPSG:6668
GIS data available for Japan
* [Census](http://e-stat.go.jp/SG2/eStatGIS/page/download.html) population by address
* [Suuchi](http://nlftp.mlit.go.jp/ksj/) - a variety of features (e.g. hospitals, schools, school districts)
* [Kiban](http://www.gsi.go.jp/kiban/etsuran.html) - e.g. building and road edges
* [Geospatial Japan](https://www.geospatial.jp/ckan/dataset) - a portal of GIS data by local governments and others
In the code, the SRID=4326 is used. This is mainly for USA. Japanese government has maintained its own coordinate system and revised in 2000 and 2011 (the latter following the Great East Japan Earthquake).
| Coordinates | EPSG | Region |
| :--- | :--- | :------- |
| lat/lon | 4326 | US |
| lat/lon | 3857 | Web |
| lat/lon | 4612 | Japan (JGD2000) |
| lat/lon | 6668 | Japan (JGD2011) |
The relationships dataset in Contact Tracing section is missing.
with thanks to @babayoshihiko.
Email from [email protected] to epiRhandbook on 28 July
Hello,
Thank you very much for providing the public with The Epidemiologist R Handbook. I am a relative novice in R but am reaching out to clarify the definition for factors that you list here: https://epirhandbook.com/factors.html
The use of "ordered" and "order" in the first and second sentences, respectively, seem to suggest that factors in R are only useful for ordinal variables. Can you clarify this? I could see how this may throw some users off.
Thank you again for this wonderful reference text.
Ryan
Ryan S. Babadi, PhD, MPH
Postdoctoral Research Fellow
Department of Environmental Health
Harvard T.H. Chan School of Public Health
Hi Ryan,
Good to hear from you - I'm glad you are finding the Handbook helpful. Thanks for writing to us about this wording, I appreciate the detailed feedback.
You are correct in identifying that one can convert a column to class factor without defining an order to the factor levels. This would presumably be with the intention of setting a limited range of acceptable values. However, in my experience with R in applied epi, the vast majority of use cases for factors are centered around specifying the level order, so that's what I focused on in writing the opening text to this page.
I've made a note to revise this language in version 2 of the Handbook, to be more clear. Thanks again for writing to us,
Neale
Hi there,
One of our Japanese translators (@KoKYura) let me know an issue in the R code chunk below in Chapter 32 Epidemic Curves.
epi_day <- incidence( # create incidence object
x = linelist, # dataset
date_index = date_onset, # date column
interval = "day" # date grouping interval
)
Error: Not implemented for class POSIXct, POSIXt
The error occurs because the date fields are imported as "POSIXct" in the previous code that uses import() function.
I was wondering if we could set the variable class as date by using readxl::read_excel() rather than import() function? Or we could convert the POSIXct variables to the date class right after the dataset is imported, using some code like this:
linelist <- import("linelist_cleaned.xlsx")
linelist[] <- lapply(linelist, function(x) {
if (inherits(x, "POSIXct")) as.Date(x) else x
})
Please let me know if I can help you with modifying the code.
Thank you,
Hitomi
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.