Coder Social home page Coder Social logo

epirhandbook_eng's People

Contributors

alexandreblake avatar aspina7 avatar nsbatra avatar ntluong95 avatar oliviabboyd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epirhandbook_eng's Issues

Review GIS page

  • need to update basemap section to align with case studies (ggplot friendlier packages + offline saving)
  • link to / incorporate material from rspatialdata

Stats page updates

  • follow up with mark from melbourne about epiR updates to stats page
  • work on gtsummary wrapper function for stratified analysis
  • show how to add counts to a regression table with {gtsummary} as detailed here

Restructuring website for translations

Create a new repo that pulls all the html outputs from translations into subfolders so that website can be built on a single domain with landing page.
New bookdown version has option to specify folder for output - so worth considering a wrapper script for rendering.
Also need to consider how this will interact with #8

Is something wrong or missing in epicurves.Rmd ?

Hola,

When rendering epicurves.Rmd (alone or the entire book). Gives that error.
I am using R 4.1.1 and incidence2 1.2.1 and later incidence 1.2.2

Error in (function (cond) :
error in evaluating the argument 'x' in selecting a method for function 'plot': cumulate() was deprecated in incidence2 1.2.0 and is now defunct.

  • See the 1.2.0 NEWS for more information

lines 757-760 epicurves.Rmd

plot cumulative incidence

wkly_inci %>%
cumulate() %>%
plot()

Code improvement: Pivoting

Hi there,
I appritiate for your work on epirhankbook based tidyverse of R and learned a lot.
When I get to the part of https://epirhandbook.com/pivoting-data.html#pivoting-data-of-multiple-classes,
these codes like:

df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)

df_long

df_long <-
df_long %>%
mutate(
date = date %>% lubridate::as_date(),
observation =
observation %>%
str_remove_all("obs") %>%
as.numeric()
)

df_long

df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)

df_long <-
df %>%
pivot_longer(
cols = -id,
names_to = c("observation", ".value"),
names_sep = "_"
)

I

I would like to recommand the more ideal code as follows:

#Import data
obs <-
structure(
list(
id = c("A", "B", "C"),
obs1_date = c("2021-04-23", "2021-04-23", "2021-04-23"),
obs1_status = c("Healthy", "Healthy", "Missing"),
obs2_date = c("2021-04-24", "2021-04-24", "2021-04-24"),
obs2_status = c("Healthy", "Healthy", "Healthy"),
obs3_date = c("2021-04-25", "2021-04-25", "2021-04-25"),
obs3_status = c("Unwell", "Healthy", "Healthy")),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))

#Tidy data
obs %>%
pivot_longer(
2:last_col(),
names_to = c("obs", ".value"),
names_pattern = "obs(.)_(.+)",
names_transform = list(obs = as.integer),
values_transform = list(date = as.Date))

Best wishes,

Tony

From email from [email protected] to [email protected] on 8 August

Error in standardised rates page

Thanks very much to Thuan from the Vietnamese team for pointing this out:

The code of the 'Standardised rates' page does not work well. Finally, I found that a minor error in the code

# Remove specific string from column values
standard_pop_clean <- standard_pop_data %>%
     mutate(
          age_cat5 = str_replace_all(age_cat5, "years", ""),   # remove "year"
          age_cat5 = str_replace_all(age_cat5, "plus", ""),    # remove "plus"
          age_cat5 = str_replace_all(age_cat5, " ", "")) %>%   # remove " " space
     rename(pop = WorldStandardPopulation)   # change col name to "pop", as this is expected by dsr package

Only change age_cat5 by AgeGroup in the first row of mutate(), everything works well, see highlighted text bellow.

standard_pop_clean <- standard_pop_data %>%
     mutate(
          age_cat5 = str_replace_all(AgeGroup, "years", ""),   # remove "year"
          age_cat5 = str_replace_all(age_cat5, "plus", ""),    # remove "plus"
          age_cat5 = str_replace_all(age_cat5, " ", "")) %>%   # remove " " space
     rename(pop = WorldStandardPopulation)   # change col name to "pop", as this is expected by dsr package

Saving multiple plots within purrr map()

# use the code below to automatically make epidemic curves by communue, for EVERY Department (iteration)
# The plots are saved into the "png" folder in the R project.

# Define vector of unique department names
dept_names <- palu %>% 
  pull(Département) %>% 
  unique() 
  
# "map" a function across each of the department names.
# The function filters the dataset for the department and creates/saves the plot

dept_plots <- palu %>%             # begin with the complete dataset
  group_split(Département) %>%     # split into different datasets by Departement
  purrr::map(~ggsave(              # the function that is iterated is ggsave() to save the plot
    
    # within the ggsave(), the file name is created as:
    filename = here::here(                                        
      "png",                       # the folder "png"
      str_glue("paludisme_par_mois_{first(.$Département)}.png")), # and then a dynamic file name that contains the department name.
    
    # and the plot to save is created from the split data, created above (.x)
    plot = ggplot(data = .x,
                  mapping = aes(x = mois, y = nombre_cas, group = annee))+
           geom_line(aes(color = annee), size = 2, alpha = 0.6)+
           facet_wrap(~Communes, scales = "free_y")+                  # one plot per commune
           labs(
             y = "Nombre de cas",
             x = "Mois",
             title = str_glue("Paludisme, Department {first(.x$Département)}"))+
           theme_minimal(16)+
           theme(axis.text.x = element_text(angle = 90)),
    
    width = 18,
    height = 10
    ) # end ggsave() 
    ) # end map()

34.3 ordered y axis code doesn't change the y-axis order

In the ordered y axis section of 34.3, the below code is used to order the location_name variable, however, this does not work. The location_name variable needs to be converted to a factor first and then fct_relevel will do what is expected.
You'll see that the current figures on the webpage with/without an ordered y-axis are identical.

Maybe also better to put the p_load(forcats) to the start of the chapter rather than in this chunk

`load package
pacman::p_load(forcats)

create factor and define levels manually
agg_weeks <- agg_weeks %>%
mutate(location_name = fct_relevel(
location_name, facility_order$location_name)
)`

Update github links in translations

All translations need updated Github links:

  • Github link on Welcome page
  • Almost all the links in the download data page

Use Find in Files tool to replace all:
epirhandbook/Epi_R_handbook to:
appliedepi/epirhandbook_eng

all the various pages that link to data (opening paragraphs)

37 Transmission chains - Error with subset()

In section 37.3 Handling, the subset code produces an error:
sub_attributes <- subset( epic, node_attribute = list( gender = "m", date_infection = as.Date(c("2014-04-01", "2014-07-01")) ), edge_attribute = list(location = "Nosocomial") )

Error in FUN(X[[i]], ...) :
Value for date_infection is not found in dataset
In addition: Warning message:
In if (!attribute %in% data) stop(paste("Value for", name_attribute, :
the condition has length > 1 and only the first element will be used

It looks like this is because the dates are stored as datetime when reading the "linelist_cleaned.xlsx" dataset.

Adding the line below when reading the data fixes the problem:

linelist <- rio::import("linelist_cleaned.xlsx") %>% mutate(across(starts_with("date"), as.Date))

Working with dates

both months and minutes are written with the same. strptime, line 106 - 122, file dates.Rmd

Setup github actions

  • book a training date with bassem and those interested
  • write github actions so that any PR merged to the english repo (ideally with a "For translation" tag") triggers an issue to be created in all other language repos

Epicurves updates

Epicurves page:

Add to all ggplot chunks that use a breaks = statement
closed = "left", # count cases from start of breakpoint

Add bullet to explain its use, in Weekly Epicurve Example section

  • We use closed = "left" in the geom_histogram() to ensure the date are counted in the correct bins

Edit the multi-line date labels to include labels = scales::label_date_short().

image

Remove one of the old examples using faceting.
Edit the green faceting example to remove the facet panel border.

image

Also this is updated in the ggplot tip page section on date axis.
image

Updates for all translations

Technical updates in pages, that enable rendering:

  • Cleaning page: change last R chunk at bottom (already echo=F) to eval=F. This prevents error that "file already exists" when saving the cleaned dataset.
  • GIS page, add drop_na(objectid) %>% to chunk that creates case_adm3_sf ~ line 605 # drop any empty rows. This prevents error about discrepancy between 9 and 10 rows.
  • GIS page: add OpenStreetMap:: around line 730 to openmap() command
  • Epicurves page: remove text and R code about cumulative for incidence2, which is now deprecated

Welcome page

  • Add donate button.
  • Add banner in language, and translators
  • Add "Languages:" in bold, just below top four bullet points on welcome page. Add links to English and others. Be sure to use https:// in front in the link.
  • Adjust Applied Epi descriptive text
  • Add Applied Epi logo to replace epiRhandbook logo
  • Ensure Core Team is subsumed into Authors and Supporters on welcome page

Epicurves page
Fixing issue #56 with these changes. Also need to add a sentence referring people to the Dates page if they are reading in an Excel file.
image

Inconsistent code appearance in 42.5

You can normally copy/paste most chunks of code from the book to R studio, but this is different in much of the page on flexdashboards. Unclear if this is intentional but ideally good to have chunks of code that can be copied rather than as images.

Unnecessary message appearing under section 34.3

Minor point, there is a message appearing just before "Create heat plot" section in section 34.3
It says ## Joining, by = c("location_name", "week") and it appears as code that a user could copy/paste to R.
Probably good to just remove this message.

Cannot find the survey data used in 26.2

Are there files available to run through the survey analysis section?
I couldn't find them after installing the handbook/or on github.
It would be helpful if such files were available to ensure users are getting the same results as presented online.

#import the survey data survey_data <- rio::import("survey_data.xlsx")

import the dictionary into R survey_dict <- rio::import("survey_dict.xlsx")

Update alluvial plots to ggalluvial

Data begins in linelist format, with each row having a classification. Then transformed to "wide" with one column per time (values the classifications), plus one column for counts. Then transformed to long with ggalluvial-specific function.

Data preparation:

ELR_wide <- ELR %>% 
  pivot_wider(
    id_cols = c(country, who_region),
    names_from = week,
    values_from = final) %>% 
  drop_na() %>% 
  group_by(across(contains("2021"))) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(across(.cols = -n, 
                .fns = ~recode(.x,
                              "1 - Critical" = "Critical",
                              "2 - Very high" = "Very high", 
                              "3 - High" = "High",
                              "4 - Medium" = "Medium",
                              "5 - Low" = "Low",
                              "6 - Minimal/NA" = "Minimal",
                              "No Data" = "No Data"))) %>% 
  mutate(across(.cols = -n,
                .fns = ~fct_relevel(.x, c(
                  "Critical",
                  "Very high", 
                  "High",
                  "Medium",
                  "Low",
                  "Minimal",
                  "No Data")))) %>% 
  mutate(across(.cols = -n,
                .fns = fct_rev))

#levels(ELR_wide$`2021-07-19`)


library(ggalluvial)
#is_alluvia_form(as.data.frame(ELR_wide), axes = 1:3, silent = TRUE)


ELR_long_alluvial_original <- to_lodes_form(data.frame(ELR_wide),
                              key = "date",
                              axes = 1:6) %>% 
  mutate(date = str_replace_all(date, "X2021.", "")) %>% 
  mutate(stratum = fct_relevel(stratum,
                                    c("Critical",
                                      "Very high", 
                                      "High",
                                      "Medium",
                                      "Low",
                                      "Minimal",
                                      "No Data")))

# add current ELR classifications
ELR_long_alluvial <- left_join(
  ELR_long_alluvial_original,
  ELR_long_alluvial_original %>%
              select(-n) %>% 
              filter(date == max(date, na.rm=T)) %>% 
              rename(
                current_date = date,
                current_ELR = stratum),
            by = c("alluvium" )) 

# add earliest ELR classifications
ELR_long_alluvial <- left_join(
  ELR_long_alluvial,
  ELR_long_alluvial_original %>%
              select(-n) %>% 
              filter(date == min(date, na.rm=T)) %>% 
              rename(
                oldest_date = date,
                oldest_ELR = stratum),
            by = c("alluvium" )) 

data visualization with ggalluvial

# plot showing current classification
ggplot(data = ELR_long_alluvial,
       aes(x = date, stratum = stratum, alluvium = alluvium,
           y = n, label = stratum)) +
  geom_alluvium(aes(fill = current_ELR)) +
  geom_stratum() +
  geom_text(stat = "stratum", size = 1) +
  theme_minimal() +
  scale_fill_manual(
    values = c(
      "Critical" = "black",
      "Very high" = "darkred",
      "High" = "red",
      "Medium" = "darkorange",
      "Low" = "darkgreen",
      "Minimal" = "green",
      "No Data" = "grey"))+
  labs(
    title = "[INTERNAL] ELR classification trajectories\ncolored by current classification (global)",
    fill = str_glue("ELR classification\non {max(ELR_long_alluvial$date, na.rm=T)}"),
    caption = "Last 6 weeks of data")

Surveys page updates

New to add:

  • group_by estimates e.g. for strata
  • weighted regression
  • isidro's cluster weighting code to account for diff population sizes by cluster

Trouble with EpiNow2 Code

The EpiNow2 code chunk below returned an error message:

run epinow

epinow_res <- epinow(
reported_cases = cases,
generation_time = generation_time,
delays = delay_opts(incubation_period),
return_output = TRUE,
verbose = TRUE,
horizon = 21,
stan = stan_opts(samples = 750, chains = 4)
)

Error Message:
Logging threshold set at INFO for the EpiNow2 logger
Writing EpiNow2 logs to the console and: C:\Users\AKARST1\AppData\Local\Temp\RtmpcZOade/regional-epinow/2015-04-30.log
Logging threshold set at INFO for the EpiNow2.epinow logger
Writing EpiNow2.epinow logs to the console and: C:\Users\AKARST
1\AppData\Local\Temp\RtmpcZOade/epinow/2015-04-30.log
Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

Reduce size of all language repos

Follow this sequence of actions (see two linked issues below):

  • INFORM REPO COORDINATOR
  • create empty repo with "_test" name extension (and fix language extension as per #10)
  • clone new repo locally
  • transfer important files (remove things like OLD, etc.... don't forget to keep index.Rmd and common.R!)
  • fix github links (as per #2)
  • push
  • publish via netlify on "test domain" to see if book builds
  • edit name of old repo to replace "_test" with "_archive" (and archive the repo on github)
  • edit name of new repo to have appropriate language ending (e.g. "_vn")
  • follow the digital ocean SOP for adding to the server

Languages to complete:

  • eng
  • vn
  • tr
  • mn
  • es
  • jp
  • fr

poorly formatted x-axis labels

in section "31.8 Highlighting" the x-axis labels are poorly formatted such that one can't read them clearly. image!

Same problem in "31.11 Dual axes"

image

Could have a piece on how to correctly format labels?

Offline versions for translations

Thanks very much Thuan from the Vietnamese team for point out that the offline version of the vietnamese translation is not available from the website. (think this just needs to be knit?)

The Vietnamese version does not have an offline version although it is not necessarily required.

translations in R basics page

Somehow there were some translated sentences that went live in the R Basics page. Need to be removed and re-rendered.

GIS and local information

@yuriei @ishaberry @ebuajitti and @aspina7 @hitomik723
This is a copy of an issue from epiRhandbook_jp regarding gis.Rmd (28 GIS basics):
appliedepi/epiRhandbook_jp#5

Geographic Information System (GIS) is geography-dependent, i.e. the national/local governments may maintain their coordinate systems and provide data such as national census and buildings. I suggest adding local information to the Japanese version of GIS.Rmd.

Below is a sample of additional information for readers in Japan.

Key terms

Visualizing spatial data {.unnumbered}

CRS used in Japan

* Japan Plane Rectangular CS I to XIII, EPSG: 2443-2455
* JGD2011 GRS80 ellipsoid, EPSG:6668

Getting started with GIS

GIS data available for Japan

*  [Census](http://e-stat.go.jp/SG2/eStatGIS/page/download.html) population by address
*  [Suuchi](http://nlftp.mlit.go.jp/ksj/) - a variety of features (e.g. hospitals, schools, school districts)
*  [Kiban](http://www.gsi.go.jp/kiban/etsuran.html) - e.g. building and road edges
*  [Geospatial Japan](https://www.geospatial.jp/ckan/dataset) - a portal of GIS data by local governments and others

Preparation

Sample case data

In the code, the SRID=4326 is used. This is mainly for USA. Japanese government has maintained its own coordinate system and revised in 2000 and 2011 (the latter following the Great East Japan Earthquake).

| Coordinates | EPSG | Region |
| :--- | :--- | :------- |
| lat/lon | 4326 | US |
| lat/lon | 3857 | Web |
| lat/lon | 4612 | Japan (JGD2000) |
| lat/lon | 6668 | Japan (JGD2011) |

40.4 Incomplete explanation of knitr option `child=`

Found a 'note to self' under File structure, subsection 'Source other files' (in chapter 40 section 4). It looks like the explanation for knitr option child= is incomplete or something was meant to be added here at a later stage. See image.
Thank you for checking this!
issue_ch40_2021-11-22
e

Typo: Push and pull, main vs master

with thanks to @babayoshihiko.

  • collaboration page L.674 should read: "PUSH - Clicking the green "Push" icon (upward arrow). You may be asked"
  • collaboration page L.734 should read: "git push # Push local commits of this branch to the remote branch"
  • also consider switching "master" to "main" in line with new approach outlined by github (unsure if this has been universally implemented)

Typo: Factors opening sentences

Email from [email protected] to epiRhandbook on 28 July

Hello,

Thank you very much for providing the public with The Epidemiologist R Handbook. I am a relative novice in R but am reaching out to clarify the definition for factors that you list here: https://epirhandbook.com/factors.html

The use of "ordered" and "order" in the first and second sentences, respectively, seem to suggest that factors in R are only useful for ordinal variables. Can you clarify this? I could see how this may throw some users off.

Thank you again for this wonderful reference text.

Ryan
Ryan S. Babadi, PhD, MPH
Postdoctoral Research Fellow
Department of Environmental Health
Harvard T.H. Chan School of Public Health

Hi Ryan,

Good to hear from you - I'm glad you are finding the Handbook helpful. Thanks for writing to us about this wording, I appreciate the detailed feedback.

You are correct in identifying that one can convert a column to class factor without defining an order to the factor levels. This would presumably be with the intention of setting a limited range of acceptable values. However, in my experience with R in applied epi, the vast majority of use cases for factors are centered around specifying the level order, so that's what I focused on in writing the opening text to this page.

I've made a note to revise this language in version 2 of the Handbook, to be more clear. Thanks again for writing to us,
Neale

Error in Chapter 32 Epidemic Curve

Hi there,

One of our Japanese translators (@KoKYura) let me know an issue in the R code chunk below in Chapter 32 Epidemic Curves. 

epi_day <- incidence(       # create incidence object
  x = linelist,             # dataset
  date_index = date_onset,  # date column
  interval = "day"          # date grouping interval
  )
Error: Not implemented for class POSIXct, POSIXt

The error occurs because the date fields are imported as "POSIXct" in the previous code that uses import() function.
I was wondering if we could set the variable class as date by using readxl::read_excel() rather than import() function? Or we could convert the POSIXct variables to the date class right after the dataset is imported, using some code like this:

linelist <- import("linelist_cleaned.xlsx")

linelist[] <- lapply(linelist, function(x) {
    if (inherits(x, "POSIXct")) as.Date(x) else x
})

Please let me know if I can help you with modifying the code.

Thank you,
Hitomi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.