epiforecasts / covid-rt-estimates Goto Github PK

View Code? Open in Web Editor NEW

34.0 34.0 17.0 7.25 GB

National and subnational estimates of the time-varying reproduction number for Covid-19

Home Page: https://epiforecasts.io/covid/

License: MIT License

Dockerfile 0.20% R 96.12% Shell 1.36% Python 2.32%

country covid-19 epinow2 open-source reproduction-number subnational-estimates

covid-rt-estimates's People

Contributors

Stargazers

Watchers

Forkers

restartus hamishgibbs hongqin steven871121 edugonzaloalmorox nozziel mridulkthomas dax027 sierra2190 biocyberman kb-ph-lab jrs264 mvmlima manasa2811 velmalopez maile-thayer lalavenda

covid-rt-estimates's Issues

Old and new regions for cases presents in estimates - UK

@kathsherratt could you flag the old depreciated UK regions that we should remove from the repository. At the moment they are shown on the website and in all downstream CSVs.

@joeHickson I think the best solution to this is to delete the folders for these regions in order to stop them appearing in the summaries.

Subnational estimates not updating

Subnational estimates are no longer updating and the logs show no information on where the issue could be.

Running Rscript R/run-region-updates.R in a single session appears to lead to a crash

Not sure what is driving this but may be a RAM issue (i.e RAM not being released). As this is flagged as the primary tool for running these estimates it would be sensible to either note that there is an issue or point at a more robust approach (i.e as in the infra repo running singularly by region) so that others can easily reproduce.

Potentially this may actually be pointing at a more serious issue though have not been able to diagnose one.

Status.csv not updating

Seems stuck on the UK

Summary not generated due to a gsub error.

See server logs for details of the error.

I have sidestepped this for now by adding a tryCatch call to regional_epinow which ensures that regional_summary always gets called.

I have been unable to reproduce this anywhere but the production server and have not been able to inspect the line number that is causing the error on the production server. Manual inspection of the code (both here and in EpiNow2 does not shed any light).

Saved dated copies of each summary folder

Whilst it would be nice to be able to store full samples from each region it may be more practical in the short term to instead save the summarised estimates available in each summary folder. This can either be linked to #9 or be a git only implementation based on copying the summary folder (or just the csv's it contains into a dated folder for each region).

If a git based intervention then summarised results could be stored in a covid-rt-estimates-archive repo for example in order to avoid additional history bloat and provide a repo that is easier for others to download (as it won't contain the history of samples and hence will be much smaller even with several hundred archived csv's).

Filter on most recent data used.

Currently, we use a flat filter on all datasets to only start using data after 3 days since first reported. This is because in some datasets updates occur over time that change reported case counts. If used without truncation these datasets lead to estimates that are biased downwards which needs to be avoided. However, many datasets do not have this issue, and having this be a flat cut-off limit how real-time estimates can be.

There are two solutions to this:

Reviewing each dataset to check if they show evidence of this behaviour and setting the cut-off accordingly. @kathsherratt has been doing a lot of work on the data and may have some thoughts on how feasible this is.
Scan the data over multiple days (as a copy of the reported data is kept in the summary folder as reported_cases.csv) and detect if this is happening and set the truncation dynamically for each dataset accordingly. @joeHickson this would be more involved but also obviously more robust + improve real-time performance. We can't really do this until we start storing multiple datasets.

RAM usage.

On a run with the new set up, I am currently seeing about 32Gb of RAM usage vs (I think) about 5Gb max using the previous version. This could be a false impression as I have no log of RAM usage but we should keep an eye on this to make sure it doesn't increase.

R is pretty bad at letting go of RAM within a single session so I could imagine this being a factor.

NULL messages in the logs

Seeing lots of NULL Warnings in logs running using the new script approach. @joeHickson is this expected?

Seeing this in all regional estimates but not in national-level estimates indicating to me its probably coming from the new covid-rt-estimates implementation. See https://github.com/epiforecasts/covid-rt-estimates/tree/estimate-test for a partial run using settings from the master (will complete as and when estimates are done).

Format logs for ingestion into Azure Monitor

Azure requires timestamp to be the first thing on the line. I imagine other log ingestion tools won't object to this!
Set the format on the file logger to:
"~t ~l [~n.~f] : ~m" instead of the default of "[~l] [~t] [~n.~f] ~m"
Timestamp, Level, Namespace.CallingFunction, Message

use ftry to replace the commented out try/catch in the run region updates script

this was commented out as it was failing, masking the error with it's own error...

error in number

Hi I'm from Iraq
Good Day.
Every time I operate the same source of data using the excel the equation gives me different estimation and forecasting results

Thanks
Sabri

Depend on a tag or commit version of EpiNow2

Currently, the estimates are based on the latest master version of EpiNow2. As GitHub is our dev version of EpiNow2 this introduces some constraints (with the CRAN version being the true release version). It would make sense to make the versioning system for EpiNow2 more manual in this repo. I think the sensible way to do this is either target certain release tags (potentially the nicest option) or commits (easier as we don't need to add a tag on EpiNow2) but not a major feature. This can then be manually incremented and frees EpiNow2 development from having to worry about introducing breaking changes here. It also makes it easier to potentially test EpiNow2 upstreams on a regular basis before putting them in production.

This issue is a particular problem now when we are looking to make breaking changes to the interface of EpiNow2 but cannot PR into master as it will break this repository. This has led to a single large version PR building up in EpiNow2 which is not the ideal dev cycle and keeps important updates we are using elsewhere from users.

Adding additional subnational estimates.

We are interested in supporting an increased range of subnational estimates. Ideally, contributors will be able to do the majority of the linking work in order for the core team to focus on optimizing the code and theoretical considerations. In order for this to be possible, we need several things.

Whilst we would like to support estimates for as many places as possible it is key that areas with the fewest other resources/greatest need are prioritized. Users should be encouraged to open an issue highlighting the subnational area they would like estimates to be produced for, flag the available data, and outline to what level they are able to contribute towards the necessary integration steps. At this time we are unlikely to be able to provide subnational estimates below the level 1 geographic region. Though we would be happy to support external implementations.
Users need to be linked to covidregionaldata and in particular the SMG which outlines the steps required for adding subnational datasets. This step is required in order for us to support estimates.
Users need to be provided with documentation on how to add a new subnational estimate to this repo.
Ideally once a new subnational estimate has been added it will be tested with the results explored before entering daily production.

Updating national cases and deaths estimates broken on master

Both cases and deaths updating is broken at the national level.

Capture R errors

The logging is very useful but when an issue occurs that doesn't have a built-in logging message then there is no feedback as to why this may have occurred. Obviously one option would be to add logging messages for all possible issues but this will end up being essentially a rewrite of R's own warning/error system. Given this, it might be a good idea to set up console sinking to a separate logging file (https://stackoverflow.com/questions/11666086/output-error-warning-log-txt-file-when-running-r-script-under-command-line) as a secondary debug tool.

Mismatch between estimates and summary in Russia

I see a mismatch between timestamps of estimates and summaries in Russia. This indicates an uncaught and unlogged error. The issue is likely an edge case in EpiNow2.

Ability to specify a subset of countries/regions to update

It would be great to be able to specify a subset of regions to update. This would make it easier to split processing across multiple compute nodes in a fairly simple fashion.

Increase time out in the UK

The UK case data is problematic at the moment due to the underlying increase in cases and the breakdown of reliable testing data. This instability increases the difficulty of fitting the model. Given this, it looks like the timeout needs to be increased in order to provide estimates

Update developer notes on the data entities

follows on from #71

Update the smg.md with latest parameters

Creating logging for R errors.

R error messages can be output to a log file (https://stackoverflow.com/questions/44712959/error-handling-and-logging-in-r) - this may make debugging easier without having to search through code in interactive sessions.

regional estimates are writing regions to regional/deaths/national/...

this should be deaths/regional/...

it probably also applies to cases.

Update to EpiNow2 1.2.1

Just released to GitHub master and so may have some teething problems. The scale of this update means we need to test it before pushing into production.

Most changes are interface-related and walked through in the README.

Current problem locations

Starting a ticket to highlight problem locations that are having issues (from runtimes.csv)

Timeouts:
dataset subregion start_date runtime
belgium Brussels 15/09/2020 16:13 999999
belgium Flanders 15/09/2020 16:13 999999
belgium Unknown 15/09/2020 16:13 999999
belgium Wallonia 15/09/2020 16:13 999999
brazil SÃ£o Paulo 15/09/2020 16:31 999999
canada New Brunswick 15/09/2020 18:33 999999
germany Baden-WÃ¼rttemberg 15/09/2020 20:58 999999
india Manipur 15/09/2020 22:10 999999
italy Lombardia 17/09/2020 05:58 999999
italy Trentino-Alto Adige 17/09/2020 05:58 999999
united-kingdom East Midlands 17/09/2020 07:59 999999
united-kingdom North West 17/09/2020 07:59 999999
united-kingdom Yorkshire and The Humber 17/09/2020 07:59 999999

I'm re-running Belgium now to see if it's still an issue

Add support for estimates for each subnational area using multiple data sets.

For the global estimates, we currently provide estimates for both cases and deaths. On the sub-national level we also have both these data sources in many cases and it would be useful to easily be able to support.

Cases and deaths still failing.

The new implementation of global cases and deaths fails for all scales (national and regional).

I see variants of the following error tree:

WARN [2020-09-15 00:05:56] cases: NULL - mccollect, jobs, TRUE
ERROR [2020-09-15 00:05:56]      █
ERROR [2020-09-15 00:05:56]   1. ├─base::tryCatch(...)
ERROR [2020-09-15 00:05:56]   2. │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
ERROR [2020-09-15 00:05:56]   3. │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
ERROR [2020-09-15 00:05:56]   4. │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
ERROR [2020-09-15 00:05:56]   5. ├─base::withCallingHandlers(...)
ERROR [2020-09-15 00:05:56]   6. ├─global::run_regional_updates(datasets = datasets, args = args)
ERROR [2020-09-15 00:05:56]   7. │ └─global::rru_process_locations(datasets, args, excludes, includes)
ERROR [2020-09-15 00:05:56]   6. ├─global::run_regional_updates(datasets = datasets, args = args)
ERROR [2020-09-15 00:05:56]   7. │ └─global::rru_process_locations(datasets, args, excludes, includes)
ERROR [2020-09-15 00:05:56]   8. │   ├─base::tryCatch(...)
ERROR [2020-09-15 00:05:56]   9. │   │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
ERROR [2020-09-15 00:05:56]  10. │   │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
ERROR [2020-09-15 00:05:56]  11. │   │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
ERROR [2020-09-15 00:05:56]  12. │   ├─base::withCallingHandlers(...)
ERROR [2020-09-15 00:05:56]  13. │   └─global::update_regional(...)
ERROR [2020-09-15 00:05:56]  14. │     └─EpiNow2::regional_epinow(...)
ERROR [2020-09-15 00:05:56]  15. │       └─future.apply::future_lapply(...)
ERROR [2020-09-15 00:05:56]  16. │         └─future.apply:::future_xapply(...)
ERROR [2020-09-15 00:05:56]  17. │           └─future::future(...)
ERROR [2020-09-15 00:05:56]  18. │             └─future:::makeFuture(...)
ERROR [2020-09-15 00:05:56]  19. │               └─future:::fun(...)
ERROR [2020-09-15 00:05:56]  20. │                 ├─future::run(future)
ERROR [2020-09-15 00:05:56]  21. │                 └─future:::run.MulticoreFuture(future)
ERROR [2020-09-15 00:05:56]  22. │                   └─future:::requestCore(...)
ERROR [2020-09-15 00:05:56]  23. │                     └─future:::usedCores()
ERROR [2020-09-15 00:05:56]  24. │                       └─future:::FutureRegistry(reg, action = "list", earlySignal = TRUE)
ERROR [2020-09-15 00:05:56]  25. │                         └─future:::collectValues(where, futures = futures[idxs], firstOnly = FALSE)
ERROR [2020-09-15 00:05:56]  26. │                           ├─future::resolved(future, run = FALSE)
ERROR [2020-09-15 00:05:56]  27. │                           └─future:::resolved.MulticoreFuture(future, run = FALSE)
ERROR [2020-09-15 00:05:56]  28. │                             └─future:::signalEarly(x, ...)
ERROR [2020-09-15 00:05:56]  29. │                               ├─future::result(future)
ERROR [2020-09-15 00:05:56]  30. │                               └─future:::result.MulticoreFuture(future)
ERROR [2020-09-15 00:05:56]  31. │                                 └─future:::FutureRegistry(...)
ERROR [2020-09-15 00:05:56]  32. │                                   └─future:::collectValues(where, futures = futures[idxs], firstOnly = FALSE)
ERROR [2020-09-15 00:05:56]  33. │                                     ├─future::resolved(future, run = FALSE)
ERROR [2020-09-15 00:05:56]  34. │                                     └─future:::resolved.MulticoreFuture(future, run = FALSE)
ERROR [2020-09-15 00:05:56]  35. │                                       └─future:::signalEarly(x, ...)
ERROR [2020-09-15 00:05:56]  36. │                                         ├─future::result(future)
ERROR [2020-09-15 00:05:56]  37. │                                         └─future:::result.MulticoreFuture(future)
ERROR [2020-09-15 00:05:56]  38. │                                           └─base::stop(ex)
ERROR [2020-09-15 00:05:56]  39. └─(function (e) ...
ERROR [2020-09-15 00:05:56] cases: Failed to retrieve the result of MulticoreFuture (future_lapply-72) from the forked worker (on localhost; PID 1620). Post-mortem diagnostic: No process exists with this PID, i.e. the forked localhost worker is no longer alive. -
WARN [2020-09-15 00:05:56] simpleWarning in outcome[[location$name]]$start <- start: Coercing LHS to a list

Fix docker login when running via ssh

Workaround for running none interactive docker login is throwing an error.

This is used to provide a single line command to update the estimates on a newly instanced remote worker.

See here for bash script: https://github.com/epiforecasts/covid-rt-estimates/blob/master/bin/update-via-ssh.sh

Brazil last updated 7 days ago.

Any information in the logs as to why this might be @joeHickson?

runtimes.csv analysis

We now have a regularly updated runtimes.csv that contains a large amount of information on the status of updates. Automated analysis could be performed on this to give the last update date for each region, number of subregions in each region that timed out for that update and their last sucessful update etc. This could then be extended in multiple directions to help track the performance of EpiNow2 and provide information for future model developments.

Potentially some of this analysis could be pushed to epiforecasts.io/covid so that more casual users can easily track the status of updates.

Include the code (but not the results) in the docker image

This sits along side the publication task to remove the need to mount things into the container - we will only need to extract logs.

National cases not updated for two days but regional cases updating daily.

looks like a bug - again any information in the logs @joeHickson?

Document and tidy run-region-update

Currently built using a series of functions that are not called in a completely clear order. It would be great to streamline this a little and provides some more supporting documentation.

Add try protection instead of purrr::safely

To overcome an earlier runtime issue I commented out previously implemented error catching and replaced it with purrr::safely this should probably be replaced with a proper error catcher that reports to the log. See here:

covid-rt-estimates/R/run-region-updates.R

Line 71 in 8e2d04a

outcome[[location$name]] <-

Logs not showing from check_for_update

Not seeing logs from inside check_for_update.

Move output storage out of git and keep historic estimates

At the moment estimates are stored in git and overwritten with each new update. This causes two issues: 1. Historic estimates are not available and 2. the size of the git repo grows over time meaning that periodically the history must be cleaned.

The ideal solution to this would allow programmatically pushing results for each update with an easy to use bash friendly login process.

Truncation in UK test positive estimates.

Looking at UK test positive estimates looks like delay to report has increased and our real time estimates are now biased downwards in some regions. @kathsherratt have you seen any data changes? May need to increase the time lag for this dataset only in order to avoid this for now. @joeHickson that should be possible now right?

Null values

Working with the summary data:

national/deaths/summary/summary_table.csv

and rt.csv, cases_by_infection.csv, and cases_by_report.csv.

There are null summary values in summary_table.csv and missing +- 20% CI estimates in the other three files. Appears to affect only Papua New Guinea.

This has caused some issues in rt_vis and RtD3 (and subsequently the epiforecasts site where RtD3 is implemented). Which are in the process of being fixed.

We will work on accounting for the possibility of null values in summary estimates but I wanted to raise the issue here in case it is some broader issue.

It looks like Papua New Guinea is currently the shortest time series (10 days) which may or may not have anything to do with the problem.

Thanks!

Subnational estimates not updating in list

When estimates are run in a list using bin/update-via-docker.sh national estimates run without issue. Subnational estimates all hang with 0 CPU usage.

The docker logs indicate that the code has reached the EpiNow2::epinow function (i.e each region is running).

I have been unable to reproduce this in R, in bash, or via docker (replacing the docker run command in bin/update-via-docker.sh with a Rscript call).

Any ideas on this issue would be helpful as currently the only blocker to running a full update using EpiNow2.

Add estimates for a complete time series in each country/subnational region

Due to computational and storage constraints, it is not feasible to run the complete time series every day with our current resources. For this reason, we have shifted our daily updates to focus on a rolling window of the last 3 months of data. Many users may be interested in the complete time series - please respond to this issue in order for us to assess the priority of this requirement.

We are considering two solutions to this issue:

Piecing together daily estimates from rolling model fits
Running less frequent complete time series runs and linking these to our real-time estimates.

Specify root path for data output

blocks #9
Specifying the root path allows for shifting the data output into a scratch location away from the git checkout. This location will then be used to publish the data out elsewhere.

Currently this will default to the existing location but potentially this should shift to default to ./data/generated and be added to the gitignore. This final move would also suggest shuffling the current .rds files from ./data to ./data/source or ./data/reference to help keep the data folder tidy.

1/2 number of cores used on Azure in Docker

When running an update in docker on an Azure cluster only half of the available cores are used.

Cores are allocated using setup_future in R/utils.R. This uses future::availableCores() internally and should default to all cores when jobs > cores when jobs < cores then the remaining cores should be shared between jobs and used to run multiple MCMC chains. Local tests indicate all of these features are working as intended outside of Azure/script use in docker.

Death to report

This delay needs to be updated based on literature/public estimates. @kathsherratt do you have any ideas on this? When I checked the line-list I saw just 6 observations with complete data.

Long runtimes in some regions

EpiNow2 has large differences in runtime between regions. Some of this may be because of regions being fit in which there is very little to know data. Identifying these regions is the first step to fixing this behaviour.

Add global and WHO region estimates.

There is a policy interest in having global /higher than country-level estimates. These can either be implemented in the current case and death scripts or split out into their own processing stage. The best option is likely to split out as higher-level estimates will otherwise dominate the summary plots.

csv output from new UK data sources

Linked to #62 , adding new data sources for UK estimates.

For these to be used, ideally the summary of Rt estimates needs to be output in a single csv file.

This should

Combine Rt estimates from each data source
Include a data source identifier
Filter the type variable to only the Rt "estimates" (not the estimates from partial data or forecasts)
Save in a csv in a dedicated folder within the united-kingdom folder.

Below is example code to get to what is needed. In this example the new dedicated folder for output is called "all-summary-rt".

However I am not sure where this needs to go to get it running in the process after the daily Rt estimates have finished.

# Get Rt estimates summary from each data source
cases <- fread(here::here("subnational", "united-kingdom", "cases", "summary", "rt.csv"))
cases <- cases[cases$type == "estimate"][, data_source := "test-positive cases"]

deaths <- fread(here::here("subnational", "united-kingdom", "deaths", "summary", "rt.csv"))
deaths <- deaths[deaths$type == "estimate"][, data_source := "deaths"]

admissions <- fread(here::here("subnational", "united-kingdom", "admissions", "summary", "rt.csv"))
admissions <- admissions[admissions$type == "estimate"][, data_source := "hospital admissions"]

# Bind all data sources
uk_rt <- data.table::rbindlist(list(cases, deaths, admissions),
                               fill = TRUE, use.names = TRUE)

# Save back to main UK folder
write.csv(uk_rt, here::here("subnational", "united-kingdom", "all-summary-rt", paste0(Sys.Date(), "-uk-rt.csv"))

Issue with estimates not being run and logs

I am seeing the following:

INFO [2020-09-10 12:46:07] Data has not been updated since last run. If wanting to run again then remove /home/rstudio/covid-rt-estimates/last-update/colombia.rds
ERROR [2020-09-10 12:46:07]      █
ERROR [2020-09-10 12:46:07]   1. ├─base::tryCatch(...)
ERROR [2020-09-10 12:46:07]   2. │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
ERROR [2020-09-10 12:46:07]   3. │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
ERROR [2020-09-10 12:46:07]   4. │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
ERROR [2020-09-10 12:46:07]   5. ├─base::withCallingHandlers(...)
ERROR [2020-09-10 12:46:07]   6. ├─global::run_regional_updates(regions = regions, args = args)
ERROR [2020-09-10 12:46:07]   7. │ └─global::rru_process_locations(regions, args, excludes, includes)
ERROR [2020-09-10 12:46:07]   8. │   ├─base::tryCatch(...)
ERROR [2020-09-10 12:46:07]   9. │   │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
ERROR [2020-09-10 12:46:07]  10. │   │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
ERROR [2020-09-10 12:46:07]  11. │   │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
ERROR [2020-09-10 12:46:07]  12. │   ├─base::withCallingHandlers(...)
ERROR [2020-09-10 12:46:07]  13. │   └─global::update_regional(...)
ERROR [2020-09-10 12:46:07]  14. └─base::.handleSimpleError(...)
ERROR [2020-09-10 12:46:07]  15.   └─h(simpleError(msg, call))
ERROR [2020-09-10 12:46:07] colombia: object 'out' not found - update_regional, location, excludes[region == location$name], includes[region == location$name], args$force, args$timeout
WARN [2020-09-10 12:46:07] simpleWarning in outcome[[location$name]]$start <- start: Coercing LHS to a list

include optional timout flag

Add a cli flag to run-region-updates to allow a timeout length to be specified.
blocked by epiforecasts/EpiNow2#63

Reorder batch job to do regional, national and then subnational estimates

It would be better (in circumstances where there is some instability) to make sure regional and national estimates run correctly and then do subnational estimates.

New summary table for status overview

In run-regional-updates handle a new csv (status.csv?) that contains the latest summary for each location when processing the run outcome. This may also help with #54 .

Perhaps the following columns:

Dataset	subregion	last run timestamp	last run status	latest results generated at	latest results data up to
united-kingdom	*	2020-09-30 13:45:53	No New Data Available	2020-09-29 05:34:23	2020-09-25
united-kingdom	London	2020-09-30 13:45:53	No New Data Available	2020-09-29 05:34:23	2020-09-25

last run status = Success | Error | Timed Out | No New Data Available (I think we should be able to detect all these conditions)

Reschedule estimates

Due to a package mismatch between the server and code the update failed last night. @joeHickson should we clean out the last-update folder and reschedule?