Repository for regularly-updated data sets used by covid-data-model. Anything that we want to re-fetch periodically should definitely live in this repo, along with scripts for automated updating.
- We recently changed our default branch from
master
tomain
. If you have the repo checked out locally, you can update with the following:$ git branch --unset-upstream $ git branch -u origin/main
- Use README.md files to document where data has been sourced from.
- Data is updated twice daily (midnight and noon, UTC) by a Github Action defined here. The action runs update.sh and then push_update.sh. To trigger a manual update, see details in the workfow definition.
- Don't check in multiple versions of the same data. We can rely on git history instead.
- If data is being downloaded / scraped by a script, check the script in under scripts/
- Git LFS is required to correctly checkout at least the US Census Shapefiles
for now because they are very big. You have to run something like
git lfs install
andgit lfs fetch
.
These are data sets that we've found that look interesting and we may want to consider pulling in the future.
- Johns Hopkins Data
- Scraped Data
- covid19-vis - CSV Dataset of interventions at county/state level including start dates.
- US State-Level Containment Policies
- AEI Action Tracker
- Their source: National Governors Association
- NYTimes Stay-At-Home Orders
- LA Times Tracker (California only)
- Local Action Tracker
- National Association of Counties Tracker -- looks promising
- Oxford COVID-19 Response Tracker - Historical country-level intervention data
- Stateside State & Local Government Report
- Test and Trace
- American Hospital Directory (Link is to CA data, but seems to support any state)
- Medicare Claims for Inpatients
- CA Healthcare Facilities
- American Hosptial Association (paywalled)
We recommend installing all requirements in a virtualenv. To setup your virtual env, you should follow the steps here.
Once you have activated the venv run make setup-dev
to install packages in it.
We use black to automatically format python code. One way we keep this maintainable is by using a pre-commit step that automatically reformats modified files on commit.
To manually kick off a data update (e.g. if a previous update failed and a fix was merged), get a github personal access token, and run:
GITHUB_TOKEN=<YOUR PERSONAL GITHUB TOKEN> ./tools/trigger-data-update.sh