We are looking for volunteers who want to contribute to the cleaning of the raw datasets!
This repository hosts workflows to process several data sources and cleaned datasets for COVID-19 cases across the world.
-
European Centre for Disease Prevention and Control, ECDC
- Processed by Our World in Data: https://ourworldindata.org/coronavirus-source-data
-
previously WHO
- Processed by Our World in Data: https://ourworldindata.org/coronavirus-source-data
- Tableau: Tableau cleans the JHU CSSE dataset and provides a tidy-formatted dataset. However, as of now, it does not address the data consistency issues in the raw dataset.
-
Wikipedia ISO3166 Country code data
- COVID-19 daily report by JHU: This has many consistency issues regarding country names and aggregation of US data. Aggregation mechanism is not so transparent.
output/cases/cases_WHO.csv
: This converts the CSV dataset cleaned by Our World in Data team by using ISO 3166 Alpha-3 country code. It also fills up non-existing dates so that for every country, the dataset starts from the same date (Jan. 21st). One may want to combine this with country-level metadata or alternative country names here.
output/cases/cases_WHO_WP.csv
: similar tocases_WHO.csv
, but US data is overrided by the data from Wikipedia.
cntry_stat_owid.json
- Used in an interactive visualization of case fatality rate of COVID-19
- Website source code: https://github.com/covid19-data/covid19-dashboard
- visualization source code on ObservableHQ: https://observablehq.com/@yy/covid-19-fatality-rate and https://observablehq.com/@yy/covid-19-trends
- An example to create case time series charts in ObservableHQ by benjyz
- Used in an interactive visualization of case fatality rate of COVID-19
output/metadata/country/country_name_code.csv
: a conversion table from country name to code (ISO 3166 Alpha 3). Note that multiple names point to the same code.output/metadata/country/country_code_name.csv
: a conversion table from country code (ISO 3166 Alpha 3) to country name. The shortest country names are picked from the above dataset.
output/metadata/country/country_metadata.csv
: Country metadata, such as population, region, and income group, indexed by the ISO 3166 Alpha 3 codes.
coordinates.csv
: Lat Lng location data from JHU dataset (Unreliable).
Install pandas and snakemake using conda
.
conda install -c bioconda -c conda-forge snakemake pandas
or pip
:
pip install pandas snakemake