Coder Social home page Coder Social logo

simeonmiteff / nzcovid19cases Goto Github PK

View Code? Open in Web Editor NEW
11.0 5.0 0.0 278 KB

Scrape New Zealand's COVID-19 case, alert level and hardship grant information from government web pages, and render the data in various formats suitable for mapping, visualisation and analysis.

Home Page: https://nzcovid19api.xerra.nz

License: MIT License

Shell 1.01% Go 97.14% Dockerfile 1.85%
covid-19 covid19 covid19-data covid19-nz new-zealand coronavirus coronavirus-tracking coronavirus-real-time coronavirus-api coronavirus-tracker

nzcovid19cases's Introduction

NZ COVID-19 cases scraper

UPDATE: online scraper API retried

After one too many arbitrary format changes on the MOH web site I've decided to stop updating the scraper and shut down the online API. There are alternative sources of both live statistics and case data (see section below).

For me, this project was an object lesson in the futility of scraping hand-edited information. Open Data is necessary for the public to (feasibly) automatically process government-owned data. It turns out, in a crisis, Open Data is not a priority (indeed, as of 5 July 2020, in NZ the official government portal has scant COVID-19 datasets).

Sorry for the inconvenience, and thank you for your interest.

UPDATE: discontinuation of case detail scraping

On April 12, the MOH stopped publishing all COVID-19 case details in a single table, and began reporting monthly cases. At this point I don't think it makes sense for this API to offer detailed case information. The last successfully scraped case data is now archived. I will leave the scraping code as is for those who want to use the CLI tool to download the current month's case data.

Similarly, the location (per-DHB statistics) which were derived from scraped cases will now be incorrect, and MOH's own per-DHB case summary table is also only for the current month. Again, I will remove the API for /location/* and leave the CLI function in place, in case it is useful to anyone (unlikely, but who knows).

For those who are interested in obtaining a full snapshot of case information, the best source I know of is the via the arcgis.com dashboard linked from the MOH webste.

Specifically tables that appear to be obtained from, or maintained by ESR in the backend web service, can be dumped in JSON format with the right query strings:

UPDATE: real-time NZ COVID-19 statistics

ESR now provides a dashboard that (presumably) renders statistics directly from the authoritative database that all the NZ COVID-19 comes from (EpiSurv): https://nzcoviddashboard.esr.cri.nz/

Unfortunately there is no usable API. As far as I can tell, R Shiny-server uses a baroque home-grown protocol. It exchanges strangely encoded messages (mixed with JSON) over streaming XHR connections:

Client: ["0#0|o|"]
Server: a["1#0|m|{\"busy\":\"busy\"}"]

If anyone feels there is significant value in reverse-engineering this, feel free to open an issue.

Overview

This code is intended to scrape the following sources of COVID-19 data in New Zealand, and render the data in various formats suitable for mapping, visualisation and analysis:

Use this with caution - the NZ government may change their pages and break the scraper at any time.

This code is used as the core of an API service I'm running: https://nzcovid19api.xerra.nz/

Courtesy of @gizmoguy, the metrics exported are scraped by a Prometheus server, and visualised on a Grafana dashboard:

Screenshot of COVID-19 Grafana dashboard at https://nzcovid19.grafana.sla.ac/d/r4XZV79Wz/new-zealand-covid-19-tracker?orgId=1

Building

Building directly

To build the utilities, you'll need a go 1.13+ toolchain installed (check out https://golang.org/dl/ for details).

Running ./build.sh will build each tool in the cmd/ subdirectories.

Building with Docker

If you don't want to futz with Go, a Dockerfile is provided. Use docker to build a container:

$ docker build -t nzcovid19cases .
<snip>
Successfully tagged nzcovid19cases:latest

Usage

For now there is a CLI tool.

Running the directly built binaries

cmd/nzcovid19-cli$ ./nzcovid19-cli 

Usage: ./cmd/nzcovid19-cli/nzcovid19-cli <action>
	Where <action> is one of:
		- cases/json
		- cases/csv
		- locations/json
		- locations/csv
		- alertlevel/json
		- casestats/json
		- clusters/json
		- clusters/csv

Running the docker container

$ docker run -ti --rm nzcovid19cases alertlevel/json
{
  "Level": 4,
  "LevelName": "Eliminate"
}

Code license

This code is published under the MIT license.

Data copyright

The data processed by this tool is published under:

nzcovid19cases's People

Contributors

simeonmiteff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nzcovid19cases's Issues

Could you please expose "summary" info as a json api?

Thanks for your repo, I wonder if could add some header info from the official website.

On top of the website:
https://www.health.govt.nz/our-work/diseases-and-conditions/covid-19-novel-coronavirus/covid-19-current-cases

Summary
Number of confirmed and probable cases in New Zealand - 155
Number of recovered cases โ€“ 12
Number of community transmission cases โ€“ 4

IMHO, at least two fields like "type and number" are enough, and could also be used to calculate some other amount.

Cheers

Temporal data?

Hi,

Thanks for making your repo open source and thanks for providing this API. I think it might be interesting to add in which day each case was reported, this might be useful for building plots of cases over time, or timeline maps (such as https://public.tableau.com/views/coivd-nzmap/Dashboard1?:embed=y&:embed_code_version=3&:loadOrderID=1&:display_count=y&:origin=viz_share_link). While MOH don't have a "date reported" column on their site, we can correlate the case numbers with the days/case numbers on https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv - e.g.

Case 1 was reported on 2/28/20
Cases 2-3 were reported on 3/4/20
Case 4 was reported on 3/6/20

and so on. These dates could be added as properties to the geojson, or as a column to the csv.

[Bug]Cases number from newest

I noticed that the cases number is not provided now (maybe will be there later again).
And from your cases API, the number is set from newest. I assume that set to 1 from the oldest would be more reasonable.
Cheers.

CORS

Access to XMLHttpRequest at 'https://nzcovid19api.xerra.nz/cases/geojson' from origin 'http://127.0.0.1:8080' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. - could you please add an "Access-Control-Allow-Origin: *" header to your nginx server? This might be useful - https://enable-cors.org/server_nginx.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.