Coder Social home page Coder Social logo

datasets / covid-19 Goto Github PK

View Code? Open in Web Editor NEW
1.2K 78.0 608.0 5.05 GB

Novel Coronavirus 2019 time series data on cases

Home Page: https://datahub.io/core/covid-19

Python 100.00%
coronavirus coronavirus-disease covid datapackage data-package covid-19 covid19-data dataset

covid-19's People

Contributors

actions-user avatar anuveyatsu avatar aravindnair430 avatar jochym avatar kant avatar krunal-darji avatar morisset avatar nirabpudasaini avatar pidugusundeep avatar rufuspollock avatar trevorwinstral avatar weileizeng avatar zelima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19's Issues

Executing process.py on 3/11/2020 gets ValidationError

Here's the traceback:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 49, in schema_validator
row[f.name] = f.cast_value(row.get(f.name))
File "/usr/local/lib/python3.7/site-packages/tableschema/field.py", line 149, in cast_value
).format(field=self, value=value))
datapackage.exceptions.CastError: Field "Deaths" can't cast value "None" for type "number" with format "default"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "process.py", line 60, in
dump_to_path()
File "/usr/local/lib/python3.7/site-packages/dataflows/base/flow.py", line 12, in results
return self._chain().results(on_error=on_error)
File "/usr/local/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 96, in results
for res in ds.res_iter
File "/usr/local/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 96, in
for res in ds.res_iter
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 46, in schema_validator
for i, row in enumerate(iterator):
File "/usr/local/lib/python3.7/site-packages/dataflows/processors/dumpers/dumper_base.py", line 69, in row_counter
for row in iterator:
File "/usr/local/lib/python3.7/site-packages/dataflows/processors/dumpers/file_dumper.py", line 76, in rows_processor
for row in resource:
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 51, in schema_validator
if not on_error(resource['name'], row, i, e):
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 22, in raise_exception
raise ValidationError(res_name, row, i, e)
dataflows.base.schema_validator.ValidationError:
ROW: {'Date': datetime.date(2020, 3, 11), 'Province/State': 'Anhui', 'Country/Region': 'Mainland China', 'Lat': Decimal('31.8257'), 'Long': Decimal('117.2264'), 'Confirmed': None, 'Recovered': None, 'Deaths': 'None'}

Confirmed Cases missing

Current dataset on 4/7/2020 shows 0 cases for North Dakota and the overall # for 4/6/2020 is off by about 140k.

FAQs (WIP)

Why this dataset? (After all authoritative one is elsewhere)?

Ans: well structured data, data package'd so you have tools to ingest into your system of choice, reliably kept up to date ...

Why this dashboard? After all there are many others?

We provide a dashboard that is simple and well-designed but primarily because open source and easy for others to reuse

Who's behind this?

@rufuspollock and colleagues at @datopian who have worked in #opendata and #opensource and #datasets for many years.

State Data Missing for US

I imported this file yesterday and it included state data for US - when I refreshed this morning, the data is now missing.

  • time-series-19-covid-combined_csv.csv

[optimization] Move longitude and latitude data to a separate CSV

As a user of the covid-19 data, I want the latitude and longitude data in a separate CSV file from the other data, so that it optimizes the use of the data by cutting down the file sizes, loading times, etc.

Acceptance criteria

  • Latitude and longitude data is moved to a separate CSV file
  • A new datapackage.json is created for the new CSV
  • A new visualization is created for it

unable to open database file

Hi,
When I try to run in Jupyter notebook, am getting error as unable to open database file.

OperationalError: unable to open database file

API based on the Data Package

As a lot of people want to connect from dashboards and get filtered/streaming access to the data, it would be good to also set up an (example) wrapper with API endpoints.

See also https://github.com/Quintessential-SFT/Covid-19-API and https://github.com/dataletsch/panoptikum/blob/master/app.py

Design (from @rufuspollock)

Jobs to be done: i want to get latest data for my country / region.

url: coronavirus.api.datahub.io

Desired API

GET /country/{name or code} => (in reverse date order)
[ 
 {
  date: 
  confirmed: ...
  deaths: 
 }
]

API-ifying a Data Package

Can we take Inspiration from https://github.com/simonw/datasette

We have a datapackage.json - let's auto API-ify-it.

e.g. suppose we have a table cases.csv

Country, Date, Value

Each table => a url ...

/cases?field=x

Values => sub-urls

Dimension

Adding an id (??)

/cases/{country}/{date}

Wrong Numbers for Spain on 12/March/2020

Data for Spain on the 12/March/2020 is wrong, Accidentally you copied the same as of 11/March/2020

Hope you can fix this.

Edit: the file is countries-aggregated.csv

Executing on 3/14/2020 gets ValidationError & CastError

CastError                                 Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
     48             for f in schema_fields:
---> 49                 row[f.name] = f.cast_value(row.get(f.name))
     50         except CastError as e:

~/.local/lib/python3.8/site-packages/tableschema/field.py in cast_value(self, value, constraints)
    145             if cast_value == config.ERROR:
--> 146                 raise exceptions.CastError((
    147                     'Field "{field.name}" can\'t cast value "{value}" '

CastError: Field "Deaths" can't cast value "None" for type "number" with format "default"
During handling of the above exception, another exception occurred:

ValidationError                           Traceback (most recent call last)
<ipython-input-11-4036c1aa3210> in <module>
     18 extra_value = {'name': 'Case', 'type': 'number'}
     19 
---> 20 Flow(
     21       load(f'{BASE_URL}{CONFIRMED}'),
     22       load(f'{BASE_URL}{RECOVERED}'),

~/.local/lib/python3.8/site-packages/dataflows/base/flow.py in results(self, on_error)
     10 
     11     def results(self, on_error=None):
---> 12         return self._chain().results(on_error=on_error)
     13 
     14     def process(self):

~/.local/lib/python3.8/site-packages/dataflows/base/datastream_processor.py in results(self, on_error)
     92     def results(self, on_error=None):
     93         ds = self._process()
---> 94         results = [
     95             list(schema_validator(res.res, res, on_error=on_error))
     96             for res in ds.res_iter

~/.local/lib/python3.8/site-packages/dataflows/base/datastream_processor.py in <listcomp>(.0)
     93         ds = self._process()
     94         results = [
---> 95             list(schema_validator(res.res, res, on_error=on_error))
     96             for res in ds.res_iter
     97         ]

~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
     44         field_names = [f.name for f in schema.fields]
     45     schema_fields = [f for f in schema.fields if f.name in field_names]
---> 46     for i, row in enumerate(iterator):
     47         try:
     48             for f in schema_fields:

~/.local/lib/python3.8/site-packages/dataflows/processors/dumpers/dumper_base.py in row_counter(self, resource, iterator)
     67     def row_counter(self, resource, iterator):
     68         counter = 0
---> 69         for row in iterator:
     70             counter += 1
     71             yield row

~/.local/lib/python3.8/site-packages/dataflows/processors/dumpers/file_dumper.py in rows_processor(self, resource, writer, temp_file)
     74 
     75     def rows_processor(self, resource, writer, temp_file):
---> 76         for row in resource:
     77             writer.write_row(row)
     78             yield row

~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
     49                 row[f.name] = f.cast_value(row.get(f.name))
     50         except CastError as e:
---> 51             if not on_error(resource['name'], row, i, e):
     52                 continue
     53 

~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in raise_exception(res_name, row, i, e)
     20 
     21 def raise_exception(res_name, row, i, e):
---> 22     raise ValidationError(res_name, row, i, e)
     23 
     24 

ValidationError: 
ROW: {'Date': datetime.date(2020, 3, 14), 'Province/State': None, 'Country/Region': 'Thailand', 'Lat': Decimal('15.0'), 'Long': Decimal('101.0'), 'Confirmed': None, 'Recovered': None, 'Deaths': 'None'}
----

Regional granularity

Country level comparisons are quite limiting, it is difficult to draw meaning about the impact of measures. For instance, mortality and intensive care cases on country level are under/over estimated, in regards to whether co-morbidities are considered, or after health system collapse. The statistics are much more granular in the case of the United States already in the John Hopkins dataset, the Italian regions or Swiss cantons. It would be good to build on the work here to go beyond a country ranking.

Inconsistent file formatting

The data files have inconsistent file formatting making it difficult to write code which works on all files.

Headers example: Last Update changes to Last_Update, Confirmed changes to FIPS.

Changes to Country/ Region: UK changes to United Kingdom.

Compare files '02-03-2020.csv' to '03-26-2020.csv' for example.

[workflow] Actions pipeline stucked

Your action workflow seems stucked since 10 hours ago
Possible something gone wrong while at step Run pip install -r scripts/requirements.txt

image

image

Italy has wrong data for March 23

I was updating my dashboards on https://corona.deleu.dev and I noticed a full flat data on Italy.

The dataset shows

2020-03-21,Italy,,43.0,12.0,53578,6072,4825
2020-03-22,Italy,,43.0,12.0,59138,7024,5476
2020-03-23,Italy,,43.0,12.0,59138,7024,5476

when in reality it should be

2020-03-21,Italy,,43.0,12.0,53578,6072,4825
2020-03-22,Italy,,43.0,12.0,59138,7024,5476
2020-03-23,Italy,,43.0,12.0,63,927,7432,6077

effected vs affected

In the intro you say "effect", which is correct. But when you say "effected" it should be "affected".

effected means to have made something happen
affected means that something has changed something else

Push fixes to upstream repo

Can we try and upstream stuff to the upstream repo? May be tough as they have a lot of open PRs and a lot of noise right now. We initially planned (back in Feb) to put in a PR for datapackage.json (and maybe even a refactor or file structures) but this may be tough now (they certainly are unlikely to change file structure).

However, may still be worth trying to push data bugfixes.

Data Update

Hi,
when data will be updated? Thanks bye, Alberto

Dashboard for this

Create a simple dashboard similar to e.g. https://carbon.datahub.io or https://london.datahub.io to present this information and provide an open source basis for others to create their own dashboards quickly esp per country.

Tasks

  • Design the dashboard
  • Sketch out dashboard
  • Implement

Implement

Analysis

Mockup

Screen Shot 2020-04-09 at 16 18 29

Charting libraries

v1 - worldwide data with key figures and choropleth map

v2 - added line chart with cumulative cases in top 5 countries

Screen Shot 2020-04-14 at 22 40 46

v3 - ability to select a country and showing a graph with cumulative cases, deaths per day and new cases per day

dashboard v3

v4 - added figure for showing cases per 100k population

dashboard v4

v5 - added choropleth map (again)

dashboard v5


Charts to do

  • Time series of cases
  • Chloropeth of cases by country

Needs Analysis

Domain Model

Value: (new confirmed) cases, deaths, recovered

Dimensions:

  • Time
  • Country
    • SubCountry i.e. Province/State
    • City

Job Stories

Key figures (for world and per country)

When wanting to know about the situation I want to see key figures such as total number of people infected/recovered/died, so that I understand current status of the situation in the World.

  • In my country, in my locality

Specific items:

  • How many total cases? [single figure]
  • How many total cases (over time) i.e. cumulative? [time series]
  • How many cases "per day" over time [time series]
  • What is the mortality rate? (how that has changed over time?)
  • Cases in specific locations (lon, lat and by country)
  • Total Case by country (now)
  • Case by country (over time)

"What's happening in my country" => Ditto but just with my country

What's changed

  • When I see the COVID-19 dashboard, I want to see a figure showing change of total number of people affected in last 24h (something like stock market price), so that I can know if it's getting better or not.

Secondary

  • When I see the COVID-19 dashboard, I want to check number of cases per capita, so that I can compare my country against others.

Tertiary

  • When I see the COVID-19 dashboard, I want to see viz showing some correlation with economic indicators (by country), so that I can assess the economic impact.

Meta

  • When I see the COVID-19 dashboard, I want to be able to share it via twitter/facebook/instagram, so that my friends/colleagues can also check it out.
  • ...

State-wise data for the US

Hello,

I saw that some countries (e.g., China, Canada, Australia) have state/province data but not the US. Is there any reason that there are only the data for the whole US ?

Thanks!

Open Source Helps!

Thanks for your work to help the people in need! Your site has been added! I currently maintain the Open-Source-COVID-19 page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.

http://open-source-covid-19.weileizeng.com/

Cheers!

NYT data (for the US)

NYT now have data - just for the US. https://github.com/nytimes/covid-19-data

But it's not open ...

In light of the current public health emergency, The New York Times Company is
providing this database under the following free-of-cost, perpetual,
non-exclusive license. Anyone may copy, distribute, and display the database, or
any part thereof, and make derivative works based on it, provided (a) any such
use is for non-commercial purposes only and (b) credit is given to The New York
Times in any public display of the database, in any publication derived in part
or in full from the database, and in any other public use of the data contained
in or derived from the database.

Dataset Design

Value: (new confirmed) cases, deaths, recovered

Dimensions:

  • Time
  • Country
    • SubCountry i.e. Province/State
Province/State,Country/Region,Lat,Long,date,case
Anhui,Mainland China,31.8257,117.2264,2020-03-04,6
Anhui,Mainland China,31.8257,117.2264,2020-03-05,6
Anhui,Mainland China,31.8257,117.2264,2020-03-06,6
Beijing,Mainland China,40.1824,116.4142,2020-01-22,0
Beijing,Mainland China,40.1824,116.4142,2020-01-23,0
Beijing,Mainland China,40.1824,116.4142,2020-01-24,0
Beijing,Mainland China,40.1824,116.4142,2020-01-25,0

Perfect dataset

Would go with cumulative numbers (we can always difference to get per day)

  • What about country totals? Do we compute and put in file e.g. if country is null it is the total ... or we can aggregate in browser / elsewhere.
Country,Province,Date,Confirmed,Death,Recovered

province2latlon

Province,Lat,Lon

404 on the recovery url

On running process.py, I get a 404 on the confirmed url. This one:
RECOVERED = 'time_series_19-covid-Recovered.csv'

Maybe this is just some temporary url bug, but I thought I'd let you know.

Meanwhile, I have managed to get the script to run by commenting out all references to the recovered portion of the data, which is less than ideal.

Great job!

Blog post updating on progress so far

Blog post(s) to put on datahub.io/blog highlighting progress on this dataset plus all the work by others. Could also blog specific stuff e.g. the modelling background.

@Liyubov do you want to lead on this? I suggest drafting blog posts in markdown in hackmd so that can they can then be reviewed and then added to datahub.io/blog easily.

Potential Posts

  • How we are collecting and data packaging the data
  • An overview on the data, dashboarding and modelling efforts going on in the ecosystem
  • An overview of modelling approaches

Canada Recovery Data

Not seeing recovery data for Canada, but it is being updated in the John Hopkins data.

Those are the only NA's I'm seeing. Great work on this - thanks a ton.

Add clinical trials information

Add data about the current clinical trials being conducted against COVID-19.

This might (or might not) involve scraping some clinical trials registries (e.g. EUCTR, ICTRP etc.).

I will self assign as I wanted to get them anyway, can't think of a better place to put them. The only caveat is that I will try to patch some of the OpenTrials collectors in order to do that and that might not be the straightest (or most obvious) path to extract that information.

docs: methodology

Great stuff! I'm planning to use API for my dashboard Pandemic Estimator but I wish your API had a better documentation on methodology. I'm using JHU directly and I know what chaos it is, the most blatant example being that they provide "cumulative data" that's not cumulative quite often in practice. And the whole change of file formats, etc.

Can you please describe methodology how you deal with it? What's from JHU and what's from CSBS? What has been omitted, what has been "adjusted" and how? Thank you!

Romania data lagging one full day

First of all, congrats on the project! It took me almost no time to synchronize my excel workbook with your csv raw data. Thank you !

I have one issue, Romania data is lagging one full day, do you think you could refresh the dataset faster or at another time? Or please advise how to proceed

Thanks Again !

Automate keeping data up to date by pulling data from upstream

We want to automate collecting the data every day (or even every half-day?). Since upstream repo is update at 23:59 GMT (once a day), we can run our update script right after that time, eg, 00:00 GMT.

Acceptance criteria

  • The repo is updated at least every day
  • The new dataset is pushed to datahub.io/core/covid-19

Tasks

  • Build action - #15
    • Create github actions to:
      • setup python project
      • install dependencies
      • run the update scripts
      • commit changes and push to the repo (master branch)
    • Run it on a schedule at 00:00 GMT
    • Run it on master branch only
    • Setup github token so the action is authorized to push to the repo
  • Deploy action (to datahub.io) - 4c98133
    • prepare datapackage.json for the dataset
    • setup node project
    • install data-cli via npm/yarn
    • run data push command

Future

France aggregated count is down from yesterday, why?

France count or confirmed number has an issue,
82 2020-04-12 133670
83 2020-04-13 137875
84 2020-04-14 131361
why the number is going down from yesterday?
As it is an aggregated number is has to growth or to show stagnation ...
Thank for any clarification.

Admin2/City field missing in US data

Since the following commit: ab35560
The "Admin2" field is missing from US CSV files. In my case, I was using this field to filter data by US city, and now I can only do so by state. Can this field be added back into the US datasets?

Reading in the data via read_csv gives NA results for Canada on 29 March

read_csv(time-series-19-covid-combined.csv, col_names = T) gives 68 NA values for Confirmed and Deaths in the last update on 29 March 2020 for Canada. I cannot immediately see the reason why, but I did pull the data into Excel and that works fine. It seems just the read_csv function is not working on this latest update.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.