datasets / covid-19 Goto Github PK
View Code? Open in Web Editor NEWNovel Coronavirus 2019 time series data on cases
Home Page: https://datahub.io/core/covid-19
Novel Coronavirus 2019 time series data on cases
Home Page: https://datahub.io/core/covid-19
Here's the traceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 49, in schema_validator
row[f.name] = f.cast_value(row.get(f.name))
File "/usr/local/lib/python3.7/site-packages/tableschema/field.py", line 149, in cast_value
).format(field=self, value=value))
datapackage.exceptions.CastError: Field "Deaths" can't cast value "None" for type "number" with format "default"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "process.py", line 60, in
dump_to_path()
File "/usr/local/lib/python3.7/site-packages/dataflows/base/flow.py", line 12, in results
return self._chain().results(on_error=on_error)
File "/usr/local/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 96, in results
for res in ds.res_iter
File "/usr/local/lib/python3.7/site-packages/dataflows/base/datastream_processor.py", line 96, in
for res in ds.res_iter
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 46, in schema_validator
for i, row in enumerate(iterator):
File "/usr/local/lib/python3.7/site-packages/dataflows/processors/dumpers/dumper_base.py", line 69, in row_counter
for row in iterator:
File "/usr/local/lib/python3.7/site-packages/dataflows/processors/dumpers/file_dumper.py", line 76, in rows_processor
for row in resource:
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 51, in schema_validator
if not on_error(resource['name'], row, i, e):
File "/usr/local/lib/python3.7/site-packages/dataflows/base/schema_validator.py", line 22, in raise_exception
raise ValidationError(res_name, row, i, e)
dataflows.base.schema_validator.ValidationError:
ROW: {'Date': datetime.date(2020, 3, 11), 'Province/State': 'Anhui', 'Country/Region': 'Mainland China', 'Lat': Decimal('31.8257'), 'Long': Decimal('117.2264'), 'Confirmed': None, 'Recovered': None, 'Deaths': 'None'}
In time series data the total number of deaths for Germany are lower than the day before. A issue at JHU GitHUB is already opend: CSSEGISandData/COVID-19#2137 (comment)
2020-04-10,Germany,,51.0,9.0,122171,53913,2767<br> 2020-04-11,Germany,,51.0,9.0,124908,57400,2736
Current dataset on 4/7/2020 shows 0 cases for North Dakota and the overall # for 4/6/2020 is off by about 140k.
Ans: well structured data, data package'd so you have tools to ingest into your system of choice, reliably kept up to date ...
We provide a dashboard that is simple and well-designed but primarily because open source and easy for others to reuse
@rufuspollock and colleagues at @datopian who have worked in #opendata and #opensource and #datasets for many years.
I imported this file yesterday and it included state data for US - when I refreshed this morning, the data is now missing.
As a user of the covid-19 data, I want the latitude and longitude data in a separate CSV file from the other data, so that it optimizes the use of the data by cutting down the file sizes, loading times, etc.
Hi,
When I try to run in Jupyter notebook, am getting error as unable to open database file.
OperationalError: unable to open database file
would it be possible to feed it back in?
see
("time_series_covid19_recovered_global.csv | update recoverd time series with 3/26/20 data")
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
When viewing https://datahub.io/core/covid-19 i want to see summary graphs so that a) i get an immediate overview b) i get a sense of what's in the dataset
As a lot of people want to connect from dashboards and get filtered/streaming access to the data, it would be good to also set up an (example) wrapper with API endpoints.
See also https://github.com/Quintessential-SFT/Covid-19-API and https://github.com/dataletsch/panoptikum/blob/master/app.py
Jobs to be done: i want to get latest data for my country / region.
url: coronavirus.api.datahub.io
Desired API
GET /country/{name or code} => (in reverse date order)
[
{
date:
confirmed: ...
deaths:
}
]
Can we take Inspiration from https://github.com/simonw/datasette
We have a datapackage.json - let's auto API-ify-it.
e.g. suppose we have a table cases.csv
Country, Date, Value
Each table => a url ...
/cases?field=x
Values => sub-urls
Dimension
Adding an id (??)
/cases/{country}/{date}
Hi,,
the number of records of time-series-19-covid-combined.csv is approx half of yesterday count (31000 rows).
It seems some data are lost - or am I missing something? Thanks
Need to look at the current data sources and these and compare. Just keeping track here.
Data for Spain on the 12/March/2020 is wrong, Accidentally you copied the same as of 11/March/2020
Hope you can fix this.
Edit: the file is countries-aggregated.csv
CastError Traceback (most recent call last)
~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
48 for f in schema_fields:
---> 49 row[f.name] = f.cast_value(row.get(f.name))
50 except CastError as e:
~/.local/lib/python3.8/site-packages/tableschema/field.py in cast_value(self, value, constraints)
145 if cast_value == config.ERROR:
--> 146 raise exceptions.CastError((
147 'Field "{field.name}" can\'t cast value "{value}" '
CastError: Field "Deaths" can't cast value "None" for type "number" with format "default"
During handling of the above exception, another exception occurred:
ValidationError Traceback (most recent call last)
<ipython-input-11-4036c1aa3210> in <module>
18 extra_value = {'name': 'Case', 'type': 'number'}
19
---> 20 Flow(
21 load(f'{BASE_URL}{CONFIRMED}'),
22 load(f'{BASE_URL}{RECOVERED}'),
~/.local/lib/python3.8/site-packages/dataflows/base/flow.py in results(self, on_error)
10
11 def results(self, on_error=None):
---> 12 return self._chain().results(on_error=on_error)
13
14 def process(self):
~/.local/lib/python3.8/site-packages/dataflows/base/datastream_processor.py in results(self, on_error)
92 def results(self, on_error=None):
93 ds = self._process()
---> 94 results = [
95 list(schema_validator(res.res, res, on_error=on_error))
96 for res in ds.res_iter
~/.local/lib/python3.8/site-packages/dataflows/base/datastream_processor.py in <listcomp>(.0)
93 ds = self._process()
94 results = [
---> 95 list(schema_validator(res.res, res, on_error=on_error))
96 for res in ds.res_iter
97 ]
~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
44 field_names = [f.name for f in schema.fields]
45 schema_fields = [f for f in schema.fields if f.name in field_names]
---> 46 for i, row in enumerate(iterator):
47 try:
48 for f in schema_fields:
~/.local/lib/python3.8/site-packages/dataflows/processors/dumpers/dumper_base.py in row_counter(self, resource, iterator)
67 def row_counter(self, resource, iterator):
68 counter = 0
---> 69 for row in iterator:
70 counter += 1
71 yield row
~/.local/lib/python3.8/site-packages/dataflows/processors/dumpers/file_dumper.py in rows_processor(self, resource, writer, temp_file)
74
75 def rows_processor(self, resource, writer, temp_file):
---> 76 for row in resource:
77 writer.write_row(row)
78 yield row
~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in schema_validator(resource, iterator, field_names, on_error)
49 row[f.name] = f.cast_value(row.get(f.name))
50 except CastError as e:
---> 51 if not on_error(resource['name'], row, i, e):
52 continue
53
~/.local/lib/python3.8/site-packages/dataflows/base/schema_validator.py in raise_exception(res_name, row, i, e)
20
21 def raise_exception(res_name, row, i, e):
---> 22 raise ValidationError(res_name, row, i, e)
23
24
ValidationError:
ROW: {'Date': datetime.date(2020, 3, 14), 'Province/State': None, 'Country/Region': 'Thailand', 'Lat': Decimal('15.0'), 'Long': Decimal('101.0'), 'Confirmed': None, 'Recovered': None, 'Deaths': 'None'}
----
These numbers are wrong. Where is this data taken from? Compare with https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Country level comparisons are quite limiting, it is difficult to draw meaning about the impact of measures. For instance, mortality and intensive care cases on country level are under/over estimated, in regards to whether co-morbidities are considered, or after health system collapse. The statistics are much more granular in the case of the United States already in the John Hopkins dataset, the Italian regions or Swiss cantons. It would be good to build on the work here to go beyond a country ranking.
The data files have inconsistent file formatting making it difficult to write code which works on all files.
Headers example: Last Update
changes to Last_Update
, Confirmed
changes to FIPS
.
Changes to Country/ Region: UK
changes to United Kingdom
.
Compare files '02-03-2020.csv' to '03-26-2020.csv' for example.
I was updating my dashboards on https://corona.deleu.dev and I noticed a full flat data on Italy.
The dataset shows
2020-03-21,Italy,,43.0,12.0,53578,6072,4825
2020-03-22,Italy,,43.0,12.0,59138,7024,5476
2020-03-23,Italy,,43.0,12.0,59138,7024,5476
when in reality it should be
2020-03-21,Italy,,43.0,12.0,53578,6072,4825
2020-03-22,Italy,,43.0,12.0,59138,7024,5476
2020-03-23,Italy,,43.0,12.0,63,927,7432,6077
In the intro you say "effect", which is correct. But when you say "effected" it should be "affected".
effected means to have made something happen
affected means that something has changed something else
In the link, it should be github instead githab.
Create a new dataset (or add to existing) that is per capita data.
Upstream apparently already has this so we can merge.
O/w Computation is easy and we can use https://github.com/datasets/population
The number of confirmed cases on April 13 increased by 127306 from the previous day. Is this right?
Can we try and upstream stuff to the upstream repo? May be tough as they have a lot of open PRs and a lot of noise right now. We initially planned (back in Feb) to put in a PR for datapackage.json (and maybe even a refactor or file structures) but this may be tough now (they certainly are unlikely to change file structure).
However, may still be worth trying to push data bugfixes.
e.g. could use https://colab.research.google.com/
Hi,
when data will be updated? Thanks bye, Alberto
Create a simple dashboard similar to e.g. https://carbon.datahub.io or https://london.datahub.io to present this information and provide an open source basis for others to create their own dashboards quickly esp per country.
Value: (new confirmed) cases, deaths, recovered
Dimensions:
When wanting to know about the situation I want to see key figures such as total number of people infected/recovered/died, so that I understand current status of the situation in the World.
Specific items:
"What's happening in my country" => Ditto but just with my country
Secondary
Tertiary
Meta
Hello,
I saw that some countries (e.g., China, Canada, Australia) have state/province data but not the US. Is there any reason that there are only the data for the whole US ?
Thanks!
Thanks for your work to help the people in need! Your site has been added! I currently maintain the Open-Source-COVID-19 page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.
http://open-source-covid-19.weileizeng.com/
Cheers!
NYT now have data - just for the US. https://github.com/nytimes/covid-19-data
But it's not open ...
In light of the current public health emergency, The New York Times Company is
providing this database under the following free-of-cost, perpetual,
non-exclusive license. Anyone may copy, distribute, and display the database, or
any part thereof, and make derivative works based on it, provided (a) any such
use is for non-commercial purposes only and (b) credit is given to The New York
Times in any public display of the database, in any publication derived in part
or in full from the database, and in any other public use of the data contained
in or derived from the database.
Value: (new confirmed) cases, deaths, recovered
Dimensions:
Province/State,Country/Region,Lat,Long,date,case
Anhui,Mainland China,31.8257,117.2264,2020-03-04,6
Anhui,Mainland China,31.8257,117.2264,2020-03-05,6
Anhui,Mainland China,31.8257,117.2264,2020-03-06,6
Beijing,Mainland China,40.1824,116.4142,2020-01-22,0
Beijing,Mainland China,40.1824,116.4142,2020-01-23,0
Beijing,Mainland China,40.1824,116.4142,2020-01-24,0
Beijing,Mainland China,40.1824,116.4142,2020-01-25,0
Would go with cumulative numbers (we can always difference to get per day)
Country,Province,Date,Confirmed,Death,Recovered
province2latlon
Province,Lat,Lon
On running process.py, I get a 404 on the confirmed url. This one:
RECOVERED = 'time_series_19-covid-Recovered.csv'
Maybe this is just some temporary url bug, but I thought I'd let you know.
Meanwhile, I have managed to get the script to run by commenting out all references to the recovered portion of the data, which is less than ideal.
Great job!
Hey guys, just noticed missing data here.
Incredible work btw ๐
Total Tests of each country can give a knowledge of how active that country's government is
Blog post(s) to put on datahub.io/blog highlighting progress on this dataset plus all the work by others. Could also blog specific stuff e.g. the modelling background.
@Liyubov do you want to lead on this? I suggest drafting blog posts in markdown in hackmd so that can they can then be reviewed and then added to datahub.io/blog easily.
This would follow convention and make a cleaner setup.
Not seeing recovery data for Canada, but it is being updated in the John Hopkins data.
Those are the only NA's I'm seeing. Great work on this - thanks a ton.
Add data about the current clinical trials being conducted against COVID-19.
This might (or might not) involve scraping some clinical trials registries (e.g. EUCTR, ICTRP etc.).
I will self assign as I wanted to get them anyway, can't think of a better place to put them. The only caveat is that I will try to patch some of the OpenTrials collectors in order to do that and that might not be the straightest (or most obvious) path to extract that information.
Great stuff! I'm planning to use API for my dashboard Pandemic Estimator but I wish your API had a better documentation on methodology. I'm using JHU directly and I know what chaos it is, the most blatant example being that they provide "cumulative data" that's not cumulative quite often in practice. And the whole change of file formats, etc.
Can you please describe methodology how you deal with it? What's from JHU and what's from CSBS? What has been omitted, what has been "adjusted" and how? Thank you!
Hello maintainer,
Data series are not updated since 2 days ago. Is the script updated with the latest chenges in structure from the orignal data sources?
First of all, congrats on the project! It took me almost no time to synchronize my excel workbook with your csv raw data. Thank you !
I have one issue, Romania data is lagging one full day, do you think you could refresh the dataset faster or at another time? Or please advise how to proceed
Thanks Again !
We want to automate collecting the data every day (or even every half-day?). Since upstream repo is update at 23:59 GMT (once a day), we can run our update script right after that time, eg, 00:00 GMT.
datapackage.json
for the datasetdata-cli
via npm/yarndata push
commandDear maintainers, my team are curating a repository of country- (or region-)level secondary data deemed relevant for predicting cases. Might be relevant for people using your curated CSSE data:
https://github.com/cjvanlissa/COVID19_metadata
Sincerely,
Caspar
There are various issues primarily related to geo name normalization in upstream. Ultimately we'd like to upstream these but the maintainers there may be a bit overwhelmed atm so for now we should try and fix here:
France count or confirmed number has an issue,
82 2020-04-12 133670
83 2020-04-13 137875
84 2020-04-14 131361
why the number is going down from yesterday?
As it is an aggregated number is has to growth or to show stagnation ...
Thank for any clarification.
This repository contains (manually updated; as the commit message state) data for Portugal:
https://github.com/aperaltasantos/covid_pt/tree/master/datasets
Main site of project:
https://aperaltasantos.github.io/covid_pt/#vig-epidemiologica
The project readme lists this official government website as the data source:
https://covid19.min-saude.pt/ponto-de-situacao-atual-em-portugal/
Since the following commit: ab35560
The "Admin2" field is missing from US CSV files. In my case, I was using this field to filter data by US city, and now I can only do so by state. Can this field be added back into the US datasets?
read_csv(time-series-19-covid-combined.csv, col_names = T)
gives 68 NA values for Confirmed and Deaths in the last update on 29 March 2020 for Canada. I cannot immediately see the reason why, but I did pull the data into Excel and that works fine. It seems just the read_csv
function is not working on this latest update.
There are changes upstream we need to handle (cf #33): CSSEGISandData/COVID-19#1250
us.csv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.