Coder Social home page Coder Social logo

aaronward / covidify Goto Github PK

View Code? Open in Web Editor NEW
446.0 22.0 118.0 93.08 MB

Covidify - corona virus report and dataset generator for python πŸ“ˆ [no longer being updated]

License: MIT License

Jupyter Notebook 97.19% Python 2.75% Shell 0.06%
coronavirus coronavirus-real-time 2019-ncov 2019ncov ncov 2020ncov virus pandemic trend coronavirus-analysis

covidify's Introduction

covidify PyPi Version PyPI - Downloads PyPI - License Python Versions Buy Me a Coffee at ko-fi.com

Features β€’ How To Use β€’ Install β€’ Visualizations β€’ Data Source β€’ Credits β€’ To-Do List


Features

Covidify downloads the latest covid-19 data for confirmed cases, deaths and recoveries.

  • Creates a time series dataset
  • Creates a daily stats dataset
  • Forecast global and country confirmed cases
  • Generate visualizations
  • Filter by country
  • List all countries affected
  • Shows number of people currently infected
  • Generates an excel report including all of the above

logo


Install

  • pip install covidify

How to use

$ covidify
Usage: covidify [OPTIONS] COMMAND [ARGS]...

  ☣  COVIDIFY ☣

   - use the most up-to-date data to generate reports of 
     confirmed cases, fatalities and recoveries.

Options:
  --help  Show this message and exit.

Commands:
  list  List all the countries that have confirmed cases.
  run   Generate reports for global cases or refine by country.
$ covidify run --help
Usage: covidify run [OPTIONS]

Options:
  --output TEXT    Folder to output data and reports [Default:
                   /Users/award40/Desktop/covidify-output/]
  --source TEXT    There are two datasources to choose from, Johns Hopkins
                   github repo or wikipedia -- options are JHU or wiki
                   respectively [Default: JHU]
  --country TEXT   Filter reports by a country
  --top TEXT       Top N infected countries for log plot. [Default: 10]
  --forecast TEXT  Number of days to forecast cumulative cases in the future.
                   [Default: 15]
  --help           Show this message and exit.

Example Commands:

# List all countries affected 
covidify list --countries
# Will default to desktop folder for output and github for datasource
covidify run 
# Specify output folder and source
covidify run --output=<PATH TO DESIRED OUTPUT FOLDER>
# Filter reports by country
covidify run --country="South Korea"
# Show top 20 infected countries on a logarithmic scale
covidify run --top=20
# Forecast cumulative cases in America for 14 days into the future
covidify run --country=America --forecast=14

Visualizations

An excel spreadsheet is generated with a number of visualizations and statistics.

logo

Logarithmic Plot

This plot shows the top N infected countries on a logarithmic scale. alt text

Forecasting

An ARIMA model is trained and used to forecast the cumulative cases for N number of days into the future (DISCLAIMER: the forecast is a ballpark figure, and should not be taken as gospel) alt text

Accumulative Trend

This is an accumulative sum trendline for all the confirmed cases, deaths and recoveries. alt text

Daily Trendline

This is a daily sum trendline for all the confirmed cases, deaths and recoveries. alt text

Stacked Daily Confirmed Cases

This stacked bar chart shows a daily sum of people who are already confirmed (red) and the people who have been confirmed on that date (blue)

alt text

Daily Confirmed Cases

A count for new cases on a given date, does not take past confirmations into account. alt text

Daily Deaths

A count for deaths on a given date, does not take past deaths into account. alt text

Daily Recoveries

A count for new recoveries on a given date, does not take past recoveries into account. alt text

Currently Infected

A count for all the people who are currently infected for a given date. alt text


Credits

covidify's People

Contributors

aaronward avatar ajaymaity avatar barberw-osu avatar barneshere avatar creeble avatar kant avatar pnuw avatar weisisheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covidify's Issues

Suggestion: Remove contradictory and misleading tags

you're using misleading tags on this repository:

  • covid-virus
    • there's no such thing, there's a virus mentioned below and an illness covid-19 caused by it
  • wuhan, china
    • this is not a local event, it's a pandemic
  • wuhan-virus
    • its name is SARS-CoV-2
    • this term is used by nationalist circles, led by an orange maniac, in the United States of America only

please, from a European perspective how this pandemic is addressed and handled on the other side of the pond, is not comprehensible and very frightening. framings that deny the impact on any population do not make sense and are a danger b/c people are acting irresponsible when misinformation is spread by the authorities.

stay safe and protect others! speak scientific truth, not nationalist bigotry!

Error in starting ./pipeline.sh

Not sure what happened. thanks!

Traceback (most recent call last):
File "./src/data_prep.py", line 79, in
cleaned_ranges = clean_sheet_names(sheets)
File "./src/data_prep.py", line 27, in clean_sheet_names
clean_new_ranges = new_ranges.copy()
AttributeError: 'list' object has no attribute 'copy'

Data Exploration

Traceback (most recent call last):
File "./src/data_exploration.py", line 13, in
plt.style.use('ggplot')
AttributeError: 'module' object has no attribute 'style'

Cumulative trendline decreases

For US data, the cumulative trendline for confirmed cases decreases on some days. (Recovereds decreases on some days as well.)

America_confirmed_trendline

For example, covidify shows 33 confirmed cases on 2020-03-13. I'm using the data_exploration.ipynb notebook.

Data Missing When Parsed

Seems to be a bug in how it's parsing the csv pulled from the repo. As a test we will use "US" and "Texas"

covidify run --country="US"

Located in the "/tmp/corona/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/" folder, the csv reports show correct metrics for "Texas" which at this moment is "43" which can be varified from "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"

When the CSV is parsed for metrics the resulting "agg_data_2020-03-14.csv" results in a status of "27" listed for Texas which obviously suggests some discrepancy.

US not listed as country anymore

Many thanks for covidify, Aaron.

Describe the bug
It seems that covidify does not find data from the US anymore.

To Reproduce
I added print(country_list) in data_prep.py to see the candidates

rd@h370:~/tmp.nobackup/git/covid-19-analysis$ ./build/lib/covidify/pipeline.sh ./build/lib/covidify ~/tmp.nobackup/covidify git "US"

Job arguments:

ENV: ./build/lib/covidify
OUTPUT FOLDER: /home/rd/tmp.nobackup/covidify
DATA SOURCE: git
COUNTRIES: US

Data Extraction

git pull from https://github.com/CSSEGISandData/COVID-19.git
Getting sheets...
... importing data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 64/64 [00:03<00:00, 21.24it/s]
['italy', 'mongolia', 'sweden', 'curacao', 'romania', 'central african republic', 'grenada', 'laos', 'japan', 'slovakia', 'mexico', 'martinique', 'cameroon', 'singapore', 'nicaragua', 'liechtenstein', 'guatemala', 'thailand', 'senegal', 'gabon', 'cape verde', 'belarus', 'angola', 'dominican republic', 'new zealand', 'puerto rico', 'faroe islands', 'sri lanka', 'the gambia', 'brazil', 'colombia', 'norway', 'maldives', 'kazakhstan', 'occupied palestinian territory', 'jordan', 'oman', 'australia', 'nigeria', 'saint barthelemy', 'libya', 'mayotte', 'poland', 'congo', 'luxembourg', 'montenegro', 'costa rica', 'hungary', 'seychelles', 'hong kong', 'bosnia', 'trinidad and tobago', 'vietnam', 'benin', 'malaysia', 'taiwan', 'monaco', 'saudi arabia', 'guinea', "cote d'ivoire", 'rwanda', 'greenland', 'united kingdom', 'switzerland', 'azerbaijan', 'canada', 'argentina', 'uruguay', 'uzbekistan', 'cabo verde', 'israel', 'togo', 'el salvador', 'ireland', 'guernsey', 'mauritius', 'ethiopia', 'greece', 'peru', 'ghana', 'aruba', 'madagascar', 'uganda', 'ecuador', 'palestine', 'armenia', 'cayman islands', 'morocco', 'mozambique', 'iran', 'others', 'channel islands', 'bolivia', 'gambia', 'guadeloupe', 'saint kitts and nevis', 'holy see', 'india', 'denmark', 'east timor', 'antigua and barbuda', 'ukraine', 'turkey', 'cuba', 'austria', 'papua new guinea', 'haiti', 'burkina faso', 'finland', 'belize', 'niger', 'georgia', 'panama', 'estonia', 'chile', 'spain', 'the bahamas', 'tanzania', 'mali', 'russia', 'russian federation', 'liberia', 'north macedonia', 'china', 'cyprus', 'eritrea', 'jamaica', 'kenya', 'belgium', 'iraq', 'nepal', 'pakistan', 'netherlands', 'namibia', 'serbia', 'kosovo', 'america', 'saint lucia', 'saint martin', 'moldova', 'kuwait', 'zambia', 'zimbabwe', 'bangladesh', 'vatican city', 'paraguay', 'malta', 'lithuania', 'honduras', 'dominica', 'bhutan', 'syria', 'bulgaria', 'venezuela', 'bahrain', 'qatar', 'macau', 'equatorial guinea', 'cambodia', 'french guiana', 'brunei', 'philippines', 'indonesia', 'eswatini', 'reunion', 'portugal', 'croatia', 'ivory coast', 'algeria', 'san marino', 'latvia', 'slovenia', 'germany', 'djibouti', 'fiji', 'mauritania', 'suriname', 'guinea-bissau', 'south korea', 'guyana', 'saint vincent and the grenadines', 'jersey', 'south africa', 'somalia', 'united arab emirates', 'lebanon', 'guam', 'iceland', 'egypt', 'france', 'czechia', 'diamond princess', 'gibraltar', 'albania', 'afghanistan', 'kyrgyzstan', 'sudan', 'barbados', 'andorra', 'tunisia', 'st. martin', 'chad', 'timor-leste']
Country specified!
US was not listed.
rd@h370:~/tmp.nobackup/git/covid-19-analysis$

Could not pull from https://github.com/CSSEGISandData/COVID-19.git

It has stopped working (I just ran it again). Cannot pull from the REPO and no data in covidify-output - I/O error > something has gone wrong with the path or data permissions?

DATA SOURCE: git

Data Extraction

git pull from https://github.com/CSSEGISandData/COVID-19.git
Could not pull from https://github.com/CSSEGISandData/COVID-19.git

Data Exploration

Importing Data...
Traceback (most recent call last):
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/covidify/data_exploration.py", line 45, in
agg_df = pd.read_parquet(os.path.join(data_dir, agg_file))
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pandas/io/parquet.py", line 310, in read_parquet
return impl.read(path, columns=columns, **kwargs)
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pandas/io/parquet.py", line 124, in read
result = self.api.parquet.read_table(
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1271, in read_table
pf = ParquetDataset(source, metadata=metadata, memory_map=memory_map,
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1028, in init
self.metadata_path) = _make_manifest(
File "/Users/chriswarner/Desktop/covid/lib/python3.8/site-packages/pyarrow/parquet.py", line 1228, in _make_manifest
raise IOError('Passed non-file path: {0}'
OSError: Passed non-file path: /Users/chriswarner/Desktop/covidify-output/data/2020-03-16/agg_data_2020-03-16.parquet.gzip

No data shown

I keep trying to access data but it alternates between telling me covidify isn't a real command and I don't have access.

jordan@jordan-HP-EliteBook-8560w:~$ covidify run --source=git
MESSAGE: No country specified, defaulting to global cases
MESSAGE: No output directory given, defaulting to /Users/jordan/Desktop/
mkdir: cannot create directory β€˜/Users’: Permission denied

jordan@jordan-HP-EliteBook-8560w:~$ covidify run --output=/Users/award40/Documents/projects-folder --source=git
MESSAGE: No country specified, defaulting to global cases
mkdir: cannot create directory β€˜/Users’: Permission denied

Columns and DataType Not Explicitly Set on line 122 of github.py

Hello!

I found an AI-Specific Code smell in your project.
The smell is called: Columns and DataType Not Explicitly Set

You can find more information about it in this paper: https://dl.acm.org/doi/abs/10.1145/3522664.3528620.

According to the paper, the smell is described as follows:

Problem If the columns are not selected explicitly, it is not easy for developers to know what to expect in the downstream data schema. If the datatype is not set explicitly, it may silently continue the next step even though the input is unexpected, which may cause errors later. The same applies to other data importing scenarios.
Solution It is recommended to set the columns and DataType explicitly in data processing.
Impact Readability

Example:

### Pandas Column Selection
import pandas as pd
df = pd.read_csv('data.csv')
+ df = df[['col1', 'col2', 'col3']]

### Pandas Set DataType
import pandas as pd
- df = pd.read_csv('data.csv')
+ df = pd.read_csv('data.csv', dtype={'col1': 'str', 'col2': 'int', 'col3': 'float'})

You can find the code related to this smell in this link:

def get_data(cleaned_sheets):
all_csv = []
# Import all CSV's
for f in tqdm(sorted(cleaned_sheets), desc='... loading data: '):
if 'csv' in f:
try:
tmp_df = pd.read_csv(os.path.join(DATA, f), index_col=None,header=0, parse_dates=['Last Update'])
except:
# Temporary fix for JHU's bullshit data management
tmp_df = pd.read_csv(os.path.join(DATA, f), index_col=None,header=0, parse_dates=['Last_Update'])
tmp_df = clean_data(tmp_df)
tmp_df['date'] = tmp_df['datetime'].apply(get_date) # remove time to get date
tmp_df['file_date'] = get_csv_date(f) #Get date of csv from file name
tmp_df = tmp_df[KEEP_COLS]
tmp_df['province'].fillna(tmp_df['country'], inplace=True) #If no region given, fill it with country
all_csv.append(tmp_df)
df_raw = pd.concat(all_csv, axis=0, ignore_index=True, sort=True) # concatenate all csv's into one df
df_raw = fix_country_names(df_raw) # Fix mispelled country names
.

I also found instances of this smell in other files, such as:

File: https://github.com/AaronWard/covidify/blob/master/src/covidify/data_prep.py#L180-L190 Line: 185
File: https://github.com/AaronWard/covidify/blob/master/src/covidify/forecast.py#L95-L105 Line: 100
.

I hope this information is helpful!

Change Log πŸŽ‰

Version 1.3.0

  • Added pmdarima dependency
  • Fixed dating in bar graph images
  • Removed redundant parquet.gzip saving aggregation file from data_prep.py
  • Updated forecasting script to not do any testing (use all ground truth data for fitting arima model)
  • Added kofi button

Failed to run in windows.

when running in windows the code presents an error when it cannot find a path.

  1. Move to Desktop directory
  2. Try to run in CMD this comand:
covidify run --country="Colombia" --forecast=15 --output="."

Screenshots
error

  • OS: Windows
  • Version 10

I was missing logscale

Is your feature request related to a problem? Please describe.
Since the curves are supposed to be exponential, log scale for the y axis seems to be adequate.

Here is some local modification which worked for me:

rd@h370:~/tmp.nobackup/git/covid-19-analysis$ git diff src/covidify
diff --git a/src/covidify/data_visualization.py b/src/covidify/data_visualization.py
index 89db5c6..9ceae9c 100644
--- a/src/covidify/data_visualization.py
+++ b/src/covidify/data_visualization.py
@@ -95,6 +95,7 @@ def create_trend_line(tmp_df, date_col, col, col2, col3, fig_title, country):
fig, ax = plt.subplots(figsize=(20,10))
tmp_df.groupby([date_col])[[col, col2, col3]].sum().plot(ax=ax, marker='o')
ax.set_title(create_title(fig_title, country))

  • ax.set_yscale('log')
    fig = ax.get_figure()
    fig.savefig(os.path.join(image_dir, create_save_file(col, country, 'trendline')))

@@ -102,6 +103,7 @@ def create_bar(tmp_df, col, rgb, country):
fig, ax = plt.subplots(figsize=(20,10))
tmp = tmp_df.groupby(['date'])[[col]].sum()
ax.set_title(create_title(col, country))

  • ax.set_yscale('log')
    tmp.plot.bar(ax=ax, rot=45, color=rgb)
    fig = ax.get_figure()
    fig.savefig(os.path.join(image_dir, create_save_file(col, country, 'bar')))
    @@ -110,6 +112,7 @@ def create_stacked_bar(tmp_df, col1, col2, fig_title, country):
    tmp_df = tmp_df.set_index('date')
    fig, ax = plt.subplots(figsize=(20,10))
    ax.set_title(create_title(fig_title, country))
  • ax.set_yscale('log')
    tmp_df[[col2, col1]].plot.bar(ax=ax,
    rot=45,
    stacked=True)
    rd@h370:~/tmp.nobackup/git/covid-19-analysis$

a parameter for pipeline.sh certainly would be better.

I saw the you have a log jupyter notebook, which might do the same, but I did not get it working....

Automatically updating README.md

Could you please host the images on a remote server and setup a cron job to update these images often (every hour perhaps?) then load the images from the remote server in the README? That way I can just check the README.md here to keep up to date.

Thanks!

Data get mixed

Hi Aaron,

I just noticed, if I do two runs after each other, e.g. one for Germany, then one for Austria, I get plots from Germany in the Austria excel file.

I copy the covidify-test output directory to

http://bokomoko.de/~rd/covidify-test/

Here are my runs:

(covidify) rd@h370:~/virtualenv$ covidify run --source JHU --output ~/tmp.nobackup/covidify-test --country Germany
MESSAGE: No top countries given, defaulting to top 10

Job arguments:

... ENV: /home/rd/virtualenv/covidify/lib/python3.7/site-packages/covidify
... OUTPUT FOLDER: /home/rd/tmp.nobackup/covidify-test
... DATA SOURCE: JHU
... COUNTRIES: Germany
... TOP INFECTED COUNTRIES: 10
... FORECAST PERIOD: 10

Data Extraction

Creating folder...
... /tmp/corona/
Cloning Data Repo...
Getting sheets...
... loading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 107/107 [00:06<00:00, 16.57it/s]
Country specified!
... filtering data for Germany
... Calculating dataframe for new cases
Calculating data for logarithmic plotting...
... Germany: 169430
Creating subdirectory for data...
... /home/rd/tmp.nobackup/covidify-test/data/2020-05-08
Saving...
... agg_data_2020-05-08.parquet.gzip
... agg_data_2020-05-08.csv
... trend_2020-05-08.csv
... log_2020-05-08.csv
Done!

Training Forecasting Model

Training forecasting model...
... train/test split: 0.95
... RMSE: 482.349274870671
... forecasting 10 days in the future
... saving file: forecast_2020-05-08.csv
... saving graph

Data Visualization

Importing Data...
Creating graphs...
... Time Series Trend Line
... Daily Figures
... Daily New Infections Differences
... Logarithmic plots
Creating excel spreadsheet report...
... reading images for: log
... reading images for: forecasts
... reading images for: bar
... reading images for: trendline
Done!

Complete!

  • Results in: /home/rd/tmp.nobackup/covidify-test
    (covidify) rd@h370:~/virtualenv$ covidify run --source JHU --output ~/tmp.nobackup/covidify-test --country Austria
    MESSAGE: No top countries given, defaulting to top 10

Job arguments:

... ENV: /home/rd/virtualenv/covidify/lib/python3.7/site-packages/covidify
... OUTPUT FOLDER: /home/rd/tmp.nobackup/covidify-test
... DATA SOURCE: JHU
... COUNTRIES: Austria
... TOP INFECTED COUNTRIES: 10
... FORECAST PERIOD: 10

Data Extraction

git pull from https://github.com/CSSEGISandData/COVID-19.git
Getting sheets...
... loading data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 107/107 [00:06<00:00, 16.25it/s]
Country specified!
... filtering data for Austria
... Calculating dataframe for new cases
Calculating data for logarithmic plotting...
... Austria: 15752
Creating subdirectory for data...
... /home/rd/tmp.nobackup/covidify-test/data/2020-05-08
Saving...
... agg_data_2020-05-08.parquet.gzip
... agg_data_2020-05-08.csv
... trend_2020-05-08.csv
... log_2020-05-08.csv
Done!

Training Forecasting Model

Training forecasting model...
... train/test split: 0.95
... RMSE: 78.96984754390114
... forecasting 10 days in the future
... saving file: forecast_2020-05-08.csv
... saving graph

Data Visualization

Importing Data...
Creating graphs...
... Time Series Trend Line
... Daily Figures
... Daily New Infections Differences
... Logarithmic plots
Creating excel spreadsheet report...
... reading images for: bar
... reading images for: forecasts
... reading images for: trendline
... reading images for: log
Done!

Complete!

  • Results in: /home/rd/tmp.nobackup/covidify-test
    (covidify) rd@h370:~/virtualenv$

Desktop (please complete the following information):
(covidify) rd@h370:/virtualenv$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
(covidify) rd@h370:
/virtualenv$ python -V
Python 3.7.3
(covidify) rd@h370:~/virtualenv$

Many thanks for maintaining covidify :-)

Rainer

Covidify broken since rΓ©cent python 3.8.6 update

Hello,
Since yesterday's update of my Manjaro stable box (update including also some Python modules), the following command:

covidify run --output ~/Documents/Informatique/Python/covidify/france --country="France"

Throw this traceback:

Traceback (most recent call last):
  File "/home/h2/.local/lib/python3.8/site-packages/covidify/data_prep.py", line 101, in <module>
    df = check_specified_country(df, country)
  File "/home/h2/.local/lib/python3.8/site-packages/covidify/data_prep.py", line 78, in check_specified_country
    country_list = list(map(lambda x:x.lower().strip(), set(df.country.values)))
  File "/home/h2/.local/lib/python3.8/site-packages/covidify/data_prep.py", line 78, in <lambda>
    country_list = list(map(lambda x:x.lower().strip(), set(df.country.values)))
AttributeError: 'float' object has no attribute 'lower'

But covidify list --countries works perfectly.

Thanks for taking a look and have a nice day.

log-plots per country

A great way to compare the spread of Covid19 between countries at different stages of the outbreak is to look at the data in a log plot, since it makes any deviations from constant exponential growth really obvious.

See this here for an example of what I mean:

https://twitter.com/MarkJHandley/status/1237144386569416712/photo/1

What can also be helpful is to normalize the number of people infected to the population of the country, which makes it more intuitively clear how hard a country is hit.

The plot in the link above is great because it shifts the countries around in time to show how predictable the growth is, but that isn't absolutely necessary.

It would be great if your tool allowed making similar plots, for a user-defined selection of countries.

Thanks for your work!

program files windows error

Hi... thought this was really cool and i wanted to try it on my Windows machine. (This could very well be an issue with my install, but...)

Windows 10, Python 3.8.

Installed with pip install covidify. Everything went well.

When I run covidify run I get this:

PS C:\> covidify run
     MESSAGE: οΏ½[1;31m No output directory given, defaulting to /Users/<username>/Desktop/covidify-output/ οΏ½[0;0m
     MESSAGE: οΏ½[1;31m No source given, defaulting to John Hopkin CSSE github repo οΏ½[0;0m
'c:\program' is not recognized as an internal or external command,
operable program or batch file.

pmdarima dependency

Hi Aaron,

I run

pip3 install covidify

in a virtualenv and I had to run

pip3 install pmdarima

Is it intended that pmdarima is not installed as requirement?

Open Source Helps!

Thanks for your work to help the people in need! Your site has been added! I currently maintain the OpenSourceWuhan page, which collects all open source projects related to COVID-19, including maps, data, news, api, analysis, medical and supply information, etc. Please share to anyone who might need the information in the list, or will possibly contribute to some of those projects. You are also welcome to recommend more projects.

https://weileizeng.github.io/OpenSourceWuhan/world

Cheers!

ValueError: 'Last Update' is not in list

I get since today (w/o updating covid-19-analysis)

... importing data: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 59/62 [00:01<00:00, 25.02it/s]
Traceback (most recent call last):
File "./build/lib/covidify/data_prep.py", line 42, in
df = github.get()
File "/home/rd/tmp.nobackup/git/covid-19-analysis/build/lib/covidify/sources/github.py", line 140, in get
df = get_data(cleaned_sheets)
File "/home/rd/tmp.nobackup/git/covid-19-analysis/build/lib/covidify/sources/github.py", line 62, in get_data
header=0, parse_dates=['Last Update'])
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 787, in init
self._make_engine(self.engine)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1758, in init
self._set_noconvert_columns()
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1826, in _set_noconvert_columns
_set(val)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1816, in _set
x = names.index(x)
ValueError: 'Last Update' is not in list

ImportError: No module named covidify.sources

Hi

Im getting this error

`# covidify run --source=git --output=/root/map/ --country="Sweden"

Job arguments:

ENV: /usr/local/lib/python3.6/site-packages/covidify
OUTPUT FOLDER: /root/map/
DATA SOURCE: git
COUNTRIES: Sweden

Data Extraction

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/covidify/data_prep.py", line 22, in
from covidify.sources import github, wiki
ImportError: No module named covidify.sources`

EDIT: Nevermind, working. Changed default python on pipeline.sh to python3.6

ValueError: Format 'jpg' is not supported (supported formats: eps, pdf, pgf, png, ps, raw, rgba, svg, svgz)

I just installed covidify on my 2019 macbook air, python 3.7.6.
When I run 'covidify run', things go well until it tries to generate the line plots, and then throws the error:

ValueError: Format 'jpg' is not supported (supported formats: eps, pdf, pgf, png, ps, raw, rgba, svg, svgz)

I did some digging, and the issue is between matplotlib and pillow, specifically with versions of pillow >= 7.0. (see ipython/ipython#8052). Downgrading pillow to 6.x fixed it.

If it's happening to me, it'll happen to other folks. I think the easy fix is just to change the export to a .png format, which I believe has better support. That's my $.02

Full dump is below.

Alexs-Air-2:~ enjrolas$ covidify run
MESSAGE: No output directory given, defaulting to /Users/enjrolas/Desktop/covidify-output/
MESSAGE: No source given, defaulting to John Hopkin CSSE github repo

Job arguments:

ENV: /usr/local/lib/python3.7/site-packages/covidify
OUTPUT FOLDER: /Users/enjrolas/Desktop/covidify-output/
DATA SOURCE: git

Data Extraction

git pull from https://github.com/CSSEGISandData/COVID-19.git
Getting sheets...
... importing data: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 49/49 [00:02<00:00, 22.23it/s]
Sorting by datetime...
Calculating dataframe for new cases...
Creating subdirectory for data...
... /Users/enjrolas/Desktop/covidify-output/data/2020-03-09
Saving...
... agg_data_2020-03-09.parquet.gzip
... agg_data_2020-03-09.csv
... trend_2020-03-09.csv
Done!

Data Exploration

Importing Data...
Creating graphs...
... Time Series Trend Line
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/covidify/data_exploration.py", line 93, in
create_trend_line(agg_df, 'confirmed', 'deaths', 'recovered')
File "/usr/local/lib/python3.7/site-packages/covidify/data_exploration.py", line 68, in create_trend_line
fig.savefig(os.path.join(image_dir, '{}_trendline.jpg'.format(col)))
File "/usr/local/lib/python3.7/site-packages/matplotlib/figure.py", line 2180, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/usr/local/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 2014, in print_figure
canvas = self._get_output_canvas(format)
File "/usr/local/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 1956, in _get_output_canvas
.format(fmt, ", ".join(sorted(self.get_supported_filetypes()))))
ValueError: Format 'jpg' is not supported (supported formats: eps, pdf, pgf, png, ps, raw, rgba, svg, svgz)

Complete!

Negative values in US summary

I'm trying to figure out the semantics of the negative numbers in the US report in new_confirmed_cases (e.g., a=32, c=-184).

I assume these are retractions of some kind? I understand that they must be in the data, but I wonder if anyone knows the history of these.

Improve default path for the output data

Hello,
It should be better to not hard code the default path for the output data as you do with:
/Users/award40/Desktop/covidify-output/ it can't work on Linux and could mislead Windows users.
I think it's better to use: os.path.expanduser("~") to find the user's home on Windows, Linux or Mac systems and then the platform.system() function to choice the good user's desktop folder name.
Otherwise, thank you for this nice work, it works very well under Linux.

Updating the module would be desirable

With last updates of covidify dependencies I have some warnings:

Creating graphs...
... Time Series Trend Line
/home/h2/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py:1235: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xticklabels)
... Daily Figures
/home/h2/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py:1235: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels(xticklabels)
... Daily New Infections Differences
... Logarithmic plots
/home/h2/.local/lib/python3.8/site-packages/covidify/data_visualization.py:136: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  ax.set_yscale('log', basey=10)

Your module should not long work properly with the next updates of his dependencies.

No output with vanilla Windows Python install 3.8

Output of covidify run missing
Cloned the covidify repo and freshly installed both Python 2.7.17 and Python 3.8.2. x64 versions under Windows 10 according to recommendations found on https://bit.ly/2JgtF5d.

Both python versions seemed to run correctly from "Program files" but ran into several problems from the beginning trying to run covidify after pip install, the first one being missing "git.exe" in spite of previous GitHub Desktop installation, the last one being the known unrecognized "c:\program" due to spaces in folder names. Tried to fix, installing Git-scm first, then enabling 8dot3 and/or a "c:\program" junction leading to "Program Files" but it did not work until I uninstalled / reinstalled Python version in a real short named "c:\program" directory and reinstalled covidify by rerunning pip3 install.

Now "covidify list -countries" works fine, but alas running "covidify run" from Powershell or Console window, both with user or admin privileges, with or without --output dir specification result in opening a vanishing console window and no output.

Reading from closed issues, I'll try to switch to Conda environment. Sorry if vanilla Python is a problem.

Regards

To Reproduce
Steps to reproduce the behavior:

  1. run this 'covidify run' or 'covidify run -output ." from any user writeable directory
  2. check the directory content
  3. I stays empty

Expected behavior
Something generated instead an empty output directory.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows
  • Version 10 Pro 64 bits

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.