epidemics / covid Goto Github PK

View Code? Open in Web Editor NEW

20.0 6.0 12.0 49.08 MB

epidemicforcasting.org visualization repository

Home Page: http://epidemicforecasting.org

License: GNU Affero General Public License v3.0

Dockerfile 0.01% CSS 0.06% HTML 0.14% TypeScript 0.66% Shell 0.02% Python 0.81% Jupyter Notebook 98.31% R 0.01%

covid frontend typescript charts plotlyjs react

covid's Introduction

Epidemic forecasting

Frontend and data generation pipeline for http://epidemicforecasting.org/

This project is divided into web and data-pipeline modules. Each of this modules have README file in which setup and development process is described.

covid's People

Contributors

Stargazers

Watchers

Forkers

lagerros davidoj gossamr mkcor keyanb famargar rakyi mathijshenquet hughparsonage danuluma heulwen lgtm-migrator

covid's Issues

Automatically and regularly load data from containment measures database

@Bachibouzouk made a PR for displaying containment measures from a .csv exported from [the Notion database of containment measures] on country pages

We ideally want this feeding of data to be automatic rather than manual; and to have it update whenever the Notion database updates.

Notion doesn't have an API, but there's an unofficial Python API wrapper

This issue requires further speccing out for what's needed. This is just a reminder that it needs doing.

Hospital line data provision

add hospital data into the notion spec (design it) and populate the data in the pipeline, upload them.

@gavento proposes here:

The Y range should not be fixed at 0..200, since for many countries and scenarios much less than 20% of country pop will be infected at any moment. (E.g. 100k infected of 10M Czechs is a catastrophe but only goes to 10 on the chart?). I would vote for flexible domains?

@jacob replies here:

I’m not a big fan of flexible domains; but your argument is compelling. Happy to do it for now.
Maybe in future we could add a horizontal line for hospital capacity? Or some other contextualising measure? As long as we could actually get reliable data on it.

MITIGATION STRENGTH -> GLOBAL MITIGATION STRENGTH

MITIGATION STRENGTH

Update color scheme to reflect designs

Here's what we're shooting for in terms of colors (moving away from the current all black):

Colors and more sketches can be found in this Figma file

The website is not responsive at all

style basic landing page

The current one is just for test purposes.

AC:

A minimalistic page is accessible and one can navigate to the first chart/interaction

Display of countermeasures

Display the countermeasures from the most recent

Display predicted, actual cases + population on model pages

Technical

@lagerros is going to compile a fixed JSON indexed by Country, we place that as a file at GCS and render number in the similar way as we now do with the plot or did with countermeasures, or how done by @Mati-Roy here.

Proper data from pipeline will be provided later.

Numbers are 1:

Confirmed
Estimated true number*
Population

with that note from the #160

UX design

It should look similar to this one:

view 2: risk calculation based on the location

For a given place, the user would input the size of an event; she would get a probability estimate someone is infected.

Like ... he is organising a concert with 100 ppl - would input 100. Based on the number of active cases and uncertainty, we would calculate the risk (basically just doing the (1-fraction)^N for them)

This will apply to the content reachable via navigation menu "Event Prediction" in the website, after submitting the form on this page.

Spec

Inputs:

size of an event
date of the event (TBD?)
(age?)

user clicks on a button

Outputs:

probability of someone infected being there / getting infected (/ dying?) - ideally as "one out of X people get infected"

AC

it's possible to go to this view from the main page, provide the input and get the number based on the estimated risk from the modelled data

transformation pipeline for new data

there are some steps to wrangle the data already
we want to be able to have a nice single pipeline out of it, which would output a file with all relevant data needed for the plots (the smaller, the better, of course). Could be e.g. HDF with multiple datasets in it.
the data inside should be ideally fully self-described, i.e. that single HDF would also include metadata such as "when the modelled was generated" or "what parameters correspond to what column"
ideally, it work with the deployment easily (currently via initContainers, also mentioned in point 1),
There is also related #12 task

Bonus points for making it automated, but that can come in a followup task.

Relevant discussions:

next set of data should be available today https://epidemicforecasting.slack.com/archives/CV93A58G7/p1584087032011600 (13.03.2020)
validation of the data prep for visualization

AC:

when new modelled data arrive, it's at maximum few commands away (possibly including redeploy) from being able to update it at the size

Use DNS instead of IP addresses

We currently have hardcoded IP addresses in our configs. Use domain names instead.

AC:

chart values use domain names
ingress have path specified

Containment measures only updates on refresh, or adds new country measures instead of replacing

#108
E.g. if I am on this page:

http://epidemicforecasting.org/model?selection=France

And go to Japan, it gets added underneath the current list:

@Bachibouzouk

Dynamically display country name (tiny easy fix)

Based off of #138

Make this reflect the actual country selected from the dropdown menu on the "Model" page:

add analytics

We want to add some analytics to the website to see the number of visits and some default demographics information.

Google Analytics comes into mind.

AC:

we have basic information about the number of visits and stuff

describe the data

To prevent that everyone is duplicating the data understanding, it would be nice to have a data description in the repository (data can be download here). That means that every data file(-type) should be described as:

what the name of the datafile means (e.g. cities/123-3.tsv - is that a city with ID 123 from the md_cities.tsv? What's the suffix -3? - UPDATE: Jan Kulveit will tell us which is the correct one!)
what the specific datafile represents/stores (e.g. "modelled data for a specific region")
what each column means (e.g. "Median means number of infected people", "Timestep corresponds to a day", ...)
what datatypes are in the columns (string, category, int32, float64, ...)
how to fix the data if there are any errors (partly done)
how to load each dataset (partly done)

Bonus

automating what can be done (e.g. the data prep, or the loading)

Hints

what "active cases" means in the {area-type}/{filename}.tsv - median or maybe this difference?
Daniel's proposal
from the call agenda - basically, we'll have e.g. 4 or up to 8 different versions of the same file, each corresponding to a different model

AC

there is a clear description of the available datasets in the data-prep/README.md and a way how to load them and work with them

For each country-specific view (view 1), display text of containment measures from containment measures database

The measures for each location can be found in this database: https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v=aa8e0c75520a479ea48f56cb4c289b7e

Not yet implemented: parser from notion to pandas or csv

import os
import keyring


from notion.client import NotionClient

# Obtain the `token_v2` value by inspecting your browser cookies on a logged-in session on Notion.so
# I used keyring to save the token locally
token_v2 = keyring.get_password("notion", "token")
client = NotionClient(token_v2=token_v2)

cv = client.get_collection_view(
    "https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v"
    "=aa8e0c75520a479ea48f56cb4c289b7e"
)
# here are the fields which interest us
# Country
# Description of measure implemented
# Keywords
# Source


# Run a filtered/sorted query using a view's default parameters
result = cv.default_query().execute()
for row in result:
    print(row.description_of_measure_implemented)

The containment measures could be in the sidebar with similar design as the previous one (one entry for each date, descending)

Automatically and regularly run GLEaM simulations for different parameter settings

Current parameter configurations are listed in this Notion database.

This is currently done by hand, and the files are then manually imported to the server.

Ideally we'd find some kind of GLEaM API and be able to regularly run new simulations for all the different parameter configurations (as well as flexibly extending the number of configurations we want to run).

This needs further speccing out, but leaving this issue as a pointer

Add initial_infected_per_1000 to data

The negative infected numbers we saw in line data do make sense, since the countries should not start at 0 infected but at the number we set them to be in the simulation. Heavily pre-infected countries like Iran then quickly get into negative. Not sure how to get this data in the fastest way, though - they are not a part of the GV export not HDF5.

Remove the text in footer

remove this: Disclaimer: All data used to produce this map are exclusively collected from publicly available sources including government reports and news media

split server and charts dockerfiles/images

Not sure if we want this one though, maybe it's easier this way than managing more images and deps.

See todo in Dockefile. Currently we use the same image for both server and bokeh and specify entrypoints everywhere.

model visualization legend

@jacobs says here:

@mati Roy I’d like to add a legend to the basic model visualisation. Should be doable in d3 quite easily.
Showing it to some users, their reaction was “I don’t know what all the colored lines mean”.
So I suggest we display some details about what the trajectories are. We’ll map the seasonality parameters {0.1, 0.7, 0.85} to {low seasonality, high seasonality, very high seasonality} and the air traffic parameters {0.2, 0.7} to {much reduced air traffic, slightly reduced air traffic}.
@jan Kulveit was keen not to include any simulation numbers that don’t make sense if you’re not familiar with gleam, so I hope changing from numbers to more vague verbal indications will work. But I expect Jan to have more opinions in the morning, and we’ll need some back and forth to figure out the right way of making the graphs more interpretable. Just treating this as a starting point for exploring options. (edited)

@jan K. says here:

I'm fine with the verbal descriptions
I've checked with 2 user and you were right

are seasonality params really 0.1,0.7,0.85? should not be the case; we should have "no seasonality | weak seasonality | medium seasonality" (it's not strong)

for param values (seasonality switched off or param setting 1; 0;85 ; 0.7
for air traffic I'd call it "strong reduction in air travel|weak reduction in air travel"

design automating model computation

Instead of manually triggering the gleamviz computation, could we automate it? Design it and do some cards to implement it. Is it even possible with gleamviz?

related: #9

add autoscaling

Currently, we don't handle autoscaling anyhow intelligently.

The main node pool has scaling from 0-5 replicas, but the services themselves not and replicas are set to 1.

Using horizontal pod autoscaler would probably help, even the default one:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Missing footer UI

Make the "Models" page the main page

Currently the main page is the bubble map (called "Public information"). However, we want to de-emphasize that and focus on the core product which is the gleam-supported model.

When one clicks "Epidemic forecasting" one should be taken to the "Models" page, and the url epidemicforecasting.org should redirect to the "Models page"

view 3: form to request a specific model calculation

initially likely a request for on-demand scenario simulation for the use of professionals / decision makers / health pros; the user would have the option to specify more precise location, and we would initially run the required simulations by hand source of description.

A user should be able to submit a form, which is somehow persisted (either via sending it to an email or to a DB).

This will apply to the content reachable via navigation menu "Request Model" in the website

So basically a form which would submit some stuff in our DB

Parameters (source of requirements):

Location
Use case;

If there are some typical use cases, we could automate it
Bonus points for knowing how to automate the recalculation of the model in the future.

AC

the form is available in the site and one can submit the parameters which are persisted

App deployment

Design and implement the deployment of the application. Likely the following:

on a merge to master, deploy to production environment, on a merge to beta branch, deploy to beta environment
it builds the dockerimage, push it to a registry, triggers the deployment

AC:

it's possible to deploy the application, the application is working

get credits/money for the project infrustructure

We currently have 300$ and it seems that we burn maybe like 50$/day. But k8s will start to be 72$/month just for the node, therefore we may need to get more money...

Some rough calculation and maybe cluster optimization could be considered.

Related #35

The model graph does not have axis labels

From https://bl.ocks.org/d3noob/23e42c8f67210ac6c678db2cd07a747e

    // text label for the x axis
    svg.append("text")
      .attr("transform",
            "translate(" + (width/2) + " ," +
                           (height + margin.top + 20) + ")")
      .style("text-anchor", "middle")
      .text("Date");

    // text label for the y axis
    svg.append("text")
      .attr("transform", "rotate(-90)")
      .attr("y", 0 - margin.left)
      .attr("x",0 - (height / 2))
      .attr("dy", "1em")
      .style("text-anchor", "middle")
      .text("Value");

view 1 - visualize model with CI intervals per area

Based on Jan Kulveit's description in slack (also this).

Specification

Selection

User should be able to select the following parameters:

area: a selection from countries and cities (not hemispheres, regions...)
reduction transmission due countermeasures - 0%, 25%, 50%, 75%

User selects e.g. 25%, and then we display 4 (or up to 8) different (gleamviz-precomputed) timeseries with different additional gleamviz parameters (e.g. seasonality).

The motivation for multiple trajectories is to visualize the uncertainty (a tooltip "Why multiple lines?" with an explanation may be helpful).

Data will be provided by Jan.

Open questions:

we can select just some specific main cities/locations, e.g. we don't have data for Africa [Jan does it]

Technical

x-axis: "timestep", start-date is in the definition.xml (+ we should somewhere note the date of the model generation, e.g. in a tooltip)
the central line glyph displays median column for the given area
band glyph is shown for lower and upper CI around the central line glyph
- we don't want to visualize the 95% CI intervals from the file! These are CIs for stochastic uncertainty of the model generation, not the uncertainty from the parameters (i.e. seasonality). For MVP, we just show the "medians" - lines, we can think about showing this uncertainty better in the future (e.g. shifting those medians back and forth by few days or by fitting some "average" from all four/eight models)
hover tool which displays relevant values for a given position. I believe we should just show the value of each line.

AC

the chart corresponding to the spec is accessible via the website

Static files in GCS have too long cache times

The GCS cache where the data files are stored has a default timeout of 1h - too much to do any experiments. Also will be problem later in production.
Now, when you update the data file (preserving the name), the cache keeps and serves the old data for 1h. Even if you delete it in GCS, the file is still served 🤷‍♂️

Individual files can get Cache-control: public, max-age=10, but needs to be done on every upload (or we get 1h trap). Can we somehow fix this in a better way? @hnykda

serve static files from CDN

Only if it's actually needed. We could reduce some load on the server if we hosted the static files elsewhere (JS and CSS). That would reduce responsivness of the site (userfriendlines) and reduce cost on compute cost and egress between the client and the server.

https://epidemicforecasting.slack.com/archives/C0100EK7N8N/p1584518497196200?thread_ts=1584472523.152600&cid=C0100EK7N8N

Solution:

add gsutil cp -Z -a <somecacheheaders as in epifor> ... step which would take the content of src/server/static and upload it to the bucket (for staging and production independently), ideally two way sync
setup the static_url to be served from GCS in deployments, maintain the file serving locally, that likely means dealing with url_for , maybe https://fastapi.tiangolo.com/tutorial/static-files/ and changing url_for from /static to something like https://storage.googleapis.com/static-covid/static/

Rescale graph to be in range

Currently the y-axis doesn't adjust to fit the data, c.f.

This should be fixed so that you can always see the entire graph.

(Y-axis will at most go to 100%)

get a certificate and implement SSL

We have ingress - SSL certs can be somewhat easily added to the ingress definition (see deploy/chart/templates...).

Ingress doesn't allow http redirects though, so we may have to:

keep http paths available too
set up https://www.starlette.io/middleware/#httpsredirectmiddleware so the app redirects http -> https

be strict about deployment branch names

Currently not only master gets deployed to production, but every branch which has master... So e.g. a branch called do-not-ever-push-to-master will make it straight into production

Same is true for staging. (because in the github actions steps I am using contains(github.ref, "master"), and not just github.ref == "master" or whatever should be there)

Date format should/could be enforced to ISO 8601, global standart numeric date

https://xkcd.com/1179/

YYYY-MM-DD

ISO since 1988...

add "user feedback" option

there is a way where user can provide a feedback (and that one is send to an email for example, e.g. formspree)

"Cancel your event" page 1 UI

https://www.figma.com/file/4FTT6neJVlMD5CQHTTpprF/Covid?node-id=0%3A1

model view: various UI fix

From @jacob (https://epidemicforecasting.slack.com/archives/CV0GGP151/p1584421050240600?thread_ts=1584420188.240200&cid=CV0GGP151)

For the form, make the countries appear in alphabetical order
Make the graph smaller so it fits the page (and the top of the “how the model works” section is in view)
[was already done] Make “Containment measures” heading smaller, also make the orange dates smaller
Include link to containment measures database at top of the sidebar and say that’s where data is sourced from: https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v=aa8e0c75520a479ea48f56cb4c289b7e
[was already done] Vertically align all the items consistently to the left, e.g. “Different global mitigation strengths” with “Mitigation strength” heading, the form, the graph, the “How the model works” section. Currently they’re all misaligned
Display values on y-axis as percentages (by default they are per 1000 persons, so divide by 1000), and the label as “Active infections” (I think this is what it shows, since it goes down occasionally, but someone could correct me)

Prevent x-axis label from being hidden

This takes off from #132 , which introduces this styling problem. When the font size is increased and the label "Date" is placed lower beneath the graph, it gets hidden behind some black content.

view 4: "bubble map" to go from landing page to view 1

Spec

UI nr4. - just a world map with shading representing the intensity of infection, upon click on a location people would end up in UI nr 1. Aim is to have simple navigation for the homepage

tech solution

I believe we want something like gmap or this

AC

the view is working on the landing page and providing the click-through to the first chart

Fix filtering of country countermeasures

Currently, it always shows just China.

AC:

when a country is selected, only countermeassures for the country are selected

The containment measures sidebar is only roughly styled

could have a scroll bar on desktop computers to not expand the content of the div too far below
are fontsize, color, spacing etc good?

This list is generated by a js loop, but each element is in the following div

 <div class="containment_measure">
   <h3 class="num">{{ date }}</h3>
   <div class="area">{{ text }} <a href="{{ source_link }}" target="_blank">Source</a></div>
 </div>

These classes can be used for styling

There is no changelog file

Brittle data handling in lines.js

The JS code in https://github.com/epidemics/covid/blob/master/src/server/static/js/lines.js#L65 explicitely tests for beta === "0.0" etc, which breaks any other (valid) floating number encoding in the data (e.g. 0.00, 0). Can we make it more robust?

Suggestion: Data Model for the server

I suggest to have a data_model package that will serve for both the charts and server with the following API:

@dataclass
class Place:
    # The following can be properties or fields
    name: str 
    type: str         # city / country / continent / world
    id: str
    location: namedtuple(long, lat)  # Location
    population: int

    def get_foretold_io_data(self) -> pd.DataFrame:
        # Columns: date, number, pdf, cdf
        # Filter on origin DF

    def get_gleamviz_data(self) -> pd.DataFrame:
        # Columns: as required by the plots
        # Filter on origin DF

    def get_median_forecasts(self) -> pd.Series:
        # date -> float
        # from predict_io_data


class Data:
    def __init__(...):
        # Initialiaze from the necessary files (or one file)
        # a) gleamviz output
        # b) foretold output (necessary for bubble chart)
        # c) locations (necessary for bubble chart)
        # d) populations (necessary for event risk prediction)
        # more?

    @property
    def selected_places(self) -> List[Place]:
        ...

    def __getitem__(self, name): -> Place:
         ...

Related to #29 and #12

Add scrolling for containment measures sidebar

Currently the sidebar takes over the whole page:

We'll want to instead give it a limited height, within which one can scroll down.

One reason for this is that we might want to fit other things in the sidebar, like additional statistics or an institution logo

The jinja function url_for is not used everywhere for paths

This is a "nice-to-have" with lower priority

Example
<link href="../static/css/main.css" rel="stylesheet" />
would become
<link href="{{ url_for('static', path='/css/main.css') }} rel="stylesheet" />

If you want the url of a view you can simply use url_for(<name of the view>)

optimize deployment strategy

Currently, pushing to staging and production can be done independently (e.g. you can deploy something to production without first testing on staging). Is it OK?

Maybe if we removed the "build and push" step from the production pipeline, we would know that it can be deployed without first having it in staging.

Check with devs.

There is no license for the data or the code

I would propose CC 0 for the data and AGPL-3 for the code

epidemics / covid Goto Github PK

covid's Introduction

Epidemic forecasting

covid's People

Contributors

Stargazers

Watchers

Forkers

covid's Issues

Technical

UX design

Spec

AC

Bonus

Hints

AC

AC

Specification

Selection

Technical

AC

Spec

tech solution

AC

Recommend Projects

Recommend Topics

Recommend Org