Frontend and data generation pipeline for http://epidemicforecasting.org/
This project is divided into web
and data-pipeline
modules.
Each of this modules have README file in which setup and development process is described.
epidemicforcasting.org visualization repository
Home Page: http://epidemicforecasting.org
License: GNU Affero General Public License v3.0
Frontend and data generation pipeline for http://epidemicforecasting.org/
This project is divided into web
and data-pipeline
modules.
Each of this modules have README file in which setup and development process is described.
@Bachibouzouk made a PR for displaying containment measures from a .csv exported from [the Notion database of containment measures] on country pages
We ideally want this feeding of data to be automatic rather than manual; and to have it update whenever the Notion database updates.
Notion doesn't have an API, but there's an unofficial Python API wrapper
This issue requires further speccing out for what's needed. This is just a reminder that it needs doing.
add hospital data into the notion spec (design it) and populate the data in the pipeline, upload them.
The Y range should not be fixed at 0..200, since for many countries and scenarios much less than 20% of country pop will be infected at any moment. (E.g. 100k infected of 10M Czechs is a catastrophe but only goes to 10 on the chart?). I would vote for flexible domains?
I’m not a big fan of flexible domains; but your argument is compelling. Happy to do it for now.
Maybe in future we could add a horizontal line for hospital capacity? Or some other contextualising measure? As long as we could actually get reliable data on it.
MITIGATION STRENGTH
Here's what we're shooting for in terms of colors (moving away from the current all black):
Colors and more sketches can be found in this Figma file
The current one is just for test purposes.
AC:
Display the countermeasures from the most recent
@lagerros is going to compile a fixed JSON indexed by Country, we place that as a file at GCS and render number in the similar way as we now do with the plot or did with countermeasures, or how done by @Mati-Roy here.
Proper data from pipeline will be provided later.
Numbers are 1:
with that note from the #160
For a given place, the user would input the size of an event; she would get a probability estimate someone is infected.
Like ... he is organising a concert with 100 ppl - would input 100. Based on the number of active cases and uncertainty, we would calculate the risk (basically just doing the (1-fraction)^N for them)
This will apply to the content reachable via navigation menu "Event Prediction" in the website, after submitting the form on this page.
Inputs:
user clicks on a button
Outputs:
Bonus points for making it automated, but that can come in a followup task.
Relevant discussions:
AC:
We currently have hardcoded IP addresses in our configs. Use domain names instead.
AC:
#108
E.g. if I am on this page:
http://epidemicforecasting.org/model?selection=France
And go to Japan, it gets added underneath the current list:
Based off of #138
Make this reflect the actual country selected from the dropdown menu on the "Model" page:
We want to add some analytics to the website to see the number of visits and some default demographics information.
Google Analytics comes into mind.
AC:
To prevent that everyone is duplicating the data understanding, it would be nice to have a data description in the repository (data can be download here). That means that every data file(-type) should be described as:
cities/123-3.tsv
- is that a city with ID 123 from the md_cities.tsv
? What's the suffix -3
? - UPDATE: Jan Kulveit will tell us which is the correct one!)Median
means number of infected people", "Timestep
corresponds to a day", ...){area-type}/{filename}.tsv
- median
or maybe this difference?The measures for each location can be found in this database: https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v=aa8e0c75520a479ea48f56cb4c289b7e
Not yet implemented: parser from notion to pandas or csv
import os
import keyring
from notion.client import NotionClient
# Obtain the `token_v2` value by inspecting your browser cookies on a logged-in session on Notion.so
# I used keyring to save the token locally
token_v2 = keyring.get_password("notion", "token")
client = NotionClient(token_v2=token_v2)
cv = client.get_collection_view(
"https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v"
"=aa8e0c75520a479ea48f56cb4c289b7e"
)
# here are the fields which interest us
# Country
# Description of measure implemented
# Keywords
# Source
# Run a filtered/sorted query using a view's default parameters
result = cv.default_query().execute()
for row in result:
print(row.description_of_measure_implemented)
The containment measures could be in the sidebar with similar design as the previous one (one entry for each date, descending)
Current parameter configurations are listed in this Notion database.
This is currently done by hand, and the files are then manually imported to the server.
Ideally we'd find some kind of GLEaM API and be able to regularly run new simulations for all the different parameter configurations (as well as flexibly extending the number of configurations we want to run).
This needs further speccing out, but leaving this issue as a pointer
The negative infected numbers we saw in line data do make sense, since the countries should not start at 0 infected but at the number we set them to be in the simulation. Heavily pre-infected countries like Iran then quickly get into negative. Not sure how to get this data in the fastest way, though - they are not a part of the GV export not HDF5.
remove this: Disclaimer: All data used to produce this map are exclusively collected from publicly available sources including government reports and news media
Not sure if we want this one though, maybe it's easier this way than managing more images and deps.
See todo in Dockefile. Currently we use the same image for both server and bokeh and specify entrypoints everywhere.
@mati Roy I’d like to add a legend to the basic model visualisation. Should be doable in d3 quite easily.
Showing it to some users, their reaction was “I don’t know what all the colored lines mean”.
So I suggest we display some details about what the trajectories are. We’ll map the seasonality parameters {0.1, 0.7, 0.85} to {low seasonality, high seasonality, very high seasonality} and the air traffic parameters {0.2, 0.7} to {much reduced air traffic, slightly reduced air traffic}.
@jan Kulveit was keen not to include any simulation numbers that don’t make sense if you’re not familiar with gleam, so I hope changing from numbers to more vague verbal indications will work. But I expect Jan to have more opinions in the morning, and we’ll need some back and forth to figure out the right way of making the graphs more interpretable. Just treating this as a starting point for exploring options. (edited)
I'm fine with the verbal descriptions
I've checked with 2 user and you were right
- are seasonality params really 0.1,0.7,0.85? should not be the case; we should have "no seasonality | weak seasonality | medium seasonality" (it's not strong)
for param values (seasonality switched off or param setting 1; 0;85 ; 0.7
for air traffic I'd call it "strong reduction in air travel|weak reduction in air travel"
Instead of manually triggering the gleamviz computation, could we automate it? Design it and do some cards to implement it. Is it even possible with gleamviz?
related: #9
Currently, we don't handle autoscaling anyhow intelligently.
The main node pool has scaling from 0-5 replicas, but the services themselves not and replicas are set to 1.
Using horizontal pod autoscaler would probably help, even the default one:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Currently the main page is the bubble map (called "Public information"). However, we want to de-emphasize that and focus on the core product which is the gleam-supported model.
When one clicks "Epidemic forecasting" one should be taken to the "Models" page, and the url epidemicforecasting.org should redirect to the "Models page"
initially likely a request for on-demand scenario simulation for the use of professionals / decision makers / health pros; the user would have the option to specify more precise location, and we would initially run the required simulations by hand source of description.
A user should be able to submit a form, which is somehow persisted (either via sending it to an email or to a DB).
This will apply to the content reachable via navigation menu "Request Model" in the website
So basically a form which would submit some stuff in our DB
Parameters (source of requirements):
If there are some typical use cases, we could automate it
Bonus points for knowing how to automate the recalculation of the model in the future.
Design and implement the deployment of the application. Likely the following:
master
, deploy to production environment, on a merge to beta
branch, deploy to beta environmentAC:
We currently have 300$ and it seems that we burn maybe like 50$/day. But k8s will start to be 72$/month just for the node, therefore we may need to get more money...
Some rough calculation and maybe cluster optimization could be considered.
Related #35
From https://bl.ocks.org/d3noob/23e42c8f67210ac6c678db2cd07a747e
// text label for the x axis
svg.append("text")
.attr("transform",
"translate(" + (width/2) + " ," +
(height + margin.top + 20) + ")")
.style("text-anchor", "middle")
.text("Date");
// text label for the y axis
svg.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 0 - margin.left)
.attr("x",0 - (height / 2))
.attr("dy", "1em")
.style("text-anchor", "middle")
.text("Value");
Based on Jan Kulveit's description in slack (also this).
User should be able to select the following parameters:
area
: a selection from countries and cities (not hemispheres, regions...)reduction transmission due countermeasures
- 0%, 25%, 50%, 75%User selects e.g. 25%, and then we display 4 (or up to 8) different (gleamviz-precomputed) timeseries with different additional gleamviz parameters (e.g. seasonality).
The motivation for multiple trajectories is to visualize the uncertainty (a tooltip "Why multiple lines?" with an explanation may be helpful).
Data will be provided by Jan.
Open questions:
x-axis
: "timestep", start-date is in the definition.xml
(+ we should somewhere note the date of the model generation, e.g. in a tooltip)median
column for the given areaThe GCS cache where the data files are stored has a default timeout of 1h - too much to do any experiments. Also will be problem later in production.
Now, when you update the data file (preserving the name), the cache keeps and serves the old data for 1h. Even if you delete it in GCS, the file is still served 🤷♂️
Individual files can get Cache-control: public, max-age=10
, but needs to be done on every upload (or we get 1h trap). Can we somehow fix this in a better way? @hnykda
Only if it's actually needed. We could reduce some load on the server if we hosted the static files elsewhere (JS and CSS). That would reduce responsivness of the site (userfriendlines) and reduce cost on compute cost and egress between the client and the server.
Solution:
gsutil cp -Z -a <somecacheheaders as in epifor> ...
step which would take the content of src/server/static
and upload it to the bucket (for staging and production independently), ideally two way syncstatic_url
to be served from GCS in deployments, maintain the file serving locally, that likely means dealing with url_for
, maybe https://fastapi.tiangolo.com/tutorial/static-files/ and changing url_for from /static
to something like https://storage.googleapis.com/static-covid/static/
We have ingress - SSL certs can be somewhat easily added to the ingress definition (see deploy/chart/templates...
).
Ingress doesn't allow http redirects though, so we may have to:
Currently not only master
gets deployed to production, but every branch which has master
... So e.g. a branch called do-not-ever-push-to-master
will make it straight into production
Same is true for staging
. (because in the github actions steps I am using contains(github.ref, "master")
, and not just github.ref == "master"
or whatever should be there)
AC
From @jacob (https://epidemicforecasting.slack.com/archives/CV0GGP151/p1584421050240600?thread_ts=1584420188.240200&cid=CV0GGP151)
For the form, make the countries appear in alphabetical order
Make the graph smaller so it fits the page (and the top of the “how the model works” section is in view)
[was already done] Make “Containment measures” heading smaller, also make the orange dates smaller
Include link to containment measures database at top of the sidebar and say that’s where data is sourced from: https://www.notion.so/977d5e5be0434bf996704ec361ad621d?v=aa8e0c75520a479ea48f56cb4c289b7e
[was already done] Vertically align all the items consistently to the left, e.g. “Different global mitigation strengths” with “Mitigation strength” heading, the form, the graph, the “How the model works” section. Currently they’re all misaligned
Display values on y-axis as percentages (by default they are per 1000 persons, so divide by 1000), and the label as “Active infections” (I think this is what it shows, since it goes down occasionally, but someone could correct me)
This takes off from #132 , which introduces this styling problem. When the font size is increased and the label "Date" is placed lower beneath the graph, it gets hidden behind some black content.
UI nr4. - just a world map with shading representing the intensity of infection, upon click on a location people would end up in UI nr 1. Aim is to have simple navigation for the homepage
I believe we want something like gmap or this
This list is generated by a js loop, but each element is in the following div
<div class="containment_measure">
<h3 class="num">{{ date }}</h3>
<div class="area">{{ text }} <a href="{{ source_link }}" target="_blank">Source</a></div>
</div>
These classes can be used for styling
The JS code in https://github.com/epidemics/covid/blob/master/src/server/static/js/lines.js#L65 explicitely tests for beta === "0.0"
etc, which breaks any other (valid) floating number encoding in the data (e.g. 0.00
, 0
). Can we make it more robust?
I suggest to have a data_model
package that will serve for both the charts
and server
with the following API:
@dataclass
class Place:
# The following can be properties or fields
name: str
type: str # city / country / continent / world
id: str
location: namedtuple(long, lat) # Location
population: int
def get_foretold_io_data(self) -> pd.DataFrame:
# Columns: date, number, pdf, cdf
# Filter on origin DF
def get_gleamviz_data(self) -> pd.DataFrame:
# Columns: as required by the plots
# Filter on origin DF
def get_median_forecasts(self) -> pd.Series:
# date -> float
# from predict_io_data
class Data:
def __init__(...):
# Initialiaze from the necessary files (or one file)
# a) gleamviz output
# b) foretold output (necessary for bubble chart)
# c) locations (necessary for bubble chart)
# d) populations (necessary for event risk prediction)
# more?
@property
def selected_places(self) -> List[Place]:
...
def __getitem__(self, name): -> Place:
...
This is a "nice-to-have" with lower priority
Example
<link href="../static/css/main.css" rel="stylesheet" />
would become
<link href="{{ url_for('static', path='/css/main.css') }} rel="stylesheet" />
If you want the url of a view you can simply use url_for(<name of the view>)
Currently, pushing to staging and production can be done independently (e.g. you can deploy something to production without first testing on staging). Is it OK?
Maybe if we removed the "build and push" step from the production pipeline, we would know that it can be deployed without first having it in staging.
Check with devs.
I would propose CC 0 for the data and AGPL-3 for the code
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.