Coder Social home page Coder Social logo

indicatore_zona_gialla's Introduction

Banner

refresh_data

GitHub last commit GitHub issues GitHub forks GitHub license

Here there are a few topics covered within this repo. A more indeep explanation of resuls and methodologies as well as the tech stack is offered in the project documentation under major revision

Description

(Back to top)

The project is articulated into 3 parts:

  • set up a new KPI i.e. Indicatore di Stress Regionale
  • build ab ETL (Extraction, Trasnformation & Loading) pipeline sourcing and merging (then computing Indicatore di Stress) 3 data Sources
  • visualize & design a frontend with [DataWrapper])(https://www.datawrapper.de/) based on aggregated data

the newly created Indicatore di Stress will be measuring the overall stress undertaken by the SSN (Servizio Sanitario Nazionale e.g. italian NHS) which results in a combination of the Vaccination and the Incidence effects computed per region. Both of the 2 dimensions can reflect hospital Saturation and define criteria according to which regions are labelled as either yellow, orange or red zones (Zona Gialla, Zona Arancione e Zona Rossa), in compliance with the most recent orders (D.L. 52/2021). This presumably will help decision makers have a neat undestranding of the current situation and an updated perspective on future scenarios.

The etl pipeline joins and cleans up 3 data sources:

Data is extracted, cleanded and aggregated every 6 hours. At the end of the pipeline within the ./data folder the output file is written in a .csv format. Methodology and software choices are outlined in the documentation which is currently under major revision

etl pipeline

Visualizations (barcharts, arrowplots, custom tables and more) are made with Datawrapper whose data source points to the urls at ./data/data-graph.

visualization

Installation

(Back to top)

You might have noticed the Back to top button(if not, please notice, it’s right there above!). This is a good idea because it makes this README easy to navigate. If you are willing to install this project on your machine the recommendend choice is to git pull it from github:

git init

git clone https://github.com/Data-Network-Lab/indicatore_zona_gialla.git

Then once you have cloned this repo on your machine you might execute:

if(!require(renv)){
    install.packages("renv")
}

renv::restore()

Author Disclaimer: renv is not a panacea for reproducibility, as many people believe. Reproducible projects can be made easier with this tool, which records the versions of R + R packages being used in a project and provides tools for reinstalling those packages in a project to their declared versions.

Usage

(Back to top)

The whole repository data as well as cleaning process can be utilized under the license prescription.

Contribute and Conduct

(Back to top)

Please note that the indicatore_zona_gialla project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Sponsors

(Back to top)

ko-fi

The project is sponsored by ALTEMS (Alta Scuola di Economia e Management dei Sistemi Sanitari ), but if you too are willing to support our open source work tap the button above or simply donate at the link you find in the upper right hand site of this repo (i.e. DN donations). If you are interested to be contacted for future works or collaboration please reach out at @ [email protected] for any sort of enquiry or question you might bump into.

License

(Back to top)

Please visit the LICENSE.md file.

Footer

indicatore_zona_gialla's People

Contributors

actions-user avatar niccolosalvini avatar vincnardelli avatar

Watchers

 avatar

indicatore_zona_gialla's Issues

[enhancement ] add renv

Add renv to project

Contributors: @NiccoloSalvini

Problem statement

Reproducibility of dependencies across different machines and different environments of R projects (open source contributed).

Description of proposed solution

As part of the R dependency management initiative, the {renv} package was created to provide project-specific R dependency management. This package should be a robust, stable replacement for Packrat, with fewer surprises and better default behaviors, according to the project's creator.

Existing workflows should continue to work as they did before. renv helps manage library paths (and other project-specific state) to help isolate your project's R dependencies.

Detailed description of design and implementation of proposed solution

The general workflow when working with renv is:

Call renv::init() to initialize a new project-local environment with a private R library,

Work in the project as normal, installing and removing new R packages as they are needed in the project,

Call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock),

Continue working on your project, installing and updating R packages as needed.

Call renv::snapshot() again to save the state of your project library if your attempts to update R packages were successful, or call renv::restore() to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems.

[BUG] Different updating pace within data sources

Expected Behavior

Expecting from tabella_semplice.csv (but also for each of the other \graph-data\*) 22 rows corresponding to each region (21 regions + statuto speciale + marginal Italia).

Current Behavior

Regions count is less than expected: either 20 or 18. The reason why it happens for tabella semplice as well as for all the other outputs in \graph-data\* is that url_vaccini updates regional data with a 1 day lag time. Moreover a few regions with 2 days lag time, such as: PA Bolzano and Valle D'Aosta. This causes a 2 days lag time to get all data coming from all the regions. Therefore since the algorithm takes the last 22 rows it can happen during the day that some regions are missing and a few of them are recyled from the day before.

Possible Solution

A couple of solutions:

  • Set a daily release hour for data (e.g. daily at 8 pm) after the last region updating. That causes data being open sourced only when it is complete, but in a certain sense it limits usability since prior 8pm data is useless.
  • recycle regional vaccini data from the last available date, since vaccination is not expected to be drastically changed from 1 day to the following.

Steps to Reproduce

library(reprex)
#> Warning: package 'reprex' was built under R version 4.0.5
library(readr)
#> Warning: package 'readr' was built under R version 4.0.5
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.0.5
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

output = read_csv("https://raw.githubusercontent.com/Data-Network-Lab/indicatore_zona_gialla/main/data/indicatore_stress.csv")
#> Rows: 5645 Columns: 31
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr   (1): denominazione_regione
#> dbl  (29): totale_casi, terapia_intensiva, ricoverati_con_sintomi, totale_ca...
#> date  (1): data
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

output %>% tail(22) %>%  count(denominazione_regione) 
#> # A tibble: 20 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    2
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> 11 Marche                    1
#> 12 Molise                    1
#> 13 P.A. Trento               1
#> 14 Piemonte                  1
#> 15 Puglia                    1
#> 16 Sardegna                  1
#> 17 Sicilia                   1
#> 18 Toscana                   1
#> 19 Umbria                    1
#> 20 Veneto                    2

output %>% filter(data== today()-2) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows

output %>% filter(data== today()-3) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows


anti_join(output %>% filter(data== today()-2) %>%  count(denominazione_regione),
          output %>% tail(22) %>%  count(denominazione_regione), 
          by = "denominazione_regione")
#> # A tibble: 2 x 2
#>   denominazione_regione     n
#>   <chr>                 <int>
#> 1 P.A. Bolzano              1
#> 2 Valle d'Aosta             1

Created on 2021-09-13 by the reprex package (v2.0.1)

[enhancement ] introduce httr2

Introduce httr2

httr2 instead of httr

Motivation

  • You can now create and modify a request without performing it. This means that there’s now a single function to perform the request and fetch the result: req_perform(). (If you want to handle the response as it streams in, use req_stream() instead). req_perform() replaces httr::GET(), httr::POST(), httr::DELETE(), and more.

  • HTTP errors are automatically converted into R errors. Use `req_error()' to override the defaults (which turn all 4xx and 5xx responses into errors) or to add additional details to the error message.

  • You can automatically retry if the request fails or encounters a transient HTTP error (e.g. a 429 rate limit request). req_retry() defines the maximum number of retries, which errors are transient, and how long to wait between tries.

  • OAuth support has been totally overhauled to directly support many more flows and to make it much easier to both customise the built-in flows and to create your own.

  • You can manage secrets (often needed for testing) with secret_encrypt() and friends. You can obfuscate mildly confidential data with obfuscate(), preventing it from being scraped from published code.

  • You can automatically cache all cacheable results with req_cache(). Relatively few API responses are cacheable, but when they are it typically makes a big difference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.