data-network-lab / indicatore_zona_gialla Goto Github PK

Data Network x ALTEMS (Alta Scuola di Economia e Management Sistemi Sanitari)

License: MIT License

R 100.00%

indicatore_zona_gialla's Introduction

Here there are a few topics covered within this repo. A more indeep explanation of resuls and methodologies as well as the tech stack is offered in the project documentation under major revision

Description
Installation
Usage
Contribute and Conduct
Sponsors
License

Description

(Back to top)

The project is articulated into 3 parts:

set up a new KPI i.e. Indicatore di Stress Regionale
build ab ETL (Extraction, Trasnformation & Loading) pipeline sourcing and merging (then computing Indicatore di Stress) 3 data Sources
visualize & design a frontend with [DataWrapper])(https://www.datawrapper.de/) based on aggregated data

the newly created Indicatore di Stress will be measuring the overall stress undertaken by the SSN (Servizio Sanitario Nazionale e.g. italian NHS) which results in a combination of the Vaccination and the Incidence effects computed per region. Both of the 2 dimensions can reflect hospital Saturation and define criteria according to which regions are labelled as either yellow, orange or red zones (Zona Gialla, Zona Arancione e Zona Rossa), in compliance with the most recent orders (D.L. 52/2021). This presumably will help decision makers have a neat undestranding of the current situation and an updated perspective on future scenarios.

The etl pipeline joins and cleans up 3 data sources:

Data is extracted, cleanded and aggregated every 6 hours. At the end of the pipeline within the ./data folder the output file is written in a .csv format. Methodology and software choices are outlined in the documentation which is currently under major revision

Visualizations (barcharts, arrowplots, custom tables and more) are made with Datawrapper whose data source points to the urls at ./data/data-graph.

Installation

(Back to top)

You might have noticed the Back to top button(if not, please notice, it’s right there above!). This is a good idea because it makes this README easy to navigate. If you are willing to install this project on your machine the recommendend choice is to git pull it from github:

git init

git clone https://github.com/Data-Network-Lab/indicatore_zona_gialla.git

Then once you have cloned this repo on your machine you might execute:

if(!require(renv)){
    install.packages("renv")
}

renv::restore()

Author Disclaimer: renv is not a panacea for reproducibility, as many people believe. Reproducible projects can be made easier with this tool, which records the versions of R + R packages being used in a project and provides tools for reinstalling those packages in a project to their declared versions.

Usage

(Back to top)

The whole repository data as well as cleaning process can be utilized under the license prescription.

Contribute and Conduct

(Back to top)

Please note that the indicatore_zona_gialla project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

(Back to top)

Please visit the LICENSE.md file.

indicatore_zona_gialla's People

Contributors

Watchers

indicatore_zona_gialla's Issues

[enhancement ] add renv

Add renv to project

Contributors: @NiccoloSalvini

Problem statement

Reproducibility of dependencies across different machines and different environments of R projects (open source contributed).

Description of proposed solution

As part of the R dependency management initiative, the {renv} package was created to provide project-specific R dependency management. This package should be a robust, stable replacement for Packrat, with fewer surprises and better default behaviors, according to the project's creator.

Existing workflows should continue to work as they did before. renv helps manage library paths (and other project-specific state) to help isolate your project's R dependencies.

Detailed description of design and implementation of proposed solution

The general workflow when working with renv is:

Call renv::init() to initialize a new project-local environment with a private R library,

Work in the project as normal, installing and removing new R packages as they are needed in the project,

Call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock),

Continue working on your project, installing and updating R packages as needed.

Call renv::snapshot() again to save the state of your project library if your attempts to update R packages were successful, or call renv::restore() to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems.

[BUG] Different updating pace within data sources

Expected Behavior

Expecting from tabella_semplice.csv (but also for each of the other \graph-data\*) 22 rows corresponding to each region (21 regions + statuto speciale + marginal Italia).

Current Behavior

Regions count is less than expected: either 20 or 18. The reason why it happens for tabella semplice as well as for all the other outputs in \graph-data\* is that url_vaccini updates regional data with a 1 day lag time. Moreover a few regions with 2 days lag time, such as: PA Bolzano and Valle D'Aosta. This causes a 2 days lag time to get all data coming from all the regions. Therefore since the algorithm takes the last 22 rows it can happen during the day that some regions are missing and a few of them are recyled from the day before.

Possible Solution

A couple of solutions:

Set a daily release hour for data (e.g. daily at 8 pm) after the last region updating. That causes data being open sourced only when it is complete, but in a certain sense it limits usability since prior 8pm data is useless.
recycle regional vaccini data from the last available date, since vaccination is not expected to be drastically changed from 1 day to the following.

Steps to Reproduce

library(reprex)
#> Warning: package 'reprex' was built under R version 4.0.5
library(readr)
#> Warning: package 'readr' was built under R version 4.0.5
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.0.5
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

output = read_csv("https://raw.githubusercontent.com/Data-Network-Lab/indicatore_zona_gialla/main/data/indicatore_stress.csv")
#> Rows: 5645 Columns: 31
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr   (1): denominazione_regione
#> dbl  (29): totale_casi, terapia_intensiva, ricoverati_con_sintomi, totale_ca...
#> date  (1): data
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.

output %>% tail(22) %>%  count(denominazione_regione) 
#> # A tibble: 20 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    2
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> 11 Marche                    1
#> 12 Molise                    1
#> 13 P.A. Trento               1
#> 14 Piemonte                  1
#> 15 Puglia                    1
#> 16 Sardegna                  1
#> 17 Sicilia                   1
#> 18 Toscana                   1
#> 19 Umbria                    1
#> 20 Veneto                    2

output %>% filter(data== today()-2) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows

output %>% filter(data== today()-3) %>%  count(denominazione_regione)
#> # A tibble: 22 x 2
#>    denominazione_regione     n
#>    <chr>                 <int>
#>  1 Abruzzo                   1
#>  2 Basilicata                1
#>  3 Calabria                  1
#>  4 Campania                  1
#>  5 Emilia-Romagna            1
#>  6 Friuli Venezia Giulia     1
#>  7 Italia                    1
#>  8 Lazio                     1
#>  9 Liguria                   1
#> 10 Lombardia                 1
#> # ... with 12 more rows


anti_join(output %>% filter(data== today()-2) %>%  count(denominazione_regione),
          output %>% tail(22) %>%  count(denominazione_regione), 
          by = "denominazione_regione")
#> # A tibble: 2 x 2
#>   denominazione_regione     n
#>   <chr>                 <int>
#> 1 P.A. Bolzano              1
#> 2 Valle d'Aosta             1

^{Created on 2021-09-13 by the reprex package (v2.0.1)}

[enhancement ] introduce httr2

Introduce `httr2`

httr2 instead of httr

Motivation

You can now create and modify a request without performing it. This means that there’s now a single function to perform the request and fetch the result: req_perform(). (If you want to handle the response as it streams in, use req_stream() instead). req_perform() replaces httr::GET(), httr::POST(), httr::DELETE(), and more.
HTTP errors are automatically converted into R errors. Use `req_error()' to override the defaults (which turn all 4xx and 5xx responses into errors) or to add additional details to the error message.
You can automatically retry if the request fails or encounters a transient HTTP error (e.g. a 429 rate limit request). req_retry() defines the maximum number of retries, which errors are transient, and how long to wait between tries.
OAuth support has been totally overhauled to directly support many more flows and to make it much easier to both customise the built-in flows and to create your own.
You can manage secrets (often needed for testing) with secret_encrypt() and friends. You can obfuscate mildly confidential data with obfuscate(), preventing it from being scraped from published code.
You can automatically cache all cacheable results with req_cache(). Relatively few API responses are cacheable, but when they are it typically makes a big difference.

data-network-lab / indicatore_zona_gialla Goto Github PK

indicatore_zona_gialla's Introduction

Description

Installation

Usage

Contribute and Conduct

Sponsors

License

indicatore_zona_gialla's People

Contributors

Watchers

indicatore_zona_gialla's Issues

Add renv to project

Problem statement

Description of proposed solution

Detailed description of design and implementation of proposed solution

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Introduce httr2

Motivation

Recommend Projects

Recommend Topics

Recommend Org

Introduce `httr2`