gpw13 / billionaire Goto Github PK

Calculate the WHO Triple Billions

Home Page: https://gpw13.github.io/billionaiRe/

License: GNU General Public License v3.0

R 100.00%

billionaire's Introduction

billionaiRe

The goal of billionaiRe is to provide an easy interface for using long format data to calculate the World Health Organization’s Triple Billions.

Installation

You can install billionaiRe from GitHub with:

remotes::install_github("gpw13/billionaiRe", build_vignettes = TRUE)

You will need to have already installed the wppdistro and whdh packages, which is stored in a private repo and only made public upon request from valid WHO users. Please contact [email protected] to request access.

Calculations

The package is built around a set of functions that calculate the Billions for the three Billions separately:

Healthier Populations (HPOP)
Health Emergencies Protection (HEP)
Universal Health Coverage (UHC)

HPOP Billion calculation

To calculate the HPOP Billion, there are a series of functions made available through the billionaiRe package:

transform_hpop_data() to transform raw values into normalized values used within the calculations.
add_hpop_populations() to get relevant population groups for each country and indicator.
calculate_hpop_contributions() to calculate indicator level changes and contributions to the Billion.
calculate_hpop_billion() to calculate indicator level changes, country-level Billion, adjusting for double counting, and all contributions.

Run in sequence, these can calculate the entire HPOP Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the sample fake HPOP data provided in the package, hpop_df.

library(billionaiRe)

hpop_df %>%
  transform_hpop_data() %>%
  add_hpop_populations() %>%
  calculate_hpop_billion() %>%
  dplyr::filter(stringr::str_detect(ind, "hpop_healthier"))
#> # A tibble: 6 × 10
#>   iso3   year ind            value type  trans…¹ popul…² contr…³ contr…⁴ contr…⁵
#>   <chr> <dbl> <chr>          <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7  2.72e7    61.2      NA
#> 2 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -3.84e7   -86.3      NA
#> 3 AFG    2023 hpop_healthier    NA <NA>       NA  4.45e7 -1.12e7   -25.1      NA
#> 4 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7  3.20e7    71.9      NA
#> 5 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -7.46e7  -168.       NA
#> 6 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -4.26e7   -95.7      NA
#> # … with abbreviated variable names ¹transform_value, ²population,
#> #   ³contribution, ⁴contribution_percent, ⁵contribution_percent_total_pop

UHC Billion calculation

To calculate the UHC Billion, there are a series of functions made available through the billionaiRe package:

transform_uhc_data() to transform raw values into normalized values used within the calculations.
calculate_uhc_billion() to calculate average service coverage, financial hardship, and the UHC single measure for each country and year in the data frame..
calculate_uhc_contribution() to calculate country-level Billion for specified beginning and end year.

Run in sequence, these can calculate the entire UHC Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the the sample fake UHC data provided in the package, uhc_df.

library(billionaiRe)

uhc_df %>%
  transform_uhc_data(end_year = 2023) %>%
  calculate_uhc_billion() %>%
  calculate_uhc_contribution(end_year = 2023, pop_year = 2023) %>% 
  dplyr::filter(ind %in% c("uhc_sm", "asc", "fh"),
                year == 2023)
#> # A tibble: 3 × 9
#>   iso3   year ind    value type      transform_value source      contr…¹ contr…²
#>   <chr> <dbl> <chr>  <dbl> <chr>               <dbl> <chr>         <dbl>   <dbl>
#> 1 AFG    2023 fh      25.4 Projected            74.6 <NA>        -3.00e6  -7.11 
#> 2 AFG    2023 asc     45.3 projected            45.3 WHO DDI ca…  1.72e6   4.06 
#> 3 AFG    2023 uhc_sm  33.8 projected            33.8 WHO DDI ca…  4.41e4   0.104
#> # … with abbreviated variable names ¹contribution, ²contribution_percent

HEP Billion calculation

To calculate the HEP Billion, there are a series of functions made available through the billionaiRe package:

transform_hep_data() to transform raw values into normalized values used within the calculations. For now, this is primarily calculating the total prevent numerators and denominators for campaign and routine data.
calculate_hep_components() to calculate component indicators (Prevent coverages), the HEP index, and levels for all components.
calculate_hep_billion() to calculate the change for the three HEP components (DNR, Prepare, and Prevent), their contribution to the Billion, and overall HEPI change and contribution.

Run in sequence, these can calculate the entire HEP Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the sample fake HEP data provided in the package, hep_df.

library(billionaiRe)

hep_df %>%
  transform_hep_data() %>%
  calculate_hep_components() %>%
  calculate_hep_billion(end_year = 2023) %>%
  dplyr::filter(ind %in% c("prevent",
                           "espar",
                           "detect_respond",
                           "hep_idx"),
                year == 2023)
#> # A tibble: 4 × 12
#>   iso3   year ind       value type  source trans…¹ use_d…² use_c…³ level contr…⁴
#>   <chr> <dbl> <chr>     <dbl> <chr> <chr>    <dbl> <lgl>   <lgl>   <dbl>   <dbl>
#> 1 AFG    2023 espar      51.2 Proj… <NA>      51.2 NA      NA          3  5.00e6
#> 2 AFG    2023 detect_r…  91   Proj… <NA>      91   NA      NA          5  2.23e6
#> 3 AFG    2023 prevent    NA   proj… Unite…   100   NA      NA          5  0     
#> 4 AFG    2023 hep_idx    NA   proj… WHO D…    80.7 NA      NA          4  7.22e6
#> # … with 1 more variable: contribution_percent <dbl>, and abbreviated variable
#> #   names ¹transform_value, ²use_dash, ³use_calc, ⁴contribution

Scenarios

In the Triple Billions and the billionaiRe package context, scenarios are understood as alternative, plausible, description of how the future may develop based on a set of defined assumptions.

Scenarios must:

Be tidy: each row is a unique combination of iso3 country-code, year, indicator, and scenario (if relevant).
Contain all and only the data that is strictly needed for calculations with billionaiRe

Four main sets of scenarios can be identified, from the most basic to the more complex:

Basic scenarios: Those scenarios are the building blocks of the other scenarios, but they can also be called on their own. They can reach a fixed target at a specified year (scenario_fixed_target()), follow a specific rate of change (scenario_aroc()), etc. See Basic scenario for more details.
Target scenarios: Target-based scenarios apply indicator-specific targets or trajectories to be reached by a specific date (e.g. Sustainable Development Goal (SDG) targets (called sdg scenarios)).
Benchmarking scenarios: Benchmarking scenarios compare performance of countries given various grouping and aim at a specified sub-set of best performing countries.
Mixed scenarios: Developped with indicator-level subject matter knowledge to pick realistic improvement scenarios (e.g. acceleration scenarios)

If billionaiRe require data that is missing in the scenario, they will be recycled from other scenarios (see Data recycling).

See Scenarios vignette for more details.

Quick start on billionaiRe scenarios

add_scenario() is the entry point function to all other scenario functions. It essentially allow to pass a typical billionaiRe data frame (df) and apply the selected scenario function.

library(billionaiRe)

df <- tibble::tibble(
    value = 60:80,
    year = 2010:2030,
    ind = "pm25",
    type = "reported",
    iso3 = "AFG",
    scenario = "default",
    source = NA_character_
  ) %>%
    dplyr::mutate(scenario = dplyr::case_when(
      year > 2021 ~ "historical",
      TRUE ~ scenario
    ),
    type = dplyr::case_when(
      year > 2021 ~ "projected",
      TRUE ~ type
    ))

The choice of scenario function to apply to the df is done through the scenario_function parameter. Additional parameters can be passed through the ellipsis (...).

For instance, to halt the rise to the 2010 value by end_year (2025 by default), we can apply a simple function to df. This will apply the halt_rise function to all unique combination of country and indicator. In this case, there is just one combination:

df %>%
  add_scenario(
    scenario_function = "halt_rise",
    baseline_year = 2010
  )

To apply the SDG targets, we use the sdg:

df %>%
  add_scenario(
    scenario_function = "sdg"
  )

By default, the scenarios start from the last reported or estimated value in the default scenario. This can be bypassed by setting start_scenario_last_default to FALSE. The scenario will then start at start_year (2018 by default):

df %>%
  add_scenario(
    scenario_function = "sdg",
    start_scenario_last_default = FALSE
  )

billionaire's People

Contributors

Stargazers

Watchers

Forkers

alicerobson elliottmess v-a-s-a henriquetguedes

billionaire's Issues

trim_values trims even when scenario is doing worst than values

When values are better than col, values should be kept, but are not.

Speed up add_hpop_populations()

add_hpop_populations is rather slow, and given it's frequent use, it would be worth improving its performance.

calculate_hep_billion does not return levels for hep_idx and espar

hep_idx and espar should have levels following methods report pages 50 and 37 respectively.

Think through Niger and Somalia SDI ratio values

Given their extremely low SDIs, Niger and Somalia have currently been fudged to have the same SDI ratio as Chad. Need to think through if this is best or an alternative way to set the SDI for them.

Move UHC 'asc' to 'prop_acc_ess_serv' adjustment to billionaiRe

Move UHC 'asc' to 'prop_acc_ess_serv' adjustment to billionaiRe rather than doing as a post hoc adjustment

Untransform functions

Add functions to untransform Billions data.

`value_col` and `transform_value_col` supports fully only one value

value_col and transform_value_col supports fully only one value. When more values are passed ( e.g. "value", "lower", "upper"), support of all functions are not guarenteed. Support of this functionnality should be offered and implemented throughout the package's functions.

This would also mean reviewing some assert_ assumptions on the data quality (assert_data_calculation_hep(), etc.) and presence (e.g. assert_iso3_not_empty(), xmart_col_types(), xmart_cols(), etc.) to allow for value_col with only NAs as values should not be always expected.

calculate_uhc_billion returns `projected` and `estimated` `type` based on projected_year parameter

calculate_uhc_billion returns projected and estimated type based on projected_year parameter for calculated indicators. Defaults to 2020, but maybe should take underlying data type ?

Write vignette for glossary of terms used in billionaiRe

Would help to document in one place what is meant by:

projected
imputed
reported
estimated

Unit testing is failing

Since commit a30d95e, the unit testing is failing.

One of the reason is that the population figures stored in the wppdistro package have been updated, but the billionaiRe:::basic_test_calculated testing data frame has not. There is then a mismatch for the billions' contributions calculated as well as the indicator that uses populations (e.g. cholera).

Replace `*_col` arguments with a single mapping

Instead of having multiple arguments like ind_col = "ind", we could continue to provide the ability to have non-standard names for columns by simply moving these mapping to a single col_names or col_mappings argument that is NULL by default (meaning we expect the standard column names).

The user could then provide a named vector/list to provide a mapping of non-standard column names to the standard names. This could then be passed to dplyr::rename() to maintain the same functionality without the overhead of maintaining a long list of arguments of this type.

Data recycling should not happen on the base scenarios

recycle_data should not recycle the routine, reference_infilling and covid_shock scenarios.

Change scenario_best_of to also compute the scenarios instead of just comparing them

It would be great and make our code far less verbose if scenario_best_of would compute the different scenarios alongside comparing them. I'm thinking we could pass in three different lists as arguments:

a list of the scenario functions we want to call,
a list with the corresponding data frames to use for each function (this could potentially just be moved into the list of arguments),
a list with the arguments to pass to each scenario function (this would thus be a list of lists)

And it would then compute everything.

@ElliottMess Thoughts?

Add faster implementation of calculate_uhc_billion

@v-a-s-a implemented a faster version of the UHC billion calculation, we should move it here.

extrapolate_campaign_vector carries values pass the bound fixed in n

For instance, if meningitis campaign numerator are as follow:

library(tidyverse)
campaign_num <- tibble(
   meningitis_campaign_num= c(11133831, rep(NA,5),3956618, rep(NA, 7)),
   year = 2010:2023)

As meningitis vaccination has a 10 years validity, we would expect the values to be carried for 10 years, as the parameter n passed extrapolate_campaign_vector in transform_prev_cmpgn_data suggests. The above data should then look something like meningitis_campaign_num_expected:

campaign_num_expected_data <- campaign_num %>%
   mutate(meningitis_campaign_num_expected = c(rep(11133831, 6), rep(11133831+ 3956618, 5), rep(3956618, 3)),
                meningitis_campaign_num_billionaiRe = billionaiRe:::extrapolate_campaign_vector(meningitis_campaign_num, 10))

fh is not transformed properly

fh is not inverted by transform_uhc_data when it should be. This has no impact on calculations as calculate_uhc_billion inverts it before use.

For clarity in the code, the indicator should be inverted in transform_uhc_billion and calculate_uhc_billion should be modified.

Unable to install billionaiRe package using remotes

The code below results in a HTTP error 404.

remotes::install_github("gpw13/billionaiRe", build_vignettes = TRUE)

Transient failures of WHDH downloads/uploads when ingesting all data

Azure Storage operations return HTTP 404 and 500 codes when running all ingestion scripts. Failures are transient, and the scripts can be rerun manually to completion. This occurs both in the automated AML runs, as well as in my local environment.

transform_hep_data returns 'projected' as type when infilling, even when in the past

When calling transform_hep_data on a dataset with missing values for some years, transform_hep_data infills the data. However, it infills the data with the type projected, even when the data is in the past. It should be 'infilled' instead, marking the source as DDI.

load_billion_data introduced by PR #39 is not retro-compatible

The new WHDH powered load_billion_data introduced by PR #39 is not compatible with the previous version. This means that all calls to the function in old call with fail as the tables are not available anymore. There is a fundamental data compability issue that renders the changes difficult to retro-fit. So for now, a _legacy functions allows to get the data from xMart.

The situation could be reversed and have the _whdh function with a possible lifecylce notification on both functions to indicate that the xmart version will be retired.

@alicerobson what do you think?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.