Coder Social home page Coder Social logo

billionaire's Introduction

billionaiRe

R build status

The goal of billionaiRe is to provide an easy interface for using long format data to calculate the World Health Organization’s Triple Billions.

Installation

You can install billionaiRe from GitHub with:

remotes::install_github("gpw13/billionaiRe", build_vignettes = TRUE)

You will need to have already installed the wppdistro and whdh packages, which is stored in a private repo and only made public upon request from valid WHO users. Please contact [email protected] to request access.

Calculations

The package is built around a set of functions that calculate the Billions for the three Billions separately:

  • Healthier Populations (HPOP)
  • Health Emergencies Protection (HEP)
  • Universal Health Coverage (UHC)

HPOP Billion calculation

To calculate the HPOP Billion, there are a series of functions made available through the billionaiRe package:

  • transform_hpop_data() to transform raw values into normalized values used within the calculations.
  • add_hpop_populations() to get relevant population groups for each country and indicator.
  • calculate_hpop_contributions() to calculate indicator level changes and contributions to the Billion.
  • calculate_hpop_billion() to calculate indicator level changes, country-level Billion, adjusting for double counting, and all contributions.

Run in sequence, these can calculate the entire HPOP Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the sample fake HPOP data provided in the package, hpop_df.

library(billionaiRe)

hpop_df %>%
  transform_hpop_data() %>%
  add_hpop_populations() %>%
  calculate_hpop_billion() %>%
  dplyr::filter(stringr::str_detect(ind, "hpop_healthier"))
#> # A tibble: 6 × 10
#>   iso3   year ind            value type  trans…¹ popul…² contr…³ contr…⁴ contr…⁵
#>   <chr> <dbl> <chr>          <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7  2.72e7    61.2      NA
#> 2 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -3.84e7   -86.3      NA
#> 3 AFG    2023 hpop_healthier    NA <NA>       NA  4.45e7 -1.12e7   -25.1      NA
#> 4 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7  3.20e7    71.9      NA
#> 5 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -7.46e7  -168.       NA
#> 6 AFG    2023 hpop_healthie…    NA <NA>       NA  4.45e7 -4.26e7   -95.7      NA
#> # … with abbreviated variable names ¹​transform_value, ²​population,
#> #   ³​contribution, ⁴​contribution_percent, ⁵​contribution_percent_total_pop

UHC Billion calculation

To calculate the UHC Billion, there are a series of functions made available through the billionaiRe package:

  • transform_uhc_data() to transform raw values into normalized values used within the calculations.
  • calculate_uhc_billion() to calculate average service coverage, financial hardship, and the UHC single measure for each country and year in the data frame..
  • calculate_uhc_contribution() to calculate country-level Billion for specified beginning and end year.

Run in sequence, these can calculate the entire UHC Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the the sample fake UHC data provided in the package, uhc_df.

library(billionaiRe)

uhc_df %>%
  transform_uhc_data(end_year = 2023) %>%
  calculate_uhc_billion() %>%
  calculate_uhc_contribution(end_year = 2023, pop_year = 2023) %>% 
  dplyr::filter(ind %in% c("uhc_sm", "asc", "fh"),
                year == 2023)
#> # A tibble: 3 × 9
#>   iso3   year ind    value type      transform_value source      contr…¹ contr…²
#>   <chr> <dbl> <chr>  <dbl> <chr>               <dbl> <chr>         <dbl>   <dbl>
#> 1 AFG    2023 fh      25.4 Projected            74.6 <NA>        -3.00e6  -7.11 
#> 2 AFG    2023 asc     45.3 projected            45.3 WHO DDI ca…  1.72e6   4.06 
#> 3 AFG    2023 uhc_sm  33.8 projected            33.8 WHO DDI ca…  4.41e4   0.104
#> # … with abbreviated variable names ¹​contribution, ²​contribution_percent

HEP Billion calculation

To calculate the HEP Billion, there are a series of functions made available through the billionaiRe package:

  • transform_hep_data() to transform raw values into normalized values used within the calculations. For now, this is primarily calculating the total prevent numerators and denominators for campaign and routine data.
  • calculate_hep_components() to calculate component indicators (Prevent coverages), the HEP index, and levels for all components.
  • calculate_hep_billion() to calculate the change for the three HEP components (DNR, Prepare, and Prevent), their contribution to the Billion, and overall HEPI change and contribution.

Run in sequence, these can calculate the entire HEP Billion, or they can be run separately to produce different outputs as required. Details on the inputs of each function are available in their individual documentation, but below you can see the quick and easy Billions calculation done using the sample fake HEP data provided in the package, hep_df.

library(billionaiRe)

hep_df %>%
  transform_hep_data() %>%
  calculate_hep_components() %>%
  calculate_hep_billion(end_year = 2023) %>%
  dplyr::filter(ind %in% c("prevent",
                           "espar",
                           "detect_respond",
                           "hep_idx"),
                year == 2023)
#> # A tibble: 4 × 12
#>   iso3   year ind       value type  source trans…¹ use_d…² use_c…³ level contr…⁴
#>   <chr> <dbl> <chr>     <dbl> <chr> <chr>    <dbl> <lgl>   <lgl>   <dbl>   <dbl>
#> 1 AFG    2023 espar      51.2 Proj… <NA>      51.2 NA      NA          3  5.00e6
#> 2 AFG    2023 detect_r…  91   Proj… <NA>      91   NA      NA          5  2.23e6
#> 3 AFG    2023 prevent    NA   proj… Unite…   100   NA      NA          5  0     
#> 4 AFG    2023 hep_idx    NA   proj… WHO D…    80.7 NA      NA          4  7.22e6
#> # … with 1 more variable: contribution_percent <dbl>, and abbreviated variable
#> #   names ¹​transform_value, ²​use_dash, ³​use_calc, ⁴​contribution

Scenarios

In the Triple Billions and the billionaiRe package context, scenarios are understood as alternative, plausible, description of how the future may develop based on a set of defined assumptions.

Scenarios must:

  1. Be tidy: each row is a unique combination of iso3 country-code, year, indicator, and scenario (if relevant).
  2. Contain all and only the data that is strictly needed for calculations with billionaiRe

Four main sets of scenarios can be identified, from the most basic to the more complex:

  1. Basic scenarios: Those scenarios are the building blocks of the other scenarios, but they can also be called on their own. They can reach a fixed target at a specified year (scenario_fixed_target()), follow a specific rate of change (scenario_aroc()), etc. See Basic scenario for more details.
  2. Target scenarios: Target-based scenarios apply indicator-specific targets or trajectories to be reached by a specific date (e.g. Sustainable Development Goal (SDG) targets (called sdg scenarios)).
  3. Benchmarking scenarios: Benchmarking scenarios compare performance of countries given various grouping and aim at a specified sub-set of best performing countries.
  4. Mixed scenarios: Developped with indicator-level subject matter knowledge to pick realistic improvement scenarios (e.g. acceleration scenarios)

If billionaiRe require data that is missing in the scenario, they will be recycled from other scenarios (see Data recycling).

See Scenarios vignette for more details.

Quick start on billionaiRe scenarios

add_scenario() is the entry point function to all other scenario functions. It essentially allow to pass a typical billionaiRe data frame (df) and apply the selected scenario function.

library(billionaiRe)

df <- tibble::tibble(
    value = 60:80,
    year = 2010:2030,
    ind = "pm25",
    type = "reported",
    iso3 = "AFG",
    scenario = "default",
    source = NA_character_
  ) %>%
    dplyr::mutate(scenario = dplyr::case_when(
      year > 2021 ~ "historical",
      TRUE ~ scenario
    ),
    type = dplyr::case_when(
      year > 2021 ~ "projected",
      TRUE ~ type
    ))

The choice of scenario function to apply to the df is done through the scenario_function parameter. Additional parameters can be passed through the ellipsis (...).

For instance, to halt the rise to the 2010 value by end_year (2025 by default), we can apply a simple function to df. This will apply the halt_rise function to all unique combination of country and indicator. In this case, there is just one combination:

df %>%
  add_scenario(
    scenario_function = "halt_rise",
    baseline_year = 2010
  )

To apply the SDG targets, we use the sdg:

df %>%
  add_scenario(
    scenario_function = "sdg"
  )

By default, the scenarios start from the last reported or estimated value in the default scenario. This can be bypassed by setting start_scenario_last_default to FALSE. The scenario will then start at start_year (2018 by default):

df %>%
  add_scenario(
    scenario_function = "sdg",
    start_scenario_last_default = FALSE
  )

billionaire's People

Contributors

elliottmess avatar caldwellst avatar mjkanji avatar v-a-s-a avatar alicerobson avatar chacalle avatar

Stargazers

Ginsky avatar José Miguel Diniz,  avatar Henrique Vasconcelos avatar NeoMarS avatar  avatar Giulia Ruggeri avatar Nurzhan Mukashev avatar

Watchers

James Cloos avatar  avatar  avatar

billionaire's Issues

Think through Niger and Somalia SDI ratio values

Given their extremely low SDIs, Niger and Somalia have currently been fudged to have the same SDI ratio as Chad. Need to think through if this is best or an alternative way to set the SDI for them.

`value_col` and `transform_value_col` supports fully only one value

value_col and transform_value_col supports fully only one value. When more values are passed ( e.g. "value", "lower", "upper"), support of all functions are not guarenteed. Support of this functionnality should be offered and implemented throughout the package's functions.

This would also mean reviewing some assert_ assumptions on the data quality (assert_data_calculation_hep(), etc.) and presence (e.g. assert_iso3_not_empty(), xmart_col_types(), xmart_cols(), etc.) to allow for value_col with only NAs as values should not be always expected.

Unit testing is failing

Since commit a30d95e, the unit testing is failing.

One of the reason is that the population figures stored in the wppdistro package have been updated, but the billionaiRe:::basic_test_calculated testing data frame has not. There is then a mismatch for the billions' contributions calculated as well as the indicator that uses populations (e.g. cholera).

Replace `*_col` arguments with a single mapping

Instead of having multiple arguments like ind_col = "ind", we could continue to provide the ability to have non-standard names for columns by simply moving these mapping to a single col_names or col_mappings argument that is NULL by default (meaning we expect the standard column names).

The user could then provide a named vector/list to provide a mapping of non-standard column names to the standard names. This could then be passed to dplyr::rename() to maintain the same functionality without the overhead of maintaining a long list of arguments of this type.

Change scenario_best_of to also compute the scenarios instead of just comparing them

It would be great and make our code far less verbose if scenario_best_of would compute the different scenarios alongside comparing them. I'm thinking we could pass in three different lists as arguments:

  • a list of the scenario functions we want to call,
  • a list with the corresponding data frames to use for each function (this could potentially just be moved into the list of arguments),
  • a list with the arguments to pass to each scenario function (this would thus be a list of lists)

And it would then compute everything.

@ElliottMess Thoughts?

extrapolate_campaign_vector carries values pass the bound fixed in n

For instance, if meningitis campaign numerator are as follow:

library(tidyverse)
campaign_num <- tibble(
   meningitis_campaign_num= c(11133831, rep(NA,5),3956618, rep(NA, 7)),
   year = 2010:2023) 

As meningitis vaccination has a 10 years validity, we would expect the values to be carried for 10 years, as the parameter n passed extrapolate_campaign_vector in transform_prev_cmpgn_data suggests. The above data should then look something like meningitis_campaign_num_expected:

campaign_num_expected_data <- campaign_num %>%
   mutate(meningitis_campaign_num_expected = c(rep(11133831, 6), rep(11133831+ 3956618, 5), rep(3956618, 3)),
                meningitis_campaign_num_billionaiRe = billionaiRe:::extrapolate_campaign_vector(meningitis_campaign_num, 10))

fh is not transformed properly

fh is not inverted by transform_uhc_data when it should be. This has no impact on calculations as calculate_uhc_billion inverts it before use.

For clarity in the code, the indicator should be inverted in transform_uhc_billion and calculate_uhc_billion should be modified.

load_billion_data introduced by PR #39 is not retro-compatible

The new WHDH powered load_billion_data introduced by PR #39 is not compatible with the previous version. This means that all calls to the function in old call with fail as the tables are not available anymore. There is a fundamental data compability issue that renders the changes difficult to retro-fit. So for now, a _legacy functions allows to get the data from xMart.

The situation could be reversed and have the _whdh function with a possible lifecylce notification on both functions to indicate that the xmart version will be retired.

@alicerobson what do you think?

Fix double counting correction for IPV

IPV indicator is double counting correcting for both male and female populations when instead it should only remove double counting for the female population column.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.