spsanderson / healthyr.data Goto Github PK

View Code? Open in Web Editor NEW

6.0 3.0 3.0 20.61 MB

Data sets for the healthyR package.

Home Page: https://www.spsanderson.com/healthyR.data/

License: Other

R 100.00%

data data-sets data-science r rstats healthcare healthcare-application healthcare-datasets healthcare-analysis

healthyr.data's Introduction

Hi there, I'm Steve I spend a lot of time in R and here is my github resume

I use these:

I authored and maintain these:

I have co-authored these:

about_me <- list(
  name            = "Sanderson"
  , first_name    = "Steven"
  , occupation    = "Manager of Applications"
  , employer      = "Stony Brook Medicine"
  , fav_prog_lang = "R"
  , languages     = list("R", "SQL", "Python (working on it)")
  , my_r_packages = list("healthyverse", "healthyR", "healthyR.data", 
                    "healthyR.ts", "healthyR.ai", "TidyDensity",
                    "tidyAML")
  , organizations = list("https://github.com/Koffi-Fredysessie")
  , orcid         = list("https://orcid.org/my-orcid?orcid=0009-0006-7661-8247")
  , org_pkgs      = list("BRVM")
  , website       = "https://www.spsanderson.com/"
  , blog          = "https://www.spsanderson.com/steveondata/"
  , mastadon      = "https://mstdn.social/@stevensanderson"
)

🔗 I am also on:

Mastodon

I am currently working on the R packages mentioned above and I use my package-downloads repository to keep track of the stats of those packages. I'm looking to collaborate on all or any of them and pull requests are welcome! I can be reached at spsanderson at gmail.com or via my LinkedIn page. Here is a link to my personal site: Steve's Homepage

I have also written a book with my co-author David Kun entitled: "Extending Excel with Python and R"

Package Stats

{healthyR}

{healthyR.ts}

{healthyR.ai}

{healthyR.data}

{healthyverse}

{TidyDensity}

{tidyAML}

{The 'spsanderson' universe} Click Here!

GitHub Stats

Extended Stats

Most Used Languages

healthyr.data's People

Contributors

Stargazers

Watchers

Forkers

minghao2016 federman rjake

healthyr.data's Issues

Release healthyR.data 1.0.1

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version()

New Function `current_hcahps_data()`

Add function `current_comp_death_data()`

Function:

#' Get Current Complications and Death Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current complications and death data.
#'
#' @details This function will obtain the current Complications and Death data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like,
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is: c("Facility","State","National"), which
#' will bring back all of the data sets that are in the Ambulatory Surgery Center
#' OAS CAHPS data sets. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' You can also pass things like c("state","Nation") as behind the scenes only
#' the Complicaitons and Death data sets are available to the function to choose
#' from and `grep` is used to find matches with `ignore.case = TRUE` set.
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_comp_death_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current Complications and Death data from the current hospital data file.
#'
#' @name current_comp_death_data
NULL

#' @export
#' @rdname current_comp_death_data

current_comp_death_data <- function(.data,
                                    .data_sets = c("Facility","State","National")) {
  
  # Variables
  ds <- .data_sets
  l <- .data
  
  # Checks
  if (!inherits(l, "current_hosp_data")){
    rlang::abort(
      message = "'.data' must come from the function 'current_hosp_data()",
      use_cli_format = TRUE
    )
  }
  
  # Manipulations
  # Get the exact files necessary to the ASC
  file_names_vec <- c("Complications_and_Deaths_Hospital.csv",
                      "Complications_and_Deaths_National.csv",
                      "Complications_and_Deaths_State.csv")
  
  asc_list <- l[names(l) %in% file_names_vec]
  names(asc_list)[1] <-  "Complications_and_Deaths_Facility.csv"
  
  # Make sure there are no 0 length items
  asc_list <- asc_list[lengths(asc_list) > 0]
  
  # Only keep the names we want
  ret <- asc_list[grep(
    paste(ds, collapse = "|"),
    names(asc_list),
    ignore.case = TRUE
  )]
  
  # Return\
  attr(ret, ".list_type") <- "current_comp_death_list"
  class(ret) <- c("current_comp_death_list", class(ret))
  
  return(ret)
  
}

New Function `current_opqr_data()`

Make a function to get the current medicare spend data current_opqr_data()

More data

IPFQR Current

Get current_ipfqr_data()

Get CMS Meta Data

Sample Code:

url <- "https://data.cms.gov/data.json"

data_sets <- httr2::request(url) |>
  httr2::req_perform() |>
  httr2::resp_body_json(check_type = FALSE, simplifyVector = TRUE)

data_tbl <- data_sets$dataset |>
  dplyr::tibble() |>
  dplyr::select(
    title, description, landingPage,
    modified, keyword, description,
    describedBy, contactPoint, identifier,
    temporal, references, distribution
  ) |>
  tidyr::unnest(cols = distribution, names_sep = "_") |>
  tidyr::unnest(cols = c(keyword, contactPoint, references)) |>
  janitor::clean_names() |>
  dplyr::select(-type, -distribution_type) |>
  dplyr::mutate(media_type = ifelse(is.na(distribution_format),
    distribution_media_type,
    distribution_format
  )) |>
  dplyr::mutate(data_link = ifelse(is.na(distribution_access_url),
    distribution_download_url,
    distribution_access_url
  )) |>
  dplyr::mutate(has_email = stringr::str_remove(has_email, "mailto:")) |>
  tidyr::separate(temporal,
    into = c("start", "end"), sep = "/",
    remove = TRUE
  ) |>
  tidyr::separate(distribution_temporal,
    into = c("distribution_start", "distribution_end"), sep = "/",
    remove = TRUE
  ) |>
  dplyr::mutate(dplyr::across(c(
    start, end, modified,
    distribution_modified, distribution_start,
    distribution_end
  ), as.Date)) |>
  dplyr::mutate(distribution_description = ifelse(is.na(distribution_description),
    "old", distribution_description
  )) |>
  dplyr::mutate(distribution_title = stringr::str_remove_all(distribution_title, "[:|-]")) |>
  dplyr::mutate(distribution_title = stringr::str_remove_all(distribution_title, "[:number:]")) |>
  dplyr::select(
    -distribution_format, -distribution_media_type,
    -distribution_access_url, -distribution_download_url
  ) |>
  dplyr::mutate(dplyr::across(dplyr::where(is.character), stringr::str_squish))

Output:

> data_tbl |>
+   head(1) |>
+   dplyr::glimpse()
Rows: 1
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants"
$ description              <chr> "The Accountable Care Organization Participants data presents …
$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta…
$ modified                 <date> 2024-01-29
$ keyword                  <chr> "Medicare"
$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM"
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov"
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-…
$ start                    <date> 2014-01-01
$ end                      <date> 2024-12-31
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-…
$ distribution_description <chr> "latest"
$ distribution_title       <chr> "Accountable Care Organization Participants"
$ distribution_modified    <date> 2024-01-29
$ distribution_start       <date> 2024-01-01
$ distribution_end         <date> 2024-12-31
$ media_type               <chr> "API"
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…

New function `current_pch_oncology_measures_hospital_data()`

new function current_pch_oncology_measures_hospital_data()

Release healthyR.data 1.0.3

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Update NEWS.md

Update News file

New function `current_pch_outcomes_data()`

new function current_pch_outcomes_data()

Get Latest Data

Sample Code:

> data_tbl |> dplyr::filter(distribution_description == "latest") |> dplyr::glimpse()
Rows: 612
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants", "Accountable Car…
$ description              <chr> "The Accountable Care Organization Participants data presents …
$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta…
$ modified                 <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ keyword                  <chr> "Medicare", "Value-Based Care", "Coordinated Care", "Payment M…
$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM", "Shared Savings Program - CM", …
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.…
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…
$ start                    <date> 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2…
$ end                      <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-…
$ distribution_description <chr> "latest", "latest", "latest", "latest", "latest", "latest", "l…
$ distribution_title       <chr> "Accountable Care Organization Participants", "Accountable Car…
$ distribution_modified    <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ distribution_start       <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2…
$ distribution_end         <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ media_type               <chr> "API", "API", "API", "API", "API", "API", "API", "API", "API",…
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…

Get Older Data

Sample Code:

> data_tbl |> dplyr::filter(distribution_description != "latest") |> dplyr::glimpse()
Rows: 13,709
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants", "Accountable Car…
$ description              <chr> "The Accountable Care Organization Participants data presents …
$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta…
$ modified                 <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ keyword                  <chr> "Medicare", "Value-Based Care", "Coordinated Care", "Payment M…
$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM", "Shared Savings Program - CM", …
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.…
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…
$ start                    <date> 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2…
$ end                      <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-…
$ distribution_description <chr> "old", "old", "old", "old", "old", "old", "old", "old", "old",…
$ distribution_title       <chr> "Accountable Care Organization Participants", "Accountable Car…
$ distribution_modified    <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ distribution_start       <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2…
$ distribution_end         <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ media_type               <chr> "text/csv", "text/csv", "text/csv", "text/csv", "text/csv", "A…
$ data_link                <chr> "https://data.cms.gov/sites/default/files/2024-01/afc09855-5e4…

Update .Rbuildignore

Use tidyquant .Rbuildignore as an example

New function `current_payment_data()`

Make a current_payment_data() function.

Require R (>= 4.1.0) because of actuar package keep all packages the same

Add function `current_maternal_data()`

Make a function current_maternal_data()

rename function `current_imaging_efficiency_data()`

This needs to be renamed to current_outpatient_imaging_efficiency_data()

New function `current_unplanned_hospital_visits_data()`

new function current_unplanned_hospital_visits_data()

Release healthyR.data 1.0.0

Prepare for release:

Submit to CRAN:

usethis::use_version('major')
devtools::submit_cran()
Approve email

Wait for CRAN...

Add function `current_asc_oas_cahps_data()`

Function:

#' Get Current Ambulatory Surgery Center Quality Reporting Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current Ambulatory Surgery Center data.
#'
#' @details This function will obtain the current Ambulatory Surgery Center data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like,
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is: c("Facility","State","National"), which
#' will bring back all of the data sets that are in the Ambulatory Surgery Center
#' OAS CAHPS data sets. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' You can also pass things like c("state","Nation") as behind the scenes only
#' the Ambulatory Surgery Center data sets are available to the function to choose
#' from and `grep` is used to find matches with `ignore.case = TRUE` set.
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_asc_oas_cahps_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current ASC data from the current hospital data file.
#'
#' @name current_asc_oas_cahps_data
NULL

#' @export
#' @rdname current_asc_oas_cahps_data

current_asc_oas_cahps_data <- function(.data,
                                       .data_sets = c("Facility","State","National")) {

    # Variables
    ds <- .data_sets
    l <- .data

    # Checks
    if (!inherits(l, "current_hosp_data")){
        rlang::abort(
            message = "'.data' must come from the function 'current_hosp_data()",
            use_cli_format = TRUE
        )
    }

    # Manipulations
    # Get the exact files necessary to the ASC
    file_names_vec <- c("ASCQR_OAS_CAHPS_BY_ASC.csv",
                        "ASCQR_OAS_CAHPS_NATIONAL.csv",
                        "ASCQR_OAS_CAHPS_STATE.csv")

    asc_list <- l[names(l) %in% file_names_vec]
    names(asc_list)[1] <-  "ASCQR_OAS_CAHPS_FACILITY.csv"

    # Make sure there are no 0 length items
    asc_list <- asc_list[lengths(asc_list) > 0]

    # Only keep the names we want
    ret <- asc_list[grep(
        paste(ds, collapse = "|"),
        names(asc_list),
        ignore.case = TRUE
    )]

    # Return\
    attr(ret, ".list_type") <- "current_asc_oascahps_list"
    class(ret) <- c("current_asc_oascahps_list", class(ret))

    return(ret)

}

New Function `current_pch_hai_hospital_data()`

New function current_pch_hai_hospital_data()

Release healthyR.data 1.0.2

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

New Function `current_hai_data()`

Add parameter of `.open_folder` as a boolean for `current_hosp_data_dict()`

Function:

current_hosp_data_dict <- function(.open_folder = FALSE) {
  
  # Create a temporary file to store the zip file
  f_path <- utils::choose.dir()
  destfile <- paste0(f_path, "\\Hospital_Data_Dictionary.pdf")
  
  # Download the zip file to the temporary location
  url <- "https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf"
  utils::download.file(
    url = url,
    destfile = destfile,
    mode = "wb"
  )
  
  # Return Message
  rlang::inform(
    message = paste0(
      "The Hospital Data Dictionary has been downloaded to: ",
      f_path,
      "\\",
      basename(url)
    ),
    use_cli_format = TRUE
  )
  
  # Open file folder?
  if (.open_folder){
    shell.exec(f_path)
  }
  
}

Example:

> current_hosp_data_dict(TRUE)
trying URL 'https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf'
Content type 'application/pdf' length 1202138 bytes (1.1 MB)
downloaded 1.1 MB

The Hospital Data Dictionary has been downloaded to:
C:\Users\sandes05\Downloads\HOSPITAL_Data_Dictionary.pdf

Program Emails Tibble

Sample code:

> data_tbl |> dplyr::select(title, has_email) |> dplyr::distinct() |> dplyr::glimpse()
Rows: 132
Columns: 2
$ title     <chr> "Accountable Care Organization Participants", "Accountable Care Organization …
$ has_email <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.hhs.gov", "ACOR…

Get `tempdir()` does not exist error

instead of using 'tempdir()` use utils::choose.dir() intead.

New function:

#' Download Current Hospital Data Files.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Download the current Hospital Data Sets.
#'
#' @details This function will download the current and the official hospital
#' data sets from the __CMS.gov__ website.
#'
#' The function makes use of a temporary directory and file to save and unzip
#' the data. This will grab the current Hospital Data Files, unzip them and
#' return a list of tibbles with each tibble named after the data file.
#'
#' The function returns a list object with all of the current hospital data as a
#' tibble. It does not save the data anywhere so if you want to save it you will
#' have to do that manually.
#'
#' This also means that you would have to store the data as a variable in order
#' to access the data later on. It does have a given attributes and a class so
#' that it can be piped into other functions.
#'
#' @examples
#' \dontrun{
#'   current_hosp_data()
#' }
#'
#' @return
#' Downloads the current hospital data sets.
#'
#' @name current_hosp_data
NULL

#' @export
#' @rdname current_hosp_data

current_hosp_data <- function() {
  
  # URL for file
  url <- "https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip"
  
  # Create a temporary directory to process the zip file
  tmp_dir <- utils::choose.dir()#tempdir()
  download_location <- file.path(tmp_dir, "download.zip")
  extract_location <- file.path(tmp_dir, "extract")
  
  # Download the zip file to the temporary location
  utils::download.file(
    url = url,
    destfile = download_location
  )
  
  # Unzip the file
  utils::unzip(download_location, exdir = extract_location)
  
  # Read the csv files into a list
  csv_file_list <- list.files(
    path = extract_location,
    pattern = "\\.csv$",
    full.names = TRUE
  )
  
  # make named list
  csv_names <- stats::setNames(
      object = csv_file_list,
      nm =
        csv_file_list |>
        basename() |>
        gsub(pattern = "\\.csv$", replacement = "") |>
        janitor::make_clean_names()
    )
  
  # Process CSV Files
  parse_csv_file <- function(file) {
    # Normalize the path to use C:/path/to/file structure
    normalizePath(file, "/") |>
      # read in the csv file and use check.names = FALSE because some of
      # the names are very long
      utils::read.csv(check.names = FALSE) |>
      dplyr::as_tibble() |>
      # clean the field names
      janitor::clean_names()
  }
  
  list_of_tables <- lapply(csv_names, parse_csv_file)
  
  unlink(tmp_dir, recursive = TRUE)
  
  # Return the tibbles)
  # Add and attribute and a class type to the object
  attr(list_of_tables, ".list_type") <- "current_hosp_data"
  class(list_of_tables) <- c("current_hosp_data", class(list_of_tables))
  
  list_of_tables
}

@rjake would you have the bandwidth to confirm this one?

Add function `current_medicare_spending_data()`

Make function current_medicare_spending_data()

Make function `current_asc_data()`

Make a function that will take in data from the current_hosp_data() function and will only return the Ambulatory Surgery Center data with a parameter that will allow the user to filter which data sets inside of that list they want to use.

Function:

#' Get Current Ambulatory Surgery Center Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current Ambulatory Surgery Center data.
#'
#' @details This function will obtain the current Ambulatory Surgery Center data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like, if you return NULL then all of the data sets
#' will be returned in a list object.
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is NULL, which will bring back all of the data
#' sets that are in the file. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_asc_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current ASC data from the current hospital data file.
#'
#' @name current_asc_data
NULL

#' @export
#' @rdname current_asc_data

current_asc_data <- function(.data, .data_sets) {

    # Variables
    ds <- .data_sets
    l <- .data

    # Checks
    if (!is.list(l)){
        rlang::abort(
            message = "'.data' must be a list object",
            use_cli_format = TRUE
        )
    }

    if (!inherits(l, "current_hosp_data")){
        rlang::abort(
            message = "'.data' must come from the function 'current_hosp_data()",
            use_cli_format = TRUE
        )
    }

    # Manipulations
    keep_names_vec <- c("ASC_Facility.csv","ASC_National.csv","ASC_State.csv")
    asc_list <- l[names(l) %in% keep_names_vec]

    # Make sure there are no 0 length items
    asc_list <- asc_list[lapply(asc_list, length) > 0]
    # Only keep the names we want
    ret <- asc_list[grep(
        paste(keep_names_vec, collapse = "|"),
        names(asc_list),
        ignore.case = TRUE
        )]

    # Return\
    attr(ret, ".list_type") <- "current_asc_list"
    class(ret) <- c("current_asc_list", class(ret))

    return(ret)

}

download_data <- function() {
    
    # Create a temporary file to store the zip file
    tmp <- tempfile()
    tmp_dir <- tempdir()
    
    # Download the zip file to the temporary location
    utils::download.file("https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip",
                  destfile = tmp)
    
    # Unzip the file
    unzip(tmp, exdir = tempdir())
    
    # Read the csv files into tibbles
    csv_file_list <- list.files(
        path = tempdir(),
        pattern = "\\.csv$",
        full.names = TRUE
    )
    
    csv_file_tbl <- csv_file_list %>%
        purrr::map(.f = ~ stringr::str_replace_all(.x, "\\\\","/")) %>%
        purrr::map(.f = ~ read.csv(.x, check.names = FALSE)) %>%
        purrr::map(janitor::clean_names)

    file_names <- csv_file_list %>%
        purrr::map(.f = ~ stringr::str_replace_all(.x, "\\\\","/")) %>%
        stringr::str_remove(paste0(stringr::str_replace_all(tmp_dir, "\\\\", "/"),"/")) %>%
        stringr::str_replace_all("-","_")
    
    names(csv_file_tbl) <- file_names
    
    csv_file_tbl <- purrr::map(csv_file_tbl, dplyr::as_tibble)
    
    unlink(tmp_dir, recursive = TRUE)
    
    # Return the tibbles
    return(csv_file_tbl)
}

Example:

test <- download_current_hospital_data()

> test
$ASC_Facility.csv
# A tibble: 10,918 × 36
   facilit…¹ facil…²    npi city  state zip_c…³ year  asc_9…⁴ asc_9…⁵ asc_1…⁶ asc_1…⁷ asc_1…⁸
   <chr>     <chr>    <int> <chr> <chr>   <int> <chr> <chr>     <int> <chr>     <int> <chr>  
 1 THE SURG… 05C000… 1.00e9 VACA… CA      95687 2021  71.67        NA N/A           5 831    
 2 THE SURG… 05C000… 1.00e9 VACA… CA      95687 2022… N/A           5 N/A           5 N/A    
 3 CARLSBAD… 05C000… 1.00e9 CARL… CA      92011 2021  44.26        NA N/A           5 65

Download `Hospital_Data_Dictionary.pdf`

Allow user to input save_to_dir if desired else, default to NULL and let utils::choose.dir() be executed.

Function:

download_current_hospital_data_dictionary <- function() {
    
    # Create a temporary file to store the zip file
    file_path <- utils::choose.dir()
    destfile <- paste0(file_path, "\\Hospital_Data_Dictionary.pdf")
    
    # Download the zip file to the temporary location
    url <- "https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf"
    utils::download.file(
        url = url,
        destfile = destfile,
        mode = "wb"
    )
    
    # Return the tibbles
    rlang::inform(
        message = paste0(
            "The Hospital Data Dictionary has been downloaded to: ", 
            file_path,
            "\\",
            basename(url)
            ),
        use_cli_format = TRUE
        )
}

Example:

> download_current_hospital_data_dictionary()
trying URL 'https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf'
Content type 'application/pdf' length 1202138 bytes (1.1 MB)
downloaded 1.1 MB

The Hospital Data Dictionary has been downloaded to:
C:\Users\steve\Downloads\HOSPITAL_Data_Dictionary.pdf

# Make sure there are no 0 length items
asc_list <- asc_list[lapply(asc_list, length) > 0]

New:

# Make sure there are no 0 length items
asc_list <- asc_list[lengths(asc_list) > 0]

Update logo image name to logo.png

Provider File MetaData

Sample code

> url_meta_data <- paste0('https://data.cms.gov/',
+                     'provider-data/api/1/metastore/',
+                     'schemas/dataset/items')
> 
> response <- httr2::request(url_meta_data) |>
+   httr2::req_perform() |>
+   httr2::resp_body_json(check_type = FALSE,
+                         simplifyVector = TRUE) |>
+   dplyr::select(-bureauCode, -`@type`, -programCode) |>
+   dplyr::mutate(contact_fn = contactPoint$fn) |>
+   dplyr::mutate(contact_email = contactPoint$hasEmail |>
+                   stringr::str_remove("mailto:")) |>
+   dplyr::mutate(publisher_name = publisher$name) |>
+   dplyr::mutate(download_url = distribution[[1]]$downloadURL) |>
+   dplyr::mutate(media_type = distribution[[3]]$mediaType) |>
+   dplyr::select(-contactPoint, -publisher, -distribution) |>
+   dplyr::mutate(dplyr::across(c(issued, modified, released), as.Date)) |>
+   dplyr::tibble() |>
+   janitor::clean_names()
> 
> dplyr::glimpse(response)
Rows: 228
Columns: 16
$ access_level    <chr> "public", "public", "public", "public", "public", "public", "public", "…
$ landing_page    <chr> "https://data.cms.gov/provider-data/dataset/46dc-a66c", "https://data.c…
$ issued          <date> 2021-06-04, 2023-08-17, 2023-08-17, 2022-07-11, 2022-07-11, 2022-07-11…
$ modified        <date> 2023-02-16, 2023-08-21, 2023-08-21, 2022-07-11, 2022-07-11, 2022-07-11…
$ released        <date> 2023-03-09, 2023-08-31, 2023-08-31, 2023-09-28, 2023-09-28, 2023-09-28…
$ keyword         <list> "Clinicians", <"Performance", "Quality", "Measures", "MIPS", "Quality …
$ identifier      <chr> "46dc-a66c", "0ba7-2cb0", "8c70-d353", "3614-1eef", "bdd5-4a04", "4269-…
$ description     <chr> "This file contains performance information for Merit-Based Incentive P…
$ title           <chr> "PY 2021 Virtual Group Public Reporting: MIPS Measures and Attestations…
$ theme           <list> "Doctors and clinicians", "Doctors and clinicians", "Doctors and clini…
$ archive_exclude <lgl> FALSE, FALSE, FALSE, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ contact_fn      <chr> "CMS", "CMS", "CMS", "PPL Dataset", "PPL Dataset", "PPL Dataset", "PPL …
$ contact_email   <chr> "[email protected]", "[email protected]", "[email protected]", "PPL_Dataset@c…
$ publisher_name  <chr> "Centers for Medicare &amp; Medicaid Services (CMS)", "Centers for Medi…
$ download_url    <chr> "https://data.cms.gov/provider-data/sites/default/files/resources/635d1…
$ media_type      <chr> "text/csv", "text/csv", "text/csv", "text/csv", "text/csv", "text/csv",…

New Function `current_imaging_efficiency_data()`

Make a new function current_imaging_efficiency_data()

try to re-work `current_hosp_data()`

try to drop use of tempfile() and tempdir()

breakup url for file to make save loc

use utils::choose.dir()

Get Data Distributions

Sample Code:

data_distribtution <- function(.data){
  df <- .data
  
  dplyr::tibble(data = df[["dataset"]][["distribution"]]) |>
    dplyr::mutate(cnm = purrr::map(data, \(x) names(x))) |>
    dplyr::mutate(title_in_col = purrr::map(cnm, \(x) "title" %in% x) |> unlist()) |>
    dplyr::mutate(accessURL_in_col = purrr::map(cnm, \(x) "accessURL" %in% x) |> unlist()) |>
    dplyr::filter(title_in_col == TRUE & accessURL_in_col == TRUE) |>
    dplyr::select(data) |>
    dplyr::mutate(data = purrr::map(
      data, \(x) x |>
        dplyr::select(title, modified, temporal, accessURL, 
                      downloadURL, mediaType) |>
        dplyr::mutate(data_link = dplyr::coalesce(accessURL, downloadURL)) |>
        dplyr::mutate(mediaType = ifelse(is.na(mediaType), "api", mediaType)) |>
        dplyr::mutate(modified = as.Date(modified)) |>
        dplyr::select(-accessURL, -downloadURL) |>
        dplyr::arrange(title, dplyr::desc(modified), mediaType) |>
        tidyr::separate(temporal, into = c("start", "end"), sep = "/",
                        remove = FALSE) |>
        dplyr::mutate(dplyr::across(c(start, end), as.Date)) |>
        dplyr::mutate(data_year = format(start, "%Y") |> as.numeric()) |>
        dplyr::mutate(title = stringr::str_remove_all(title, "[:|-]")) |>
        dplyr::mutate(title = stringr::str_remove_all(title, "[:number:]")) |>
        dplyr::mutate(title = stringr::str_squish(title))
    )) |>
    tidyr::unnest(cols = c(data)) |>
    janitor::clean_names()
}

dataset_distributions <- data_distribtution(data_sets)

Output:

> dataset_distributions |>
+   head(4) |>
+   dplyr::glimpse()
Rows: 4
Columns: 8
$ title      <chr> "Accountable Care Organization Participants", "Accountable Care Organization…
$ modified   <date> 2022-01-27, 2022-01-27, 2022-01-27, 2022-01-27
$ temporal   <chr> "2014-01-01/2014-12-31", "2014-01-01/2014-12-31", "2015-01-01/2015-12-31", …
$ start      <date> 2014-01-01, 2014-01-01, 2015-01-01, 2015-01-01
$ end        <date> 2014-12-31, 2014-12-31, 2015-12-31, 2015-12-31
$ media_type <chr> "api", "application/vnd.ms-excel", "api", "application/vnd.ms-excel"
$ data_link  <chr> "https://data.cms.gov/data-api/v1/dataset/5ebc6246-1861-4d9f-92b4-33c69b315d…
$ data_year  <dbl> 2014, 2014, 2015, 2015

Google analytics

Setup google analytics

https://stackoverflow.com/questions/41376989/how-to-include-google-analytics-in-an-rmarkdown-generated-github-page

`current_hosp_data()` update to factory function

use function factory

parse_csv_file <- function(file) {
  gsub("\\\\", "/", file |>
  read.csv(check.names = FALSE) |>
  janitor::clean_names()
}

csv_file_tbl <- lapply(csv_file_list, parse_csv_file)

get_csv_names <- function(file) {
  normalizePath(file, "/") |>
  gsub(pattern = path_remove, replacement = "") |>
  gsub(pattern = "-", replacement = "_")
}

file_names <- lapply(csv_file_list, get_csv_names)

Old

csv_file_list |>
        purrr::map(
            \(file) normalizePath(file, "/") |>
                gsub(pattern = path_remove, replacement = "") |>
                gsub(pattern = "-", replacement = "_")
        )

Add function `current_va_data()`

Add function current_va_data()

onAttach message

make a message for library() .onAttach()

Similar to bayesmodels

.onAttach <- function(libname, pkgname) {

bsu_rule_color <- "#2c3e50"
bsu_main_color <- "#1f78b4"

# Check Theme: If Dark, Update Colors
tryCatch({
    if (rstudioapi::isAvailable()) {
        theme <- rstudioapi::getThemeInfo()
        if (is.null(theme)) {
            bsu_rule_color <- "#2c3e50"
            bsu_main_color <- "#1f78b4"
        }
        if (theme$dark) {
            bsu_rule_color <- "#7FD2FF"
            bsu_main_color <- "#18bc9c"
        }
    }
}, error = function(e) {
    bsu_rule_color <- "#2c3e50"
    bsu_main_color <- "#1f78b4"
}, finally = {
    bsu_main <- crayon::make_style(bsu_main_color)
    
    msg <- paste0(
        cli::rule(left = "Welcome to bayesmodels", col = bsu_rule_color, line = 2),
        bsu_main('\nIf you are interested in time series, maybe you would like to check my other packages: garchmodels and boostime\n'),
        bsu_main('</> If you find this package useful, please leave a star: https://github.com/AlbertoAlmuinha/bayesmodels </>')
    )
    
    packageStartupMessage(msg)
})

}

Code review `current_hosp_data()` function

@rjake Here is the current function:

current_hosp_data <- function() {

    # URL for file
    url <- "https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip"
    f_name <- basename(url)

    # Create a temporary file to store the zip file
    tmp <- tempfile()
    tmp_dir <- tempdir()

    # Download the zip file to the temporary location
    utils::download.file(
        url = url,
        destfile = tmp
    )

    # Unzip the file
    utils::unzip(tmp, exdir = tmp_dir)

    # Read the csv files into a list
    csv_file_list <- list.files(
        path = tmp_dir,
        pattern = "\\.csv$",
        full.names = TRUE
    )

    # Process CSV Files
    parse_csv_file <- function(file) {
        # Normalize the path to use C:/path/to/file structure
        normalizePath(file, "/") |>
            # read in the csv file and use check.names = FALSE because some of
            # the names are very long
            utils::read.csv(check.names = FALSE) |>
            # clean the field names
            janitor::clean_names()
        }

    csv_file_tbl <- lapply(csv_file_list, parse_csv_file)

    # Get File Names
    # Get the tmp_dir in normal form C:/path/to/file
    path_remove <- paste0(normalizePath(tmp_dir, "/"),"/")

    # Get csv names function
    get_csv_names <- function(file) {
        # Process the path to normal form
        normalizePath(file, "/") |>
            # remove the tmp_dir from the full file string
            gsub(pattern = path_remove, replacement = "") |>
            # change all - to _
            gsub(pattern = "-", replacement = "_")
        }
    # Get the names
    file_names <- lapply(csv_file_list, get_csv_names)

    # apply the names
    names(csv_file_tbl) <- file_names

    # Make the result a tibble for each file.
    csv_file_tbl <- lapply(csv_file_tbl, dplyr::as_tibble)

    unlink(tmp, recursive = TRUE)

    # Return the tibbles
    # Add and attribute and a class type to the object
    attr(csv_file_tbl, ".list_type") <- "current_hosp_data"
    class(csv_file_tbl) <- c("current_hosp_data", class(csv_file_tbl))

    return(csv_file_tbl)
}

spsanderson / healthyr.data Goto Github PK

healthyr.data's Introduction

Hi there, I'm Steve I spend a lot of time in R and here is my github resume

Package Stats

GitHub Stats

Extended Stats

Most Used Languages

healthyr.data's People

Contributors

Stargazers

Watchers

Forkers

healthyr.data's Issues

Recommend Projects

Recommend Topics

Recommend Org