Coder Social home page Coder Social logo

spsanderson / healthyr.data Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 3.0 20.61 MB

Data sets for the healthyR package.

Home Page: https://www.spsanderson.com/healthyR.data/

License: Other

R 100.00%
data data-sets data-science r rstats healthcare healthcare-application healthcare-datasets healthcare-analysis

healthyr.data's Introduction

Hi there, I'm Steve I spend a lot of time in R and here is my github resume

I use these:

I authored and maintain these:

I have co-authored these:

about_me <- list(
  name            = "Sanderson"
  , first_name    = "Steven"
  , occupation    = "Manager of Applications"
  , employer      = "Stony Brook Medicine"
  , fav_prog_lang = "R"
  , languages     = list("R", "SQL", "Python (working on it)")
  , my_r_packages = list("healthyverse", "healthyR", "healthyR.data", 
                    "healthyR.ts", "healthyR.ai", "TidyDensity",
                    "tidyAML")
  , organizations = list("https://github.com/Koffi-Fredysessie")
  , orcid         = list("https://orcid.org/my-orcid?orcid=0009-0006-7661-8247")
  , org_pkgs      = list("BRVM")
  , website       = "https://www.spsanderson.com/"
  , blog          = "https://www.spsanderson.com/steveondata/"
  , mastadon      = "https://mstdn.social/@stevensanderson"
)

🔗  I am also on:

spsanderson

Mastodon

I am currently working on the R packages mentioned above and I use my package-downloads repository to keep track of the stats of those packages. I'm looking to collaborate on all or any of them and pull requests are welcome! I can be reached at spsanderson at gmail.com or via my LinkedIn page. Here is a link to my personal site: Steve's Homepage

I have also written a book with my co-author David Kun entitled: "Extending Excel with Python and R"

Package Stats

{healthyR}

CRAN_Status_Badge healthyR status badge Lifecycle: experimental PRs Welcome

{healthyR.ts}

CRAN_Status_Badge healthyR.ts status badge Lifecycle: experimental PRs Welcome

{healthyR.ai}

CRAN_Status_Badge healthyR.ai status badge Lifecycle: experimental PRs Welcome

{healthyR.data}

CRAN_Status_Badge healthyR.data status badge Lifecycle: stable PRs Welcome

{healthyverse}

CRAN_Status_Badge healthyverse status badge Lifecycle: stable PRs Welcome

{TidyDensity}

CRAN_Status_Badge TidyDensity status badge Lifecycle: stable PRs Welcome

{tidyAML}

CRAN_Status_Badge tidyAML status badge Lifecycle: experimental PRs Welcome

{The 'spsanderson' universe} Click Here!

:name status badge :registry status badge :total status badge

GitHub Stats

Steven P. Sanderson II, MPH

Extended Stats

Metrics

Most Used Languages

Your Repository's Stats

healthyr.data's People

Contributors

olivroy avatar rjake avatar spsanderson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

healthyr.data's Issues

Release healthyR.data 1.0.1

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Add function `current_comp_death_data()`

Function:

#' Get Current Complications and Death Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current complications and death data.
#'
#' @details This function will obtain the current Complications and Death data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like,
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is: c("Facility","State","National"), which
#' will bring back all of the data sets that are in the Ambulatory Surgery Center
#' OAS CAHPS data sets. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' You can also pass things like c("state","Nation") as behind the scenes only
#' the Complicaitons and Death data sets are available to the function to choose
#' from and `grep` is used to find matches with `ignore.case = TRUE` set.
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_comp_death_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current Complications and Death data from the current hospital data file.
#'
#' @name current_comp_death_data
NULL

#' @export
#' @rdname current_comp_death_data

current_comp_death_data <- function(.data,
                                    .data_sets = c("Facility","State","National")) {
  
  # Variables
  ds <- .data_sets
  l <- .data
  
  # Checks
  if (!inherits(l, "current_hosp_data")){
    rlang::abort(
      message = "'.data' must come from the function 'current_hosp_data()",
      use_cli_format = TRUE
    )
  }
  
  # Manipulations
  # Get the exact files necessary to the ASC
  file_names_vec <- c("Complications_and_Deaths_Hospital.csv",
                      "Complications_and_Deaths_National.csv",
                      "Complications_and_Deaths_State.csv")
  
  asc_list <- l[names(l) %in% file_names_vec]
  names(asc_list)[1] <-  "Complications_and_Deaths_Facility.csv"
  
  # Make sure there are no 0 length items
  asc_list <- asc_list[lengths(asc_list) > 0]
  
  # Only keep the names we want
  ret <- asc_list[grep(
    paste(ds, collapse = "|"),
    names(asc_list),
    ignore.case = TRUE
  )]
  
  # Return\
  attr(ret, ".list_type") <- "current_comp_death_list"
  class(ret) <- c("current_comp_death_list", class(ret))
  
  return(ret)
  
}

Get CMS Meta Data

Sample Code:

url <- "https://data.cms.gov/data.json"

data_sets <- httr2::request(url) |>
  httr2::req_perform() |>
  httr2::resp_body_json(check_type = FALSE, simplifyVector = TRUE)

data_tbl <- data_sets$dataset |>
  dplyr::tibble() |>
  dplyr::select(
    title, description, landingPage,
    modified, keyword, description,
    describedBy, contactPoint, identifier,
    temporal, references, distribution
  ) |>
  tidyr::unnest(cols = distribution, names_sep = "_") |>
  tidyr::unnest(cols = c(keyword, contactPoint, references)) |>
  janitor::clean_names() |>
  dplyr::select(-type, -distribution_type) |>
  dplyr::mutate(media_type = ifelse(is.na(distribution_format),
    distribution_media_type,
    distribution_format
  )) |>
  dplyr::mutate(data_link = ifelse(is.na(distribution_access_url),
    distribution_download_url,
    distribution_access_url
  )) |>
  dplyr::mutate(has_email = stringr::str_remove(has_email, "mailto:")) |>
  tidyr::separate(temporal,
    into = c("start", "end"), sep = "/",
    remove = TRUE
  ) |>
  tidyr::separate(distribution_temporal,
    into = c("distribution_start", "distribution_end"), sep = "/",
    remove = TRUE
  ) |>
  dplyr::mutate(dplyr::across(c(
    start, end, modified,
    distribution_modified, distribution_start,
    distribution_end
  ), as.Date)) |>
  dplyr::mutate(distribution_description = ifelse(is.na(distribution_description),
    "old", distribution_description
  )) |>
  dplyr::mutate(distribution_title = stringr::str_remove_all(distribution_title, "[:|-]")) |>
  dplyr::mutate(distribution_title = stringr::str_remove_all(distribution_title, "[:number:]")) |>
  dplyr::select(
    -distribution_format, -distribution_media_type,
    -distribution_access_url, -distribution_download_url
  ) |>
  dplyr::mutate(dplyr::across(dplyr::where(is.character), stringr::str_squish))

Output:

> data_tbl |>
+   head(1) |>
+   dplyr::glimpse()
Rows: 1
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants"
$ description              <chr> "The Accountable Care Organization Participants data presents …
$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta$ modified                 <date> 2024-01-29
$ keyword                  <chr> "Medicare"
$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM"
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov"
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-$ start                    <date> 2014-01-01
$ end                      <date> 2024-12-31
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-…
$ distribution_description <chr> "latest"
$ distribution_title       <chr> "Accountable Care Organization Participants"
$ distribution_modified    <date> 2024-01-29
$ distribution_start       <date> 2024-01-01
$ distribution_end         <date> 2024-12-31
$ media_type               <chr> "API"
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8

Release healthyR.data 1.0.3

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push

Get Latest Data

Sample Code:

> data_tbl |> dplyr::filter(distribution_description == "latest") |> dplyr::glimpse()
Rows: 612
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants", "Accountable Car…
$ description              <chr> "The Accountable Care Organization Participants data presents$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta…
$ modified                 <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ keyword                  <chr> "Medicare", "Value-Based Care", "Coordinated Care", "Payment M$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM", "Shared Savings Program - CM", …
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…
$ start                    <date> 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2…
$ end                      <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-$ distribution_description <chr> "latest", "latest", "latest", "latest", "latest", "latest", "l…
$ distribution_title       <chr> "Accountable Care Organization Participants", "Accountable Car$ distribution_modified    <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2$ distribution_start       <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2$ distribution_end         <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2$ media_type               <chr> "API", "API", "API", "API", "API", "API", "API", "API", "API",…
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…

Get Older Data

Sample Code:

> data_tbl |> dplyr::filter(distribution_description != "latest") |> dplyr::glimpse()
Rows: 13,709
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants", "Accountable Car…
$ description              <chr> "The Accountable Care Organization Participants data presents$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings-program/accounta…
$ modified                 <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ keyword                  <chr> "Medicare", "Value-Based Care", "Coordinated Care", "Payment M$ described_by             <chr> "https://data.cms.gov/resources/accountable-care-organization-…
$ fn                       <chr> "Shared Savings Program - CM", "Shared Savings Program - CM", …
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/9767cb68-8ea9-4f0b-8…
$ start                    <date> 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2…
$ end                      <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ references               <chr> "https://data.cms.gov/resources/acos-aco-participants-and-snf-$ distribution_description <chr> "old", "old", "old", "old", "old", "old", "old", "old", "old",…
$ distribution_title       <chr> "Accountable Care Organization Participants", "Accountable Car…
$ distribution_modified    <date> 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2024-01-29, 2…
$ distribution_start       <date> 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2024-01-01, 2…
$ distribution_end         <date> 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2024-12-31, 2…
$ media_type               <chr> "text/csv", "text/csv", "text/csv", "text/csv", "text/csv", "A$ data_link                <chr> "https://data.cms.gov/sites/default/files/2024-01/afc09855-5e4…

Release healthyR.data 1.0.0

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • devtools::build_readme()
  • usethis::use_cran_comments()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Update cran-comments.md
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Update install instructions in README
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Add function `current_asc_oas_cahps_data()`

Function:

#' Get Current Ambulatory Surgery Center Quality Reporting Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current Ambulatory Surgery Center data.
#'
#' @details This function will obtain the current Ambulatory Surgery Center data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like,
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is: c("Facility","State","National"), which
#' will bring back all of the data sets that are in the Ambulatory Surgery Center
#' OAS CAHPS data sets. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' You can also pass things like c("state","Nation") as behind the scenes only
#' the Ambulatory Surgery Center data sets are available to the function to choose
#' from and `grep` is used to find matches with `ignore.case = TRUE` set.
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_asc_oas_cahps_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current ASC data from the current hospital data file.
#'
#' @name current_asc_oas_cahps_data
NULL

#' @export
#' @rdname current_asc_oas_cahps_data

current_asc_oas_cahps_data <- function(.data,
                                       .data_sets = c("Facility","State","National")) {

    # Variables
    ds <- .data_sets
    l <- .data

    # Checks
    if (!inherits(l, "current_hosp_data")){
        rlang::abort(
            message = "'.data' must come from the function 'current_hosp_data()",
            use_cli_format = TRUE
        )
    }

    # Manipulations
    # Get the exact files necessary to the ASC
    file_names_vec <- c("ASCQR_OAS_CAHPS_BY_ASC.csv",
                        "ASCQR_OAS_CAHPS_NATIONAL.csv",
                        "ASCQR_OAS_CAHPS_STATE.csv")

    asc_list <- l[names(l) %in% file_names_vec]
    names(asc_list)[1] <-  "ASCQR_OAS_CAHPS_FACILITY.csv"

    # Make sure there are no 0 length items
    asc_list <- asc_list[lengths(asc_list) > 0]

    # Only keep the names we want
    ret <- asc_list[grep(
        paste(ds, collapse = "|"),
        names(asc_list),
        ignore.case = TRUE
    )]

    # Return\
    attr(ret, ".list_type") <- "current_asc_oascahps_list"
    class(ret) <- c("current_asc_oascahps_list", class(ret))

    return(ret)

}

Release healthyR.data 1.0.2

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push

Add parameter of `.open_folder` as a boolean for `current_hosp_data_dict()`

Function:

current_hosp_data_dict <- function(.open_folder = FALSE) {
  
  # Create a temporary file to store the zip file
  f_path <- utils::choose.dir()
  destfile <- paste0(f_path, "\\Hospital_Data_Dictionary.pdf")
  
  # Download the zip file to the temporary location
  url <- "https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf"
  utils::download.file(
    url = url,
    destfile = destfile,
    mode = "wb"
  )
  
  # Return Message
  rlang::inform(
    message = paste0(
      "The Hospital Data Dictionary has been downloaded to: ",
      f_path,
      "\\",
      basename(url)
    ),
    use_cli_format = TRUE
  )
  
  # Open file folder?
  if (.open_folder){
    shell.exec(f_path)
  }
  
}

Example:

> current_hosp_data_dict(TRUE)
trying URL 'https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf'
Content type 'application/pdf' length 1202138 bytes (1.1 MB)
downloaded 1.1 MB

The Hospital Data Dictionary has been downloaded to:
C:\Users\sandes05\Downloads\HOSPITAL_Data_Dictionary.pdf

image

Program Emails Tibble

Sample code:

> data_tbl |> dplyr::select(title, has_email) |> dplyr::distinct() |> dplyr::glimpse()
Rows: 132
Columns: 2
$ title     <chr> "Accountable Care Organization Participants", "Accountable Care Organization …
$ has_email <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSavingsProgram@cms.hhs.gov", "ACOR

Get `tempdir()` does not exist error

instead of using 'tempdir()` use utils::choose.dir() intead.

New function:

#' Download Current Hospital Data Files.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Download the current Hospital Data Sets.
#'
#' @details This function will download the current and the official hospital
#' data sets from the __CMS.gov__ website.
#'
#' The function makes use of a temporary directory and file to save and unzip
#' the data. This will grab the current Hospital Data Files, unzip them and
#' return a list of tibbles with each tibble named after the data file.
#'
#' The function returns a list object with all of the current hospital data as a
#' tibble. It does not save the data anywhere so if you want to save it you will
#' have to do that manually.
#'
#' This also means that you would have to store the data as a variable in order
#' to access the data later on. It does have a given attributes and a class so
#' that it can be piped into other functions.
#'
#' @examples
#' \dontrun{
#'   current_hosp_data()
#' }
#'
#' @return
#' Downloads the current hospital data sets.
#'
#' @name current_hosp_data
NULL

#' @export
#' @rdname current_hosp_data

current_hosp_data <- function() {
  
  # URL for file
  url <- "https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip"
  
  # Create a temporary directory to process the zip file
  tmp_dir <- utils::choose.dir()#tempdir()
  download_location <- file.path(tmp_dir, "download.zip")
  extract_location <- file.path(tmp_dir, "extract")
  
  # Download the zip file to the temporary location
  utils::download.file(
    url = url,
    destfile = download_location
  )
  
  # Unzip the file
  utils::unzip(download_location, exdir = extract_location)
  
  # Read the csv files into a list
  csv_file_list <- list.files(
    path = extract_location,
    pattern = "\\.csv$",
    full.names = TRUE
  )
  
  # make named list
  csv_names <- stats::setNames(
      object = csv_file_list,
      nm =
        csv_file_list |>
        basename() |>
        gsub(pattern = "\\.csv$", replacement = "") |>
        janitor::make_clean_names()
    )
  
  # Process CSV Files
  parse_csv_file <- function(file) {
    # Normalize the path to use C:/path/to/file structure
    normalizePath(file, "/") |>
      # read in the csv file and use check.names = FALSE because some of
      # the names are very long
      utils::read.csv(check.names = FALSE) |>
      dplyr::as_tibble() |>
      # clean the field names
      janitor::clean_names()
  }
  
  list_of_tables <- lapply(csv_names, parse_csv_file)
  
  unlink(tmp_dir, recursive = TRUE)
  
  # Return the tibbles)
  # Add and attribute and a class type to the object
  attr(list_of_tables, ".list_type") <- "current_hosp_data"
  class(list_of_tables) <- c("current_hosp_data", class(list_of_tables))
  
  list_of_tables
}

@rjake would you have the bandwidth to confirm this one?

Make function `current_asc_data()`

Make a function that will take in data from the current_hosp_data() function and will only return the Ambulatory Surgery Center data with a parameter that will allow the user to filter which data sets inside of that list they want to use.

Function:

#' Get Current Ambulatory Surgery Center Data.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Get the current Ambulatory Surgery Center data.
#'
#' @details This function will obtain the current Ambulatory Surgery Center data
#' from the output of the [healthyR.data::current_hosp_data()] function, that is
#' the required input for the `.data` parameter. You can pass in a list of which
#' of those data sets you would like, if you return NULL then all of the data sets
#' will be returned in a list object.
#'
#' @param .data The data that results from the `current_hosp_data()` function.
#' @param .data_sets The default is NULL, which will bring back all of the data
#' sets that are in the file. You can choose from the following:
#' *  Facility
#' *  National
#' *  State
#'
#' @examples
#' \dontrun{
#' library(dplyr)
#'
#' current_hosp_data() %>%
#'   current_asc_data(.data_sets = c("State","National"))
#' }
#'
#' @return
#' Gets the current ASC data from the current hospital data file.
#'
#' @name current_asc_data
NULL

#' @export
#' @rdname current_asc_data

current_asc_data <- function(.data, .data_sets) {

    # Variables
    ds <- .data_sets
    l <- .data

    # Checks
    if (!is.list(l)){
        rlang::abort(
            message = "'.data' must be a list object",
            use_cli_format = TRUE
        )
    }

    if (!inherits(l, "current_hosp_data")){
        rlang::abort(
            message = "'.data' must come from the function 'current_hosp_data()",
            use_cli_format = TRUE
        )
    }

    # Manipulations
    keep_names_vec <- c("ASC_Facility.csv","ASC_National.csv","ASC_State.csv")
    asc_list <- l[names(l) %in% keep_names_vec]

    # Make sure there are no 0 length items
    asc_list <- asc_list[lapply(asc_list, length) > 0]
    # Only keep the names we want
    ret <- asc_list[grep(
        paste(keep_names_vec, collapse = "|"),
        names(asc_list),
        ignore.case = TRUE
        )]

    # Return\
    attr(ret, ".list_type") <- "current_asc_list"
    class(ret) <- c("current_asc_list", class(ret))

    return(ret)

}

Download data for `https://data.cms.gov/provider-data/topics/hospitals`

Function:

download_data <- function() {
    
    # Create a temporary file to store the zip file
    tmp <- tempfile()
    tmp_dir <- tempdir()
    
    # Download the zip file to the temporary location
    utils::download.file("https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip",
                  destfile = tmp)
    
    # Unzip the file
    unzip(tmp, exdir = tempdir())
    
    # Read the csv files into tibbles
    csv_file_list <- list.files(
        path = tempdir(),
        pattern = "\\.csv$",
        full.names = TRUE
    )
    
    csv_file_tbl <- csv_file_list %>%
        purrr::map(.f = ~ stringr::str_replace_all(.x, "\\\\","/")) %>%
        purrr::map(.f = ~ read.csv(.x, check.names = FALSE)) %>%
        purrr::map(janitor::clean_names)

    file_names <- csv_file_list %>%
        purrr::map(.f = ~ stringr::str_replace_all(.x, "\\\\","/")) %>%
        stringr::str_remove(paste0(stringr::str_replace_all(tmp_dir, "\\\\", "/"),"/")) %>%
        stringr::str_replace_all("-","_")
    
    names(csv_file_tbl) <- file_names
    
    csv_file_tbl <- purrr::map(csv_file_tbl, dplyr::as_tibble)
    
    unlink(tmp_dir, recursive = TRUE)
    
    # Return the tibbles
    return(csv_file_tbl)
}

Example:

test <- download_current_hospital_data()

> test
$ASC_Facility.csv
# A tibble: 10,918 × 36
   facilit…¹ facil…²    npi city  state zip_c…³ year  asc_9…⁴ asc_9…⁵ asc_1…⁶ asc_1…⁷ asc_1…⁸
   <chr>     <chr>    <int> <chr> <chr>   <int> <chr> <chr>     <int> <chr>     <int> <chr>  
 1 THE SURG05C0001.00e9 VACACA      95687 2021  71.67        NA N/A           5 831    
 2 THE SURG05C0001.00e9 VACACA      95687 2022N/A           5 N/A           5 N/A    
 3 CARLSBAD05C0001.00e9 CARLCA      92011 2021  44.26        NA N/A           5 65 

Download `Hospital_Data_Dictionary.pdf`

Allow user to input save_to_dir if desired else, default to NULL and let utils::choose.dir() be executed.

Function:

download_current_hospital_data_dictionary <- function() {
    
    # Create a temporary file to store the zip file
    file_path <- utils::choose.dir()
    destfile <- paste0(file_path, "\\Hospital_Data_Dictionary.pdf")
    
    # Download the zip file to the temporary location
    url <- "https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf"
    utils::download.file(
        url = url,
        destfile = destfile,
        mode = "wb"
    )
    
    # Return the tibbles
    rlang::inform(
        message = paste0(
            "The Hospital Data Dictionary has been downloaded to: ", 
            file_path,
            "\\",
            basename(url)
            ),
        use_cli_format = TRUE
        )
}

Example:

> download_current_hospital_data_dictionary()
trying URL 'https://data.cms.gov/provider-data/sites/default/files/data_dictionaries/hospital/HOSPITAL_Data_Dictionary.pdf'
Content type 'application/pdf' length 1202138 bytes (1.1 MB)
downloaded 1.1 MB

The Hospital Data Dictionary has been downloaded to:
C:\Users\steve\Downloads\HOSPITAL_Data_Dictionary.pdf

Update length check of `current_asc_data()`

Currently:

# Make sure there are no 0 length items
asc_list <- asc_list[lapply(asc_list, length) > 0]

New:

# Make sure there are no 0 length items
asc_list <- asc_list[lengths(asc_list) > 0]

Provider File MetaData

Sample code

> url_meta_data <- paste0('https://data.cms.gov/',
+                     'provider-data/api/1/metastore/',
+                     'schemas/dataset/items')
> 
> response <- httr2::request(url_meta_data) |>
+   httr2::req_perform() |>
+   httr2::resp_body_json(check_type = FALSE,
+                         simplifyVector = TRUE) |>
+   dplyr::select(-bureauCode, -`@type`, -programCode) |>
+   dplyr::mutate(contact_fn = contactPoint$fn) |>
+   dplyr::mutate(contact_email = contactPoint$hasEmail |>
+                   stringr::str_remove("mailto:")) |>
+   dplyr::mutate(publisher_name = publisher$name) |>
+   dplyr::mutate(download_url = distribution[[1]]$downloadURL) |>
+   dplyr::mutate(media_type = distribution[[3]]$mediaType) |>
+   dplyr::select(-contactPoint, -publisher, -distribution) |>
+   dplyr::mutate(dplyr::across(c(issued, modified, released), as.Date)) |>
+   dplyr::tibble() |>
+   janitor::clean_names()
> 
> dplyr::glimpse(response)
Rows: 228
Columns: 16
$ access_level    <chr> "public", "public", "public", "public", "public", "public", "public", "
$ landing_page    <chr> "https://data.cms.gov/provider-data/dataset/46dc-a66c", "https://data.c$ issued          <date> 2021-06-04, 2023-08-17, 2023-08-17, 2022-07-11, 2022-07-11, 2022-07-11$ modified        <date> 2023-02-16, 2023-08-21, 2023-08-21, 2022-07-11, 2022-07-11, 2022-07-11$ released        <date> 2023-03-09, 2023-08-31, 2023-08-31, 2023-09-28, 2023-09-28, 2023-09-28$ keyword         <list> "Clinicians", <"Performance", "Quality", "Measures", "MIPS", "Quality …
$ identifier      <chr> "46dc-a66c", "0ba7-2cb0", "8c70-d353", "3614-1eef", "bdd5-4a04", "4269-$ description     <chr> "This file contains performance information for Merit-Based Incentive P…
$ title           <chr> "PY 2021 Virtual Group Public Reporting: MIPS Measures and Attestations$ theme           <list> "Doctors and clinicians", "Doctors and clinicians", "Doctors and clini…
$ archive_exclude <lgl> FALSE, FALSE, FALSE, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ contact_fn      <chr> "CMS", "CMS", "CMS", "PPL Dataset", "PPL Dataset", "PPL Dataset", "PPL$ contact_email   <chr> "[email protected]", "[email protected]", "[email protected]", "PPL_Dataset@c…
$ publisher_name  <chr> "Centers for Medicare &amp; Medicaid Services (CMS)", "Centers for Medi$ download_url    <chr> "https://data.cms.gov/provider-data/sites/default/files/resources/635d1…
$ media_type      <chr> "text/csv", "text/csv", "text/csv", "text/csv", "text/csv", "text/csv",…

Get Data Distributions

Sample Code:

data_distribtution <- function(.data){
  df <- .data
  
  dplyr::tibble(data = df[["dataset"]][["distribution"]]) |>
    dplyr::mutate(cnm = purrr::map(data, \(x) names(x))) |>
    dplyr::mutate(title_in_col = purrr::map(cnm, \(x) "title" %in% x) |> unlist()) |>
    dplyr::mutate(accessURL_in_col = purrr::map(cnm, \(x) "accessURL" %in% x) |> unlist()) |>
    dplyr::filter(title_in_col == TRUE & accessURL_in_col == TRUE) |>
    dplyr::select(data) |>
    dplyr::mutate(data = purrr::map(
      data, \(x) x |>
        dplyr::select(title, modified, temporal, accessURL, 
                      downloadURL, mediaType) |>
        dplyr::mutate(data_link = dplyr::coalesce(accessURL, downloadURL)) |>
        dplyr::mutate(mediaType = ifelse(is.na(mediaType), "api", mediaType)) |>
        dplyr::mutate(modified = as.Date(modified)) |>
        dplyr::select(-accessURL, -downloadURL) |>
        dplyr::arrange(title, dplyr::desc(modified), mediaType) |>
        tidyr::separate(temporal, into = c("start", "end"), sep = "/",
                        remove = FALSE) |>
        dplyr::mutate(dplyr::across(c(start, end), as.Date)) |>
        dplyr::mutate(data_year = format(start, "%Y") |> as.numeric()) |>
        dplyr::mutate(title = stringr::str_remove_all(title, "[:|-]")) |>
        dplyr::mutate(title = stringr::str_remove_all(title, "[:number:]")) |>
        dplyr::mutate(title = stringr::str_squish(title))
    )) |>
    tidyr::unnest(cols = c(data)) |>
    janitor::clean_names()
}

dataset_distributions <- data_distribtution(data_sets)

Output:

> dataset_distributions |>
+   head(4) |>
+   dplyr::glimpse()
Rows: 4
Columns: 8
$ title      <chr> "Accountable Care Organization Participants", "Accountable Care Organization…
$ modified   <date> 2022-01-27, 2022-01-27, 2022-01-27, 2022-01-27
$ temporal   <chr> "2014-01-01/2014-12-31", "2014-01-01/2014-12-31", "2015-01-01/2015-12-31", …
$ start      <date> 2014-01-01, 2014-01-01, 2015-01-01, 2015-01-01
$ end        <date> 2014-12-31, 2014-12-31, 2015-12-31, 2015-12-31
$ media_type <chr> "api", "application/vnd.ms-excel", "api", "application/vnd.ms-excel"
$ data_link  <chr> "https://data.cms.gov/data-api/v1/dataset/5ebc6246-1861-4d9f-92b4-33c69b315d$ data_year  <dbl> 2014, 2014, 2015, 2015

`current_hosp_data()` update to factory function

use function factory

parse_csv_file <- function(file) {
  gsub("\\\\", "/", file |>
  read.csv(check.names = FALSE) |>
  janitor::clean_names()
}

csv_file_tbl <- lapply(csv_file_list, parse_csv_file)
get_csv_names <- function(file) {
  normalizePath(file, "/") |>
  gsub(pattern = path_remove, replacement = "") |>
  gsub(pattern = "-", replacement = "_")
}

file_names <- lapply(csv_file_list, get_csv_names)

Old

csv_file_list |>
        purrr::map(
            \(file) normalizePath(file, "/") |>
                gsub(pattern = path_remove, replacement = "") |>
                gsub(pattern = "-", replacement = "_")
        )

onAttach message

make a message for library() .onAttach()

Similar to bayesmodels

.onAttach <- function(libname, pkgname) {

bsu_rule_color <- "#2c3e50"
bsu_main_color <- "#1f78b4"

# Check Theme: If Dark, Update Colors
tryCatch({
    if (rstudioapi::isAvailable()) {
        theme <- rstudioapi::getThemeInfo()
        if (is.null(theme)) {
            bsu_rule_color <- "#2c3e50"
            bsu_main_color <- "#1f78b4"
        }
        if (theme$dark) {
            bsu_rule_color <- "#7FD2FF"
            bsu_main_color <- "#18bc9c"
        }
    }
}, error = function(e) {
    bsu_rule_color <- "#2c3e50"
    bsu_main_color <- "#1f78b4"
}, finally = {
    bsu_main <- crayon::make_style(bsu_main_color)
    
    msg <- paste0(
        cli::rule(left = "Welcome to bayesmodels", col = bsu_rule_color, line = 2),
        bsu_main('\nIf you are interested in time series, maybe you would like to check my other packages: garchmodels and boostime\n'),
        bsu_main('</> If you find this package useful, please leave a star: https://github.com/AlbertoAlmuinha/bayesmodels </>')
    )
    
    packageStartupMessage(msg)
})

}

Code review `current_hosp_data()` function

@rjake Here is the current function:

current_hosp_data <- function() {

    # URL for file
    url <- "https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip"
    f_name <- basename(url)

    # Create a temporary file to store the zip file
    tmp <- tempfile()
    tmp_dir <- tempdir()

    # Download the zip file to the temporary location
    utils::download.file(
        url = url,
        destfile = tmp
    )

    # Unzip the file
    utils::unzip(tmp, exdir = tmp_dir)

    # Read the csv files into a list
    csv_file_list <- list.files(
        path = tmp_dir,
        pattern = "\\.csv$",
        full.names = TRUE
    )

    # Process CSV Files
    parse_csv_file <- function(file) {
        # Normalize the path to use C:/path/to/file structure
        normalizePath(file, "/") |>
            # read in the csv file and use check.names = FALSE because some of
            # the names are very long
            utils::read.csv(check.names = FALSE) |>
            # clean the field names
            janitor::clean_names()
        }

    csv_file_tbl <- lapply(csv_file_list, parse_csv_file)

    # Get File Names
    # Get the tmp_dir in normal form C:/path/to/file
    path_remove <- paste0(normalizePath(tmp_dir, "/"),"/")

    # Get csv names function
    get_csv_names <- function(file) {
        # Process the path to normal form
        normalizePath(file, "/") |>
            # remove the tmp_dir from the full file string
            gsub(pattern = path_remove, replacement = "") |>
            # change all - to _
            gsub(pattern = "-", replacement = "_")
        }
    # Get the names
    file_names <- lapply(csv_file_list, get_csv_names)

    # apply the names
    names(csv_file_tbl) <- file_names

    # Make the result a tibble for each file.
    csv_file_tbl <- lapply(csv_file_tbl, dplyr::as_tibble)

    unlink(tmp, recursive = TRUE)

    # Return the tibbles
    # Add and attribute and a class type to the object
    attr(csv_file_tbl, ".list_type") <- "current_hosp_data"
    class(csv_file_tbl) <- c("current_hosp_data", class(csv_file_tbl))

    return(csv_file_tbl)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.