Coder Social home page Coder Social logo

careset / covid_hospital_puf Goto Github PK

View Code? Open in Web Editor NEW
30.0 30.0 9.0 203 KB

The community created FAQ about the hospital-level COVID capacity data.

PHP 100.00%
capacity-data covid covid-response facility-level hhs hospital-capacity hospital-pk hospital-reporting icu-beds influenza-data

covid_hospital_puf's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid_hospital_puf's Issues

this needs to be added to the FAQ

Thank you for your interest in HHS Open Data. One note -- Patients should not be discouraged from seeking hospital care based on their interpretation of the data. Hospitals have protocols in place to keep patients safe from exposure and to ensure all patients are prioritized for care.

  • HealthData.gov Team

Duplicate descriptions in data dictionary

Hi, thank you for all of your work on this dataset and documentation, and I hope this is the right place to post about this issue.

I have been working with this dataset and noticed a few places where the data dictionary gives duplicate descriptions for different variables. I see that they point to different areas of the FAQ document, but it would be helpful to have an accurate description.

The variable pairs I noticed this for are:
all_adult_hospital_beds_7_day_sum + all_adult_hospital_inpatient_beds_7_day_sum
total_staffed_adult_icu_beds_7_day_avg +icu_beds_used_7_day_avg
previous_day_admission_adult_covid_confirmed_70-79_7_day_sum + previous_day_admission_adult_covid_suspected_70-79_7_day_sum
icu_beds_used_7_day_sum + total_icu_beds_7_day_sum

Document age-stratified data missing in state aggregated files

The facility level file includes variables for the count of admissions stratified by age, e.g. previous_day_admission_adult_covid_confirmed_30-39_7_day_sum. Because data is only shown in the public files when cell size is 4 or greater, these variables are often redacted..

This data would be much more useful aggregated by state. But the state level files omit these variables entirely. I've not seen any explanation for this. This is true for the state level files by day or week.

Obfuscation of true zeros

The HHS dataset is such a nice resource along with the FAQ here. However, I was wondering how obfuscation is applied. Is obfuscation only applied for counts 1-3? Is it possible that counts of 0 are ever obfuscated? Do sites themselves determine what should be considered obfuscated, or does this process happen automatically once data is aggregated?

Thank you - Meg

Document errors in OR / WA suspected pediatric admissions, state aggregate files

The previous_day_admission_pediatric_covid_suspected column in the state-aggregated version of the hospital file ("COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries", available here ) has an apparent error in Oregon and Washington state. See below, it appears that values after Oct. 19 are off.

HHS was notified of this error Mar. 2 but has not been fixed as of today.

image

add this question

Its seems like there are missing hospitals in the data
?


Hospital reporting is captured at the hospital level but is rolled up to the CCN level. In most cases the CCN = one single hospital. But there are cases where multiple hospitals are rolled up into one CCN.

Interpreting percentages over 100%

Hey Fred,

Thank you so much for your help so far. I've been doing some analysis using the following formula from your FAQ:

How full is the hospital with adult confirmed and suspected COVID patients?
Formula:
total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg/ all_adult_hospital_inpatient_beds_7_day_avg

And I noticed that there were some Texas hospitals that are reporting over 100% capacity over the whole data set. There was even one hospital that has been reporting over 600% capacity. I was wondering how you would interpret that, if these hospitals are tremendously over capacity or if these may be reporting errors. I noticed that a lot of the big percentages over 100% happened during the July 31 week which was the first week of data which may be a time when the hospitals were still getting used to reporting the data. However, this was also a time when we had peak COVID hospitalizations in Texas. That's why I'm not sure if I can completely write this off as data reporting error.

In addition, there is also one hospital where in one week the all_adult_hospital_inpatient_beds_7_day_avg number goes from 161.1 to 18.3 in the next week and that number has continued to get lower and lower with the latest number (in the 11/27/2020 collection week) being 6. It reminded me a bit of the staffed beds conversations we had in #5 issue. Would this be an example of the hospital suffering a staffing shortage these last couple of weeks? Thank you again!

Tremendous number of data anomalies - be careful presenting this data.

I'm attempting to make use of this data to provide communities with information about their hospital capacity and COVID rates. However, this dataset is absolutely rife with anomalies. I cannot find one pair of metrics that I can compare across the dataset that does not result in nonsense data.

More specifically, data that is supposed to be a subset of some other data is often GREATER than the thing it is supposed to be a subset of. For example, there are over 1,000 records for which the all_adult_hospital_inpatient_beds_7_day (3.b below) is greater than the inpatient_beds_7_day (3.a). Since adult inpatient beds are a subset of the inpatient beds, that should not be possible. In some cases it is a rounding type error (off by 1 bed +-) and in other cases the adult inpatient beds was specified but not the inpatient beds. However, in most cases the data simply does not make sense (IE 159 adult inpatient beds and 132.7 total inpatient beds). In those cases I looked at the other values, to see if the facility was full beyond capacity, or had high number of COVID patients, but that was not the case.

Note that those values are not even census (IE actual patient counts) but the raw BED counts, regardless of whether or not the beds are occupied. Why are hospitals reporting data as nonsensical as that (we physically have more adult inpatient beds in the hospital than we have all beds total)?

From the document provided to hospitals providing guidance on reporting data (https://www.hhs.gov/sites/default/files/covid-19-faqs-hospitals-hospital-laboratory-acute-care-facility-data-reporting.pdf):

2-a) All hospital beds
Total number of all staffed inpatient and outpatient beds in your hospital, including all overflow, observation, and active surge/expansion beds used for inpatients and for outpatients (includes all ICU, ED, and observation).
Subset:
2-b) All adult hospital beds
Total number of all staffed inpatient and outpatient adult beds in your hospital, including all overflow and active surge/expansion beds for inpatients and for outpatients (includes all ICU, ED, and observation)

3-a) All hospital inpatient beds
Total number of staffed inpatient beds in your hospital including all overflow, observation, and active surge/expansion beds used for inpatients (includes all ICU beds). This is a subset of #2.
Subset:
3-b) Adult hospital inpatient beds
Total number of staffed inpatient adult beds in your hospital including all overflow, observation, and active surge/expansion beds used for inpatients (includes all designated ICU beds). This is also a subset of #2

There are thousands of records for every one of those subsets where the subset is greater than the parent set(s).

It is apparent that there is zero data validation during the data entry of this information (IE "Your total adult inpatient bed count cannot be greater than the total inpatient bed count for your facility"), and that there is serious confusion and varied interpretation by the staff providing this data as to what the data is supposed to mean. I have a hunch that some facilities are reporting the total daily inpatients as the total daily bed occupancy, which is NOT the same thing. A single bed can be occupied by more than one patient in a day, even if there are other beds or entire floors in the facility that are not being used. Thus it drastically inflates how full the hospital was since all those patients were not occupying beds at the same time.

I don't see how this data can be used with being vetted manually per-record by a person (not realistic for those of us that consume this data), or simply passing all these obvious inaccuracies along to the general public in our data presentation (garbage in, garbage out).

Has the set not updated this week?

Hi,

Looking at the file, it seems like it didn't update this week (I'm seeing the last week as 1/1, when it should be 1/8). Is anyone else having this problem?

Thanks,

Jake

Question about ICU bed usage during COVID spike

I'm not sure if this is the right forum for this discussion, but I'm curious about my analysis of this data and how much sense it makes.

In short, I calculated an ICU bed usage ratio using the suggested formula (staffed_adult_icu_bed_occupancy_7_day_avg/ total_staffed_adult_icu_beds_7_day_avg) at the state level after correcting for missing values, etc.

I paired that data with weekly new COVID cases at the state level as reported by the NYTimes.

I then plotted the two metrics together and, surprisingly (?), there is essentially no correlation between ICU bed usage and new COVID cases. One might try argue that the increase in new cases is simply the result of more testing but, as we all know, deaths are dramatically higher now, as well.

So my questions are, is my analysis wrong in some way? Is the data wrong? Do hospitals try to limit the percentage of ICU beds that are used to ~80-90% so that they can handle non-COVID emergencies (just speculating here)?

Red line = new covid cases per week
Blue line = ICU bed use ratio

image

verify that this twitter question is answered...

2/3 for example deaf smith county texas shows 93% of beds taken by covid but only 43% of beds used. Harris county shows a hospital at 630% of capacity

Response: We suspect these could be hospital reporting errors. For example in Deaf Smith County the underlying data shows the only hospital reporting total hospitalizations (11) < COVID hospitalizations (24). Not much we can do about that.

Adding reverse geo-encoders to the hospital dataset

While placekey information has been added for each dataset.

Does it make sense to add a reverse geo-encoding to each hospital. Basically this will add Latitude & longitude information to the hospital dataset. This can help in identifying the hospital location and reaching out to them (??)

Archiving historical data?

Hi again,

One more question from me. From what I got from the Readme is that this data is set to be released weekly. Will the historical data provided in previous weeks be stored or archived anywhere for easy download or will the csv flat file be overridden with the latest weekly data every week? Thank you!

Take care,
Carla

Staffing levels?

Hi,

I'm a data reporter at the Texas Tribune. Thank you so much for making this available to us. And that Readme is really helpful. I have a question:

Every day most hospitals in the U.S. are required to report information on the following topics to the federal government:

Hospital capacity, including information on ICU capacity and available ventilators
Staffing levels, including any shortages
How many patients are coming into the hospital with confirmed or suspected COVID-19 cases
Many other relevant details that public health officials need to properly coordinate COVID responses

I'm mostly curious about the staffing levels, and when I saw the column names, I didn't see one that alluded to staffing levels or shortages. Is that not included in this initial dataset or is that something we can calculate using the existing columns in the dataset. Thank you so much!

Take care,
Carla

What does the term "inpatient" represent in the data?

I'm trying to figure out what it means when the data dictionary references inpatient beds. Arizona's Department of Health has three types of beds: Inpatient (general population), ICU, and ED beds.

Sometimes, the data dictionary clearly states whether or not ICU or ED beds are included in that field. Other times, it doesn't. What type of beds are used when the term "inpatient" is used.

For example, in the Facility COVID PUF Community FAQ, one of your recommended calculations is for How full is the hospital with adult confirmed and suspected COVID patients?

The formula is total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg/ all_adult_hospital_inpatient_beds_7_day_avg.

That takes --

COL Q: Average number of patients currently hospitalized in an adult inpatient bed who have laboratory-confirmed or suspected COVID19, including those in observation beds reported during the 7-day period. (No mention of ICU beds)

Divided by

COL N: Average of total number of staffed inpatient adult beds in the hospital including all overflow and active surge/expansion beds used for inpatients (including all designated ICU beds) reported during the 7-day period.

So if, ICU beds are not included whenever the term inpatient is used, then the above formula is flawed because Col Q doesn't include ICU beds. If ICU and ED beds are included anytime the inpatient term is used, then life just got a whole lot easier.

Thanks so much for your help in advance.

Tracking hospital-acquired covid transmission rates

Posting this here after pickup over in CoronaWhy community, from a Chicago collaborator:

"Good morning. I’m currently rotating in internal medicine at a hospital in Chicago that primarily serves an urban poor demographic. We are looking for info on hospital-acquired covid transmission rates across the country to standardize against."

Within the CareSet data structures, how are facilities tracking hospital acquired transmission? Can they? At face glance, seems a simple enough question, but given the variables involved in an aerosolised virus/long asymptomatic incubation, would be difficult to work back through contact tracing and exposure events. Interested to hear if this is factored into dataset design, know the data model is likely evolving to support iatrogenic queries like this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.