pascalcrepey / hospitalnetwork Goto Github PK

View Code? Open in Web Editor NEW

7.0 5.0 7.0 3.82 MB

Building networks of hospitals through patients transfers

Home Page: https://pascalcrepey.github.io/HospitalNetwork/

License: GNU General Public License v3.0

R 100.00%

hospital-networks patients-transfers patient-database

hospitalnetwork's Introduction

HospitalNetwork

Building networks of hospitals through patients transfers

This R package contains functions to help interested researchers to build hospital networks from data on hospitalized patients transferred between hospitals.

The aim of the project is to provide a common framework to build and analyze hospital networks.

This project is partly supported by the NeWIS (NetWorks to Improve Surveillance) initiative, funded by JPIAMR, and by Sphinx project, funded by ANR.

Step 0: Installing the package:

You can install the release version of this package from CRAN as follows:

install.packages("HospitalNetwork")

Or you can install the development version from GitHub:

The package devtools needs to be installed first

install.packages("devtools")
library(“devtools”)

Then install the package from GitHub. Update or install all the required packages.

install_github("PascalCrepey/HospitalNetwork@*release")

This command will install the latest "released" version of the package.

Step 1: Checking the consistency of the database

The function checkBase()should be run first, and the resulting checked/repaired database should be used in the following step. The function checks if :

all required variables are present,
any records contain missing values,
identifiers are in the correct format (character),
admission and discharge dates are in date format, and if not, convert them into dates,
discharges happened at same or later day as admission
if any hospital stays overlap, and correct if so.

The minimal way of running this is: checkBase(base) where base is the patient admission database. It takes the following parameters to adjust to the database in question: (default values are indicated in bold characters)

deleteErrors = “subject” or “record”: how to take care of missing or erroneous records. Delete just the record with an error, or delete all records of the patient with one or more erroneous records.
convertDates = TRUE/FALSE: whether the dates should converted,
dateFormat = The format of date as a character string (e.g. %y%m%d for 20190524, or %d-%m-%y for 24-05-2019)
subjectID = the name of the column/variable containing the subject (i.e. patient) identifier
facilityID = as above, for facility (i.e. hospital) identifier
disDate = as above, for discharge date
admDate = as above, for admissions date
maxIteration = the maximum number of times the script runs through the database to correct for overlapping admissions. Ideally set to more than the required number of times.
verbose = TRUE/FALSE: if the script prints out what it is doing.

Step 2: Reconstructing the network

The best way the reconstruct the hospital network is creating a HospiNet object from the patient database. This object also allows for easy calculation of the network metrics as well as plotting and printing of results.

hospinet_from_subject_database(checkedBase)

This function has a number of similar input parameters as the previous: subjectID, facilityID, disDate, admDate, verbose. Next to that, the following parameters can also be input:

noloops = TRUE/FALSE, indicating if movements/transfers to the same hospital as discharge should be included. These self-referrals are not necessary for the analysis of the hospitals network.
window_threshold = the number of days allowed between discharge and next admission to be counted as a movement or transfer. Suggest to set at 365 for full year, and use 0,7,30,182, and 365 for the final analysis
nmoves_threshold = Any edge/link between hospital with fewer or equal to this number of patients will be removed. Suggest to keep at default NULL or 0.
create_MetricsTable = TRUE/FALSE, indicates if the network metrics need to be immediately calculated, or only when called for.
count_option = "successive" or “all”: the way movements are counted, whether a sequence of admissions from hospital A to B to C are count as moves A→B and B→C (Successive) or A→B, B→C , and A→C (All)
condition = "dates", "flags", or "both". Whether a move is counted based on the time difference between two stays (dates), or a variable indicating if the patient was directly transferred (flags), or both.

Step 3: Exporting and plotting results.

The result of the reconstruction and analysis can be easily saved as an RDS file, using

saveRDS(HospiNet, "my_filename.RDS")

since they are all stored in the HospiNet object. This object does not include the raw database, just the edge list (which is basically the same as the contact matrix), the various network metrics for each hospital, and metrics on the size of the used database (number of patients, admissions, hospitals, etc.).

Currently, the reconstructed network can be plotted as a matrix using plot(HospiNet) or plot(HospiNet, type=”matrix”). This can also be done as a clustered matrix: plot(HospiNet, type=”clustered_matrix”). In addition, you can also visualize the degree distribution of the nodes in the network with plot(HospiNet, type = “degree”). We will try to include easy ways to plot the network in other ways as well.

Example of use:

install.packages("devtools") 	# install.packages only need to be run once 
# and can be commented after use
library(devtools)		# load the library allowing the HospitalNetwork package 
# download and installation

install_github("PascalCrepey/HospitalNetwork@*release") # can be commented once it is installed

library(HospitalNetwork)	# load the HospitalNetwork library


# Here, we create a dummy database for testing purposes,
# final users can directly use their own database. This one looks like: 
#       sID fID      Adate      Ddate
#  1: s001 f09 2019-02-19 2019-02-26
#  2: s001 f10 2019-03-27 2019-03-31
#  3: s001 f09 2019-04-22 2019-04-25
#  4: s002 f08 2019-01-15 2019-01-20
#  5: s003 f11 2019-02-14 2019-02-19
#  ---                               
# 228: s098 f01 2019-02-08 2019-02-12
mydb = create_fake_subjectDB(n_subjects = 1000, n_facilities = 100)

# checking the database
mydb_checked = checkBase(mydb)

# building the hospital network in a HospiNet object
my_hosp_net = hospinet_from_subject_database(mydb_checked)

# plot the network as a "contact matrix"
plot(my_hosp_net)
#plot the network as a "contact matrix" ordered by clusters (if any)
plot(my_hosp_net, type = "clustered_matrix")
# plot the degree (number of neighbors) distribution of hospitals in the network
plot(my_hosp_net, type = "degree")

# save the network (not the original database)
saveRDS(my_hosp_net, file = "my_hosp_net.RDS")

hospitalnetwork's People

Contributors

Stargazers

Watchers

Forkers

clementmassonnaud tjibbed mikelydeamore mufflyt larinnajim dicook

hospitalnetwork's Issues

Fake data

We need a fake dataset corresponding to the description given in the vignette. Our functions will be tested against this fake dataset to ensure their compliance.

Features you want to see in the package

Dear all,
obviously we do not want the package to do everything, but being able to do few relatively simple things would be nice. So I think we need to define a feature set. For the moment, what we've implemented can be split into 4 parts :

Validate and fix data
- check basic data structure format
- check and fix dates format
- check and fix data consistency regarding overlapping stays
Perform basic analysis and visualization
- matrix visualization
- degree histograms
- cluster identification
Interface with other tools
- matrix format
- edgelist format
- igraph format
Simulate real data
- generate fake patient stays
- generate fake patient stays with hospital clusters

Now, what else do you want to see in the package ?
(I do have some ideas but I'll share them in comments... ;-)

How overlapping stays are adjusted

I have a question regarding overlapping stays

@tjibbed I didn't go through your code into detail, can you precise what type of overlapping stays we want to look at, and how do we adjust for them. So that I can explain it in the vignettes

I think there are two kinds of overlapping that could occur:

[A.......{B.........A].........B}

t1____t2____t3_____t4

[A.......{B.........B}.........A]

Is that correct? Am I forgetting something?

As for how to deal with them, here's how I see it:

For 1: either
- [A.....A]{B....................B}
- t1____t2____t3_____t4
- [A.................A]{B........B}
For 2: either
- [A......A]{B.....B}[A.......A]
- t1____t2____t3_____t4

What are we doing for now? Am I forgetting something?
Thanks!

FR: function adding GPS coordinates to HospiNet object

I would be nice to be able to add GPS coordinates to HospiNet.
The function needs to:

check that the coordinates are valid
check that there is a match for each hospital ID in the database
add the GPS data in normalized column names in the main database (it's better than keeping it separately if one wants to split the network or be able to compute geographic distance of transfers.

Checking base during edgelist_from_base()

There is a bit of code in NetworkBuilding.R that has some adverse effects:

## Checking base message("Checking base...") base = try({ checkBase(base) }) if (class(base)[[1]] == "try-error") { stop("Cannot compute the network: the database is not correctly formated or contains errors. The database must first be checked with the function 'checkBase()'. See the vignettes for more details on the workflow of the package.") }

I think our workflow is designed to allow checkBase() to be performed separately, to allow the users to think about what is wrong in or with the database. However, the above code forces the complete function to be run as a first step, even if the input data has already been checked.

To me there seem to be three options here:
1- Don't check the database in this function (which creates the possibility that the user will pass an unchecked database to the function, with all potential problems that it might create).
2- Create a input flag for checking the database. (No guarantee that the above will not happen, but by actively having to set it to FALSE the user at least has to think about it).
3- Leave it as it is. This can be rather slow, as the database checking does take quite a bit of time for larger databases. On the other hand, it does create a fully integrated function to move from raw data to network in one go... The question is whether this is what we want.

I'd prefer option 1 or 2, but like to hear you're thoughts before adjusting the code.

Complete documentation of functions

Some "new" parameters are not described and some functions would need example... I set it as a milestone for the next release...

Error when column name equals variable name

If I call the column containing the patient identifier "patientID", I get the following error message:

Error in checkFormat(new_base, patientID = patientID, hospitalID = hospitalID, :
The following column(s) is/are missing: pID. Please check that your database contains at least the columns mentionned in the documentation, and that they are in the right format. Names are case sensitive.

With any other column name, the script runs just fine.

The same happens for the other three required variables. e.g. if hospitalID="hospitalID". For some reason, R gets confused if the variable name is the same as the column name.

I think there are two general solutions to the problem:

The proper route: try and find out how this referencing of names produces this error and see if there is a better way of dealing with this (e.g. using ..patientID [Tried this, but didn't solve all of the problems])
Hard-wire a check to see if variable names and column names are the same, and rename them if so.

new implementation of overlap management

As @ClementMassonnaud said, we need to look at the new version of the adjust_overlapping_stays() function proposed with commit #50200db

"[...] the two versions give the same results on all the tests I wrote. Considering the current version though, I'm not sure to understand the purpose of the iterator and how it works. Because I created a fake database with all the types of overlapping stays I could think of, with overlapping stays nested multiple times, etc. And it seems that the function adjusts for those stays correctly with only one pass in the while loop... Maybe you can take a look at the tests and see if I'm missing something.

Anyway, the new function works quite similarly to the current one, but with a slight change that allows to adjust partial overlaps with either admission or discharge leading. Also, it's using a bit more of the useful data.table syntax which makes the function shorter.

If the new function is not missing anything important that I might have missed, I would propose to use it as a default, mainly for the two advantages I mentioned."

a shiny ui frontend to the package

This would be a nice addition to the package and would certainly help some users to manage their patient database...

Unit test scripts

We need to develop a set of test scripts using testthat to unsure that our basecode works whatever the evolution we produce.
For that, we need a small "fake" dataset with expected output to compare the expected output to the results of our scripts.

Include (flagged) direct transfers in edgelist_from_patient_database

The edgelist_from_patient_database function now uses the time between discharge and admission to determine whether if the two records (n & n+1) constitute a transfer. This is based on the minimal dataset as defined in the NeWIS project (with just variables pID, hID, Adate, and Ddate).

However, some datasets may have additional variable(s) that flag whether or not the admission (or discharge) is part of a direct transfer. I believe that in the French database these are mode_entree and mode_sortie.

The condition that two admissions are flagged as a direct transfer should be included in the function, but only conditional on if the variables exist, and if it is part of the -at that moment- used definition.

FR: add fake GPS coordinates in fake database generation function

For testing purposes, we would need the fake data generator to also produce fake GPS.
As a first step we can just ensure that they are "valid", in a second step, choosing the country could be nice. In particular if we want to test the visnetwork UI on the shiny interface.

Hospitals with no transfers get silently dropped

Not sure if this qualifies as a bug or not, but if there are hospitals present in the database that have admissions but no transfers, they are silently dropped from the transfer matrix. I'd expect a row (and column) of zeroes by default so that the same nodes are present in all elements of the hospinet object.

S3 object HospiNet

We need to build an S3 object, let's call it HospiNet, which will contain the hospital matrix and relevant indicators (see vignette for details). This object should have a dedicated summary and print method.

Columns for checking duplicated entries

In the function adjust_overlapping_stays, the call for unique(base) takes over half of the function call time if there is a lot of auxiliary data. Is the intention to check for duplicates using the aux data as well? If not, adding a by=key(base) cuts the run time significantly:

Unit: seconds
                             expr     min      lq       mean     median   uq       max         neval
                 {     unique(base) } 30.65674 30.85987 31.83005 31.49651 32.37094 33.96304    10
 {     unique(base, by = key(base)) } 10.43815 10.70215 12.75402 13.49867 14.22774 15.69120    10

Shiny app

I uploaded the code for the shiny app

The app can be launched by calling the function shiny_app()

This version of the shiny app is from a few weeks ago, I had some troubles integrating it with the package. Maybe there is a better to do it still.
So the modifications / new features committed in the last few weeks might not be implemented in the app for now.

Anyway, I am looking forward to your feedback. Is everything working ok? The features you'd like to see added to the app, remarks about the design, the structure, the workflow, etc.

Summary statistics in HospiNet object are actually computed from the original base

Some of the statistics in HospiNet object are actually computed from the original base. They are computed from the base with the functions all_admissions_summary() and per_facility_summary() and then passed to the object.

For me, this is an issue. For instance, if accessing the number of facilities with net$numFacilities we actually get the number of facilities in the original base, not in the constructed network, which is what I would expect (actually this is computed by net$n_facilities). The same applies for the rest of the summary statistics

I would suggest to make a clear distinction between the statistics computed from the network and from the database. I think that only the statistics related to the network should be in the HospiNet object. The statistics related to the database can be computed using the functions externally.

circular network not working when only one cluster is present in the network

It's not that urgent but I do not see any reason why this plot should not work when no more than 1 cluster is present. It surely is less informative but still it should work...