Coder Social home page Coder Social logo

pascalcrepey / hospitalnetwork Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 7.0 3.82 MB

Building networks of hospitals through patients transfers

Home Page: https://pascalcrepey.github.io/HospitalNetwork/

License: GNU General Public License v3.0

R 100.00%
hospital-networks patients-transfers patient-database

hospitalnetwork's Issues

FR: add fake GPS coordinates in fake database generation function

For testing purposes, we would need the fake data generator to also produce fake GPS.
As a first step we can just ensure that they are "valid", in a second step, choosing the country could be nice. In particular if we want to test the visnetwork UI on the shiny interface.

Duplicate nodes in metrics table

When creating a HospiNet object using edgelists [i.e. HospiNet$new(edgelist, edgelist_long, window_threshold, nmoves_threshold, noloops) ], the resulting object contains a $metricsTable that has more rows than the number of nodes. For a large number of original nodes in this table, it also contain duplicated records with a lot of NA values except for hub_scores_global and hub_score_by_cluster_fast_greedy.

I suspect a merging step going wrong.

Interestingly, the function getMetricsTable() produces a metrics table that doesn't contain these NA duplicates.

Error when column name equals variable name

If I call the column containing the patient identifier "patientID", I get the following error message:

Error in checkFormat(new_base, patientID = patientID, hospitalID = hospitalID, :
The following column(s) is/are missing: pID. Please check that your database contains at least the columns mentionned in the documentation, and that they are in the right format. Names are case sensitive.

With any other column name, the script runs just fine.

The same happens for the other three required variables. e.g. if hospitalID="hospitalID". For some reason, R gets confused if the variable name is the same as the column name.

I think there are two general solutions to the problem:

  1. The proper route: try and find out how this referencing of names produces this error and see if there is a better way of dealing with this (e.g. using ..patientID [Tried this, but didn't solve all of the problems])
  2. Hard-wire a check to see if variable names and column names are the same, and rename them if so.

Shiny app

I uploaded the code for the shiny app

The app can be launched by calling the function shiny_app()

This version of the shiny app is from a few weeks ago, I had some troubles integrating it with the package. Maybe there is a better to do it still.
So the modifications / new features committed in the last few weeks might not be implemented in the app for now.

Anyway, I am looking forward to your feedback. Is everything working ok? The features you'd like to see added to the app, remarks about the design, the structure, the workflow, etc.

Columns for checking duplicated entries

In the function adjust_overlapping_stays, the call for unique(base) takes over half of the function call time if there is a lot of auxiliary data. Is the intention to check for duplicates using the aux data as well? If not, adding a by=key(base) cuts the run time significantly:

Unit: seconds
                             expr     min      lq       mean     median   uq       max         neval
                 {     unique(base) } 30.65674 30.85987 31.83005 31.49651 32.37094 33.96304    10
 {     unique(base, by = key(base)) } 10.43815 10.70215 12.75402 13.49867 14.22774 15.69120    10

Hospitals with no transfers get silently dropped

Not sure if this qualifies as a bug or not, but if there are hospitals present in the database that have admissions but no transfers, they are silently dropped from the transfer matrix. I'd expect a row (and column) of zeroes by default so that the same nodes are present in all elements of the hospinet object.

Unit test scripts

We need to develop a set of test scripts using testthat to unsure that our basecode works whatever the evolution we produce.
For that, we need a small "fake" dataset with expected output to compare the expected output to the results of our scripts.

FR: function adding GPS coordinates to HospiNet object

I would be nice to be able to add GPS coordinates to HospiNet.
The function needs to:

  • check that the coordinates are valid

  • check that there is a match for each hospital ID in the database

  • add the GPS data in normalized column names in the main database (it's better than keeping it separately if one wants to split the network or be able to compute geographic distance of transfers.

Features you want to see in the package

Dear all,
obviously we do not want the package to do everything, but being able to do few relatively simple things would be nice. So I think we need to define a feature set. For the moment, what we've implemented can be split into 4 parts :

  1. Validate and fix data
    • check basic data structure format
    • check and fix dates format
    • check and fix data consistency regarding overlapping stays
  2. Perform basic analysis and visualization
    • matrix visualization
    • degree histograms
    • cluster identification
  3. Interface with other tools
    • matrix format
    • edgelist format
    • igraph format
  4. Simulate real data
    • generate fake patient stays
    • generate fake patient stays with hospital clusters

Now, what else do you want to see in the package ?
(I do have some ideas but I'll share them in comments... ;-)

new implementation of overlap management

As @ClementMassonnaud said, we need to look at the new version of the adjust_overlapping_stays() function proposed with commit #50200db

"[...] the two versions give the same results on all the tests I wrote. Considering the current version though, I'm not sure to understand the purpose of the iterator and how it works. Because I created a fake database with all the types of overlapping stays I could think of, with overlapping stays nested multiple times, etc. And it seems that the function adjusts for those stays correctly with only one pass in the while loop... Maybe you can take a look at the tests and see if I'm missing something.

Anyway, the new function works quite similarly to the current one, but with a slight change that allows to adjust partial overlaps with either admission or discharge leading. Also, it's using a bit more of the useful data.table syntax which makes the function shorter.

If the new function is not missing anything important that I might have missed, I would propose to use it as a default, mainly for the two advantages I mentioned."

Include (flagged) direct transfers in edgelist_from_patient_database

The edgelist_from_patient_database function now uses the time between discharge and admission to determine whether if the two records (n & n+1) constitute a transfer. This is based on the minimal dataset as defined in the NeWIS project (with just variables pID, hID, Adate, and Ddate).

However, some datasets may have additional variable(s) that flag whether or not the admission (or discharge) is part of a direct transfer. I believe that in the French database these are mode_entree and mode_sortie.

The condition that two admissions are flagged as a direct transfer should be included in the function, but only conditional on if the variables exist, and if it is part of the -at that moment- used definition.

Backward compatibility

As part of a dashboard development I noticed the following:

RDS files created using older versions of the package don't allow for the node and ring colouring in the circular network plot, even when using the newer version of the package. This because the plotting function is stored in the HospiNet object, and the older plotting function doesn't recognise the extra arguments.

To solve this, I propose creating a version of the $new() function that just takes the entire object, copies the network information into the current HospiNet object, that does contain the current plotting function.

As an additions, I'd like to propose adding an attribute to the object that contains the version number, so any application using the package can check if the input object was created using the most recent version.

S3 object HospiNet

We need to build an S3 object, let's call it HospiNet, which will contain the hospital matrix and relevant indicators (see vignette for details). This object should have a dedicated summary and print method.

How overlapping stays are adjusted

I have a question regarding overlapping stays

@tjibbed I didn't go through your code into detail, can you precise what type of overlapping stays we want to look at, and how do we adjust for them. So that I can explain it in the vignettes

I think there are two kinds of overlapping that could occur:

  1. [A.......{B.........A].........B}
  • t1____t2____t3_____t4
  1. [A.......{B.........B}.........A]

Is that correct? Am I forgetting something?

As for how to deal with them, here's how I see it:

  • For 1: either
    • [A.....A]{B....................B}
    • t1____t2____t3_____t4
    • [A.................A]{B........B}
  • For 2: either
    • [A......A]{B.....B}[A.......A]
    • t1____t2____t3_____t4

What are we doing for now? Am I forgetting something?
Thanks!

Checking base during edgelist_from_base()

There is a bit of code in NetworkBuilding.R that has some adverse effects:

## Checking base message("Checking base...") base = try({ checkBase(base) }) if (class(base)[[1]] == "try-error") { stop("Cannot compute the network: the database is not correctly formated or contains errors. The database must first be checked with the function 'checkBase()'. See the vignettes for more details on the workflow of the package.") }

I think our workflow is designed to allow checkBase() to be performed separately, to allow the users to think about what is wrong in or with the database. However, the above code forces the complete function to be run as a first step, even if the input data has already been checked.

To me there seem to be three options here:
1- Don't check the database in this function (which creates the possibility that the user will pass an unchecked database to the function, with all potential problems that it might create).
2- Create a input flag for checking the database. (No guarantee that the above will not happen, but by actively having to set it to FALSE the user at least has to think about it).
3- Leave it as it is. This can be rather slow, as the database checking does take quite a bit of time for larger databases. On the other hand, it does create a fully integrated function to move from raw data to network in one go... The question is whether this is what we want.

I'd prefer option 1 or 2, but like to hear you're thoughts before adjusting the code.

Summary statistics in HospiNet object are actually computed from the original base

Some of the statistics in HospiNet object are actually computed from the original base. They are computed from the base with the functions all_admissions_summary() and per_facility_summary() and then passed to the object.

For me, this is an issue. For instance, if accessing the number of facilities with net$numFacilities we actually get the number of facilities in the original base, not in the constructed network, which is what I would expect (actually this is computed by net$n_facilities). The same applies for the rest of the summary statistics

I would suggest to make a clear distinction between the statistics computed from the network and from the database. I think that only the statistics related to the network should be in the HospiNet object. The statistics related to the database can be computed using the functions externally.

Fake data

We need a fake dataset corresponding to the description given in the vignette. Our functions will be tested against this fake dataset to ensure their compliance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.