Comments (5)
Hi @ClementMassonnaud ,
I'm currently checking your new implementation of the overlapping admission algorithm. (I am using a subset of a real dataset, so no simulated data at this moment). I've run into a couple of issues that might need our attention:
- It's quite slow. The original method takes:
user system elapsed
1.413 0.000 0.751
whereas the new implementation times at:
user system elapsed
84.214 0.039 83.084
I think this is mainly die to the "per subject" calculation of overlaps, although I haven't checked this. However, I used to use the by=sID method before switching to the full list method. @PascalCrepey taught me this, and I guess you know it too. It's quite fast... basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?
over = base[, list("left" = .SD[, Adate][-1] - .SD[, Ddate][-.N], "right" = .SD[, Ddate][-1] - .SD[, Ddate][-.N], "I" = .I[-1]), by = sID]
- The checked database isn't final. It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different.
This can be caused by staggered hospital stays or sequential overlaps, e.g.
Facility A: |--------------------------------------------|
Facility B: |----| |----| |----|
Because overlap detection is based on successive records, sorted on patient, admission, and discharge, the algorithm will only detect and correct the first instance, leaving two overlaps:
Facility A: |--------| |------------------------------|
Facility B: |----| |----| |----|
The next run, the following overlap is detected and corrected:
Facility A: |--------| |-------| |-------------------|
Facility B: |----| |----| |----|
Etc.
This is the original method uses the iterations, and I think this could be done in the new method as well.
-
Some of the newly create records have negative lengths of stay. This not a big issue, as they can easily be filtered out.
-
If I run the algorithm until it gives the final answer, the answer is not the same as the original method. The new method gives far more records. I am currently going through the database to see what the correct answer should be. As it is real data, this does take some time, but I think the extra "dirt" in the real data may help us find the potential issues here.
We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.
- Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.
from hospitalnetwork.
My "beautiful" schematic of overlapping admission didn't align there... But I think you get the idea.
from hospitalnetwork.
Hi,
Thank you for your feedback
It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different. This can be caused by staggered hospital stays or sequential overlaps, e.g.
I tried to think about the different types of possible overlaps, but I missed this type. I now see what is the issue... I usually try to avoid 'while' loops as much as possible, but here it seems we don't have much choice indeed
We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.
Yes, I think it is a key point, to make sure the function deals correctly with every possible scenario. I wrote tests in tests/testthat/test-adjust_overlaps.R
. I manually created a fake database with all the specific overlaps I could think of, listing them, and anticipating what the correct result should be. I think that it would be a good idea to use the other types of overlaps from a real database, as you suggested, and then 'merge' them with the current test database to complete the tests.
Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.
I'm not sure I understand what you mean by auxiliary data. If by that you mean additional columns in the database, I don't understand where is the issue
basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?
Yes this is interesting, I missed that. I don't see why it would not be possible here as well. I am currently on the road, I will look into it ASAP.
from hospitalnetwork.
It seems to me we need to remain on @tjibbed original implementation, with the loop. Am I right ? does it need further optimization ? Can we merge @ClementMassonnaud data.table optimization into @tjibbed implementation ? or should we simply close this issue ?
from hospitalnetwork.
Yes we do need the loop I think
But I think we should also implement the optimizations I proposed.
I'll merge the two versions and close this issue ASAP
from hospitalnetwork.
Related Issues (18)
- Unit test scripts
- Checking base during edgelist_from_base() HOT 15
- Complete documentation of functions
- Summary statistics in HospiNet object are actually computed from the original base HOT 6
- Shiny app HOT 3
- Columns for checking duplicated entries HOT 2
- S3 object HospiNet HOT 10
- FR: function adding GPS coordinates to HospiNet object
- FR: add fake GPS coordinates in fake database generation function
- circular network not working when only one cluster is present in the network
- Hospitals with no transfers get silently dropped HOT 1
- Fake data HOT 4
- Include (flagged) direct transfers in edgelist_from_patient_database
- Features you want to see in the package HOT 7
- Error when column name equals variable name HOT 2
- a shiny ui frontend to the package
- How overlapping stays are adjusted HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hospitalnetwork.