Coder Social home page Coder Social logo

Comments (5)

tjibbed avatar tjibbed commented on July 26, 2024

Hi @ClementMassonnaud ,

I'm currently checking your new implementation of the overlapping admission algorithm. (I am using a subset of a real dataset, so no simulated data at this moment). I've run into a couple of issues that might need our attention:

  • It's quite slow. The original method takes:

user system elapsed
1.413 0.000 0.751

whereas the new implementation times at:

user system elapsed
84.214 0.039 83.084

I think this is mainly die to the "per subject" calculation of overlaps, although I haven't checked this. However, I used to use the by=sID method before switching to the full list method. @PascalCrepey taught me this, and I guess you know it too. It's quite fast... basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

over = base[, list("left" = .SD[, Adate][-1] - .SD[, Ddate][-.N], "right" = .SD[, Ddate][-1] - .SD[, Ddate][-.N], "I" = .I[-1]), by = sID]

  1. The checked database isn't final. It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different.
    This can be caused by staggered hospital stays or sequential overlaps, e.g.
    Facility A: |--------------------------------------------|
    Facility B: |----| |----| |----|

Because overlap detection is based on successive records, sorted on patient, admission, and discharge, the algorithm will only detect and correct the first instance, leaving two overlaps:
Facility A: |--------| |------------------------------|
Facility B: |----| |----| |----|

The next run, the following overlap is detected and corrected:
Facility A: |--------| |-------| |-------------------|
Facility B: |----| |----| |----|

Etc.

This is the original method uses the iterations, and I think this could be done in the new method as well.

  1. Some of the newly create records have negative lengths of stay. This not a big issue, as they can easily be filtered out.

  2. If I run the algorithm until it gives the final answer, the answer is not the same as the original method. The new method gives far more records. I am currently going through the database to see what the correct answer should be. As it is real data, this does take some time, but I think the extra "dirt" in the real data may help us find the potential issues here.

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

  1. Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

from hospitalnetwork.

tjibbed avatar tjibbed commented on July 26, 2024

My "beautiful" schematic of overlapping admission didn't align there... But I think you get the idea.

from hospitalnetwork.

ClementMassonnaud avatar ClementMassonnaud commented on July 26, 2024

Hi,
Thank you for your feedback

It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different. This can be caused by staggered hospital stays or sequential overlaps, e.g.

I tried to think about the different types of possible overlaps, but I missed this type. I now see what is the issue... I usually try to avoid 'while' loops as much as possible, but here it seems we don't have much choice indeed

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

Yes, I think it is a key point, to make sure the function deals correctly with every possible scenario. I wrote tests in tests/testthat/test-adjust_overlaps.R. I manually created a fake database with all the specific overlaps I could think of, listing them, and anticipating what the correct result should be. I think that it would be a good idea to use the other types of overlaps from a real database, as you suggested, and then 'merge' them with the current test database to complete the tests.

Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

I'm not sure I understand what you mean by auxiliary data. If by that you mean additional columns in the database, I don't understand where is the issue

basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

Yes this is interesting, I missed that. I don't see why it would not be possible here as well. I am currently on the road, I will look into it ASAP.

from hospitalnetwork.

PascalCrepey avatar PascalCrepey commented on July 26, 2024

It seems to me we need to remain on @tjibbed original implementation, with the loop. Am I right ? does it need further optimization ? Can we merge @ClementMassonnaud data.table optimization into @tjibbed implementation ? or should we simply close this issue ?

from hospitalnetwork.

ClementMassonnaud avatar ClementMassonnaud commented on July 26, 2024

Yes we do need the loop I think
But I think we should also implement the optimizations I proposed.
I'll merge the two versions and close this issue ASAP

from hospitalnetwork.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.