Coder Social home page Coder Social logo

Comments (2)

schuemie avatar schuemie commented on June 1, 2024

I'd like to push back on this a bit for the following reasons:

  1. In our current environment, the data fetch typically takes no more than half an hour, which seems a reasonable time for a feasibility study.

  2. In the develop branch I've already added the option to sample the cohorts prior to fitting the PS model. The fitting of the PS model typically can take up to 2 days in our environment, so sampling seems more helpful here.

  3. The way to sample is not entirely obvious to me: we could sample uniformly across T and C, but if T or C are of very different sizes we might shrink one cohort too much. If we sample both T and C separately (as done in the createPs function as mentioned above) the ratio changes, making interpretation difficult (in the createPs function this is automatically corrected for).

Maybe we could just introduce a generic function that generates new cohorts by randomly sampling from other cohorts? It doesn't solve problem 3, but at least it puts the responsibility in the hands of the user.

from cohortmethod.

schuemie avatar schuemie commented on June 1, 2024

Ok, added the option anyway (needed it for the method evaluation, where some cohorts had >7 mln subjects).

In the current development version getDbCohortMethodData has a new argument maxCohortSize. If set to a value >0, both target and comparator cohort will be restricted to this size (through random sampling): a54d834

Of course, this argument has also percolated to the createGetDbCohortMethodDataArgs function.

from cohortmethod.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.