Coder Social home page Coder Social logo

Comments (5)

sbrugman avatar sbrugman commented on May 18, 2024

@csala Any thoughts on this proposal?

from sdgym.

sbrugman avatar sbrugman commented on May 18, 2024

@leix28 @katxiao would a PR be welcome?

from sdgym.

csala avatar csala commented on May 18, 2024

Hi @sbrugman I think that for now the exact change that you are proposing is not within the current SDGym roadmap, but some variation is:

My suggestion would be to make the following changes:

  • All synthesizers should inherit from a synthesizer base class (Baseline)
  • All synthesizers should implement a separate fit and sample method

To add some more context to it, the reason for which the required input is a function instead of a class is that wrapping a class-based synthesizer that follows the fit/sample abstraction within a single function that receives real data, runs fit/sample internally and returns a synthetic clone is far easier than the opposite approach of trying to adapt a functional based synthesizer into this fit/sample abstraction. Also, the current implementation of SDGym already supports class-based synthesizers that inherit from the Baseline class, so making this a hard requirement does not really expand the support, it only narrows it.

On the other hand, it is true that this support for class-based synthesizer is not really documented, so adding it to the docs would be interesting!

More interestingly, this structure allows for capturing valuable metrics that are currently out of reach related to fit/sampling time and complexity (time measurements or maybe even this package). SDGym would this way be able to benchmark this aspect of a synthesizer as well, which can be an important decision criterion for which synthesizer is best for a given use case: if the user expects to sample large quantities of data then a longer fitting time would be acceptable at a lower sampling complexity.

This is another story, and it could actually be interesting too! An option that would be acceptable without sacrificing the functional input, would be to modify the code to capture such metrics only when the provided synthesizer is a Baseline subclass. We could make it so that model_time stays the same and is always reported for all synthesizer, but for Baseline subclasses two new columns, fit_time and sample_time, are added to the output table.

from sdgym.

sbrugman avatar sbrugman commented on May 18, 2024

Hi Carles, thanks for getting back at this. The clarification on why you would not like to impose the fit/sample abstraction on users is helpful. The backwards-compatibility argument also makes sense. Good to know that we can move forward on by documenting the class-based synthesizers and with the conditional extension of the benchmark with metrics on whether the implementation is Baseline-based or otherwise.

from sdgym.

npatki avatar npatki commented on May 18, 2024

Hello, I'm jumping in here a few years after this initial conversation. Since 2021, we have made significant updates to the usage/API of SDGym as well as its functionality. And I believe that some key features that were discussed here have now been incorporated into the library.

  1. You can now supply a custom synthesizer by supplying 2 separate functions for fit and sample. For more information, see the custom synthesizer user guide. We will automatically use these functions to create a class for you.
  2. The benchmarking script now reports more granular results, such as time (fit time, sample time, evaluation time) and memory usage. See interpreting the results.

So I'm going to mark this issue as (more-or-less) resolved.

Apologies if I've overlooked any of the finer points in the discussion. If there is more to add, I'd recommend filing a new issue and we can start a new discussion based on the vision and capabilities of the newest SDGym library. Thanks!

from sdgym.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.