Coder Social home page Coder Social logo

Comments (4)

srinify avatar srinify commented on June 8, 2024

Hi there @deltaproximity it looks like the data you're using for training don't adhere to the constraint you specified (scalar range from 0.7 to 0.9). I see multiple values outside of this range.

Constraints in sdv are used to describe business rules inherent in your real data that you want the trained synthesizer / model to know about. This error is being thrown because sdv detected that the underlying data for training doesn't match the constraint you specified: InvalidDataError: The provided data does not match the metadata: Data is not valid for the 'ScalarRange' constraint:

Do you mind sharing more about your use case here? What's the motivation to define such a constraint that deviates from your original data?

from sdv.

deltaproximity avatar deltaproximity commented on June 8, 2024

Hi @srinify, thanks for your reply. I need the synthesizer to use only the data from the range [0.7, 0.9], because the data I want to sample should have values of this column only in this range. From what I understood from the sdv documentation when using the conditional sampling one can only fix values but cannot specify a range for sampling. Therefore, I wanted to create a synthesizer that learns only from the data in the above specified range.

from sdv.

srinify avatar srinify commented on June 8, 2024

Thanks for the context. A few things:

  • sdv is working as intended and this error is in line with how constraints work. You can read more here about the error.
  • It looks like InvalidDataError is being returned instead of ConstraintsNotMetError, which is likely a bug! So I'll open a separate bug issue for us to fix that and make the error clearer: #1842
  • It would be useful to be able to perform conditional sampling with a specified range (instead of just specific values). So I'll open a feature request issue for that and we'd love it if you could comment with your broader use case in mind: #1843

As a workaround @deltaproximity what you can do is sample a bunch of rows and filter out the ones outside your range:

# Request more rows than you need. Maybe 1,000 if you need 100 true rows.
synthetic_data = synthesizer.sample(1000)

# Filter out rows
filtered_synthetic_data = synthetic_data[synthetic_data[(synthetic_data[COL_NAME] >= LOW_RANGE) & (synthetic_data[COL_NAME] <= HIGH_RANGE)]

from sdv.

npatki avatar npatki commented on June 8, 2024

Hi all, I'm closing this issue out as it has been inactive for a few weeks. I believe we now have other issues that are more suited to the root cause of this (see previous comment).

Please feel free to reply if there is anything more to discuss. We can always reopen the issue for more investigation. Thanks.

from sdv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.