Coder Social home page Coder Social logo

Comments (1)

npatki avatar npatki commented on June 2, 2024

Hi @MarcJohler thanks for filing the issue with the details and providing some insight as to what's going on. We'll keep this issue open as we track the fix. Fortunately, there are a few workarounds with I'll mention below.

Reproducing the Issue

The code below reproduces the issue. Do let us know if you meant something else by your original issue.

import pandas as pd
import numpy as np

from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.sampling import Condition

data = pd.DataFrame(data={
    'A': [round(i, 2) for i in np.random.uniform(low=0, high=10, size=100)],
    'B': [round(i) for i in np.random.uniform(low=0, high=10, size=100)],
    'C': np.random.choice(['Yes', 'No', 'Maybe'], size=100)
})

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'A': { 'sdtype': 'numerical' },
        'B': { 'sdtype': 'numerical' },
        'C': { 'sdtype': 'categorical' }
    }
})

synth = GaussianCopulaSynthesizer(metadata)

constraint = {
    'constraint_class': 'ScalarRange',
    'constraint_parameters': {
        'column_name': 'B',
        'low_value': 0,
        'high_value': 10,
        'strict_boundaries': False
    }
}

synth.add_constraints([constraint])
synth.fit(data)

my_condition = Condition(num_rows=250, column_values={'B': 1})
synth.sample_from_conditions([my_condition])

Output:

ValueError: Unable to sample any rows for the given conditions. This may be because the provided values are out-of-bounds in the current model. 
Please try again with a different set of values.

Workaround 1: Preprocessing

By default, SDV synthesizers are automatically configured to enforce the observed min/max values for all columns. So there's no need to add a ScalarRange constraint.

Alternatively, you can toggle this on/off for particular columns by updating the data transformers.

from rdt.transformers.numerical import FloatFormatter

# enforce for all columns (default)
synth = GaussianCopulaSynthesizer(metadata, enforce_min_max_values=True)

# selectively enforce
synth = GaussianCopulaSynthesizer(metadata, enforce_min_max_values=False)
synth.auto_assign_transformers(data)

synth.update_transformers({
    'B': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=False)
})

synth.fit(data)

Workaround 2: Update the bounds of ScalarRange

For this particular case, updating the lower boundary to -1 seemed to work for me. I'm not entirely sure why.

constraint = {
    'constraint_class': 'ScalarRange',
    'constraint_parameters': {
        'column_name': 'B',
        'low_value': -1,
        'high_value': 10,
        'strict_boundaries': False
    }
}

from sdv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.