Comments (1)
Hi @MarcJohler thanks for filing the issue with the details and providing some insight as to what's going on. We'll keep this issue open as we track the fix. Fortunately, there are a few workarounds with I'll mention below.
Reproducing the Issue
The code below reproduces the issue. Do let us know if you meant something else by your original issue.
import pandas as pd
import numpy as np
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.sampling import Condition
data = pd.DataFrame(data={
'A': [round(i, 2) for i in np.random.uniform(low=0, high=10, size=100)],
'B': [round(i) for i in np.random.uniform(low=0, high=10, size=100)],
'C': np.random.choice(['Yes', 'No', 'Maybe'], size=100)
})
metadata = SingleTableMetadata.load_from_dict({
'columns': {
'A': { 'sdtype': 'numerical' },
'B': { 'sdtype': 'numerical' },
'C': { 'sdtype': 'categorical' }
}
})
synth = GaussianCopulaSynthesizer(metadata)
constraint = {
'constraint_class': 'ScalarRange',
'constraint_parameters': {
'column_name': 'B',
'low_value': 0,
'high_value': 10,
'strict_boundaries': False
}
}
synth.add_constraints([constraint])
synth.fit(data)
my_condition = Condition(num_rows=250, column_values={'B': 1})
synth.sample_from_conditions([my_condition])
Output:
ValueError: Unable to sample any rows for the given conditions. This may be because the provided values are out-of-bounds in the current model.
Please try again with a different set of values.
Workaround 1: Preprocessing
By default, SDV synthesizers are automatically configured to enforce the observed min/max values for all columns. So there's no need to add a ScalarRange
constraint.
Alternatively, you can toggle this on/off for particular columns by updating the data transformers.
from rdt.transformers.numerical import FloatFormatter
# enforce for all columns (default)
synth = GaussianCopulaSynthesizer(metadata, enforce_min_max_values=True)
# selectively enforce
synth = GaussianCopulaSynthesizer(metadata, enforce_min_max_values=False)
synth.auto_assign_transformers(data)
synth.update_transformers({
'B': FloatFormatter(learn_rounding_scheme=True, enforce_min_max_values=False)
})
synth.fit(data)
Workaround 2: Update the bounds of ScalarRange
For this particular case, updating the lower boundary to -1 seemed to work for me. I'm not entirely sure why.
constraint = {
'constraint_class': 'ScalarRange',
'constraint_parameters': {
'column_name': 'B',
'low_value': -1,
'high_value': 10,
'strict_boundaries': False
}
}
from sdv.
Related Issues (20)
- '<Synthesizer>' object has no attribute '_model'
- Allow the ability to easily remove primary keys
- Constraint should not be set on columns inside a gps relationship
- Set the default transformer for GPS column relationship
- Column relationship warning should be raised during synthesizer initialization only
- PARSynthesizer creates limited ranges (and is unable to forecast past the max date) HOT 2
- Improving Multi-Table Synthetic Data (Healthcare dataset) -- NaN values getting created HOT 32
- Make the `get_parameters` function consistent between synthesizers
- Reinstate `get_table_parameters` for the multi-table synthesizers
- Validate condition and provide user-friendly messages for NaN/missing values (currently unsupported)
- What is the license of sdv-dev (DataCebo) SDV? HOT 2
- Improve quality of `sequence_index`: Move the start dates into the context model
- Add a `version` module to align with SDV Enterprise
- Warn users to save their metadata file after auto-detecting/updating it
- Support the `'category'` dtype in SDV (currently `'object'` representation is supported)
- Set the GPSNoiser as default transformer for GPS column relationship
- Add-ons warning is raised twice for multi table synthesizers.
- Revisit `extended_columns` abstraction
- Improved error message if a column is already present in a relationship
- Datatime columns as context_columns in PARsynthizer HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.