Comments (4)
Hi there @deltaproximity it looks like the data you're using for training don't adhere to the constraint you specified (scalar range from 0.7 to 0.9). I see multiple values outside of this range.
Constraints in sdv are used to describe business rules inherent in your real data that you want the trained synthesizer / model to know about. This error is being thrown because sdv detected that the underlying data for training doesn't match the constraint you specified: InvalidDataError: The provided data does not match the metadata: Data is not valid for the 'ScalarRange' constraint:
Do you mind sharing more about your use case here? What's the motivation to define such a constraint that deviates from your original data?
from sdv.
Hi @srinify, thanks for your reply. I need the synthesizer to use only the data from the range [0.7, 0.9], because the data I want to sample should have values of this column only in this range. From what I understood from the sdv documentation when using the conditional sampling one can only fix values but cannot specify a range for sampling. Therefore, I wanted to create a synthesizer that learns only from the data in the above specified range.
from sdv.
Thanks for the context. A few things:
- sdv is working as intended and this error is in line with how constraints work. You can read more here about the error.
- It looks like
InvalidDataError
is being returned instead ofConstraintsNotMetError
, which is likely a bug! So I'll open a separate bug issue for us to fix that and make the error clearer: #1842 - It would be useful to be able to perform conditional sampling with a specified range (instead of just specific values). So I'll open a feature request issue for that and we'd love it if you could comment with your broader use case in mind: #1843
As a workaround @deltaproximity what you can do is sample a bunch of rows and filter out the ones outside your range:
# Request more rows than you need. Maybe 1,000 if you need 100 true rows.
synthetic_data = synthesizer.sample(1000)
# Filter out rows
filtered_synthetic_data = synthetic_data[synthetic_data[(synthetic_data[COL_NAME] >= LOW_RANGE) & (synthetic_data[COL_NAME] <= HIGH_RANGE)]
from sdv.
Hi all, I'm closing this issue out as it has been inactive for a few weeks. I believe we now have other issues that are more suited to the root cause of this (see previous comment).
Please feel free to reply if there is anything more to discuss. We can always reopen the issue for more investigation. Thanks.
from sdv.
Related Issues (20)
- Warn the user if they are trying to save an unfit synthesizer
- `DataProcessor` never gets assigned a `table_name`.
- Optimize PARSynthesizer's performance HOT 3
- Use Metal Performance Shaders when training SDV synthesizers on Apple Silicon machines HOT 1
- Class Imbalances HOT 1
- Idea: Save only the model weights in the model PKL file instead of all the classes and other object properties
- PARSynthesizer model won't fit if sequence_index is missing
- Unexpected NaN values in `sequence_index` when dataframe isn't reset HOT 4
- Save usage log file as a csv
- Split out metadata creation from data import in the local files handlers
- Improve error message when trying to sample before fitting (multi-table)
- Improve error message when trying to sample before fitting (single table)
- Fix pandas DtypeWarning in download_demo
- Expose progress bars for data pre-processing during model fitting
- Sampling with remaining columns not working if primary key column is included in remaining columns
- Add verbosity to `TVAESynthesizer`
- [HELP] CTGAN has Reproducibility?
- Add support for generated columns when conditional sampling
- Update code to remove `FutureWarning` related to 'enforce_uniqueness' parameter
- Update code to remove pandas `FutureWarning` messages that's displayed for each row during conditional sampling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.