Comments (3)
Workaround
For anyone blocked by this, you can use the code snippet below. This code will sample a lot of rows (unbounded) and then filter out afterwards to a specific range.
# TODO: input the conditions you need
COL_NAME = 'my_column_name'
LOW_RANGE = 18.0 # minimum possible value in range
HIGH_RANGE = 100.0 # maximum possible value in range
# Request more rows than you need. Maybe 1,000 if you need 100 true rows.
synthetic_data = synthesizer.sample(1000)
# Filter out rows to within the range
filtered_synthetic_data = synthetic_data[synthetic_data[(synthetic_data[COL_NAME] >= LOW_RANGE) & (synthetic_data[COL_NAME] <= HIGH_RANGE)]
from sdv.
Thanks @npatki for sharing the workaround code. Can such conditions be defined even before generating the samples? I think it would be better to have something like generate with conditions (different from generating with constraints) to avoid unnecessary computation time in generating and then filtering based on conditions.
from sdv.
Hi @adib0073, unfortunately I cannot think of a good workaround that would allow you to do so right now.
However in the future, when the team adds an actual feature to enable range-based conditional sampling, that is exactly how I envision it working.
from sdv.
Related Issues (20)
- Warn the user if they are trying to save an unfit synthesizer
- `DataProcessor` never gets assigned a `table_name`.
- Optimize PARSynthesizer's performance HOT 3
- Use Metal Performance Shaders when training SDV synthesizers on Apple Silicon machines HOT 1
- Class Imbalances HOT 1
- Idea: Save only the model weights in the model PKL file instead of all the classes and other object properties
- PARSynthesizer model won't fit if sequence_index is missing
- Unexpected NaN values in `sequence_index` when dataframe isn't reset HOT 4
- Save usage log file as a csv
- Split out metadata creation from data import in the local files handlers
- Improve error message when trying to sample before fitting (multi-table)
- Improve error message when trying to sample before fitting (single table)
- Fix pandas DtypeWarning in download_demo
- Expose progress bars for data pre-processing during model fitting
- Sampling with remaining columns not working if primary key column is included in remaining columns
- Add verbosity to `TVAESynthesizer`
- [HELP] CTGAN has Reproducibility? HOT 1
- Add support for generated columns when conditional sampling
- Update code to remove `FutureWarning` related to 'enforce_uniqueness' parameter
- Update code to remove pandas `FutureWarning` messages that's displayed for each row during conditional sampling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.