Comments (3)
Hi @gnomesanta, when you use conditional sampling with a neural network-based synthesizer such as TVAE, it will require many attempts to match the conditions. It appears that you are running out of attempts because a Regex (in your metadata) is limiting the possibilities.
But to confirm this, would you be able to share your final metadata? You can do this by printing the metadata or by saving it as a JSON file and attaching it here (recommended approach).
metadata.save_to_json(filepath='my_metadata.json')
Once we confirm this, I may be able to suggest a few workarounds.
from sdv.
Hi @gnomesanta, are you still running into this issue?
I'm not sure what your metadata looks like so I don't know if this will help: If you have a Regex that is limiting the # of possibilities, you may want to update your metadata to provide a larger Regex.
For eg. the following regex will allow for 1,000 possibilities (id_000
to id_999
): id_[0-9]{3}
. To allow for more possibilities you can either update the allowed characters (id_[0-9a-z]
) or increase the number of digits (id_[0-9]{10}
). Do note that we plan to update the defaults in the future so the synthesizer will not crash if it runs out of Regexes (see RDT issue #749).
from sdv.
Hi @gnomesanta, are you still working on this project? I'm closing off this issue since it has been inactive for some time and we've discussed one possible cause and suggestion for conditional sampling.
If there is more to discuss, please feel free to reply to this issue and we can always reopen for further investigation. Thanks.
from sdv.
Related Issues (20)
- '<Synthesizer>' object has no attribute '_model'
- Allow the ability to easily remove primary keys
- Constraint should not be set on columns inside a gps relationship
- Set the default transformer for GPS column relationship
- Column relationship warning should be raised during synthesizer initialization only
- PARSynthesizer creates limited ranges (and is unable to forecast past the max date) HOT 2
- Improving Multi-Table Synthetic Data (Healthcare dataset) -- NaN values getting created HOT 32
- Make the `get_parameters` function consistent between synthesizers
- Reinstate `get_table_parameters` for the multi-table synthesizers
- Validate condition and provide user-friendly messages for NaN/missing values (currently unsupported)
- What is the license of sdv-dev (DataCebo) SDV? HOT 2
- Improve quality of `sequence_index`: Move the start dates into the context model
- Add a `version` module to align with SDV Enterprise
- Warn users to save their metadata file after auto-detecting/updating it
- Support the `'category'` dtype in SDV (currently `'object'` representation is supported)
- Set the GPSNoiser as default transformer for GPS column relationship
- Add-ons warning is raised twice for multi table synthesizers.
- Revisit `extended_columns` abstraction
- Improved error message if a column is already present in a relationship
- Datatime columns as context_columns in PARsynthizer HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.