Comments (5)
It is just a legacy thing in the dataset I am working with. Thanks for the suggestion!
from sdv.
Hi @longngng, unfortunately primary and foreign keys are not supported for custom constraints.
We have added this info to the custom constraints docs and FAQs.
From looking at your code, it seems like the foreign key that you have is a deterministic combination of both the letter and number columns. Since the letter and number columns can be completely recreated from the foreign key, I would consider dropping these columns even before you bring the data into the SDV. You can recreate these columns yourself after you have sampled synthetic data.
from sdv.
I see. What if the other columns in data_2 have some more information that I would like to keep? For example
data_1 = pd.DataFrame({
'key': ['a', 'b', 'c'],
})
data_2 = pd.DataFrame({
'foreign_key': ['a', 'b', 'c, c'],
'letter': ['a_20231212', 'b_20231213', 'c_20231214', 'c_20231215'],
'other': [7, 8, 9, 10]
})
In this case, I would like to enforce the PK-FK relationship, while using sdtype timestamp
on the letter
column after removing the foreign_key
prefix. Do you have any suggestions for this?
from sdv.
In the example above, the 'letter'
column seems to be of mixed-type (half foreign key, half timestamp). The easiest solution will be to extract out only the timestamp ('20231212', '20231213', ...
). This would be a standard way to denote a timestamp. SDV can model both columns (foreign key and timestamp) if they are represented separately.
I would like to understand your project a bit better though. Why does the 'letter'
column contain the foreign key's prefix? Is this standard practice in the datasets you are working with? Perhaps if I could better understand why the database is set up this way, it will help us come up with a solution.
from sdv.
No problem @longngng. Happy to help.
I'm closing this issue off as out-of-scope since constraints were not designed to be working with foreign keys. But if we ever come up with a compelling use case for this -- one that cannot be done by some pre-processing -- then we can always open a new feature request to track.
from sdv.
Related Issues (20)
- pip install takes too long HOT 1
- Repeated sequence_index values in specific situations HOT 8
- Lossvalues are good, but the quality of the synthetic data is bad... How?? HOT 2
- ParSynthesizer trying to allocate an absurd about of memory for a small dataset HOT 3
- Use Github Action Caching to Speed Up Unit and Integration Tests
- PAR Diagnostic is not 1.0 for datetime context columns HOT 1
- Add reproducibility when fitting a synthesizer
- Getting ValueError (sdv-pii-25szo) while sampling synthesizer on SDV==1.13.0 HOT 3
- Is it possible to specify a distribution that one or more columns need to follow? HOT 2
- Getting KeyError while generation of data (synthesizer.sample()) - sdv==1.12.1 HOT 2
- unable to run this code from sdv.demo import load_tabular_demo HOT 1
- unable to run this code from sdv.demo import load_tabular_demo its showing error and stating No module named 'sdv.demo' HOT 3
- Sampling should not create a file called β.sample.csv.tempβ by default HOT 1
- SDV support for Ray? HOT 1
- PARSynthesizer: Duplicate sequence index values when `sequence_length` is higher than real data HOT 2
- How to improve the performance of synthesizers? HOT 1
- Adjustable Target Feature Distribution HOT 1
- TVAESynthesizer.__init__() got an unexpected keyword argument 'verbose' HOT 1
- Use of incorrect parameter name in example HOT 2
- If no filepath is provided, do not create a file during `sample`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.