Comments (2)
Hi there @Vasanthpravin
When you run metadata.detect_from_dataframe(dp_pandas)
, SDV does a best-guess effort to automatically infer the metadata (and hence, the sdtypes) for all of your columns.
However, this process isn't perfect and we always recommend double checking the metadata to make sure it matches what you expect. You can display the metadata
object to get a read-out of the auto-detected sdtypes:
print(metadata)
Then, you can update the sdtype of multiple columns at once using the update_columns_metadata
method from SingleTableMetadata:
metadata.update_columns_metadata(
column_metadata = {
'personid': { 'sdtype': 'numerical' },
'phoneid': { 'sdtype': 'phone_number' }
}
)
Then you can create your synthesizer object, fit the model, and sample:
synthesizer = GaussianCopulaSynthesizer(metadata=metadata)
synthesizer.fit(data=df_pandas)
synthetic_data = synthesizer.sample(num_rows=50)
from sdv.
Hi there @Vasanthpravin I'm closing out this issue for now, as it seems like there isn't a clear bug here. But let me know if you're still running into the issue or uncover a related bug and we can re-open the issue!
from sdv.
Related Issues (20)
- Add reproducibility when fitting a synthesizer
- Getting ValueError (sdv-pii-25szo) while sampling synthesizer on SDV==1.13.0 HOT 3
- Is it possible to specify a distribution that one or more columns need to follow? HOT 2
- Getting KeyError while generation of data (synthesizer.sample()) - sdv==1.12.1 HOT 2
- unable to run this code from sdv.demo import load_tabular_demo HOT 1
- unable to run this code from sdv.demo import load_tabular_demo its showing error and stating No module named 'sdv.demo' HOT 3
- Sampling should not create a file called β.sample.csv.tempβ by default HOT 1
- SDV support for Ray? HOT 1
- PARSynthesizer: Duplicate sequence index values when `sequence_length` is higher than real data HOT 2
- How to improve the performance of synthesizers? HOT 1
- Adjustable Target Feature Distribution HOT 1
- TVAESynthesizer.__init__() got an unexpected keyword argument 'verbose' HOT 1
- Use of incorrect parameter name in example HOT 2
- If no filepath is provided, do not create a file during `sample`
- HMA Synthesizer's `scale` parameter doesn't work for small values
- Add header to log.csv file
- Enable the ability to run multi table synthesizers on disjointed table schemas
- Order Metadata Columns Alphabetically in Visualization
- Certain attributes are mapped as Unknown SDType and we have to change the dtype using custom script HOT 1
- Add workflow to generate release notes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdv.