Comments (4)
Hi @ArupDukeBanerjee at the moment SDGym is not intended to be used with your own dataset, but rather to only evaluate and compare the performance of data synthesis methods over a set of well-known datasets.
For the scenario that you mention, we are working on a separated package called SDMetrics that will be made public in the upcoming days.
from sdgym.
Hi @csala
I just wanted to know one thing about this package. Can it be used for only data generation for real data as you already stated benchmark is yet to come, meanwhile can I use/leverage different generators on my own set of data. Thanks a lot in advance!
Thanks,
Arup
from sdgym.
@ArupDukeBanerjee Yes, SDGym synthesizers can be used for modeling and sampling your own data, but this is just a secondary effect of having all the synthesizers here implemented with a uniform API.
I would rather recommend you to use the CTGAN package, which is simpler to use and will give you better results in the long term, since it is an actively maintained package with ease of use and sampling quality in mind while SDGym's goal is only to provide benchmark.
from sdgym.
from sdgym.
Related Issues (20)
- Fix typos in the docs HOT 1
- Add run_on_ec2 flag to benchmark_single_table
- Transition from using setup.py to pyproject.toml to specify project metadata
- Remove bumpversion and use bump-my-version
- Switch to using ruff for Python linting and code formatting
- Add 'pytest-runner>=2.11.1' dependency
- Add dependency checker
- Fix minimum version workflow when pointing to github branch
- Add bandit workflow
- Cleanup automated PR workflows
- Add support for Python 3.12
- Remove FastML Synthesizer
- Only run unit and integration tests on oldest and latest python versions for macos
- Bump verions SDV, SDMetrics and RDT
- Docs for AWS integration are incorrect HOT 1
- Passing synthesizer as string fails if run_on_ec2 is enabled
- The returned `Evaluate_Time` does not include results from all metrics
- Allow the ability to compute diagnostic score in a benchmarking run
- Cap numpy to less than 2.0.0 until SDGym supports
- Add support for numpy 2.0.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdgym.