Comments (4)
Hi @samgalen, the SDGym documentation website contains a reference to all the features that we support. The get_available_datasets function is listed but load_dataset is not -- meaning, it's not currently a supported feature.
Note that we are in the process of cleaning up our library so older, unsupported features may still be present in the code. So we ask that you please bear with us as we clean our repo!
BTW -- I'm curious about your use case? We found that loading datasets ad-hoc was not a frequently used feature, as most of our users are directly coming to benchmark synthesizers. If this would be helpful to you, we could track it as a feature request.
from sdgym.
Hi @npatki - Thanks for the response.
My use case is that I'm trying to replicate prior work which uses the load_dataset
function (but not other portions of SDgym). So it's not so much that I need to be able to use the function regularly, but rather that I was trying to figure out some aspects of how some of the datasets were processed, and how data was encoded etc.
If there's a way to see that easily in the current version of SDgym, that would be ideal.
from sdgym.
No problem! SDGym uses the SDV library for a majority of the predefined synthesizers. It also reads from the same demo datasets.
So one options is to directly pull from the SDV instead of SDGym. It should be automatically installed if you have SDGym already.
from sdv.datasets.demo import get_available_demos
from sdv.datasets.demo import download_demo
# get a table of all demos
# this should have the same datasets as what SDGym returns
all_demos = get_available_demos(modality='single_table')
# select a particular dataset name to download
data, metadata = download_demo(
modality='single_table',
dataset_name='fake_hotel_guests'
)
For more resources see:
- SDV demo API
- SDV transformation API. We now expose functions that allow you to see how the data is preprocessed (converted from raw -> numeric values) before applying the machine learning.
Let me know if you have any more Qs!
from sdgym.
Hi @samgalen, I'm closing this issue off since it has been inactive for some time and we've answered the original question.
I've filed a separate feature request in #261 to allow the ability to download and inspect datasets prior to running them in the benchmark. I've also copied over the workaround where you can access the datasets directly from the SDV library.
Feel free to reply if there is more to discuss and we can always reopen the issue. Alternatively, we can continue the conversation in the new feature request.
from sdgym.
Related Issues (20)
- Drop support for Python 3.7
- Switch default branch from master to main HOT 1
- Binary Classification metric fails with unknown category (`ValueError`) HOT 2
- Add ability to load and inspect individual datasets HOT 1
- Dockerfile error HOT 1
- Fix typos in the docs HOT 1
- Add run_on_ec2 flag to benchmark_single_table
- Transition from using setup.py to pyproject.toml to specify project metadata
- Remove bumpversion and use bump-my-version
- Switch to using ruff for Python linting and code formatting
- Add 'pytest-runner>=2.11.1' dependency
- Add dependency checker
- Fix minimum version workflow when pointing to github branch
- Add bandit workflow
- Cleanup automated PR workflows
- Add support for Python 3.12
- Remove FastML Synthesizer
- Only run unit and integration tests on oldest and latest python versions for macos
- Bump verions SDV, SDMetrics and RDT
- Docs for AWS integration are incorrect HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sdgym.