Environment Details Please indicate the following details about th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

load_dataset fails for about sdgym HOT 4 CLOSED

samgalen commented on June 9, 2024

load_dataset fails for

from sdgym.

Comments (4)

npatki commented on June 9, 2024

Hi @samgalen, the SDGym documentation website contains a reference to all the features that we support. The get_available_datasets function is listed but load_dataset is not -- meaning, it's not currently a supported feature.

Note that we are in the process of cleaning up our library so older, unsupported features may still be present in the code. So we ask that you please bear with us as we clean our repo!

BTW -- I'm curious about your use case? We found that loading datasets ad-hoc was not a frequently used feature, as most of our users are directly coming to benchmark synthesizers. If this would be helpful to you, we could track it as a feature request.

from sdgym.

samgalen commented on June 9, 2024

Hi @npatki - Thanks for the response.

My use case is that I'm trying to replicate prior work which uses the load_dataset function (but not other portions of SDgym). So it's not so much that I need to be able to use the function regularly, but rather that I was trying to figure out some aspects of how some of the datasets were processed, and how data was encoded etc.

If there's a way to see that easily in the current version of SDgym, that would be ideal.

from sdgym.

npatki commented on June 9, 2024

No problem! SDGym uses the SDV library for a majority of the predefined synthesizers. It also reads from the same demo datasets.

So one options is to directly pull from the SDV instead of SDGym. It should be automatically installed if you have SDGym already.

from sdv.datasets.demo import get_available_demos
from sdv.datasets.demo import download_demo

# get a table of all demos
# this should have the same datasets as what SDGym returns
all_demos = get_available_demos(modality='single_table')

# select a particular dataset name to download
data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)

For more resources see:

SDV demo API
SDV transformation API. We now expose functions that allow you to see how the data is preprocessed (converted from raw -> numeric values) before applying the machine learning.

Let me know if you have any more Qs!

from sdgym.

npatki commented on June 9, 2024

Hi @samgalen, I'm closing this issue off since it has been inactive for some time and we've answered the original question.

I've filed a separate feature request in #261 to allow the ability to download and inspect datasets prior to running them in the benchmark. I've also copied over the workaround where you can access the datasets directly from the SDV library.

Feel free to reply if there is more to discuss and we can always reopen the issue. Alternatively, we can continue the conversation in the new feature request.

from sdgym.

load_dataset fails for about sdgym HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent