Coder Social home page Coder Social logo

load_dataset fails for about sdgym HOT 4 CLOSED

samgalen avatar samgalen commented on June 9, 2024
load_dataset fails for

from sdgym.

Comments (4)

npatki avatar npatki commented on June 9, 2024

Hi @samgalen, the SDGym documentation website contains a reference to all the features that we support. The get_available_datasets function is listed but load_dataset is not -- meaning, it's not currently a supported feature.

Note that we are in the process of cleaning up our library so older, unsupported features may still be present in the code. So we ask that you please bear with us as we clean our repo!

BTW -- I'm curious about your use case? We found that loading datasets ad-hoc was not a frequently used feature, as most of our users are directly coming to benchmark synthesizers. If this would be helpful to you, we could track it as a feature request.

from sdgym.

samgalen avatar samgalen commented on June 9, 2024

Hi @npatki - Thanks for the response.

My use case is that I'm trying to replicate prior work which uses the load_dataset function (but not other portions of SDgym). So it's not so much that I need to be able to use the function regularly, but rather that I was trying to figure out some aspects of how some of the datasets were processed, and how data was encoded etc.

If there's a way to see that easily in the current version of SDgym, that would be ideal.

from sdgym.

npatki avatar npatki commented on June 9, 2024

No problem! SDGym uses the SDV library for a majority of the predefined synthesizers. It also reads from the same demo datasets.

So one options is to directly pull from the SDV instead of SDGym. It should be automatically installed if you have SDGym already.

from sdv.datasets.demo import get_available_demos
from sdv.datasets.demo import download_demo

# get a table of all demos
# this should have the same datasets as what SDGym returns
all_demos = get_available_demos(modality='single_table')

# select a particular dataset name to download
data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)

For more resources see:

  • SDV demo API
  • SDV transformation API. We now expose functions that allow you to see how the data is preprocessed (converted from raw -> numeric values) before applying the machine learning.

Let me know if you have any more Qs!

from sdgym.

npatki avatar npatki commented on June 9, 2024

Hi @samgalen, I'm closing this issue off since it has been inactive for some time and we've answered the original question.

I've filed a separate feature request in #261 to allow the ability to download and inspect datasets prior to running them in the benchmark. I've also copied over the workaround where you can access the datasets directly from the SDV library.

Feel free to reply if there is more to discuss and we can always reopen the issue. Alternatively, we can continue the conversation in the new feature request.

from sdgym.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.