Coder Social home page Coder Social logo

Comments (3)

srinify avatar srinify commented on August 30, 2024

Hi @limhasic 👋

I'm a bit confused about exactly what you're asking -- do you mind clarifying a bit further? I understood your question to be -- "Why keep tables laid out in a multi-table pattern when I can just combine them into a single table and use SDV instead that way?" If this is incorrect, let me know!

Here's the relevant key differences:

Single Table: Works best when you have a single identifier column (e.g. user_id) that can uniquely link and identify the entities in your data. If you have other columns with identifier-like properties (e.g. post_id) in the same dataset, then single table models will not learn the relationships between your primary identifier column (user_id) and your secondary one (post_id). Your synthetic data may have rows containing user_id and post_id value pairs that don't exist in your real data

Multi Table: Supports cases where you have multiple identifier / id columns in your data that have a relational link between them. With Multi Table, you can specify the relationships between identifier columns and SDV will learn to model them more effectively. For example, SDV will maintain referential integrity when generating synthetic data (e.g. the combinations of user_id and post_id will match the same ones in your real data)

from sdv.

npatki avatar npatki commented on August 30, 2024

Hi @limhasic,

To add to this, we always recommend you to use with data that is the closest to its original source. The more you modify the data (splitting, joining, etc.), the more logic/dependencies you will be introducing into your dataset. As a result, it becomes much more difficult for SDV synthesizers to learn this out-of-the-box, because they must reverse-engineer all the changes that were introduced.

Hope that helps, and as @srinify mentioned, it would be helpful if you can provide an example to help us clarify the question further. Thanks.

from sdv.

srinify avatar srinify commented on August 30, 2024

Hi @limhasic we hope our answers were helpful! It's been 2 weeks since we've heard from you and our general posture is to close out issues with no response after 2 weeks!

If you have more questions, feel free to open more issues!

from sdv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.