Coder Social home page Coder Social logo

Comments (5)

rcurtin avatar rcurtin commented on June 1, 2024

Nice, I think this could make for more compelling examples than "generate random uniform data"! 👍

It's worth pointing out that mlpack already has a number of distribution-like classes: GaussianDistribution, GammaDistribution, LaplaceDistribution, DiscreteDistribution, and so forth. (See src/mlpack/core/dists/.) Now, it would be cool to generate data directly from one of these distribution classes, but there are some issues: those distribution classes are typically aimed at (1) generating random samples via Random(), and (2) evaluating probabilities via Probability(), but that second function is totally irrelevant here---we just want to generate datasets. Even the signature of (1) is not quite right, as for existing distributions it just generates a single point.

So, certainly some additional infrastructure is necessary to generate labeled synthetic datasets, but I do think that whatever we write should be "aware" of the distribution code and make use of it when possible in the implementation (and add new distributions as needed).

A minor pedantic thought is that after #3269, pretty much everything in mlpack is directly in the mlpack:: namespace for convenience (with the exception of a couple things in util:: and a couple things in data::). So, I'd personally prefer to avoid a simulate:: namespace.

At least personally I wouldn't worry about Open Question (2) too much; I think if we provide something relatively barebones at first, it will get immediately used in the documentation, and that's probably good enough for now.

from mlpack.

mlpack-bot avatar mlpack-bot commented on June 1, 2024

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

from mlpack.

coatless avatar coatless commented on June 1, 2024

Sounds like an expansion for distributions is in order to handle multi-point generation. With respect to random() is this using a poorly spec'd PRNG?

On the note of namespaces, maybe this should go under util:: or where the train/test split is found?

from mlpack.

rcurtin avatar rcurtin commented on June 1, 2024

Sounds like an expansion for distributions is in order to handle multi-point generation.

Possibly, it would be great to keep things unified, but if it doesn't make sense (or if the amount of work for adapting older distributions is not feasible), in my view it's okay to keep them different.

With respect to random() is this using a poorly spec'd PRNG?

It uses std::mt19937, not sure if that qualifies as "poor" (I am not an RNG expert).

On the note of namespaces, maybe this should go under util:: or where the train/test split is found?

I really think a flat namespace is fine, since there aren't really going to be any naming conflicts, but Split() is in the data:: namespace (as is Load() and Save()), and I suppose we could use that too. util:: is primarily for internal mlpack tooling, but this would be user-facing.

from mlpack.

arthiondaena avatar arthiondaena commented on June 1, 2024

@rcurtin I am trying to find a good beginner's issue, do you think this feature request can be implemented by a beginner to learn about mlpack.

from mlpack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.