Coder Social home page Coder Social logo

Comments (10)

bbengfort avatar bbengfort commented on July 22, 2024 1

@rebeccabilbro ok, so I added the connect four and bike share data (and added mushroom a while ago). The other suggestions were good, unfortunately it wasn't easy to transform them into a format that yellowbrick could use.

We've got a few datasets and between these and the dataset generators in scikit-learn I'd say we're covered. I'm going to close this issue, but we can create more dataset issues as needed.

from yellowbrick.

rebeccabilbro avatar rebeccabilbro commented on July 22, 2024

Maybe the mushroom dataset for multi-class?

from yellowbrick.

bbengfort avatar bbengfort commented on July 22, 2024

Mushroom data set would be a good one; and hopefully in the user testing we
get a few as well.

Sent from Gmail Mobile

from yellowbrick.

hboyan avatar hboyan commented on July 22, 2024

Do you have any guidelines on the features you want included in these, or the number of observations? I have tons of datasets kicking around that I've used for my students to practice on, so they're fairly straightforward/clean. Would love to help!

from yellowbrick.

bbengfort avatar bbengfort commented on July 22, 2024

@hboyan one thing that we're looking for that you might be able to give advice on is data sets that have interesting feature analysis and modeling constraints; for example a dataset that is better for LASSO than Ridge and vice versa; or a data set that is better for Bayesian modeling than logistic regression (and vice versa). Basically - raw data sets that require some feature, model, and hyperparameter analysis that we can use to demonstrate the efficacy of visual diagnostics and conduct a user study to demonstrate that visual diagnostics are faster than non-visual and potentially even search based methods.

from yellowbrick.

bbengfort avatar bbengfort commented on July 22, 2024

@rebeccabilbro ok, we've added the energy dataset for regression, and mushroom for multiclass (though I thought that was poisonous vs. not poisonous?). I think we need to add one more multiclass with > 5 classes.

I'm going to add this to in progress to mark that it's underway, and I can quickly add the last data set, when we've decided.

from yellowbrick.

NealHumphrey avatar NealHumphrey commented on July 22, 2024

@bbengfort the example that I used as a basis for choosing the ConfusionMatrixVisualizer came out of the sklearn handwritten digits dataset. The music dataset I started out using as an example is kind of messy, so I was planning to convert my example over to the digits example. That might be a good one to use for a multi-class classifier example? It's nice because it's 10 classes, a good balance of big but not too big, and some nice overlap of some easy-to-identify and hard-to-identify classes.

from yellowbrick.

bbengfort avatar bbengfort commented on July 22, 2024

@NealHumphrey -- that's a good one, but I'd like to have our own examples just so we can show something slightly different from scikit-learn.

What about predicting religion from country flags?
http://archive.ics.uci.edu/ml/datasets/Flags

This will require transformers, which would also help us evaluate the YB workload. The problem is that it requires transformers (the categorical data).

from yellowbrick.

rebeccabilbro avatar rebeccabilbro commented on July 22, 2024

Ok, some suggestions:

For classification (multi-class)

  • Connect four data (target can be Win, Lose, or Draw)
  • Flags (target can be 0=Catholic, 1=Other Christian, 2=Muslim, 3=Buddhist, 4=Hindu, 5=Ethnic, 6=Marxist, 7=Others)

For regression

For clustering

from yellowbrick.

rebeccabilbro avatar rebeccabilbro commented on July 22, 2024

@bbengfort can you please review?

from yellowbrick.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.