Comments (10)
@rebeccabilbro ok, so I added the connect four and bike share data (and added mushroom a while ago). The other suggestions were good, unfortunately it wasn't easy to transform them into a format that yellowbrick could use.
We've got a few datasets and between these and the dataset generators in scikit-learn I'd say we're covered. I'm going to close this issue, but we can create more dataset issues as needed.
from yellowbrick.
Maybe the mushroom dataset for multi-class?
from yellowbrick.
Mushroom data set would be a good one; and hopefully in the user testing we
get a few as well.
Sent from Gmail Mobile
from yellowbrick.
Do you have any guidelines on the features you want included in these, or the number of observations? I have tons of datasets kicking around that I've used for my students to practice on, so they're fairly straightforward/clean. Would love to help!
from yellowbrick.
@hboyan one thing that we're looking for that you might be able to give advice on is data sets that have interesting feature analysis and modeling constraints; for example a dataset that is better for LASSO than Ridge and vice versa; or a data set that is better for Bayesian modeling than logistic regression (and vice versa). Basically - raw data sets that require some feature, model, and hyperparameter analysis that we can use to demonstrate the efficacy of visual diagnostics and conduct a user study to demonstrate that visual diagnostics are faster than non-visual and potentially even search based methods.
from yellowbrick.
@rebeccabilbro ok, we've added the energy dataset for regression, and mushroom for multiclass (though I thought that was poisonous vs. not poisonous?). I think we need to add one more multiclass with > 5 classes.
I'm going to add this to in progress to mark that it's underway, and I can quickly add the last data set, when we've decided.
from yellowbrick.
@bbengfort the example that I used as a basis for choosing the ConfusionMatrixVisualizer came out of the sklearn handwritten digits dataset. The music dataset I started out using as an example is kind of messy, so I was planning to convert my example over to the digits example. That might be a good one to use for a multi-class classifier example? It's nice because it's 10 classes, a good balance of big but not too big, and some nice overlap of some easy-to-identify and hard-to-identify classes.
from yellowbrick.
@NealHumphrey -- that's a good one, but I'd like to have our own examples just so we can show something slightly different from scikit-learn.
What about predicting religion from country flags?
http://archive.ics.uci.edu/ml/datasets/Flags
This will require transformers, which would also help us evaluate the YB workload. The problem is that it requires transformers (the categorical data).
from yellowbrick.
Ok, some suggestions:
For classification (multi-class)
- Connect four data (target can be Win, Lose, or Draw)
- Flags (target can be 0=Catholic, 1=Other Christian, 2=Muslim, 3=Buddhist, 4=Hindu, 5=Ethnic, 6=Marxist, 7=Others)
For regression
- Songs by year (target is a year between 1922 and 2011)
- Daily bikeshare rentals (target is the number of bikes rented on a given day)
For clustering
from yellowbrick.
@bbengfort can you please review?
from yellowbrick.
Related Issues (20)
- Radviz error from DataFrame which doesn't have sequantial index HOT 3
- How not to plot legend in RadViz plot? HOT 4
- On the generation of RadViz plot HOT 1
- Use classification visualizers directly from predictions, targets and logits? HOT 1
- [SilhouetteVisualizer] Constructor argument is_fitted is ignored during initialization HOT 1
- ConfusionMatrix visualizer error with sklearn models HOT 3
- Is there a way to hide the figure from KElbowVisualizer? HOT 3
- Let `KElbowVisualizer` use all the distance metrics supported by sklearn HOT 5
- The PredictionError can't be visualized due to the dim error HOT 2
- Adjusting markersize in `prediction_error` HOT 2
- Matplotlib warning about color usage in Datasaurus
- No figure output of the show method and produce a lot of findfont: Generic family 'sans-serif' not found warnings HOT 2
- Unable to use Silhouette Visualizer with Gaussian Mixture Model HOT 7
- Can't plot class report with trained model HOT 1
- Interactive plots - support plotly backend. HOT 4
- InterclusterDistance AttributeError: 'NoneType' object has no attribute '_get_renderer' HOT 2
- Add arguments to change PCA biplot arrow and arrow label colors and other properties HOT 1
- yellowbrics conflict with matplotlib: use_line_collection in cause!
- Classification report shows classes inverted if passed from sklearn label encoder
- CooksDistance regressor generating error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yellowbrick.