Coder Social home page Coder Social logo

Comments (6)

rebeccabilbro avatar rebeccabilbro commented on August 24, 2024

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

from yellowbrick.

rebeccabilbro avatar rebeccabilbro commented on August 24, 2024

In Scikit-Learn, it looks like most supervised methods have a coef_ or feature_importances_ method to determine the most important features.

Examples that use feature_importances_:

Examples that use coef_:

from yellowbrick.

bbengfort avatar bbengfort commented on August 24, 2024

@rebeccabilbro adding the comments I wrote on this from the paper. Also; I'd say this is a feature, not technical debt?

Generalized linear models compute a predicted independent variable via the linear combination of an array of coefficients with an array of dependent variables. GLMs are fit by modifying the coefficients so as to minimize error and regularization techniques specify how the model modifies coefficients in relation to each other. As a result, an opportunity presents itself: larger coefficients are necessarily "more informative" because they contribute a greater weight to the final prediction in most cases. Additionally we may say that instance features may also be more or less
"informative" depending on the product of the instance feature value with the feature coefficient. This creates two possibilities:

  1. We can compare models based on ranking of coefficients, such that a higher coefficient is "more informative".
  2. We can compare instances based on ranking of feature/coefficient products such that a higher product is "more informative".

In both cases, because the coefficient may be negative (indicating a strong negative correlation) we must rank features by the absolute values of their coefficients. Visualizing a model or multiple models by most informative feature is usually done via bar chart where the y-axis is the feature names and the x-axis is numeric value of the coefficient such that the x-axis has both a positive and negative quadrant. The bigger the size of the bar, the more informative that feature is.

This method may also be used for instances; but generally there are very many instances relative to the number models being compared. Instead a heatmap grid is a better choice to inspect the influence of features on individual instances. Here the grid is constructed such that the x-axis represents individual features, and the y-axis represents individual instances. The color of each cell (an instance, feature pair) represents the magnitude of the product of the instance value with the feature's coefficient for a single model. Visual inspection of this diagnostic may reveal a set of instances for which one feature is more predictive than another; or other types of regions of information in the model itself.

from yellowbrick.

rebeccabilbro avatar rebeccabilbro commented on August 24, 2024

@bbengfort is that a way of saying that you want to take this one?

from yellowbrick.

bbengfort avatar bbengfort commented on August 24, 2024

@rebeccabilbro just saying that I had been thinking about it and had written some notes; if you'd like to take a first crack at it - please feel free! I'm still pushing on Radviz and Paralllel coords (plus all the style and architecture stuff).

from yellowbrick.

bbengfort avatar bbengfort commented on August 24, 2024

Closed by #317

from yellowbrick.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.