Need to add a class to enable the user to evaluate the features that were most informa

In Scikit-Learn, it looks like most supervised methods have a <code class="notranslate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Closed by <a class="issue-link js-issue-link" data-error-text="Failed to load title" d

Add InformativeFeatures class,about districtdatalabs/yellowbrick

rebeccabilbro commented on August 24, 2024

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

from yellowbrick.

rebeccabilbro commented on August 24, 2024

In Scikit-Learn, it looks like most supervised methods have a coef_ or feature_importances_ method to determine the most important features.

Examples that use feature_importances_:

Examples that use coef_:

from yellowbrick.

bbengfort commented on August 24, 2024

@rebeccabilbro adding the comments I wrote on this from the paper. Also; I'd say this is a feature, not technical debt?

Generalized linear models compute a predicted independent variable via the linear combination of an array of coefficients with an array of dependent variables. GLMs are fit by modifying the coefficients so as to minimize error and regularization techniques specify how the model modifies coefficients in relation to each other. As a result, an opportunity presents itself: larger coefficients are necessarily "more informative" because they contribute a greater weight to the final prediction in most cases. Additionally we may say that instance features may also be more or less
"informative" depending on the product of the instance feature value with the feature coefficient. This creates two possibilities:

We can compare models based on ranking of coefficients, such that a higher coefficient is "more informative".
We can compare instances based on ranking of feature/coefficient products such that a higher product is "more informative".

In both cases, because the coefficient may be negative (indicating a strong negative correlation) we must rank features by the absolute values of their coefficients. Visualizing a model or multiple models by most informative feature is usually done via bar chart where the y-axis is the feature names and the x-axis is numeric value of the coefficient such that the x-axis has both a positive and negative quadrant. The bigger the size of the bar, the more informative that feature is.

This method may also be used for instances; but generally there are very many instances relative to the number models being compared. Instead a heatmap grid is a better choice to inspect the influence of features on individual instances. Here the grid is constructed such that the x-axis represents individual features, and the y-axis represents individual instances. The color of each cell (an instance, feature pair) represents the magnitude of the product of the instance value with the feature's coefficient for a single model. Visual inspection of this diagnostic may reveal a set of instances for which one feature is more predictive than another; or other types of regions of information in the model itself.

from yellowbrick.

rebeccabilbro commented on August 24, 2024

@bbengfort is that a way of saying that you want to take this one?

from yellowbrick.

bbengfort commented on August 24, 2024

@rebeccabilbro just saying that I had been thinking about it and had written some notes; if you'd like to take a first crack at it - please feel free! I'm still pushing on Radviz and Paralllel coords (plus all the style and architecture stuff).

from yellowbrick.

bbengfort commented on August 24, 2024

Closed by #317

from yellowbrick.

Add InformativeFeatures class about yellowbrick HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent