Comments (11)
I stumbled into this thread, searching for a way to display a shapviz waterfall with probabilities.
I understand the concern about the misleading element.
@madprogramer ... it would be nice if you shared your solution.
After having spent a good hour on my end, I'm sharing the completed function that was hinted last year, as it might help others in the future.
# Gross approximation to transform shap values into binary probabilities
special_transform <- function(shp) {
b <- get_baseline(shp)
S <- get_shap_values(shp)
X <- get_feature_values(shp)
# calculate prediction:
p <- exp(b + rowSums(S)) / ( 1 + exp(b + rowSums(S)) )
# transforming the baseline and shap values:
b_new <- exp(b) / (1 + exp(b))
S_new <- S / rowSums(S)*(p - b_new)
shapviz(S_new, X, b_new)
}
from shapviz.
Actually, the SHAP values returned by XGBoost or LightGBM are on the logit link scale.
library(shapviz)
library(xgboost)
X <- iris[, -1]
# Binary logistic regression with XGBoost
fit <- xgb.train(
params = list(objective = "binary:logistic"),
data = xgb.DMatrix(data.matrix(X), label = iris[, 1] >= 5.8),
nrounds = 30
)
# On logit scale
shp <- shapviz(fit, X_pred = data.matrix(X), X = X)
sv_waterfall(shp, row_id = 66)
If you rather mean to switch to probabilities, I don't think it is possible without violating at least some of the Shapley fairness axioms (linearity). Still, I think we could add a utility function that would map the "shapviz" object approximately from logit to probability space, using the approach in the blog post you provided.
from shapviz.
Thank you ! If you can add a link function it would be superb!
BTW, I used your script and it retuned a below pic:
For the The E(f(x)] = 0.116, f(x) = 2.7, I guess there are not probabilities and still need to convert , along with the numbers highlighted in yellow bar together.
from shapviz.
The values in the plot above are log-odds and they are on the logit scale. So I think you are interested in transforming them (somehow) via logistic functio (= inverse logit) to propabilities without badly violating SHAP properties. I will look into that in the next time!
from shapviz.
I looked into the matter. According to the proposed transformation, a jump from 0.5 to 0.59 would be as large as a jump from 0.9 to 0.99 on the probability scale. I currently don't see a situation where this makes sense. Thus I wont add this transformation to "shapviz". If you still need it, simply do the transformation based on
special_transform <- function(shp) {
b <- get_baseline(shp)
S <- get_shap_values(shp)
X <- get_feature_values(shp)
b_new <- g(b, S)
S_new <- f(b, S)
shapviz(S_new, X, b_new)
}
from shapviz.
Thank you for writing this specical transform funtion for me. However, it gives me a error "Error in g(b, S) : could not find function "g"". Can you help with this?
from shapviz.
I leave this as an exercise to you. Based on your link, it is just 2-3 lines of code. But as I said, I can't think of a situation where the proposed mapping makes much sense. On the contrary, I think it is misleading.
from shapviz.
Thank you! I ran the python codes to get an idea of what shap output array and baseline prediction looks like. I tried a few different codes and results turn out differntly. I will spend more time looking at it. Appreciate your help!
from shapviz.
@mayer79 I know the issue is closed, but I made a logit to binary linking function some time ago.
I still understand that you don't want to add it in though :<
from shapviz.
As far as I remember, any non-linear transform will violate the fairness axioms of Shapley, so I'd like to keep it outside the package.
Since the "shapviz" object contains the baseline b and all SHAP values S, you could write a blogpost or similar for those interested?
from shapviz.
@actuarial-lonewolf : Thanks for this input!
Comparing with Kernel SHAP (once on link scale and once on probability scale):
library(kernelshap)
library(shapviz)
fit <- glm(Treatment ~ Type + conc + uptake, data = CO2, family = binomial)
v <- c("Type", "conc", "uptake")
(link_scale <- shapviz(kernelshap(fit, X = CO2, bg_X = CO2, feature_names = v)))
(prob_scale <- shapviz(kernelshap(fit, X = CO2, bg_X = CO2, feature_names = v, type = "response")))
transformed <- special_transform(link_scale)
get_shap_values(prob_scale[1:5, ])
# Type conc uptake
# [1,] 0.1945444 -0.12989845 0.34064436
# [2,] 0.1970985 -0.15394423 -0.08552033
# [3,] 0.1711899 -0.11740017 -0.23216460
# [4,] 0.1613382 -0.06415103 -0.30143362
# [5,] 0.1884975 0.02929410 -0.22343369
get_shap_values(transformed[1:5, ])
# Type conc uptake
# [1,] 0.2197120 -0.19802628 0.3731001
# [2,] 0.3076373 -0.21203252 -0.1484753
# [3,] 0.2954062 -0.14487101 -0.3394145
# [4,] 0.2910848 -0.06558863 -0.4402471
# [5,] 0.3082047 0.05310590 -0.3774572
from shapviz.
Related Issues (20)
- Multiclass/Multioutput/multiple models HOT 1
- Multiple plots: align SHAP axis limits
- issue with sv_importance function HOT 4
- Idea: sv_dependence2D() HOT 1
- Multioutput model names HOT 1
- Best practice for visualizing tidymodels last_fit() object HOT 6
- Cannot rename colnames/dimnames in post-processing HOT 2
- maintenance: changes in package_version() HOT 1
- Cannot set x-axis limits with beeswarm plot when data exist outside of specified xlims HOT 3
- how to get Shap interactions for LightGBM? HOT 6
- Odd findings in sv_importance() using beeswarm. HOT 14
- Stacked/dodged bar plots? HOT 1
- Controlling threads HOT 2
- Individual baselines HOT 1
- Treatment of categorical features in `potential_interactions()`: suggestion to use R squared instead of squared correlation HOT 15
- Interaction importance HOT 4
- Not compatible with mlr3 package and DALEXtra package HOT 6
- Custom color palettes for the beeswarm plot HOT 1
- ENH Allow sv_importance() and sv_interaction() to be unsorted
- Baseline-value question HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from shapviz.