Coder Social home page Coder Social logo

christophm / interpretable-ml-book Goto Github PK

View Code? Open in Web Editor NEW
4.7K 4.7K 1.1K 650.63 MB

Book about interpretable machine learning

Home Page: https://christophm.github.io/interpretable-ml-book/

License: Other

Makefile 0.08% TeX 0.21% Shell 0.04% R 1.30% HTML 0.11% CSS 0.05% Python 1.93% Jupyter Notebook 96.28%

interpretable-ml-book's People

Contributors

adavidzh avatar aditya-shirke avatar alainjungo avatar asattiraju13 avatar bgreenwell avatar christophm avatar csinva avatar dandls avatar datajms avatar dependabot[bot] avatar discdiver avatar dlhvelasco avatar etrama avatar expectopatronum avatar fangzhouli avatar goodgravy avatar jfkxs avatar jklaise avatar jtr13 avatar juliabrosig avatar mattsonthieme avatar mkirchhof avatar nwsm avatar philip-khor avatar pitmonticone avatar raam93 avatar rajshah4 avatar rikhuijzer avatar thunfischtoast avatar tobiasgoerke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interpretable-ml-book's Issues

Gitbook

Gitbook seems to now only have a legacy option. Did you use the editor to create this on your desktop ? Can you possibly recommend an easy way to get gitbook going. I love your work and have linked to it via one of my websites.

Significance of gbm interactions

In section 5.4.5 you say:
"It is unclear whether an interaction is significantly greater than 0. We would need to conduct a statistical test, but this test is not (yet) available in a model-agnostic version."
I know that Friedman and Propescu 2008 recommend a method to create a null distribution in order to determine the significance of interactions for a gbm model. Do you know if this method is implemented in R anywhere, even if just for gbm models?

List CART (decision tree) alternatives and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of decision tree algorithms as comments to this issue.

Figure text for figure 5.10 and 5.11 is the same.

I guess this text is for figure 5.10 and not figure 5.11?

FIGURE 5.11: The interaction strength for each feature with all other features for a random forest predicting the probability of cervical cancer. The number of diagnosed sexually transmitted diseases has the highest interaction effect with all other features, followed by the number of pregnancies.

Confusing difference for Figure 5.22

I guess it's because of rounding but at first I was a bit confused that the "difference" between the actual prediction (0.43) and the average prediction (0.03) is 0.41. I think it might confuse readers and could be explained in the text. If you agree I can add a sentence to explain it.
image

Cross-references

Hi!

Thanks for your effort with this manuscript.

I've noticed that in many places you use "in this section" or "as can be seen here" as cross-references. These work fine if the manuscript is read as an interactive document, i.e., freely on your web-site. However, if read in a printed PDF, these cross-references are not particularly useful.

I'll work on pull-request to fix these issues, but I thought you should be aware of them.

Cheers,
Isak

List alternatives and software for interaction effect measures

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to alternatives and software implementations of interaction effect measure algorithms as comments to this issue.

List alternatives and software for feature importance measures

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to alternatives and software implementations of feature importances measure algorithms as comments to this issue.

Make the "next page >" symbol more obvious on the Preface page

On my cell phone the next page symbol ">" is a light grey and very small. Even though I had my cell read "Desktop site" no menu or side bar was visible. I almost ignored your book since I could only see the preface.

book02

Suggest making a regular link on the preface page to the next page.

List linear regression variants and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of the linear regression model as comments to this issue.

Reference List

As a reader I would like to have all references in one place to look them up.

I stumbeled upon this when I read Chapter 2.1.
In the text it references Miller 2017, but this is not in the footnotes. Ultimately, I found it in the introduction of Chapter 2.
To prevent a search ike mine, it would be nice if all references could be found in a common place.

Need more clarity on 5.1.5.

screen shot 2018-11-02 at 17 47 54

I think this paragraph needs more clarity when explaining how the effects on each feature (for this instance) affect the prediction.

Most of the confusion come from expressions like "unusually little or much".

First and foremost, providing the used instance allows the reader to have a little bit more context.

More concepts that are not clear:

  1. Is "Temperature (2 degrees)" the data from this instance, or meaning that two unit increases?
  2. "contributes less towards the predicted value compared to the average" - what is the average? Average predictions? Mean value of distribution?
  3. "“days_since_2011” unusually much, because this instance is from late 2011 (5 days)." - so the features contributes a lot because it is from late 2011 (only 5 days????). Shouldn't it be 300 days or so?

Thanks a lot for the book, it is very well written, but I do feel that this example needs a little bit more of step-by-step explanation.
e.g.: Providing the instance, showing how it maps to the plot (calculating effects), and cross-referencing to the weight table, in order to have a full grasp of what is going on.

P.S.: It would also be really nice to have the data with the extracted features in order to reproduce the work.

Feature importance disadvantages

I think there could be added some disadvantages in the feature importance chapter.

There is this paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25

Moreover there is the problem of correlated variables. If two variables are highly correlated, their feature importance could be massively decreased, because one variable can substitute the other variable (for example when we grow a lot of trees). Here is a good blogpost about it, there are for sure some papers about this topic, I guess: https://freakonometrics.hypotheses.org/20545

Btw. you should cite Breiman, 2001, not 2011, as he already died in 2005. ;)

Model-agnostic Permutation Feature Importance

Hi Christoph, and thank you for the great book!

In the Permutation Feature Importance section you say:

I haven’t found any paper that generalises permutation feature importance, so that it can be applied model-agnostic. Please drop me a mail if you know about model-agnostic feature importance.

But you propose a model-agnostic version of the algorithm in the same section. Did you forget to remove that couple of sentences? Or am I missing something?

render_book error: Cannot open file '10-references.Rmd'

I'm progressing through your README file and upon executing line 34:

bookdown::render_book('', 'bookdown::gitbook')

I receive the following error:

Error` in file(con, "r") : cannot open the connection In addition: Warning message: In file(con, "r") : cannot open file '10-references.Rmd': No such file or directory

I checked the 'manuscript' folder for the following file: '10-references.Rmd' but did not see it in the folder.

Thank you for your time!

Can't see whole table for 4.3.2.

Here is an image

screen shot 2018-11-16 at 15 28 37

This leads me to believe that the table is incomplete.

I'm using Chrome on a Mac. Also tested on Brave browser (firefox based I believe).
Maybe table is just too long!

Cheers

List local and global surrogate models (like LIME or tree surrogates) alternatives and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of local and global surrogate models (like LIME or tree surrogates) as comments to this issue.

Additions

Brilliant work.


Adding LIME advantages and disadvantages.

Explaining what is meant by sparse explanations - Shapley Value section.

Looking forward to your future work.

List alternatives and software for partial dependence plots

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to alternatives and software implementations of partial dependence plots algorithms as comments to this issue.

explanation on how weights calculated in 5.7.2.1 Example

Very good book, thanks for writing it! :)

Just one issue: would you please give us some explanation on how the weights were calculated in the 5.7.2.1 Example. (second table, last column, after the probs). I understand it is a distance between the generated sample text and the original sentence. Could you please give us some hints on how the distance between sentence (or set of words) were calculated?

Sorry, this issue is addressed somewhere.

Prototypes/criticisms : add disadvantage

I fail to see the difference between the two concepts. As per your example we can see that in the top right figure with 0.316 MMD, the middle point is considered a prototype. But you selected another configuration and the central blob ends as a criticism. Shouldn't be the conclusion to add another prototype ? It appears to me that criticisms are very dependant on the prototypes selected. Isn't that dangerous for interpretability to use two opposite concepts when blobs of data can be one or the other depending on an arbitrary cut off value ?

List shapley value alternatives and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of the shapley value for machine learning as comments to this issue.

Create a pretty version of tree-to-rules graphic

The graphic that shows how a tree can be turned into decision rules is currently drawn by hand.
It would be great to have a pretty version of it, created on a computer.
The format can be SVG (preferred), png or jpg.
The filename is: rulefit.jpg
rulefit

Add GAMs to chapter Interpretable Models

Hi Christoph,

First of all a big bravo and thank you for writing such an interesting book and making it open to everyone!

I would suggest adding a section on generalized additive models, although not very popular in the data science community, these methods are very powerful and simple for balancing accuracy vs. interpretability.

You probably saw the Caruana talks abou GAMS for HealthCare, in any case here goes the link: https://www.microsoft.com/en-us/research/video/intelligible-machine-learning-models-for-healthcare/

Cheers,

Write a chapter about your favorite deep learning interpretability method

There are many interpretability methods for deep neural networks.
The field is very young and most methods are quite new and under development. It's hard for a single person to keep track. This issue serves as a placeholder for interpretability methods specific for deep learning.

Leave a comment if you are interested in adding a chapter. Any method that helps interpreting neural networks is interesting. A good starting point is to take a paper or some code library and explain it in simpler words, possibly with an example.

Some possible starting points:

Software:

Scientific Paper:

Leave a comment if you are interested in writing a chapter!

Decision Trees Chapter

On 4.4.2. on the example you talk about purity (of Gini index).
But you don't reference it anywhere else, which might add to the confusion of the reader.

I have some background so I know what you are talking about, but others will not!

So, either reference a link to Gini Index and what is this purity,
or just give the intuition that purity is relative to the subsets, and can be dirty if it has instances that do not "belong" to that group.

As always, keep up the good work,
Pedro

nonconsistent value in caption and chart

Hi,
Caption to this chart:

```{r ice-cervical-centered, fig.cap=sprintf("Centered ICE plot for predicted cancer probability by age. Lines are fixed to 0 at age %i. Compared to age %i, the predictions for most women remain unchanged until the age of 45 where the predicted probability increases.", min(cervical_subset$Age), min(cervical_subset$Age))}

seems to be ok ('Lines are fixed to 0 at age 13. Compared to age 13'), but it is inconsistent with y-axis label: 'Cancer probability difference to age 18'.

PDF version

Hi Christoph,

lovely, thanks a lot for the book.
I might sound semi-old asking for the possibility to build a PDF version of your book, but in order to read and make notes on e.g. an iPad or other tablet this would make a lot of sense to me.

Is there any way to do so?

List decision rule algorithm alternatives and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of decision rule algorithms as comments to this issue.

Unclear formulation: sensitive M

The explanation of the Monte-Carlo approximation of the shapely value says "It is unclear how to choose a sensitive M".

I'm not entirely sure what you mean by sensitive here.

Semi-related question: couldn't the correlation problem be solved by running a PCA first? (orthogonalize to the feature of interest, then run PCA on the remaining columns).

List logistic regression alternatives and software

I am extending all chapters with a section for software implementations and to alternative algorithms (also with software implementation). The software can be any free and open source software: R, Python, Weka, ...

You can help out by posting links to papers and software implementations of logistic regresision algorithms as comments to this issue.

Empty _book directory after build

After installing all packages and running all commands recommended in README file I have no error messages in console, but no resulting HTML files in _book directory.

What can you recommend to look at (for R beginner)?

Add acknowledgements

  • Chapter contributors (Abi, Verena)
  • smaller fixes (go through PRs)
  • Cover (Yvonne)
  • Images (Shapley: Abi, flaticon; future: this japanese website)
  • Funding from ZD.B
  • Early readers

Error in the RuleFit -> Guidelines -> Disadvantages

In the Disadvantages section of the RuleFit Guidelines you write:

"For example one decision rule (feature) for the bike prediction could be: “temp > 15” and another rule could be “temp > 10 & weather=‘GOOD’”. When the weather is good and the temperature is above 10 degrees, the temperature is automatically also always bigger then 15, which means in the cases where the second rule applies, the first one also always applies."

I think that you have swapped the numbers around and the rules should be:

  • temp > 10, and
  • temp > 15 & weather=‘GOOD’.

temp > 10 & weather=‘GOOD’ does not imply temp > 15, e.g. weather=‘GOOD’ & temp = 13.

LIME: better explain how different features can be used

Better explain the following statement: "the explanations created with local surrogate models can use other features than the original model.". Can I use a feature the model hasn't been trained on ? How would that work for tabular data ?

A good example: model is trained on PCA components of features, but explanation is based on features directly.

Write a chapter about your favorite tree ensemble interpretability method

Tree ensembles like boosted trees (e.g. xgboost) and random forests perform extremely well on tabular data.

A few interpretability methods are created to interpret specifically tree ensembles.

Some pointers:

Leave a comment if you are interested in writing a chapter about a tree ensemble-specific method!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.