Official code for using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" (ICLR 2019). This code produces hierarchical interpretations for different types of trained neural networks in pytorch.
Note: this repo is actively maintained. For any questions please file an issue.
- fully-contained data/models/code for reproducing and experimenting with ACD
- see the reproduce_figs folder for notebooks with examples of using ACD to reproduce figures in the paper
- try your own examples on these models with simple alterations to the notebooks
- for additional details, see the acd folder which contains a bulk of the code for getting importance scores and aggregating them
- the dsets folder contains small data/models/dataloaders for using different data sets
- more documentation is provided in readmes of subfolders and inline comments
- allows for different types of interpretations by changing hyperparameters (explained in examples)
- tested with python3 and pytorch 1.0 with/without gpu (although cpu doesn't work very well for vision models)
Inspecting NLP sentiment models | Detecting adversarial examples | Analyzing imagenet models |
---|---|---|
- to use ACD on your own model, replace the models in the examples with your own trained models. Specifically, 3 things must be altered:
- the pred_ims function must be replaced by a function you write using your own trained model. This function gets predictions from a model given a batch of examples.
- the model must be replaced with your model
- the current CD implementation doesn't always work for all types of networks. If you are getting an error inside of
cd.py
, you may need to write a custom function that iterates through the layers of your network (for examples seecd.py
)
- this work is part of an overarching project on interpretable machine learning, guided by the PDR framework
- for related work, see the github repo for contextual decomposition (ICLR 2018)
- the file scores/score_funcs.py also contains simple pytorch implementations of integrated gradients and the simple interpration technique gradient * input
-
feel free to use/share this code openly
-
if you find this code useful for your research, please cite the following:
@article{singh2018hierarchical, title={Hierarchical interpretations for neural network predictions}, author={Singh, Chandan and Murdoch, W James and Yu, Bin}, journal={arXiv preprint arXiv:1806.05337}, year={2018} }