pytorch / captum Goto Github PK
View Code? Open in Web Editor NEWModel interpretability and understanding for PyTorch
Home Page: https://captum.ai
License: BSD 3-Clause "New" or "Revised" License
Model interpretability and understanding for PyTorch
Home Page: https://captum.ai
License: BSD 3-Clause "New" or "Revised" License
Hey Guys,
While trying to run this tutorial , I am facing issues in loading the Glove vector. After loading the vector, it is showing me size of vocabulary equal to 2, but ideally it is should be more thn 10000. Can anyone help me out in this ?
My pytorch version is 1.3.1
Torchtext version is 0.5.1
Help me in this. Thanks !
#fix_error
When trying to run the Getting started with Captum Insights tutorial in a Google Colab notebook, I stumbled upon the following issue: When calling visualizer.render(debug=False)
, the result looks like in the screenshot below.
The reason for this behavior is that Captum's render() method does not redirect requests as e.g. shown in TensorBoard's _display_colab() method. While the current implementation works fine with regular IPython notebooks, Colab requires some additional tweaks as described in the TensorBoard code.
Do you have any plans to support Colab or is this even a priority? If no one is already working on this, I could make a PR adding some code similar to TensorBoard as a proof of concept.
Hi @NarineK and captum team, thanks for all the great work on interpretability with PyTorch.
As others here (see #150, #249), I am trying to interpret a BERT classifier finetuned on a binary classification task, using the transformers
library from HuggingFace.
Indeed, I have
model = BertForSequenceClassification.from_pretrained('finetuned-bert-base-cased')
I am not being great at doing this, starting from the SQUAD example https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb
So far, I left almost everything else untouched and redefined
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):
text_ids = tokenizer.encode(text, add_special_tokens=False)
# construct input token ids
input_ids = [cls_token_id] + text_ids + [sep_token_id]
# construct reference token ids
ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]
return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)
which I call with input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
and a custom forward method that reads
def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None, position=0):
outputs = predict(inputs, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
preds = outputs[0]
#preds is like
#tensor([[-1.9723, 2.2183]], grad_fn=<AddmmBackward>)
return torch.tensor([torch.softmax(preds, dim = 1)[0][1]], requires_grad = True)
which I use in lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings)
.
When calling lig.attribute
(as in the tutorial), I get
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Can you help me debug the above? I guess I am messing something up with the custom_forward
method, and maybe also construct_input_ref_pair
... or more.
I am happy to post a working solution once done with this!
Is it possible use captum library for keypoint detection?
I am running 'captum' on OS X 10.11.6 (also Ubuntu 16.04LTS).
The example 'python -m captum.insights.example' gets and Internal Server Error when I try
to connect to http://localhost:51283/ with Safari.
Any ideas?
============================= test session starts ==============================
platform darwin -- Python 3.6.7, pytest-5.0.1, py-1.8.0, pluggy-0.13.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/davidlaxer/captum/.hypothesis/examples')
rootdir: /Users/davidlaxer/captum
plugins: hypothesis-3.88.3
collected 212 items
tests/attr/test_approximation_methods.py .... [ 1%]
tests/attr/test_common.py ........ [ 5%]
tests/attr/test_data_parallel.py ssssssssssssssss [ 13%]
tests/attr/test_deeplift_basic.py ...... [ 16%]
tests/attr/test_deeplift_classification.py .....F.. [ 19%]
tests/attr/test_gradient.py ........ [ 23%]
tests/attr/test_gradient_shap.py ... [ 25%]
tests/attr/test_input_x_gradient.py ......... [ 29%]
tests/attr/test_integrated_gradients_basic.py ........................ [ 40%]
tests/attr/test_integrated_gradients_classification.py ........ [ 44%]
tests/attr/test_internal_influence.py .......... [ 49%]
tests/attr/test_layer_activation.py ...... [ 51%]
tests/attr/test_layer_conductance.py ............. [ 58%]
tests/attr/test_layer_gradient_x_activation.py ...... [ 60%]
tests/attr/test_neuron_conductance.py ......... [ 65%]
tests/attr/test_neuron_gradient.py ........ [ 68%]
tests/attr/test_neuron_integrated_gradients.py ........ [ 72%]
tests/attr/test_saliency.py ......... [ 76%]
tests/attr/test_targets.py ................................... [ 93%]
tests/attr/test_utils_batching.py ......... [ 97%]
tests/attr/models/test_base.py . [ 98%]
tests/attr/models/test_pytext.py ss [ 99%]
tests/insights/test_contribution.py .. [100%]
=================================== FAILURES ===================================
_____________ Test.test_softmax_classification_batch_zero_baseline _____________
self = <tests.attr.test_deeplift_classification.Test testMethod=test_softmax_classification_batch_zero_baseline>
def test_softmax_classification_batch_zero_baseline(self):
num_in = 40
input = torch.arange(0.0, num_in * 3.0, requires_grad=True).reshape(3, num_in)
baselines = 0 * input
model = SoftmaxDeepLiftModel(num_in, 20, 10)
dl = DeepLift(model)
> self.softmax_classification(model, dl, input, baselines)
tests/attr/test_deeplift_classification.py:54:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/attr/test_deeplift_classification.py:117: in softmax_classification
self._assert_attributions(model, attributions, input, baselines, delta, target2)
tests/attr/test_deeplift_classification.py:129: in _assert_attributions
"some samples".format(delta),
E AssertionError: False is not true : The sum of attribution values tensor([0.0008, 0.0023, 0.0039]) is not nearly equal to the difference between the endpoint for some samples
=============================== warnings summary ===============================
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91: DeprecationWarning: IPython.utils.signatures backport for Python 2 is deprecated in IPython 6, which only supports Python 3
from IPython.utils.signatures import signature
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch_4D_input
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_multi_ref
tests/attr/test_deeplift_basic.py::Test::test_relu_linear_deeplift
tests/attr/test_deeplift_basic.py::Test::test_tanh_deeplift
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool1d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool2d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool3d
tests/attr/test_deeplift_classification.py::Test::test_sigmoid_classification
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_zero_baseline
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_single_tensor
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_tensor
/Users/davidlaxer/captum/captum/attr/_core/deep_lift.py:327: UserWarning: Setting forward, backward hooks and attributes on non-linear
activations. The hooks and attributes will be removed
after the attribution is finished
after the attribution is finished"""
tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_layer_conductance.py::Test::test_matching_conv_with_baseline_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool1_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool2_conductance
tests/attr/test_neuron_gradient.py::Test::test_matching_intermediate_gradient
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_relu2
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_input_x_gradient
tests/attr/test_targets.py::Test::test_multi_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_input_x_gradient
tests/attr/test_targets.py::Test::test_simple_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_saliency_tensor
/Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 0 did not already require gradients, required_grads has been set automatically.
"required_grads has been set automatically." % index
tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
/Users/davidlaxer/captum/captum/attr/_utils/gradient.py:34: UserWarning: Input Tensor 1 had a non-zero gradient tensor, which is being reset to 0.
"which is being reset to 0." % index
tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
/Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 2 did not already require gradients, required_grads has been set automatically.
"required_grads has been set automatically." % index
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
/Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 1 did not already require gradients, required_grads has been set automatically.
"required_grads has been set automatically." % index
tests/attr/models/test_base.py::Test::test_interpretable_embedding_base
/Users/davidlaxer/captum/captum/attr/_models/base.py:168: UserWarning: In order to make embedding layers more interpretable they will
be replaced with an interpretable embedding layer which wraps the
original embedding layer and takes word embedding vectors as inputs of
the forward function. This allows to generate baselines for word
embeddings and compute attributions for each embedding dimension.
The original embedding layer must be set
back by calling `remove_interpretable_embedding_layer` function
after model interpretation is finished.
after model interpretation is finished."""
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/colors.py:101: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
ret = np.asscalar(ex)
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:424: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
a_min = np.asscalar(a_min.astype(scaled_dtype))
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
a_max = np.asscalar(a_max.astype(scaled_dtype))
-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================== short test summary info ============================
SKIPPED [1] tests/attr/test_data_parallel.py:116: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:187: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:254: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:38: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:68: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:98: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:137: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:168: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:219: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:24: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:56: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:84: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:123: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:154: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:200: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:235: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/models/test_pytext.py:81: Skip the test since PyText is not installed
SKIPPED [1] tests/attr/models/test_pytext.py:68: Skip the test since PyText is not installed
FAILED tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
======= 1 failed, 193 passed, 18 skipped, 60 warnings in 1188.87 seconds =======
$ python -m captum.insights.example
Fetch data and view Captum Insights at http://localhost:51283/
<IPython.lib.display.IFrame object at 0x1211f1c18>
Hi,
In the tutorial Model Interpretation for Pretrained ResNet Model, for the occlusion experiment, rand_img_dist = torch.cat([input * 0, input * 1])
is defined but never used, maybe you want to remove it.
occlusion = Occlusion(model)
rand_img_dist = torch.cat([input * 0, input * 1])
attributions_occ = occlusion.attribute(input,
strides = (3, 50, 50),
target=pred_label_idx,
sliding_window_shapes=(3,60, 60),
baselines=0)
_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)),
["original_image", "heat_map"],
["all", "positive"],
show_colorbar=True,
outlier_perc=2,
)
Often, in practice, we wish to compute the contributions w.r.t. the logits of the final sigmoid/softmax, rather than w.r.t. the final network output itself. This is to avoid artifacts that can be caused by the saturating nature of the sigmoid/softmax, and comes into play when comparing attributions between examples. It is particularly relevant if gradient*input is used as an attribution method, because for examples with very confident predictions, the sigmoid/softmax outputs tend to saturate and the gradients will approach zero. I'm wondering if it may be worth mentioning this in the documentation - in the current "getting started", the toy model has a sigmoid output:
I'm concerned that a naive user may try to compare the magnitudes of attributions across different examples without realizing that, for sigmoid/softmax outputs, it may be worth removing the final nonlinearity before doing such a comparison. We discuss this in Section 3.6 of the deeplift paper. Ideally there would be an option in Captum to ignore the final nonlinearity, but I realize it may not be trivial to add that option. Sorry if this is already addressed and I missed it.
Hi everyone,
I am applying the integrated gradient method on my dataset which has categorical and numerical data, in which I convert categorical data into embedding and concatenate with numerical. But the output of integrated gradients for all the categorical values are zero and for the numerical ones is calculated correctly.
I have tried to do it with LayerIntegratedGradients but as far as I do not have the developer version of captum installed it failed.
any suggestion?
I am running the example application and wanted to ask if it's possible to set a particular port for the app?
Thanks
Hi all,
Just wanted to put this particular use-case on your radar. Sometimes we find that it is useful to get access to just the gradients ("multipliers"), before they are multiplied by the difference-from-reference to get the final attribution. Specifically, we use the multipliers to estimate how the network might have responded had it seen slightly different inputs. We refer to these estimates as "hypothetical contribution scores". If you are curious how these hypothetical contributions look, here's a notebook (on a fork of the DeepSHAP repository) where I compute hypothetical contributions in the context of genomic data: https://github.com/AvantiShri/shap/blob/0b0350ba3a42af275f6e99ca2e3c5877d7d94f8a/notebooks/deep_explainer/PyTorch%20Deep%20Explainer%20DeepSEA%20example.ipynb
You've all done an awesome job with this repository, and I will definitely point it to the pytorch users in my lab once the release is formally announced. I totally understand if the ability to return just the multipliers is not something that you are likely to incorporate in the main release; I'm sure we can easily fork the repository and add that feature in for our lab's purposes.
Thanks again!
Av
Using Captum v0.1, so I'm not sure whether this happens with current master.
Something I have noticed when trying out DeepLIFT with CNNs is that reusing MaxPool2d layers instead of explicitly defining one per usage results in RuntimeErrors. Maybe this is related to #199
For example, consider the CIFAR10 tutorial.
If we were to change the network structure to just reuse the self.pool1 as follows:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool1 = nn.MaxPool2d(2, 2)
# self.pool2 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
self.relu1 = nn.ReLU()
self.relu2 = nn.ReLU()
self.relu3 = nn.ReLU()
self.relu4 = nn.ReLU()
def forward(self, x):
x = self.pool1(self.relu1(self.conv1(x)))
x = self.pool1(self.relu2(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = self.relu3(self.fc1(x))
x = self.relu4(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Training works just fine, but attributing with DeepLIFT should fail due to size mismatch, such as (unfortunately I can't download the dataset right now, using a local version):
~\envs\lib\site-packages\captum\attr\_core\deep_lift.py in <genexpr>(.0)
282 """
283 delta_in = tuple(
--> 284 inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
285 )
286 delta_out = tuple(
RuntimeError: The size of tensor a (10) must match the size of tensor b (28) at non-singleton dimension 3
Is this a bug or necessary convention? Note that reusing pooling layers actually occurs in official PyTorch tutorials.
When I tried to install the latest version, I got errors below.
error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file
----------------------------------------
ERROR: Command errored out with exit status 1: /root//.pyenv/versions/3.7.4/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lcp60r6o/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
It seems to be caused by the wrong js path, captum/insights/frontend/widget/static/extension.js
.
Dear people of Captum,
I would like to add explanability within https://github.com/nicolas-chaulet/deeppointcloud-benchmarks
How complex would it be to extend Captum to support at least Pytorch Geometric (https://github.com/rusty1s/pytorch_geometric)
Best,
Thomas Chaton
Hi, I am an undergrad student looking to apply Captum's implementation of DeepLift for a Graph Convolution Network
Below is a snippet of the code in the forward function that is causing problems:
to_conv1d = batch_sortpooling_graphs.view((-1, 1, self.k * self.total_latent_dim))
conv1d_res = self.conv1d_params1(to_conv1d)
conv1d_res = self.conv1d_activation(conv1d_res)
conv1d_res = self.maxpool1d(conv1d_res)
conv1d_res = self.conv1d_params2(conv1d_res)
conv1d_res = self.conv1d_activation(conv1d_res)
to_dense = conv1d_res.view(len(graph_sizes), -1)
if self.output_dim > 0:
out_linear = self.out_params(to_dense)
reluact_fp = self.conv1d_activation(out_linear)
else:
reluact_fp = to_dense
return self.conv1d_activation(reluact_fp)
As you can see, my code requires several reshapes of the tensors as it moves from the input to the 1d convolution layer and finally to the dense layer. Running as is gives me the following error:
Traceback (most recent call last):
File "main.py", line 625, in <module>
attribution = dl.attribute(input, additional_forward_args=[15], target=1)
File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 202, in attribute
additional_forward_args=additional_forward_args,
File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_utils/gradient.py", line 92, in compute_gradients
grads = torch.autograd.grad(torch.unbind(output), inputs)
File "/home/user/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 157, in grad
inputs, allow_unused)
File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in _backward_hook
inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in <genexpr>
inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
RuntimeError: The size of tensor a (160) must match the size of tensor b (19) at non-singleton dimension 2
The shapes of each tensors are as follows:
batch_sortpooling_graphs: torch.Size([1, 19, 97])
conv1d_res (immediately after line 1): torch.Size([1, 1, 1843])
to_dense: torch.Size([1, 160])
May I ask if anyone has any idea how to circumvent this such that the DeepLift can work with tensor reshapes? Thank you!
Hi again,
I have a question to ask about the _select_targets function, specifically when used for the DeepLift implementation. I figured out that the output passed into this function is based on the output from the last layer of the architecture. For my architecture, my last layer is a log_softmax. Sorry if it is a silly question but should i return the predicted class (only 2 classes), the loss value or the class probability of the target class as output?
Attached below is the code snippet for _select_targets for your reference.
def _select_targets(output, target):
output = output[0]
num_examples = output.shape[0]
dims = len(output.shape)
if target is None:
return output
elif isinstance(target, int) or isinstance(target, tuple):
return _verify_select_column(output, target)
elif isinstance(target, torch.Tensor):
if torch.numel(target) == 1 and isinstance(target.item(), int):
return _verify_select_column(output, target.item())
elif len(target.shape) == 1 and torch.numel(target) == num_examples:
assert dims == 2, "Output must be 2D to select tensor of targets."
return torch.gather(output, 1, target.reshape(len(output), 1))
else:
raise AssertionError(
"Tensor target dimension %r is not valid." % (target.shape,)
)
elif isinstance(target, list):
assert len(target) == num_examples, "Target list length does not match output!"
if type(target[0]) is int:
assert dims == 2, "Output must be 2D to select tensor of targets."
return torch.gather(output, 1, torch.tensor(target).reshape(len(output), 1))
elif type(target[0]) is tuple:
return torch.stack(
[output[(i,) + targ_elem] for i, targ_elem in enumerate(target)]
)
else:
raise AssertionError("Target element type in list is not valid.")
else:
raise AssertionError("Target type %r is not valid." % target)
I was looking at the code in grad_cam.py for LayerGradCam.attribute() and noticed that there is no Relu operation applied after computing the summed_grads * layer_eval here: https://github.com/pytorch/captum/blob/master/captum/attr/_core/layer/grad_cam.py#L177
The provided vision examples and documentation are excellent for single-class classification, but I am struggling to implement a multi-label use case.
For my use case, I use a single channel image of a cell nucleus as input. The target is a tensor the describes whether or not the cell was positive for each of 22 different protein markers, e.g. tensor([0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 0., 0.], dtype=torch.float64)
...that is, each cell can be positive for multiple markers, not only one. This is a simple multi-label classification task, where my model is the boilerplate torchvision.models.resnet18
with a custom final layer that accommodates the desired output.
I use the CIFAR vision example as a starting point as follows:
But I get AssertionError: Tensor target dimension torch.Size([22]) is not valid.
I see from the docstring for saliency.attribute
that targets/outputs with with greater than two dimensions should be passed as tuples, but when I pass tuple(labels[ind])
instead, I get AssertionError: Cannot choose target column with output shape torch.Size([1, 22]).
.
Ideally, I'd like to set up an AttributionVisualizer that looks like the following mock-up:
...where I can click each element of the prediction (e.g. CK19) and see the corresponding attribution image for that marker.
Any chance that a multi-label classification example like this could be supplied?
Much thanks!
Hi all,
I am using integrated gradient (IG) package from Captum package, which I apply one LSTM on varying length sequences and then I try to get IG from the trained model using the following line of code:
attr, delta = ig.attribute((data, seq_lengths), target=1, return_convergence_delta=True)
but I am getting the following error:
RuntimeError:
lengths
array must be sorted in decreasing order whenenforce_sorted
is True. You can passenforce_sorted=False
to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.
however, I have sorted the lengths of the array in each batch in decreasing order.
please note that If I use this IG without using pack_padded_sequence it works perfectly.
regarding the previous error, I set enforce_sorted=False in pack_padded_sequence but I am getting another error:
RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0
Here is the length of all the samples which none of them are less than zero:
tensor([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20,
14, 10])
any help would be much appreciated.
`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
13
14 from captum.attr import visualization as viz
---> 15 from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
16 from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer
ImportError: cannot import name 'LayerIntegratedGradients'`
Detectron2 has been released https://github.com/facebookresearch/detectron2. It would be very nice to see an example using the default models there.
Hello, I was experimenting with Captum and I was wondering if there was any way to trace/script an attribution model in order to just obtain the final heatmap as output of the serialized file.
I did not find any reference in the documentation nor in the code, and did not manage to integrate it myself by creating intermediate classes to, for example, wrap the Saliency class in a torch.nn.Module one.
Is there something I am missing / is it in the future plans?
I am trying to compute layer conductance in the IMDB tutorial, and I keep getting a scalar issue. Any guidance on how I should pass the input (test_input_tensor) to get the attributions.
cond = LayerConductance(model, model.convs)
cond_vals = cond.attribute(test_input_tensor,target=1)
Thank you!
Hi, i am trying to interpret my intent classification model by using your "IMDB tutorial" and im facing the following error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor
". This error raises during the forward pass of an RNN (lstm) which takes as input a pack sequence (pack_padded_sequence library).
Hi all,
Thank you so much for the invitation to captum. Very grateful to all of you for putting this together! I had a quick question regarding the documentation. Currently, in the arguments description for DeepLiftShap, it says "The first dimension in baseline tensors defines the distribution from which we randomly draw samples" (
captum/captum/attr/_core/deep_lift.py
Line 313 in 6447777
Thanks,
Avanti
I tried to use any of the saliency methods and I get this error:
AttributeError: 'AvgPool2d' object has no attribute 'divisor_override'
Do not understand why that happens?
Looks like additional_forward_args
are passed to target
parameter
Line 227 in d65be28
captum/captum/attr/_utils/common.py
Line 214 in d65be28
Hi folks,
Is there a proper .bib format available for Captum for the purposes of citation in research papers?
Thanks!
Hello,
Can I use this library for object localisation tasks? Would you think you could prepare some very easy tutorial for this? I bet that this would be very helpful for many people since labelling images with bounding boxes or polygons is really time consuming as you know.
Getting import error for Occlusion on running the tutorial Interpreting vision for Resnet.
Error details
ImportError Traceback (most recent call last)
in
15 from captum.attr import IntegratedGradients
16 from captum.attr import GradientShap
---> 17 from captum.attr import Occlusion
18 from captum.attr import NoiseTunnel
19 from captum.attr import visualization as viz
ImportError: cannot import name 'Occlusion' from 'captum.attr' (/home/ubuntu/opt/anaconda3/envs/pytorch/lib/python3.7/site-packages/captum/attr/init.py)
When I try to run Captum Insights from a SageMaker notebook terminal on port 6006 by browsing to <sagemaker_notebook_address>/proxy/6006/
, the tab name shows "Captum Insights", but the web page is blank. The same method works fine on my local system, or fine with tensorboard/flask apps through SageMaker. It seems to be a problem with Captum+SageMaker specifically.
Alternatively, when attempting to run tutorials/CIFAR_TorchVision_Captum_Insights.ipynb
I get this error from within a notebook:
(I get the same error with visualizer.render()
, just with less details)
I upgraded my SageMaker pytorch_p36
conda environment to torch==1.3.0
. I installed captum from source with git clone https://github.com/pytorch/captum.git
and then installed Insights with:
conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop
Then ran the example with python captum/insights/example.py
And tried to access via <sagemaker_notebook_address>/proxy/6006/
(the same way I access a running tensorboard server)
I also tried it with/without modifying line 66 in insights/server.py
from tcp.bind(("", 0))
to tcp.bind(("", 6006))
in order to use port 6006 (since this port seemed to work fine for running a tensorboard server).
Hello,
firstly, thank you for this outstanding library!
I was wondering whether direct support for Layer-Wise Relevance Propagation or Deep Taylor Decomposition in general is planned or being worked on.
Since these methods are rather well known, surely they would make a good addition. Standalone/Keras implementations are available on GitHub, too.
Thank you for considering this feature request!
Best,
Eric
I was trying to reproduce the Interpreting text models: IMDB Sentiment Analysis
but training my model instead of just loading a pretrained one.
I adapted the code of the original CNN tutorial but when I get to the point of calling interpret_sentence
the following error occurs:
RuntimeError Traceback (most recent call last)
<ipython-input-23-68d49a3d040b> in <module>()
----> 1 interpret_sentence(model, 'It was a fantastic performance !', label=1)
2 interpret_sentence(model, 'Best film ever', label=1)
3 interpret_sentence(model, 'Such a great show!', label=1)
4 interpret_sentence(model, 'It was a horrible movie', label=0)
5 interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
2 frames
<ipython-input-22-cbf5d478566f> in interpret_sentence(model, sentence, min_len, label)
29 # compute attributions and approximation delta using integrated gradients
30 attributions_ig, delta = ig.attribute(
---> 31 input_embedding, reference_embedding, n_steps=500, return_convergence_delta=True
32 )
33
/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
232 end_point,
233 additional_forward_args=additional_forward_args,
--> 234 target=target,
235 )
236 return _format_attributions(is_inputs_tuple, attributions), delta
/usr/local/lib/python3.6/dist-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
232 row_sums = [_sum_rows(attribution) for attribution in attributions]
233 attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234 return attr_sum - (end_point - start_point)
235
236
RuntimeError: expected device cpu but got device cuda:0
I am not sure, but I suppose the problem is that torch.tensor
being created without any device
argument. Can I work around this issue?
In this Colab Notebook you can reproduce the error.
When I built a python wheel package for captum with the following command:
BUILD_INSIGHTS=1 python setup.py bdist_wheel --python-tag py3
I got an error message:
error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file
I found some errors on the setup.py
file, where the paths for extension.js
, index.js
and index.js.map
were not correct.
One solution is the following:
diff --git a/setup.py b/setup.py
index 87f5068..ee0a379 100755
--- a/setup.py
+++ b/setup.py
@@ -150,9 +150,9 @@ if __name__ == "__main__":
(
"share/jupyter/nbextensions/jupyter-captum-insights",
[
- "captum/insights/frontend/widget/static/extension.js",
- "captum/insights/frontend/widget/static/index.js",
- "captum/insights/frontend/widget/static/index.js.map",
+ "captum/insights/widget/static/extension.js",
+ "captum/insights/widget/static/index.js",
+ "captum/insights/widget/static/index.js.map",
],
),
(
Hi,
I am currently integrating Captum into my deep learning tool kit, thx for providing this lib.
When I try to run IntegratedGradients on a standard densenet201 model that is on a cuda device (11GB vram), I am getting an out-of-memory error even for one input image.
Just a quick check: Is this normal behaviour?
Can I use captum in semantic segmentation tasks?
For the toy example with cuda
model = ToyModel()
model = model.cuda()
model.eval()
input = torch.rand(2, 3).cuda()
baseline = torch.zeros(2, 3).cuda()
ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)
fails with the error
~/anaconda3/envs/heterokaryon/lib/python3.7/site-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
232 row_sums = [_sum_rows(attribution) for attribution in attributions]
233 attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234 return attr_sum - (end_point - start_point)
235
236
RuntimeError: expected device cpu and dtype Float but got device cuda:0 and dtype Float
presumably since attr_sum
is not on GPU. Turning return_convergence_delta
to False
results in no error.
Similar issues may arise in other places, though I haven't checked.
I am working with a number of models from the torchreid library. When I use DeepLift
on these models, some work and some do not. For example, the DenseNet, MLFN, and MuDeep models work fine, but the OSNet, ResNetMid, and ResNet-50 (and some others) model do not. (N.B. I modified the models to not use inplace=True
for nn.ReLU()
.)
These models that fail usually fail with an error along the lines of 'Sigmoid' object has no attribute 'input'
(though it also fails for the same reason if ReLU
is used), however I can't see what I need to change in this model in order for it to work with DeepLift
.
What is different about these models that cause this error? I understand the error message, but I don't understand why the the module doesn't have an input
attribute.
The hyperlink "Getting Started with Captum" on this page is pointing to pytorch.org rather than the correct tutorial page.
Cannot build and launch Captum insights on Linux Ubutnu18.04 (inside VM VirtualBox):
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ conda install -c conda-forge yarn
Collecting package metadata (repodata.json): done
Solving environment: doneAll requested packages already installed.
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ BUILD_INSIGHTS=1 python setup.py develop
-- Building version 0.2.0
-- Building Captum Insights
Running: ./scripts/build_insights.sh
~/eStep/XAI/Software/captum/captum/insights/frontend ~/eStep/XAI/Software/captumInstall Dependencies
yarn install v1.22.0
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning " > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0-0".
warning "@babel/plugin-proposal-class-properties > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "webpack@>=2".
warning "react-scripts > @typescript-eslint/eslint-plugin > [email protected]" has unmet peer dependency "typescript@>=2.8.0 || >= 3.2.0-dev || >= 3.3.0-dev || >= 3.4.0-dev || >= 3.5.0-dev || >= 3.6.0-dev || >= 3.6.0-beta || >= 3.7.0-dev || >= 3.7.0-beta".
warning " > [email protected]" has unmet peer dependency "prop-types@^15.0.0".
warning " > [email protected]" has unmet peer dependency "[email protected]".
error An unexpected error occurred: "EPERM: operation not permitted, symlink '../../../parser/bin/babel-parser.js' -> '/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/node_modules/@babel/core/node_modules/.bin/parser'".
info If you think this is a bug, please open a bug report with the information provided in "/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Traceback (most recent call last):
File "setup.py", line 105, in
build_insights()
File "setup.py", line 88, in build_insights
subprocess.check_call(command)
File "/home/elena/anaconda3/envs/captum/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command './scripts/build_insights.sh' returned non-zero exit status 1.
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$
Hi,
I was trying to use captum.attr._core.layer_activation.LayerActivation
to get the activation of the first convolutional layer in a simple model. Here is my code:
torch.manual_seed(23)
np.random.seed(23)
model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(inplace=True),
nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(inplace=True))
layer_act = LayerActivation(model, model[0])
input = torch.randn(1, 3, 5, 5)
mylayer = model[0]
print(torch.norm(mylayer(input) - layer_act.attribute(input), p=2))
In fact, I have computed the activation in two different ways and compared them afterwards. Obviously, I expected a value close to zero to be printed as the output, however, this is what I got:
tensor(3.4646, grad_fn=<NormBackward0>)
I hypothesize that the inplace ReLU
layer after the convolutional layer acts on its output since there were many zeros in the activation computed by Captum ( i.e. layer_act.attribute(input)
). In fact, when I changed the architecture of the network to the following:
model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(inplace=True))
then the outputs matched.
System information
Hi there,
I tried to apply Captum Tutorial for Q&A to Bert Sentence Classification task, but I am facing difficulties to adapt baselines / references part of the code for Classification and the new HugginFace Tokenizer.
Just want to check if someone is working in the same topic, so we can share experiences.
Commit : c7583e3
File : tutorials/CIFAR_TorchVision_Interpret.ipynb
Cell : 12, Line : 3
When : passing NoiseTunnel() as algorithm into attribute_image_features()
TypeError : attribute() got an unexpected keyword argument 'noise_frac'
in readme,
Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the second target output.
and then target=0, is set, should it be first target output?
Hello,
Kudos for the great work. I believe this has great potential.
I wonder what is in your roadmap, especially regarding perturbation-based attribution methods (Occlusion, LIME/KernelSHAP, Shapley Value sampling, etc.).
Are these planned at all? While being orders of magnitude slower, these methods have the advantage that they can be applied to any black-box model (ie. any network architecture is supported out-of-the-box, with no need to instrument layers or implement custom modules). The implementation into Captum should be easier too. Moreover, Shapley Value attributions have unique theoretical properties that might be important when speed is not critical.
While it makes sense to focus on gradient-based methods first, maybe the structure of the library should be such that these methods can be easily added in the future.
Hi all,
I am wondering if there are examples that I could learn to use Captum for regression problem as well as using volume data. My problem setting is feeding volume data with WxHxD (64x64x64) to a 3D convnet which has only one neuron in the top layer that output a real number. Thanks.
captum/captum/attr/_core/gradient_shap.py
Lines 16 to 35 in 5231c2e
According to the docs, the baselines
parameter in the attribute
method of GradientShap
is optional, and is replaced with a zero-filled Tensor as the same size as the input if not provided. However at the moment it's a required argument.
Hi,
Thanks for the great work. The LSTM tutorial looks very nice.
Are any suggestions on how to use Captum for Transformer-based / BERT-like pre-trained contextualized word embeddings? If I want to see the attribution of each token in the word embedding layer, is it that I'd also need the FFN layer for fine-tuning downstream tasks in order to get the gradients? The current code is implemented with torch/text; would really appreciate it if you could some hints how to integrate it with BERT models(e.g. huggingface/transformers).
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.