Coder Social home page Coder Social logo

pytorch / captum Goto Github PK

View Code? Open in Web Editor NEW
4.6K 239.0 464.0 283.28 MB

Model interpretability and understanding for PyTorch

Home Page: https://captum.ai

License: BSD 3-Clause "New" or "Revised" License

Python 94.92% Shell 0.56% JavaScript 1.70% Makefile 0.02% Batchfile 0.03% CSS 1.53% HTML 0.01% TypeScript 1.21%
interpretability interpretable-ai interpretable-ml feature-importance feature-attribution

captum's People

Contributors

99warriors avatar agaction avatar amyreese avatar aobo-y avatar bilalsal avatar caraya10 avatar cicichen01 avatar crawlingcub avatar cyrjano avatar diegoolano avatar dkrako avatar edward-io avatar gabrieltseng avatar j0nreynolds avatar jessijzhao avatar miguelmartin75 avatar mruberry avatar nanohanno avatar narinek avatar orionr avatar pingjunchen avatar progamergov avatar reubend avatar shubhammuttepawar avatar shuwenw avatar stanislavglebik avatar thatch avatar vivekmig avatar yucu avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

captum's Issues

Not able to Load Vectors

Hey Guys,

While trying to run this tutorial , I am facing issues in loading the Glove vector. After loading the vector, it is showing me size of vocabulary equal to 2, but ideally it is should be more thn 10000. Can anyone help me out in this ?

image

My pytorch version is 1.3.1
Torchtext version is 0.5.1

Help me in this. Thanks !

#fix_error

Captum Insights not working in Google Colab

When trying to run the Getting started with Captum Insights tutorial in a Google Colab notebook, I stumbled upon the following issue: When calling visualizer.render(debug=False), the result looks like in the screenshot below.

Screenshot 2019-10-11 at 21 05 48

The reason for this behavior is that Captum's render() method does not redirect requests as e.g. shown in TensorBoard's _display_colab() method. While the current implementation works fine with regular IPython notebooks, Colab requires some additional tweaks as described in the TensorBoard code.

Do you have any plans to support Colab or is this even a priority? If no one is already working on this, I could make a PR adding some code similar to TensorBoard as a proof of concept.

How to intepret BERT for SequenceClassification?

Hi @NarineK and captum team, thanks for all the great work on interpretability with PyTorch.

As others here (see #150, #249), I am trying to interpret a BERT classifier finetuned on a binary classification task, using the transformers library from HuggingFace.
Indeed, I have

model = BertForSequenceClassification.from_pretrained('finetuned-bert-base-cased')

I am not being great at doing this, starting from the SQUAD example https://github.com/pytorch/captum/blob/master/tutorials/Bert_SQUAD_Interpret.ipynb

So far, I left almost everything else untouched and redefined

def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, add_special_tokens=False)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

which I call with input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id) and a custom forward method that reads

def custom_forward(inputs, token_type_ids=None, position_ids=None, attention_mask=None, position=0):
    outputs = predict(inputs, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask)
    preds = outputs[0]
   #preds is like
   #tensor([[-1.9723,  2.2183]], grad_fn=<AddmmBackward>)
    return torch.tensor([torch.softmax(preds, dim = 1)[0][1]], requires_grad = True)

which I use in lig = LayerIntegratedGradients(custom_forward, model.bert.embeddings).

When calling lig.attribute (as in the tutorial), I get

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Can you help me debug the above? I guess I am messing something up with the custom_forward method, and maybe also construct_input_ref_pair... or more.

I am happy to post a working solution once done with this!

Internal Server Error

I am running 'captum' on OS X 10.11.6 (also Ubuntu 16.04LTS).
The example 'python -m captum.insights.example' gets and Internal Server Error when I try
to connect to http://localhost:51283/ with Safari.

Any ideas?

============================= test session starts ==============================
platform darwin -- Python 3.6.7, pytest-5.0.1, py-1.8.0, pluggy-0.13.0
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/davidlaxer/captum/.hypothesis/examples')
rootdir: /Users/davidlaxer/captum
plugins: hypothesis-3.88.3
collected 212 items                                                            

tests/attr/test_approximation_methods.py ....                            [  1%]
tests/attr/test_common.py ........                                       [  5%]
tests/attr/test_data_parallel.py ssssssssssssssss                        [ 13%]
tests/attr/test_deeplift_basic.py ......                                 [ 16%]
tests/attr/test_deeplift_classification.py .....F..                      [ 19%]
tests/attr/test_gradient.py ........                                     [ 23%]
tests/attr/test_gradient_shap.py ...                                     [ 25%]
tests/attr/test_input_x_gradient.py .........                            [ 29%]
tests/attr/test_integrated_gradients_basic.py ........................   [ 40%]
tests/attr/test_integrated_gradients_classification.py ........          [ 44%]
tests/attr/test_internal_influence.py ..........                         [ 49%]
tests/attr/test_layer_activation.py ......                               [ 51%]
tests/attr/test_layer_conductance.py .............                       [ 58%]
tests/attr/test_layer_gradient_x_activation.py ......                    [ 60%]
tests/attr/test_neuron_conductance.py .........                          [ 65%]
tests/attr/test_neuron_gradient.py ........                              [ 68%]
tests/attr/test_neuron_integrated_gradients.py ........                  [ 72%]
tests/attr/test_saliency.py .........                                    [ 76%]
tests/attr/test_targets.py ...................................           [ 93%]
tests/attr/test_utils_batching.py .........                              [ 97%]
tests/attr/models/test_base.py .                                         [ 98%]
tests/attr/models/test_pytext.py ss                                      [ 99%]
tests/insights/test_contribution.py ..                                   [100%]

=================================== FAILURES ===================================
_____________ Test.test_softmax_classification_batch_zero_baseline _____________

self = <tests.attr.test_deeplift_classification.Test testMethod=test_softmax_classification_batch_zero_baseline>

    def test_softmax_classification_batch_zero_baseline(self):
        num_in = 40
        input = torch.arange(0.0, num_in * 3.0, requires_grad=True).reshape(3, num_in)
        baselines = 0 * input
    
        model = SoftmaxDeepLiftModel(num_in, 20, 10)
        dl = DeepLift(model)
    
>       self.softmax_classification(model, dl, input, baselines)

tests/attr/test_deeplift_classification.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/attr/test_deeplift_classification.py:117: in softmax_classification
    self._assert_attributions(model, attributions, input, baselines, delta, target2)
tests/attr/test_deeplift_classification.py:129: in _assert_attributions
    "some samples".format(delta),
E   AssertionError: False is not true : The sum of attribution values tensor([0.0008, 0.0023, 0.0039]) is not nearly equal to the difference between the endpoint for some samples
=============================== warnings summary ===============================
/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/lib/pretty.py:91: DeprecationWarning: IPython.utils.signatures backport for Python 2 is deprecated in IPython 6, which only supports Python 3
    from IPython.utils.signatures import signature

/Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/IPython/utils/module_paths.py:28: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_batch_4D_input
tests/attr/test_deeplift_basic.py::Test::test_relu_deeplift_multi_ref
tests/attr/test_deeplift_basic.py::Test::test_relu_linear_deeplift
tests/attr/test_deeplift_basic.py::Test::test_tanh_deeplift
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool1d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool2d
tests/attr/test_deeplift_classification.py::Test::test_convnet_with_maxpool3d
tests/attr/test_deeplift_classification.py::Test::test_sigmoid_classification
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_multi_baseline
tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_zero_baseline
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_single_tensor
tests/attr/test_targets.py::Test::test_simple_target_deep_lift_shap_tensor
  /Users/davidlaxer/captum/captum/attr/_core/deep_lift.py:327: UserWarning: Setting forward, backward hooks and attributes on non-linear
                 activations. The hooks and attributes will be removed
              after the attribution is finished
    after the attribution is finished"""

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_layer_conductance.py::Test::test_matching_conv_with_baseline_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool1_conductance
tests/attr/test_layer_conductance.py::Test::test_matching_pool2_conductance
tests/attr/test_neuron_gradient.py::Test::test_matching_intermediate_gradient
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_input_relu2
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
tests/attr/test_targets.py::Test::test_multi_target_deep_lift
tests/attr/test_targets.py::Test::test_multi_target_input_x_gradient
tests/attr/test_targets.py::Test::test_multi_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_deep_lift
tests/attr/test_targets.py::Test::test_simple_target_input_x_gradient
tests/attr/test_targets.py::Test::test_simple_target_saliency
tests/attr/test_targets.py::Test::test_simple_target_saliency_tensor
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 0 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:34: UserWarning: Input Tensor 1 had a non-zero gradient tensor, which is being reset to 0.
    "which is being reset to 0." % index

tests/attr/test_gradient.py::Test::test_apply_gradient_reqs
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 2 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear1
tests/attr/test_neuron_gradient.py::Test::test_simple_gradient_multi_input_linear2
  /Users/davidlaxer/captum/captum/attr/_utils/gradient.py:27: UserWarning: Input Tensor 1 did not already require gradients, required_grads has been set automatically.
    "required_grads has been set automatically." % index

tests/attr/models/test_base.py::Test::test_interpretable_embedding_base
  /Users/davidlaxer/captum/captum/attr/_models/base.py:168: UserWarning: In order to make embedding layers more interpretable they will
          be replaced with an interpretable embedding layer which wraps the
          original embedding layer and takes word embedding vectors as inputs of
          the forward function. This allows to generate baselines for word
          embeddings and compute attributions for each embedding dimension.
          The original embedding layer must be set
          back by calling `remove_interpretable_embedding_layer` function
          after model interpretation is finished.
    after model interpretation is finished."""

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/colors.py:101: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    ret = np.asscalar(ex)

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:424: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_min = np.asscalar(a_min.astype(scaled_dtype))

tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_multi_features
tests/insights/test_contribution.py::Test::test_one_feature
tests/insights/test_contribution.py::Test::test_one_feature
  /Users/davidlaxer/anaconda/envs/ai/lib/python3.6/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
    a_max = np.asscalar(a_max.astype(scaled_dtype))

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=========================== short test summary info ============================
SKIPPED [1] tests/attr/test_data_parallel.py:116: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:187: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:254: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:38: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:68: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:98: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:137: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:168: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:219: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:24: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:56: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:84: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:123: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:154: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:200: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/test_data_parallel.py:235: Skipping GPU test since CUDA not available.
SKIPPED [1] tests/attr/models/test_pytext.py:81: Skip the test since PyText is not installed
SKIPPED [1] tests/attr/models/test_pytext.py:68: Skip the test since PyText is not installed
FAILED tests/attr/test_deeplift_classification.py::Test::test_softmax_classification_batch_zero_baseline
======= 1 failed, 193 passed, 18 skipped, 60 warnings in 1188.87 seconds =======

$ python -m captum.insights.example

Fetch data and view Captum Insights at http://localhost:51283/

<IPython.lib.display.IFrame object at 0x1211f1c18>

Screen Shot 2019-10-18 at 9 08 09 AM

rand_img_dist defined but not used in the official tutorial

Hi,

In the tutorial Model Interpretation for Pretrained ResNet Model, for the occlusion experiment, rand_img_dist = torch.cat([input * 0, input * 1]) is defined but never used, maybe you want to remove it.

occlusion = Occlusion(model)

rand_img_dist = torch.cat([input * 0, input * 1])
attributions_occ = occlusion.attribute(input,
                                       strides = (3, 50, 50),
                                       target=pred_label_idx,
                                       sliding_window_shapes=(3,60, 60),
                                       baselines=0)

_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      ["original_image", "heat_map"],
                                      ["all", "positive"],
                                      show_colorbar=True,
                                      outlier_perc=2,
                                     )

Computing contributions w.r.t. logits rather than final activations

Often, in practice, we wish to compute the contributions w.r.t. the logits of the final sigmoid/softmax, rather than w.r.t. the final network output itself. This is to avoid artifacts that can be caused by the saturating nature of the sigmoid/softmax, and comes into play when comparing attributions between examples. It is particularly relevant if gradient*input is used as an attribution method, because for examples with very confident predictions, the sigmoid/softmax outputs tend to saturate and the gradients will approach zero. I'm wondering if it may be worth mentioning this in the documentation - in the current "getting started", the toy model has a sigmoid output:

Screenshot 2019-10-08 at 2 19 52 PM

I'm concerned that a naive user may try to compare the magnitudes of attributions across different examples without realizing that, for sigmoid/softmax outputs, it may be worth removing the final nonlinearity before doing such a comparison. We discuss this in Section 3.6 of the deeplift paper. Ideally there would be an option in Captum to ignore the final nonlinearity, but I realize it may not be trivial to add that option. Sorry if this is already addressed and I missed it.

how would it be possible to calculate the Integrated gradients for model which has embedding

Hi everyone,

I am applying the integrated gradient method on my dataset which has categorical and numerical data, in which I convert categorical data into embedding and concatenate with numerical. But the output of integrated gradients for all the categorical values are zero and for the numerical ones is calculated correctly.
I have tried to do it with LayerIntegratedGradients but as far as I do not have the developer version of captum installed it failed.
any suggestion?

captum insights port?

I am running the example application and wanted to ask if it's possible to set a particular port for the app?

Thanks

Returning only the gradients/"multipliers"

Hi all,

Just wanted to put this particular use-case on your radar. Sometimes we find that it is useful to get access to just the gradients ("multipliers"), before they are multiplied by the difference-from-reference to get the final attribution. Specifically, we use the multipliers to estimate how the network might have responded had it seen slightly different inputs. We refer to these estimates as "hypothetical contribution scores". If you are curious how these hypothetical contributions look, here's a notebook (on a fork of the DeepSHAP repository) where I compute hypothetical contributions in the context of genomic data: https://github.com/AvantiShri/shap/blob/0b0350ba3a42af275f6e99ca2e3c5877d7d94f8a/notebooks/deep_explainer/PyTorch%20Deep%20Explainer%20DeepSEA%20example.ipynb

You've all done an awesome job with this repository, and I will definitely point it to the pytorch users in my lab once the release is formally announced. I totally understand if the ability to return just the multipliers is not something that you are likely to incorporate in the main release; I'm sure we can easily fork the repository and add that feature in for our lab's purposes.

Thanks again!
Av

DeepLIFT fails when reusing MaxPool2d layer

Using Captum v0.1, so I'm not sure whether this happens with current master.

Something I have noticed when trying out DeepLIFT with CNNs is that reusing MaxPool2d layers instead of explicitly defining one per usage results in RuntimeErrors. Maybe this is related to #199

For example, consider the CIFAR10 tutorial.
If we were to change the network structure to just reuse the self.pool1 as follows:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        # self.pool2 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu1 = nn.ReLU()
        self.relu2 = nn.ReLU()
        self.relu3 = nn.ReLU()
        self.relu4 = nn.ReLU()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool1(self.relu2(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = self.relu3(self.fc1(x))
        x = self.relu4(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

Training works just fine, but attributing with DeepLIFT should fail due to size mismatch, such as (unfortunately I can't download the dataset right now, using a local version):

~\envs\lib\site-packages\captum\attr\_core\deep_lift.py in <genexpr>(.0)
    282          """
    283         delta_in = tuple(
--> 284             inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
    285         )
    286         delta_out = tuple(

RuntimeError: The size of tensor a (10) must match the size of tensor b (28) at non-singleton dimension 3

Is this a bug or necessary convention? Note that reusing pooling layers actually occurs in official PyTorch tutorials.

Cannot install the latest version

When I tried to install the latest version, I got errors below.


    error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file
    ----------------------------------------

ERROR: Command errored out with exit status 1: /root//.pyenv/versions/3.7.4/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-xijz5fxd/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-lcp60r6o/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

It seems to be caused by the wrong js path, captum/insights/frontend/widget/static/extension.js.

Dealing with .view in DeepLift

Hi, I am an undergrad student looking to apply Captum's implementation of DeepLift for a Graph Convolution Network

Below is a snippet of the code in the forward function that is causing problems:

to_conv1d = batch_sortpooling_graphs.view((-1, 1, self.k * self.total_latent_dim))
    conv1d_res = self.conv1d_params1(to_conv1d)
    conv1d_res = self.conv1d_activation(conv1d_res)
    conv1d_res = self.maxpool1d(conv1d_res)
    conv1d_res = self.conv1d_params2(conv1d_res)
    conv1d_res = self.conv1d_activation(conv1d_res)

    to_dense = conv1d_res.view(len(graph_sizes), -1)

    if self.output_dim > 0:
        out_linear = self.out_params(to_dense)
        reluact_fp = self.conv1d_activation(out_linear)
    else:
        reluact_fp = to_dense
    return self.conv1d_activation(reluact_fp)

As you can see, my code requires several reshapes of the tensors as it moves from the input to the 1d convolution layer and finally to the dense layer. Running as is gives me the following error:

Traceback (most recent call last):
  File "main.py", line 625, in <module>
    attribution = dl.attribute(input, additional_forward_args=[15], target=1)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 202, in attribute
    additional_forward_args=additional_forward_args,
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_utils/gradient.py", line 92, in compute_gradients
    grads = torch.autograd.grad(torch.unbind(output), inputs)
  File "/home/user/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in _backward_hook
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
  File "/home/user/.local/lib/python3.6/site-packages/captum/attr/_core/deep_lift.py", line 284, in <genexpr>
    inp - inp_ref for inp, inp_ref in zip(module.input, module.input_ref)
RuntimeError: The size of tensor a (160) must match the size of tensor b (19) at non-singleton dimension 2

The shapes of each tensors are as follows:

batch_sortpooling_graphs: torch.Size([1, 19, 97])
conv1d_res (immediately after line 1): torch.Size([1, 1, 1843])
to_dense: torch.Size([1, 160])

May I ask if anyone has any idea how to circumvent this such that the DeepLift can work with tensor reshapes? Thank you!

What is the desired output for _select_targets in common.py?

Hi again,

I have a question to ask about the _select_targets function, specifically when used for the DeepLift implementation. I figured out that the output passed into this function is based on the output from the last layer of the architecture. For my architecture, my last layer is a log_softmax. Sorry if it is a silly question but should i return the predicted class (only 2 classes), the loss value or the class probability of the target class as output?

Attached below is the code snippet for _select_targets for your reference.

def _select_targets(output, target):
    output = output[0]
    num_examples = output.shape[0]
    dims = len(output.shape)

    if target is None:
        return output
    elif isinstance(target, int) or isinstance(target, tuple):
        return _verify_select_column(output, target)
    elif isinstance(target, torch.Tensor):
        if torch.numel(target) == 1 and isinstance(target.item(), int):
            return _verify_select_column(output, target.item())
        elif len(target.shape) == 1 and torch.numel(target) == num_examples:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, target.reshape(len(output), 1))
        else:
            raise AssertionError(
                "Tensor target dimension %r is not valid." % (target.shape,)
            )
    elif isinstance(target, list):
        assert len(target) == num_examples, "Target list length does not match output!"
        if type(target[0]) is int:
            assert dims == 2, "Output must be 2D to select tensor of targets."
            return torch.gather(output, 1, torch.tensor(target).reshape(len(output), 1))
        elif type(target[0]) is tuple:
            return torch.stack(
                [output[(i,) + targ_elem] for i, targ_elem in enumerate(target)]
            )
        else:
            raise AssertionError("Target element type in list is not valid.")
    else:
        raise AssertionError("Target type %r is not valid." % target)

Request: example with multilabel attribution

The provided vision examples and documentation are excellent for single-class classification, but I am struggling to implement a multi-label use case.

For my use case, I use a single channel image of a cell nucleus as input. The target is a tensor the describes whether or not the cell was positive for each of 22 different protein markers, e.g. tensor([0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 0., 0., 0.], dtype=torch.float64)
...that is, each cell can be positive for multiple markers, not only one. This is a simple multi-label classification task, where my model is the boilerplate torchvision.models.resnet18 with a custom final layer that accommodates the desired output.

I use the CIFAR vision example as a starting point as follows:
image

But I get AssertionError: Tensor target dimension torch.Size([22]) is not valid. I see from the docstring for saliency.attribute that targets/outputs with with greater than two dimensions should be passed as tuples, but when I pass tuple(labels[ind]) instead, I get AssertionError: Cannot choose target column with output shape torch.Size([1, 22])..

Ideally, I'd like to set up an AttributionVisualizer that looks like the following mock-up:

image

...where I can click each element of the prediction (e.g. CK19) and see the corresponding attribution image for that marker.

Any chance that a multi-label classification example like this could be supplied?

Much thanks!

Integrated gradients using with pack_padded_sequence returns error

Hi all,

I am using integrated gradient (IG) package from Captum package, which I apply one LSTM on varying length sequences and then I try to get IG from the trained model using the following line of code:

attr, delta = ig.attribute((data, seq_lengths), target=1, return_convergence_delta=True)

but I am getting the following error:

RuntimeError: lengths array must be sorted in decreasing order when enforce_sorted is True. You can pass enforce_sorted=False to pack_padded_sequence and/or pack_sequence to sidestep this requirement if you do not need ONNX exportability.

however, I have sorted the lengths of the array in each batch in decreasing order.
please note that If I use this IG without using pack_padded_sequence it works perfectly.

regarding the previous error, I set enforce_sorted=False in pack_padded_sequence but I am getting another error:

RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0

Here is the length of all the samples which none of them are less than zero:

tensor([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 20,
14, 10])

any help would be much appreciated.

ImportError: cannot import name 'LayerIntegratedGradients'

`---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
13
14 from captum.attr import visualization as viz
---> 15 from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
16 from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

ImportError: cannot import name 'LayerIntegratedGradients'`

On custom architecture

@orionr @zpao @asmeurer @asuhan @kostmo great work by the team , this is what i was looking for , i have few queries .

  1. can captum used be used for architectures like object detection and semantic segmentation
  2. would i be able to see the intermediate learnings during training

Scripting/tracing Captum classes

Hello, I was experimenting with Captum and I was wondering if there was any way to trace/script an attribution model in order to just obtain the final heatmap as output of the serialized file.

I did not find any reference in the documentation nor in the code, and did not manage to integrate it myself by creating intermediate classes to, for example, wrap the Saliency class in a torch.nn.Module one.

Is there something I am missing / is it in the future plans?

Computing LayerConductance in IMDB sentiment analysis Tutorial

I am trying to compute layer conductance in the IMDB tutorial, and I keep getting a scalar issue. Any guidance on how I should pass the input (test_input_tensor) to get the attributions.

cond = LayerConductance(model, model.convs)
cond_vals = cond.attribute(test_input_tensor,target=1)

Thank you!

"RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor "

Hi, i am trying to interpret my intent classification model by using your "IMDB tutorial" and im facing the following error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor
". This error raises during the forward pass of an RNN (lstm) which takes as input a pack sequence (pack_padded_sequence library).

Documentation on `baseline` argument in DeepLiftShap

Hi all,

Thank you so much for the invitation to captum. Very grateful to all of you for putting this together! I had a quick question regarding the documentation. Currently, in the arguments description for DeepLiftShap, it says "The first dimension in baseline tensors defines the distribution from which we randomly draw samples" (

randomly draw samples. All other dimensions starting after
). However, when I look at the code, it seems as though all the baselines are used for all the inputs (i.e. I'm not seeing any code that I would associate with sampling). Is my understanding correct? I actually prefer the deterministic behavior because in my lab we typically supply multiple baselines per input and we want all the baselines to be used.

Thanks,
Avanti

Issue with resnet18 model

I tried to use any of the saliency methods and I get this error:
AttributeError: 'AvgPool2d' object has no attribute 'divisor_override'

Do not understand why that happens?

how to get captum insights working

it gives error after visualizer.render()
Screenshot (205)

and how do I get this saved image?

# show a screenshot if using notebook non-interactively
from IPython.display import Image
Image(filename='img/captum_insights.png')

BibTeX for citation

Hi folks,
Is there a proper .bib format available for Captum for the purposes of citation in research papers?

Thanks!

Could I use captum for object localisation?

Hello,
Can I use this library for object localisation tasks? Would you think you could prepare some very easy tutorial for this? I bet that this would be very helpful for many people since labelling images with bounding boxes or polygons is really time consuming as you know.

Import error for Occlusion

Getting import error for Occlusion on running the tutorial Interpreting vision for Resnet.

Error details
ImportError Traceback (most recent call last)
in
15 from captum.attr import IntegratedGradients
16 from captum.attr import GradientShap
---> 17 from captum.attr import Occlusion
18 from captum.attr import NoiseTunnel
19 from captum.attr import visualization as viz
ImportError: cannot import name 'Occlusion' from 'captum.attr' (/home/ubuntu/opt/anaconda3/envs/pytorch/lib/python3.7/site-packages/captum/attr/init.py)

Captum Insights not working in SageMaker

When I try to run Captum Insights from a SageMaker notebook terminal on port 6006 by browsing to <sagemaker_notebook_address>/proxy/6006/, the tab name shows "Captum Insights", but the web page is blank. The same method works fine on my local system, or fine with tensorboard/flask apps through SageMaker. It seems to be a problem with Captum+SageMaker specifically.

Screenshot 2019-10-24 05 16 27

Alternatively, when attempting to run tutorials/CIFAR_TorchVision_Captum_Insights.ipynb I get this error from within a notebook:

Screenshot 2019-10-24 05 26 03

(I get the same error with visualizer.render(), just with less details)


Details:

I upgraded my SageMaker pytorch_p36 conda environment to torch==1.3.0. I installed captum from source with git clone https://github.com/pytorch/captum.git and then installed Insights with:

conda install -c conda-forge yarn
BUILD_INSIGHTS=1 python setup.py develop

Then ran the example with python captum/insights/example.py

And tried to access via <sagemaker_notebook_address>/proxy/6006/ (the same way I access a running tensorboard server)

I also tried it with/without modifying line 66 in insights/server.py from tcp.bind(("", 0)) to tcp.bind(("", 6006)) in order to use port 6006 (since this port seemed to work fine for running a tensorboard server).

RuntimeError: expected device cpu but got device cuda:0 when training and visualizing model on IMDB

I was trying to reproduce the Interpreting text models: IMDB Sentiment Analysis but training my model instead of just loading a pretrained one.

I adapted the code of the original CNN tutorial but when I get to the point of calling interpret_sentence the following error occurs:

RuntimeError                              Traceback (most recent call last)
<ipython-input-23-68d49a3d040b> in <module>()
----> 1 interpret_sentence(model, 'It was a fantastic performance !', label=1)
      2 interpret_sentence(model, 'Best film ever', label=1)
      3 interpret_sentence(model, 'Such a great show!', label=1)
      4 interpret_sentence(model, 'It was a horrible movie', label=0)
      5 interpret_sentence(model, 'I\'ve never watched something as bad', label=0)

2 frames
<ipython-input-22-cbf5d478566f> in interpret_sentence(model, sentence, min_len, label)
     29     # compute attributions and approximation delta using integrated gradients
     30     attributions_ig, delta = ig.attribute(
---> 31         input_embedding, reference_embedding, n_steps=500, return_convergence_delta=True
     32     )
     33 

/usr/local/lib/python3.6/dist-packages/captum/attr/_core/integrated_gradients.py in attribute(self, inputs, baselines, target, additional_forward_args, n_steps, method, internal_batch_size, return_convergence_delta)
    232                 end_point,
    233                 additional_forward_args=additional_forward_args,
--> 234                 target=target,
    235             )
    236             return _format_attributions(is_inputs_tuple, attributions), delta

/usr/local/lib/python3.6/dist-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu but got device cuda:0

I am not sure, but I suppose the problem is that torch.tensor being created without any device argument. Can I work around this issue?

In this Colab Notebook you can reproduce the error.

Building failure for captum wheel package

When I built a python wheel package for captum with the following command:

BUILD_INSIGHTS=1 python setup.py bdist_wheel --python-tag py3

I got an error message:

error: can't copy 'captum/insights/frontend/widget/static/extension.js': doesn't exist or not a regular file

I found some errors on the setup.py file, where the paths for extension.js, index.js and index.js.map were not correct.

One solution is the following:

diff --git a/setup.py b/setup.py
index 87f5068..ee0a379 100755
--- a/setup.py
+++ b/setup.py
@@ -150,9 +150,9 @@ if __name__ == "__main__":
             (
                 "share/jupyter/nbextensions/jupyter-captum-insights",
                 [
-                    "captum/insights/frontend/widget/static/extension.js",
-                    "captum/insights/frontend/widget/static/index.js",
-                    "captum/insights/frontend/widget/static/index.js.map",
+                    "captum/insights/widget/static/extension.js",
+                    "captum/insights/widget/static/index.js",
+                    "captum/insights/widget/static/index.js.map",
                 ],
             ),
             (

CUDA OOM Error

Hi,

I am currently integrating Captum into my deep learning tool kit, thx for providing this lib.

When I try to run IntegratedGradients on a standard densenet201 model that is on a cuda device (11GB vram), I am getting an out-of-memory error even for one input image.

Just a quick check: Is this normal behaviour?

Toy Example breaks with CUDA on compute_convergence_delta for Integrated Gradients

For the toy example with cuda

model = ToyModel()
model = model.cuda()
model.eval()

input = torch.rand(2, 3).cuda()
baseline = torch.zeros(2, 3).cuda()

ig = IntegratedGradients(model)
attributions, delta = ig.attribute(input, baseline, target=0, return_convergence_delta=True)

fails with the error

~/anaconda3/envs/heterokaryon/lib/python3.7/site-packages/captum/attr/_utils/attribution.py in compute_convergence_delta(self, attributions, start_point, end_point, target, additional_forward_args)
    232         row_sums = [_sum_rows(attribution) for attribution in attributions]
    233         attr_sum = torch.tensor([sum(row_sum) for row_sum in zip(*row_sums)])
--> 234         return attr_sum - (end_point - start_point)
    235 
    236 

RuntimeError: expected device cpu and dtype Float but got device cuda:0 and dtype Float

presumably since attr_sum is not on GPU. Turning return_convergence_delta to False results in no error.

Similar issues may arise in other places, though I haven't checked.

Models failing with error - Module has no input attribute

I am working with a number of models from the torchreid library. When I use DeepLift on these models, some work and some do not. For example, the DenseNet, MLFN, and MuDeep models work fine, but the OSNet, ResNetMid, and ResNet-50 (and some others) model do not. (N.B. I modified the models to not use inplace=True for nn.ReLU().)

These models that fail usually fail with an error along the lines of 'Sigmoid' object has no attribute 'input' (though it also fails for the same reason if ReLU is used), however I can't see what I need to change in this model in order for it to work with DeepLift.

What is different about these models that cause this error? I understand the error message, but I don't understand why the the module doesn't have an input attribute.

Captum Insights build fails on Linux Ubuntu18.04

Cannot build and launch Captum insights on Linux Ubutnu18.04 (inside VM VirtualBox):

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ conda install -c conda-forge yarn
Collecting package metadata (repodata.json): done
Solving environment: done

All requested packages already installed.

(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$ BUILD_INSIGHTS=1 python setup.py develop
-- Building version 0.2.0
-- Building Captum Insights
Running: ./scripts/build_insights.sh
~/eStep/XAI/Software/captum/captum/insights/frontend ~/eStep/XAI/Software/captum

Install Dependencies

yarn install v1.22.0
[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning " > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0-0".
warning "@babel/plugin-proposal-class-properties > @babel/[email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "@babel/core@^7.0.0".
warning " > [email protected]" has unmet peer dependency "webpack@>=2".
warning "react-scripts > @typescript-eslint/eslint-plugin > [email protected]" has unmet peer dependency "typescript@>=2.8.0 || >= 3.2.0-dev || >= 3.3.0-dev || >= 3.4.0-dev || >= 3.5.0-dev || >= 3.6.0-dev || >= 3.6.0-beta || >= 3.7.0-dev || >= 3.7.0-beta".
warning " > [email protected]" has unmet peer dependency "prop-types@^15.0.0".
warning " > [email protected]" has unmet peer dependency "[email protected]".
error An unexpected error occurred: "EPERM: operation not permitted, symlink '../../../parser/bin/babel-parser.js' -> '/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/node_modules/@babel/core/node_modules/.bin/parser'".
info If you think this is a bug, please open a bug report with the information provided in "/home/elena/eStep/XAI/Software/captum/captum/insights/frontend/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Traceback (most recent call last):
File "setup.py", line 105, in
build_insights()
File "setup.py", line 88, in build_insights
subprocess.check_call(command)
File "/home/elena/anaconda3/envs/captum/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command './scripts/build_insights.sh' returned non-zero exit status 1.
(captum) elena@elena-VirtualBox:~/eStep/XAI/Software/captum$

Undesirable behavior of LayerActivation in networks with inplace ReLUs

Hi,
I was trying to use captum.attr._core.layer_activation.LayerActivation to get the activation of the first convolutional layer in a simple model. Here is my code:

torch.manual_seed(23)
np.random.seed(23)
model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

layer_act = LayerActivation(model, model[0])
input = torch.randn(1, 3, 5, 5)
mylayer = model[0]
print(torch.norm(mylayer(input) - layer_act.attribute(input), p=2))

In fact, I have computed the activation in two different ways and compared them afterwards. Obviously, I expected a value close to zero to be printed as the output, however, this is what I got:

tensor(3.4646, grad_fn=<NormBackward0>)

I hypothesize that the inplace ReLU layer after the convolutional layer acts on its output since there were many zeros in the activation computed by Captum ( i.e. layer_act.attribute(input)). In fact, when I changed the architecture of the network to the following:

model = nn.Sequential(nn.Conv2d(3, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(),
                      nn.Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
                      nn.ReLU(inplace=True))

then the outputs matched.

System information

  • Python 3.7.0
  • torch 1.3.0
  • Captum 0.1.0

Captum for Bert Sentence Classification

Hi there,

I tried to apply Captum Tutorial for Q&A to Bert Sentence Classification task, but I am facing difficulties to adapt baselines / references part of the code for Classification and the new HugginFace Tokenizer.

Just want to check if someone is working in the same topic, so we can share experiences.

is this a typo?

in readme,

Next we will use IntegratedGradients algorithms to assign attribution scores to each input feature with respect to the second target output.

and then target=0, is set, should it be first target output?

Plan for perturbation-based methods

Hello,
Kudos for the great work. I believe this has great potential.
I wonder what is in your roadmap, especially regarding perturbation-based attribution methods (Occlusion, LIME/KernelSHAP, Shapley Value sampling, etc.).

Are these planned at all? While being orders of magnitude slower, these methods have the advantage that they can be applied to any black-box model (ie. any network architecture is supported out-of-the-box, with no need to instrument layers or implement custom modules). The implementation into Captum should be easier too. Moreover, Shapley Value attributions have unique theoretical properties that might be important when speed is not critical.

While it makes sense to focus on gradient-based methods first, maybe the structure of the library should be such that these methods can be easily added in the future.

Captum for regression problem

Hi all,

I am wondering if there are examples that I could learn to use Captum for regression problem as well as using volume data. My problem setting is feeding volume data with WxHxD (64x64x64) to a 3D convnet which has only one neuron in the top layer that output a real number. Thanks.

GradientShap's `attribute` method `baselines` argument should be None

class GradientShap(GradientAttribution):
def __init__(self, forward_func):
r"""
Args:
forward_func (function): The forward function of the model or
any modification of it
"""
GradientAttribution.__init__(self, forward_func)
def attribute(
self,
inputs,
baselines,
n_samples=5,
stdevs=0.0,
target=None,
additional_forward_args=None,
return_convergence_delta=False,
):

According to the docs, the baselines parameter in the attribute method of GradientShap is optional, and is replaced with a zero-filled Tensor as the same size as the input if not provided. However at the moment it's a required argument.

Captum for BERT

Hi,
Thanks for the great work. The LSTM tutorial looks very nice.
Are any suggestions on how to use Captum for Transformer-based / BERT-like pre-trained contextualized word embeddings? If I want to see the attribution of each token in the word embedding layer, is it that I'd also need the FFN layer for fine-tuning downstream tasks in order to get the gradients? The current code is implemented with torch/text; would really appreciate it if you could some hints how to integrate it with BERT models(e.g. huggingface/transformers).

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.