In regards to the output of get_gradients_of_activations</co

So, am I right in assuming that trainable_weights gradients are averaged over al

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Clarification on the returned shape for the gradients about keract HOT 4 CLOSED

philipperemy commented on May 28, 2024

Clarification on the returned shape for the gradients

from keract.

Comments (4)

philipperemy commented on May 28, 2024

So, am I right in assuming that trainable_weights gradients are averaged over all the training examples, but the activation gradients are individually returned for each input sample?
=> yes you are right, the trainable_weights gradients are averaged over the minibatch. The functions (trainable_weights grad and activation grad) operate on different nodes (activation nodes vs trainable parameters nodes). cf. below.
activation gradients are the gradients of the outputs of the activation functions, regarding the loss. Intuitively, it shows how your activation maps change over a tiny modification of the loss. They are not used in the back-propagation because they are not related to the trainable weights. When you back-propagate you want dW/dL (W = trainable weight) and here activation gradients = dA/dL (A = output of a layer).

1 /

From https://www.tensorflow.org/api_docs/python/tf/nn/conv2d

The weight of Conv2D has shape: [filter_height, filter_width, in_channels, out_channels]
In your case the kernel filter has a shape of (5, 5, 10, 16).

The gradients of this kernel will have exactly the same shape: (5, 5, 10, 16). That's what get_gradients_of_trainable_weights returns.

So yes the gradients are averaged out across the batch dimension (or summed I don't remember). That's why it's better to have large minibatch to avoid noisy gradients (or to accumulate gradients before pushing a back-propagation step).

Note: The parameters also include a bias vector. You get both the kernel and the bias gradients.

TD;LR: get_gradients_of_trainable_weights returns the gradients used in the back-propagation (with the chain rule).

2 /

Regarding get_gradients_of_activations, you get a gradient matrix of the same shape as the activation matrix. It therefore includes the batch size and depends on the output node of the conv layer. Keras gives it this default name: BiasAdd (output of conv = right after the biases have been added).

TD;LR: get_gradients_of_activations is not used in the back-propagation.

3 /

get_activations is the usual function to get the activations.

Output

1
get_gradients_of_activations - my_conv/BiasAdd:0 (1, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)
********************************************************************************
2
get_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)

Code

from __future__ import print_function

import numpy as np
from keras.layers import Conv2D
from keras.layers import Dense, Flatten
from keras.models import Sequential

if __name__ == '__main__':
    model = Sequential()
    model.add(Conv2D(16, kernel_size=(5, 5), padding='same', input_shape=(96, 96, 10), name='my_conv'))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='mse',
                  optimizer='adam',
                  metrics=['accuracy'])

    x1 = np.random.uniform(size=(1, 96, 96, 10))
    x2 = np.random.uniform(size=(2, 96, 96, 10))

    y1 = np.random.uniform(size=(1,))
    y2 = np.random.uniform(size=(2,))

    import keract


    def run(x, y):
        bs = len(x)
        print(bs)
        print('get_activations - my_conv/BiasAdd:0',
              keract.get_activations(model, x)['my_conv/BiasAdd:0'].shape)

        print('get_gradients_of_activations - my_conv/BiasAdd:0',
              keract.get_gradients_of_activations(model, x, y)['my_conv/BiasAdd:0'].shape)

        print('get_gradients_of_trainable_weights - my_conv/kernel:0',
              keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/kernel:0'].shape)

        print('get_gradients_of_trainable_weights - my_conv/bias:0',
              keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/bias:0'].shape)


    print('*' * 80)
    run(x1, y1)
    print('*' * 80)
    run(x2, y2)

from keract.

philipperemy commented on May 28, 2024

If this explanation is good enough, I will add it in the README :)

from keract.

Stochastic13 commented on May 28, 2024

Thanks for the explanation! It is good enough for me. :)

PS: Was thinking of a way to visualize these gradients_for_trainable_weights . One idea is to present out_channels number of rows, each being made from unraveled in_channel number of images of dimensions filter_w , filter_h . Personally, such a visualization would have come in handy to debug vanishing gradients with relu . I can send a PR if this idea seems useful to you.

from keract.

philipperemy commented on May 28, 2024

@Stochastic13 looks good to me. I'd be happy to review a PR on this. Will def be useful :)

from keract.

Clarification on the returned shape for the gradients about keract HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent