Coder Social home page Coder Social logo

Comments (4)

philipperemy avatar philipperemy commented on May 28, 2024
  • So, am I right in assuming that trainable_weights gradients are averaged over all the training examples, but the activation gradients are individually returned for each input sample?
    => yes you are right, the trainable_weights gradients are averaged over the minibatch. The functions (trainable_weights grad and activation grad) operate on different nodes (activation nodes vs trainable parameters nodes). cf. below.
  • activation gradients are the gradients of the outputs of the activation functions, regarding the loss. Intuitively, it shows how your activation maps change over a tiny modification of the loss. They are not used in the back-propagation because they are not related to the trainable weights. When you back-propagate you want dW/dL (W = trainable weight) and here activation gradients = dA/dL (A = output of a layer).

1 /

From https://www.tensorflow.org/api_docs/python/tf/nn/conv2d

The weight of Conv2D has shape: [filter_height, filter_width, in_channels, out_channels]
In your case the kernel filter has a shape of (5, 5, 10, 16).

The gradients of this kernel will have exactly the same shape: (5, 5, 10, 16). That's what get_gradients_of_trainable_weights returns.

So yes the gradients are averaged out across the batch dimension (or summed I don't remember). That's why it's better to have large minibatch to avoid noisy gradients (or to accumulate gradients before pushing a back-propagation step).

Note: The parameters also include a bias vector. You get both the kernel and the bias gradients.

TD;LR: get_gradients_of_trainable_weights returns the gradients used in the back-propagation (with the chain rule).

2 /

Regarding get_gradients_of_activations, you get a gradient matrix of the same shape as the activation matrix. It therefore includes the batch size and depends on the output node of the conv layer. Keras gives it this default name: BiasAdd (output of conv = right after the biases have been added).

TD;LR: get_gradients_of_activations is not used in the back-propagation.

3 /

get_activations is the usual function to get the activations.

Output

1
get_gradients_of_activations - my_conv/BiasAdd:0 (1, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)
********************************************************************************
2
get_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)

Code

from __future__ import print_function

import numpy as np
from keras.layers import Conv2D
from keras.layers import Dense, Flatten
from keras.models import Sequential

if __name__ == '__main__':
    model = Sequential()
    model.add(Conv2D(16, kernel_size=(5, 5), padding='same', input_shape=(96, 96, 10), name='my_conv'))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='mse',
                  optimizer='adam',
                  metrics=['accuracy'])

    x1 = np.random.uniform(size=(1, 96, 96, 10))
    x2 = np.random.uniform(size=(2, 96, 96, 10))

    y1 = np.random.uniform(size=(1,))
    y2 = np.random.uniform(size=(2,))

    import keract


    def run(x, y):
        bs = len(x)
        print(bs)
        print('get_activations - my_conv/BiasAdd:0',
              keract.get_activations(model, x)['my_conv/BiasAdd:0'].shape)

        print('get_gradients_of_activations - my_conv/BiasAdd:0',
              keract.get_gradients_of_activations(model, x, y)['my_conv/BiasAdd:0'].shape)

        print('get_gradients_of_trainable_weights - my_conv/kernel:0',
              keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/kernel:0'].shape)

        print('get_gradients_of_trainable_weights - my_conv/bias:0',
              keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/bias:0'].shape)


    print('*' * 80)
    run(x1, y1)
    print('*' * 80)
    run(x2, y2)

from keract.

philipperemy avatar philipperemy commented on May 28, 2024

If this explanation is good enough, I will add it in the README :)

from keract.

Stochastic13 avatar Stochastic13 commented on May 28, 2024

Thanks for the explanation! It is good enough for me. :)

PS: Was thinking of a way to visualize these gradients_for_trainable_weights . One idea is to present out_channels number of rows, each being made from unraveled in_channel number of images of dimensions filter_w , filter_h . Personally, such a visualization would have come in handy to debug vanishing gradients with relu . I can send a PR if this idea seems useful to you.

from keract.

philipperemy avatar philipperemy commented on May 28, 2024

@Stochastic13 looks good to me. I'd be happy to review a PR on this. Will def be useful :)

from keract.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.