Comments (4)
- So, am I right in assuming that trainable_weights gradients are averaged over all the training examples, but the activation gradients are individually returned for each input sample?
=> yes you are right, the trainable_weights gradients are averaged over the minibatch. The functions (trainable_weights grad and activation grad) operate on different nodes (activation nodes vs trainable parameters nodes). cf. below. - activation gradients are the gradients of the outputs of the activation functions, regarding the loss. Intuitively, it shows how your activation maps change over a tiny modification of the loss. They are not used in the back-propagation because they are not related to the trainable weights. When you back-propagate you want dW/dL (W = trainable weight) and here activation gradients = dA/dL (A = output of a layer).
1 /
From https://www.tensorflow.org/api_docs/python/tf/nn/conv2d
The weight of Conv2D has shape: [filter_height, filter_width, in_channels, out_channels]
In your case the kernel filter has a shape of (5, 5, 10, 16).
The gradients of this kernel will have exactly the same shape: (5, 5, 10, 16). That's what get_gradients_of_trainable_weights
returns.
So yes the gradients are averaged out across the batch dimension (or summed I don't remember). That's why it's better to have large minibatch to avoid noisy gradients (or to accumulate gradients before pushing a back-propagation step).
Note: The parameters also include a bias vector. You get both the kernel and the bias gradients.
TD;LR: get_gradients_of_trainable_weights returns the gradients used in the back-propagation (with the chain rule).
2 /
Regarding get_gradients_of_activations
, you get a gradient matrix of the same shape as the activation matrix. It therefore includes the batch size and depends on the output node of the conv layer. Keras gives it this default name: BiasAdd (output of conv = right after the biases have been added).
TD;LR: get_gradients_of_activations is not used in the back-propagation.
3 /
get_activations
is the usual function to get the activations.
Output
1
get_gradients_of_activations - my_conv/BiasAdd:0 (1, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)
********************************************************************************
2
get_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_activations - my_conv/BiasAdd:0 (2, 96, 96, 16)
get_gradients_of_trainable_weights - my_conv/kernel:0 (5, 5, 10, 16)
get_gradients_of_trainable_weights - my_conv/bias:0 (16,)
Code
from __future__ import print_function
import numpy as np
from keras.layers import Conv2D
from keras.layers import Dense, Flatten
from keras.models import Sequential
if __name__ == '__main__':
model = Sequential()
model.add(Conv2D(16, kernel_size=(5, 5), padding='same', input_shape=(96, 96, 10), name='my_conv'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse',
optimizer='adam',
metrics=['accuracy'])
x1 = np.random.uniform(size=(1, 96, 96, 10))
x2 = np.random.uniform(size=(2, 96, 96, 10))
y1 = np.random.uniform(size=(1,))
y2 = np.random.uniform(size=(2,))
import keract
def run(x, y):
bs = len(x)
print(bs)
print('get_activations - my_conv/BiasAdd:0',
keract.get_activations(model, x)['my_conv/BiasAdd:0'].shape)
print('get_gradients_of_activations - my_conv/BiasAdd:0',
keract.get_gradients_of_activations(model, x, y)['my_conv/BiasAdd:0'].shape)
print('get_gradients_of_trainable_weights - my_conv/kernel:0',
keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/kernel:0'].shape)
print('get_gradients_of_trainable_weights - my_conv/bias:0',
keract.get_gradients_of_trainable_weights(model, x, y)['my_conv/bias:0'].shape)
print('*' * 80)
run(x1, y1)
print('*' * 80)
run(x2, y2)
from keract.
If this explanation is good enough, I will add it in the README :)
from keract.
Thanks for the explanation! It is good enough for me. :)
PS: Was thinking of a way to visualize these gradients_for_trainable_weights
. One idea is to present out_channels
number of rows, each being made from unraveled in_channel
number of images of dimensions filter_w , filter_h
. Personally, such a visualization would have come in handy to debug vanishing gradients with relu
. I can send a PR if this idea seems useful to you.
from keract.
@Stochastic13 looks good to me. I'd be happy to review a PR on this. Will def be useful :)
from keract.
Related Issues (20)
- Plans to enable eager execution from TF 2.0? HOT 10
- Error when using model.add_loss HOT 5
- Regression CNN HOT 19
- Visualization on image sequence HOT 2
- Layer Import HOT 1
- Keras symbolic input/outputs and layer_names issue HOT 8
- Using an input other than the one provided by the pre-trained model fails. HOT 20
- get_activations: AttributeError when nested=True HOT 6
- Any plans for Pytorch implementation? HOT 2
- Heatmaps - ValueError: X has 20 features, but MinMaxScaler is expecting 1 features as input. HOT 17
- Can not convert a odict_values into a Tensor or Operation HOT 1
- custom loss not working any more
- Example in the README: 'Functional' object has no attribute '_layers' HOT 2
- display_heatmaps() ValueError: X has 28 features, but MinMaxScaler is expecting 1 features as input. HOT 2
- Interaction with submodels? HOT 5
- Trying to use keract.get_activations() for days, keep getting stuck HOT 3
- Implement with GradientTape
- Help with get_activations HOT 4
- display_heatmap error
- Scikit-learn requirement HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keract.