overlordgolddragon / see-rnn Goto Github PK

View Code? Open in Web Editor NEW

178.0 3.0 20.0 263 KB

RNN and general weights, gradients, & activations visualization in Keras & TensorFlow

License: MIT License

Python 99.90% Shell 0.10%

rnn tensorflow keras visualization deep-learning lstm gru

see-rnn's Introduction

See RNN

RNN weights, gradients, & activations visualization in Keras & TensorFlow (LSTM, GRU, SimpleRNN, CuDNN, & all others)

Features

Weights, gradients, activations visualization
Kernel visuals: kernel, recurrent kernel, and bias shown explicitly
Gate visuals: gates in gated architectures (LSTM, GRU) shown explicitly
Channel visuals: cell units (feature extractors) shown explicitly
General visuals: methods also applicable to CNNs & others
Weight norm tracking: useful for analyzing weight decay

Why use?

Introspection is a powerful tool for debugging, regularizing, and understanding neural networks; this repo's methods enable:

Monitoring weights & activations progression - how each changes epoch-to-epoch, iteration-to-iteration
Evaluating learning effectiveness - how well gradient backpropagates layer-to-layer, timestep-to-timestep
Assessing layer health - what percentage of neurons are "dead" or "exploding"
Tracking weight decay - how various schemes (e.g. l2 penalty) affect weight norms

It enables answering questions such as:

Is my RNN learning long-term dependencies? >> Monitor gradients: if a non-zero gradient flows through every timestep, then every timestep contributes to learning - i.e., resultant gradients stem from accounting for every input timestep, so the entire sequence influences weight updates. Hence, an RNN no longer ignores portions of long sequences, and is forced to learn from them
Is my RNN learning independent representations? >> Monitor activations: if each channel's outputs are distinct and decorrelated, then the RNN extracts richly diverse features.
Why do I have validation loss spikes? >> Monitor all: val. spikes may stem from sharp changes in layer weights due to large gradients, which will visibly alter activation patterns; seeing the details can help inform a correction
Is my weight decay excessive or insufficient? >> Monitor weight norms: if values slash to many times less their usual values, decay might be excessive - or, if no effect is seen, increase decay

For further info on potential uses, see this SO.

Installation

pip install see-rnn. Or, for latest version (most likely stable):

pip install git+https://github.com/OverLordGoldDragon/see-rnn

To-do

Will possibly implement:

Weight norm inspection (all layers); see here
Pytorch support
Interpretability visuals (e.g. saliency maps, adversarial attacks)
Tools for better probing backprop of return_sequences=False
Unify _id and layer? Need duplicates resolution scheme

Examples

# for all examples
grads = get_gradients(model, 1, x, y)  # return_sequences=True,  layer index 1
grads = get_gradients(model, 2, x, y)  # return_sequences=False, layer index 2
outs  = get_outputs(model, 1, x)       # return_sequences=True,  layer index 1
# all examples use timesteps=100
# NOTE: `title_mode` kwarg below was omitted for simplicity; for Gradient visuals, would set to 'grads'

EX 1: bi-LSTM, 32 units - activations, activation='relu'
features_1D(outs[:1], share_xy=False)
features_1D(outs[:1], share_xy=True, y_zero=True)

Each subplot is an independent RNN channel's output (return_sequences=True)
In this example, each channel/filter appears to extract complex independent features of varying bias, frequency, and probabilistic distribution
Note that share_xy=False better pronounces features' shape, whereas =True allows for an even comparison - but may greatly 'shrink' waveforms to appear flatlined (not shown here)

EX 2: one sample, uni-LSTM, 6 units - gradients, return_sequences=True, trained for 20 iterations
features_1D(grads[:1], n_rows=2)

Note: gradients are to be read right-to-left, as they're computed (from last timestep to first)
Rightmost (latest) timesteps consistently have a higher gradient
Vanishing gradient: ~75% of leftmost timesteps have a zero gradient, indicating poor time dependency learning

EX 3: all (16) samples, uni-LSTM, 6 units -- return_sequences=True, trained for 20 iterations
features_1D(grads, n_rows=2)
features_2D(grads, n_rows=4, norm=(-.01, .01))

Each sample shown in a different color (but same color per sample across channels)
Some samples perform better than one shown above, but not by much
The heatmap plots channels (y-axis) vs. timesteps (x-axis); blue=-0.01, red=0.01, white=0 (gradient values)

EX 4: all (16) samples, uni-LSTM, 6 units -- return_sequences=True, trained for 200 iterations
features_1D(grads, n_rows=2)
features_2D(grads, n_rows=4, norm=(-.01, .01))

Both plots show the LSTM performing clearly better after 180 additional iterations
Gradient still vanishes for about half the timesteps
All LSTM units better capture time dependencies of one particular sample (blue curve, first plot) - which we can tell from the heatmap to be the first sample. We can plot that sample vs. other samples to try to understand the difference

EX 5: 2D vs. 1D, uni-LSTM: 256 units, return_sequences=True, trained for 200 iterations
features_1D(grads[0, :, :])
features_2D(grads[:, :, 0], norm=(-.0001, .0001))

2D is better suited for comparing many channels across few samples
1D is better suited for comparing many samples across a few channels

EX 6: bi-GRU, 256 units (512 total) -- return_sequences=True, trained for 400 iterations
features_2D(grads[0], norm=(-.0001, .0001), reflect_half=True)

Backward layer's gradients are flipped for consistency w.r.t. time axis
Plot reveals a lesser-known advantage of Bi-RNNs - information utility: the collective gradient covers about twice the data. However, this isn't free lunch: each layer is an independent feature extractor, so learning isn't really complemented
Lower norm for more units is expected, as approx. the same loss-derived gradient is being distributed across more parameters (hence the squared numeric average is less)

EX 7: 0D, all (16) samples, uni-LSTM, 6 units -- return_sequences=False, trained for 200 iterations
features_0D(grads)

return_sequences=False utilizes only the last timestep's gradient (which is still derived from all timesteps, unless using truncated BPTT), requiring a new approach
Plot color-codes each RNN unit consistently across samples for comparison (can use one color instead)
Evaluating gradient flow is less direct and more theoretically involved. One simple approach is to compare distributions at beginning vs. later in training: if the difference isn't significant, the RNN does poorly in learning long-term dependencies

EX 8: LSTM vs. GRU vs. SimpleRNN, unidir, 256 units -- return_sequences=True, trained for 250 iterations
features_2D(grads, n_rows=8, norm=(-.0001, .0001), show_xy_ticks=[0,0], title_mode=False)

Note: the comparison isn't very meaningful; each network thrives w/ different hyperparameters, whereas same ones were used for all. LSTM, for one, bears the most parameters per unit, drowning out SimpleRNN
In this setup, LSTM definitively stomps GRU and SimpleRNN

EX 9: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20) (input)
rnn_histogram(model, 'lstm', equate_axes=False, bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, bias=False)
rnn_heatmap(model, 'lstm')

Top plot is a histogram subplot grid, showing weight distributions per kernel, and within each kernel, per gate
Second plot sets equate_axes=True for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appeal
Last plot is a heatmap of the same weights, with gate separations marked by vertical lines, and bias weights also included
Unlike histograms, the heatmap preserves channel/context information: input-to-hidden and hidden-to-hidden transforming matrices can be clearly distinguished
Note the large concentration of maximal values at the Forget gate; as trivia, in Keras (and usually), bias gates are all initialized to zeros, except the Forget bias, which is initialized to ones

EX 10: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16) (input)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))

Bidirectional is supported by both; biases included in this example for histograms
Note again the bias heatmaps; they no longer appear to reside in the same locality as in EX 1. Indeed, CuDNNLSTM (and CuDNNGRU) biases are defined and initialized differently - something that can't be inferred from histograms

EX 11: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16) (input)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

We may wish to visualize gradient intensity, which can be done via absolute_value=True and a greyscale colormap
Gate separations are apparent even without explicit separating lines in this example:
- New is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow
- Reset is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping

EX 12: NaN detection: LSTM, 512 units, weights -- batch_shape = (16, 100, 16) (input)

Both the heatmap and the histogram come with built-in NaN detection - kernel-, gate-, and direction-wise
Heatmap will print NaNs to console, whereas histogram will mark them directly on the plot
Both will set NaN values to zero before plotting; in example below, all related non-NaN weights were already zero

EX 13: Sparse Conv1D autoencoder weights -- w = layer.get_weights()[0]; w.shape == (16, 64, 128)
features_2D(w, n_rows=16, norm=(-.1, .1), tight=True, borderwidth=1, title_mode=title)
# title = "((Layer Channels vs. Kernels) vs. Weights) vs. Input Channels -- norm = (-0.1, 0.1)"

One of stacked Conv1D sparse autoencoder layers; network trained with Dropout(0.5, noise_shape=(batch_size, 1, channels)) (Spatial Dropout), encouraging sparse features which may benefit classification
Weights are seen to be 'sparse'; some are uniformly low, others uniformly large, others have bands of large weights among lows

Usage

QUICKSTART: run sandbox.py, which includes all major examples and allows easy exploration of various plot configs.

Note: if using tensorflow.keras imports, set import os; os.environ["TF_KERAS"]='1'. Minimal example below.

visuals_gen.py functions can also be used to visualize Conv1D activations, gradients, or any other meaningfully-compatible data formats. Likewise, inspect_gen.py also works for non-RNN layers.

import numpy as np
from keras.layers import Input, LSTM
from keras.models import Model
from keras.optimizers import Adam
from see_rnn import get_gradients, features_0D, features_1D, features_2D

def make_model(rnn_layer, batch_shape, units):
    ipt = Input(batch_shape=batch_shape)
    x   = rnn_layer(units, activation='tanh', return_sequences=True)(ipt)
    out = rnn_layer(units, activation='tanh', return_sequences=False)(x)
    model = Model(ipt, out)
    model.compile(Adam(4e-3), 'mse')
    return model

def make_data(batch_shape):
    return np.random.randn(*batch_shape), \
           np.random.uniform(-1, 1, (batch_shape[0], units))

def train_model(model, iterations, batch_shape):
    x, y = make_data(batch_shape)
    for i in range(iterations):
        model.train_on_batch(x, y)
        print(end='.')  # progbar
        if i % 40 == 0:
            x, y = make_data(batch_shape)

units = 6
batch_shape = (16, 100, 2*units)

model = make_model(LSTM, batch_shape, units)
train_model(model, 300, batch_shape)

x, y  = make_data(batch_shape)
grads_all  = get_gradients(model, 1, x, y)  # return_sequences=True,  layer index 1
grads_last = get_gradients(model, 2, x, y)  # return_sequences=False, layer index 2

features_1D(grads_all, n_rows=2, show_xy_ticks=[1,1])
features_2D(grads_all, n_rows=8, show_xy_ticks=[1,1], norm=(-.01, .01))
features_0D(grads_last)

How to cite

Short form:

John Muradeli, see-rnn, 2019. GitHub repository, https://github.com/OverLordGoldDragon/see-rnn/. DOI: 10.5281/zenodo.5080359

BibTeX:

@article{OverLordGoldDragon2019see-rnn,
  title={See RNN},
  author={John Muradeli},
  journal={GitHub. Note: https://github.com/OverLordGoldDragon/see-rnn/},
  year={2019},
  doi={10.5281/zenodo.5080359},
}

see-rnn's People

Contributors

Stargazers

Watchers

see-rnn's Issues

Installation support needed

First of all many Thanks for this repo. I have been attempting to extract the partial derivatives dc_t/dc_0 in LSTM training but this quantity quickly vanishes at t increases. see_rnn is apparently a great alternative to use for visualising the vanishing gradients in RNN-class models and I would love to try it out.

Installation with pip install git+https://github.com/OverLordGoldDragon/see-rnn produces the following error:

ERROR: Command errored out with exit status 1:
   command: git version
       cwd: None
  Complete output (2 lines):
  xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

I have used pip install see_rnn instead, however when I tried the example under the Usage section I am receiving an error message for cannot import get_gradients.
In fact none of the functions in the command
from see_rnn import get_gradients, features_0D, features_1D, features_2D can be found in the installed package or here. I would like some help with this part. Thanks in advance!

Comparing NN

I apologise I know this isn't how one contacts developers. If you have time I need to discussed with you about RNN in private session. Could you please provide me your e-mail or any app that I can talk with you. Thank you so much

An idea/suggestion on the gradient used

I would like to make an inquiry on the form of gradients computed in get_gradients function. It seems that you have computed dL_t/d_W directly where W is a parameter. While loss gradient on weight is simple enough in feedforward NNs, in RNNs because the same weight is shared at all time steps, each dL_t/d_W is actually a summation of partial derivative products of lengths 1, 2, ..., t respectively. Please see this tutorial for the actual form, in particular results (5) and (6).

Those longer partial derivative products correspond to the backpropagated signals over longer temporal dependencies. If these longer ones vanish (and they are prone to vanishing), then the weights are updated in a way that 'cannot' retain earlier information.

Therefore it occurs to me that if dL_t/d_W stays away from 0, it does not seem to be guaranteed that vanishing gradients did not take place. It might be those shorter partial derivative products more vanishing-resistant that have kept the magnitude of dL_t/d_W away from 0.
A more direct indicator could be dh_t/dh_1, or dh_t/dh_0, where h_t is the hidden state at step t. Both are products of result (6) multiplied over the time steps. If say starting from t = 100 such a quantity vanishes, then we can claim the model is unable to retain information for more than 100 steps.

That being said, I am not really an expert in RNN, and I am just raising an idea here. I would really appreciate it if you can take a look at whether my understanding is correct, and whether the statistic dh_t/dh_1, or dh_t/dh_0 can possibly be implemented.
Thanks in advance!

cannot import name error

Hi, thank you very much for the great work!
I tried to put my original Python script, then encountered the error below.
cannot import name 'get_rnn_gradients' from 'see_rnn' (C:\Users\username\AppData\Local\Programs\Python\Python37\lib\site-packages\see_rnn_init_.py)

How should I solve this error?

Regards,

ValueError: The last dimension of the inputs to Dense should be defined. Found None

When I tried the example provided at Stackoverflow ,

There is an error: ValueError: The last dimension of the inputs to Dense should be defined. Found None.

Here is the example:

from keras.layers import Input, Dense, LSTM, Flatten, concatenate
from keras.models import Model
from keras.optimizers import Adam
from keras_self_attention import SeqSelfAttention
import numpy as np 

ipt   = Input(shape=(240,4))
x     = LSTM(60, activation='tanh', return_sequences=True)(ipt)
x     = SeqSelfAttention(return_attention=True)(x)
x     = concatenate(x)
x     = Flatten()(x)
out   = Dense(1, activation='sigmoid')(x)
model = Model(ipt,out)
model.compile(Adam(lr=1e-2), loss='binary_crossentropy')

X = np.random.rand(10,240,4) # dummy data
Y = np.random.randint(0,2,(10,1)) # dummy labels
model.train_on_batch(X, Y)

outs = get_layer_outputs(model, 'seq', X[0:1], 1)
outs_1 = outs[0]
outs_2 = outs[1]

show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=False,show_y_zero=False)
show_features_1D(model,'lstm',X[0:1],max_timesteps=100,equate_axes=True, show_y_zero=True)
show_features_2D(outs_2[0])  # [0] for 2D since 'outs_2' is 3D

Here is the full error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-777fb671e483> in <module>
     10 x     = concatenate(x)
     11 x     = Flatten()(x)
---> 12 out   = Dense(1, activation='sigmoid')(x)
     13 model = Model(ipt,out)
     14 model.compile(Adam(lr=1e-2), loss='binary_crossentropy')

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    950     if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    951       return self._functional_construction_call(inputs, args, kwargs,
--> 952                                                 input_list)
    953 
    954     # Maintains info about the `Layer.call` stack.

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
   1089         # Check input assumptions set after layer building, e.g. input shape.
   1090         outputs = self._keras_tensor_symbolic_call(
-> 1091             inputs, input_masks, args, kwargs)
   1092 
   1093         if outputs is None:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
    820       return nest.map_structure(keras_tensor.KerasTensor, output_signature)
    821     else:
--> 822       return self._infer_output_signature(inputs, args, kwargs, input_masks)
    823 
    824   def _infer_output_signature(self, inputs, args, kwargs, input_masks):

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
    860           # overridden).
    861           # TODO(kaftan): do we maybe_build here, or have we already done it?
--> 862           self._maybe_build(inputs)
    863           outputs = call_fn(inputs, *args, **kwargs)
    864 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2708         # operations.
   2709         with tf_utils.maybe_init_scope(self):
-> 2710           self.build(input_shapes)  # pylint:disable=not-callable
   2711       # We must set also ensure that the layer is marked as built, and the build
   2712       # shape is stored since user defined build functions may not be calling

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py in build(self, input_shape)
   1180     last_dim = tensor_shape.dimension_value(input_shape[-1])
   1181     if last_dim is None:
-> 1182       raise ValueError('The last dimension of the inputs to `Dense` '
   1183                        'should be defined. Found `None`.')
   1184     self.input_spec = InputSpec(min_ndim=2, axes={-1: last_dim})

Do you have any clue why this error occurred?

No module named 'tensorflow.python.keras.mixed_precision.experimental'

I'm suffering a strange behaviour when importing the gradients functions:

Set up
pip install see-rnn

test.py

from see_rnn import features_0D, features_1D, features_2D
from see_rnn import get_gradients
from see_rnn import get_layer_gradients

exit()

Error

Exception has occurred: ImportError
cannot import name 'get_gradients' from 'see_rnn' (/Users/.../opt/anaconda3/envs/fns/lib/python3.8/site-packages/see_rnn/init.py)

Sames happen with "get_layer_gradients".

Reproducing a stack overflow example

Thank you very much for creating see-rnn!
I've been working with RNNs for years,
yet this kind of visualization was eye opening for me.

In the bottom of your Stack Overflow post, you are visualizing gradients which have a near constant magnitude for each timestep. How can I reproduce something similar in the provided rnn_sandbox?

My gradients always vanish towards zero for the earlier timesteps.

Link to stack overflow:
https://stackoverflow.com/questions/48714407/rnn-regularization-which-component-to-regularize

CNN with embedding

Hi there,

I like your GitRepo, and I have started to use it on my project.
I have a problem with the Embedding layer. If the first layer is the Embedding layer, the "see-rnn -> get_gradients" broke down.
Any suggestion?

I have tried to use embedding with your example, like this (I don't care about the results, only to use embedding layer):

def make_model(rnn_layer, batch_shape, units):
    ipt = Input(batch_shape=batch_shape)
    input0 = Lambda(lambda x: x[:,:,0])(ipt)
    input1 = Lambda(lambda x: x[:,:,1:])(ipt)
    embed_layer1 = Embedding(100, 30, input_length=100, mask_zero=True)(input0)
    merged = concatenate([input1,embed_layer1])

    x   = rnn_layer(units, activation='tanh', return_sequences=True)(merged)
    out = rnn_layer(units, activation='tanh', return_sequences=False)(x)
    model = Model(ipt, out)
    model.compile(Adam(4e-3), 'mse')
    return model
    
def make_data(batch_shape):
    return np.random.randn(*batch_shape).astype('int32')+10, \
           np.random.randint(1, 99, (batch_shape[0], units))

def train_model(model, iterations, batch_shape):
    x, y = make_data(batch_shape)
    for i in range(iterations):
        model.train_on_batch(x, y)
        print(end='.')  # progbar
        if i % 40 == 0:
            x, y = make_data(batch_shape)

units = 6
batch_shape = (16, 100, 2*units)

model = make_model(LSTM, batch_shape, units)
train_model(model, 300, batch_shape)

x, y  = make_data(batch_shape)
grads_all  = get_gradients(model, 1, x, y)  # return_sequences=True,  layer index 1
grads_last = get_gradients(model, 2, x, y)  # return_sequences=False, layer index 2

I got this error:

AttributeError: Tensor.name is meaningless when eager execution is enabled.

My model is the next:

 #Residual block
def ResBlock(x,filters,kernel_size,dilation_rate):
    r=Conv1D(filters,kernel_size,padding='same',dilation_rate=dilation_rate,activation='relu')(x) #first convolution
    r=Conv1D(filters,kernel_size,padding='same',dilation_rate=dilation_rate)(r) #Second convolution
    if x.shape[-1]==filters:
        shortcut=x
    else:
        shortcut=Conv1D(filters,kernel_size,padding='same')(x) #shortcut (shortcut)
    o=add([r,shortcut])
    o=Activation('relu')(o) #Activation function
    return o

def build_model_withEmbedding(num_of_event,num_of_link,num_of_resource,maxlen,numOfFeat):       
    dropout=0.2
    n_timesteps = maxlen
    input_ = Input(shape=(n_timesteps,numOfFeat))
    input0 = Lambda(lambda x: x[:,:,0])(input_)
    input1 = Lambda(lambda x: x[:,:,1])(input_)
    input2 = Lambda(lambda x: x[:,:,2:3])(input_)
    input3 = Lambda(lambda x: x[:,:,3:])(input_)
    
    # embedding head Event
    embed_layer1 = Embedding(num_of_event+1, 30, input_length=n_timesteps, mask_zero=True)(input0)
    
    # embedding head Resource display name
    embed_layer2 = Embedding(num_of_resource+1, 30, input_length=n_timesteps, mask_zero=True)(input1)
    
    # lstm head Success
    lstm_layer = input2
    
    # CountVectorizer head Success
    CountVectorizer_layer = input3
    
    # merged head
    merged = concatenate([embed_layer1,embed_layer2,lstm_layer,CountVectorizer_layer])

    x=ResBlock(merged,filters=32,kernel_size=3,dilation_rate=1)
    x=ResBlock(x,filters=32,kernel_size=3,dilation_rate=2)
    x=ResBlock(x,filters=16,kernel_size=3,dilation_rate=4)
    merged = Flatten()(x)
    merged = Dense(10)(merged)
    outputs =Dense(1)(merged)
    
    model = Model(inputs=input_, outputs=outputs)
    model.compile(loss='mae', optimizer='adam')  
  
    return model
model = build_model_withEmbedding(num_of_event,num_of_link,num_of_resource,maxlen,train_X.shape[2])

[Query] object has no attribute '_clip_gradients' for TF2.4

Hi,

I am using inspect_gen.py with TF2.4. I am encountering the function _clip_gradients here. However, the optimizers class doesn't have a _clip_gradients function. Please let me know what am I missing or an equivalent function for the same.

Thanks

Model comparison, interpretation questions

Unable to import get_gradients / get_outputs from see_rnn==1.15.1 and 1.15.0

In my jupyter I have

%pip install see_rnn==1.15.1

from see_rnn import get_gradients, get_outputs

ModuleNotFoundError Traceback (most recent call last)
[/.......-based.ipynb]/...based.ipynb) Cell 33 line 1
----> 1 from see_rnn import get_gradients, get_outputs

I also tried with version 1.15.0 .

What am I missing ?

Naive question

Dear @OverLordGoldDragon

Recently, I got familiar with your wondeful package and I found it very helpful for my research. I am doing research in the field of neurodegenarative disease and more specifically Alzheimer disease. I have a problem in generating saliency maps regarding the time dimension and the features which are most significant to the predictions. I trained an LSTM network to predict the disease state of patients for the next 5 time points. I have 368 patients, 5 time-points, and almost 20 features. Now, I want to show which of these features are play an important role in classifying the patients for each time step out of those 5 predicted. I found that the saliency map is able to show that To be honest, I tried a lot to do it myself, yet I was not successful. Now, I would like to know if I can do it with your wonderful package and if yes how I can try it with my own model?

All of your kind help would be appreciated in advance.

Sincerely yours,

Visualize features in CNN+LSTM

Hello!
Great work, i really appreciate it!
I am developing a network for video sequence classification composed by:
CNN (for feature extraction) + LSTM (for sequence class prediction).
The features of each sequence (let's consider 30 frames) are given from a pre-trained network fed into the LSTM with return_sequence=False.
Is possible to visualize the features that let the classification be possible, in a heatmap form?
I am facing with some troubles understanding the way of do that with your tool, even if it's seems very powerful.
Thank you for the help,
Best

How to cite your work

Hi, there.
I really appreciate your work , and it helped me solve a big problem in my research work.
Now I'm writing paper, and I want to cite your work to express my gratitude.
So how do I cite your work? Or have you published academic papers based on this GitRepo？
I am looking forward for your reply

give visibility to _watch_layer_outputs

Hi @OverLordGoldDragon, i made use of _watch_layer_outputs and with some changes in your code I managed to visualize a custom made RNN using Keras Model Subclassing API. While your implementation through a Wrapper helped me a lot, I did not notice the function even existed but after some time. Since the function is not called by any other one, perhaps it could be included in the README.md so other people can make use of it. Thanks for the awesome work, I could visually check my vanishing gradient problem!!

AttributeError: 'ResourceVariable' object has no attribute '_keras_history'

I have a GRU model (with return_sequence=False).
I am trying to use:
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

However, I am getting the below error: