kieranlitschel / xswem Goto Github PK

1.0 1.0 0.0 183 KB

A simple and explainable deep learning model for NLP.

License: MIT License

Python 21.34% Jupyter Notebook 78.66%

deep-learning explainable fast global-explanation glove keras local-explanation machine-learning model nlp simple swem tensorflow word-embeddings

xswem's People

Contributors

Stargazers

Watchers

xswem's Issues

Allow users to set the parameters of any layer

We should allow users to set the parameter of any layer.

We could do this using a configuration dictionary passed to the constructor of the model. This could map layer names to the configuration dictionary for the corresponding layer. The layers configuration dictionary could then be unpacked in the layers constructor using **kwargs.

Investigate how to visualize local explanations

It'd be nice if we could visualize the local explanations on the input sentence to make it easier to understand explanations. Investigate how to do this

Add option to initialize model with pre-trained GloVe word embeddings

We have found that we are able to achieve similar performance by initializing word embeddings randomly. But in the original paper, the author initialized them with pre-trained GloVe word embeddings. We should enable this functionality by implementing the following:

Allow users to initialize the embedding layer with their own pre-trained weights. We should recommend them to use the pre-trained GloVe weights. Words that the user does not have pre-trained weights for should be initialized using a random uniform distribution with values in the range -0.01 to 0.01.

The authors also sometimes added a Dense layer between embedding and pooling layers to allow the model to adapt the embeddings to the task of interest. We should allow users to do this by implementing the following:

Allow users to optionally specify a Dense layer should be included between the embedding and pooling layer. It should have the same number of units as the embedding layer and use a ReLU activation function.

Typically we would freeze the embedding layer when using pre-trained weights. But the author does not mention this explicitly in their paper, nor freeze the weights in their original source code. So in our implementation embedding weights are trainable.

Implement global explainability for word embedding components

In section 4.1.1 of the original paper, the authors proposed a method for interpreting the components of the embeddings learned by SWEM-max. We should implement this method in XSWEM.

To do this we need to first implement a function that allows users to generate a histogram from their word embeddings so that they can confirm whether the embeddings learned for their model are also sparse. Second, we need to implement a function that returns the n words with the largest values for each component (n=5 should be the default).

Remove datasets as a requirement for this package

We only require the datasets package for the demo notebooks. We should remove datasets as a requirement and instead pip install it at the start of each notebook

Implement method to determine most salient words

At maximum, d words (where d is size of the embedding size) from the input sentence contribute to the output of the network. This is because of the max-pooling layer, with it only keeping the maximum value of each dimension across the embeddings of the input sentence. This means at maximum d words contribute to the output of the max-pooling layer.

Thus where d is smaller than the unique words in the input sentence, the max-pooling layer has the effect of shortlisting the d most important words needed to make a prediction. If d is larger than the number of the unique words in the input sentence, it still can have the effect of shortlisting words because some words may have the maximum value for multiple dimensions, but shortlisting is not guaranteed.

We can find the shortlisted words by taking an argmax for each dimension across the embeddings of the input sentence. We should add a function to XSWEM to do this. This can be used as a method for local explainability.

Apply dropout to the input of the pooling layer instead of the output

Currently, dropout is applied to the output of the pooling layer. After reading the paper again and looking at the source code for it, we realized that it should be applied to the input of the pooling layer instead.

Setup automatic documentation

We are using numpy style docstrings. These can be used to automatically generate documentation in Sphinx using the Napoleon extension. We should set this up.

Add option to freeze embedding weights

Implement SWEM-max

Create child class of tf.keras.Model which implements SWEM-max as described in section 3.3 of the original paper.

Dropout before max pooling killing embedding components during training

When a unit is dropped out its value is set to 0. As we are applying dropout directly to the word embeddings, for long input sequences, it becomes increasingly likely that at least one component in each dimension will be set to zero. This means that negative components can often die, as they get stuck with negative values due to the zeros being introduced by dropout being taken as the maximum.

This is particularly problematic as our distribution for initializing embeddings is centred at zero, meaning around half of the components are initialized as values less than zero. The histogram below exemplifies this issue.

One possible solution is to initialize all embedding weights with values greater than zero. This should significantly reduce the number of dying units, but units will still die if they are updated with a value less than zero.

A better solution would be to make it so that zero is ignored during the max-pooling operation. But this may slow down training significantly, which would make the first solution more preferable.

Investigate setting embedding weights not seen in training to zero to reduce saved model size

In #10 we observed that it looks like a lot of the weights in the embeddings are never seen during training so maintain their initialized values. From what we can tell this seems to be happening for most weights initialized with a negative value. If we randomly initialize our embedding layer, weights that have never been seen during training have little to contribute to prediction at test time as their value is random. We may be able to use this to make our saved models smaller.

After training, we could check which weights have values that have not changed from their initialized value and set them to zero. Then when saving the matrix of embedding weights we just need to save the non-zero values and where they occur in the matrix.

kieranlitschel / xswem Goto Github PK

xswem's People

Contributors

Stargazers

Watchers

xswem's Issues

Allow users to set the parameters of any layer

Investigate how to visualize local explanations

Add option to initialize model with pre-trained GloVe word embeddings

Implement global explainability for word embedding components

Remove datasets as a requirement for this package

Implement method to determine most salient words

Apply dropout to the input of the pooling layer instead of the output

Setup automatic documentation

Add option to freeze embedding weights

Implement SWEM-max

Dropout before max pooling killing embedding components during training

Investigate setting embedding weights not seen in training to zero to reduce saved model size

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent