kieranlitschel / xswem Goto Github PK
View Code? Open in Web Editor NEWA simple and explainable deep learning model for NLP.
License: MIT License
A simple and explainable deep learning model for NLP.
License: MIT License
We should allow users to set the parameter of any layer.
We could do this using a configuration dictionary passed to the constructor of the model. This could map layer names to the configuration dictionary for the corresponding layer. The layers configuration dictionary could then be unpacked in the layers constructor using **kwargs.
It'd be nice if we could visualize the local explanations on the input sentence to make it easier to understand explanations. Investigate how to do this
We have found that we are able to achieve similar performance by initializing word embeddings randomly. But in the original paper, the author initialized them with pre-trained GloVe word embeddings. We should enable this functionality by implementing the following:
The authors also sometimes added a Dense layer between embedding and pooling layers to allow the model to adapt the embeddings to the task of interest. We should allow users to do this by implementing the following:
Typically we would freeze the embedding layer when using pre-trained weights. But the author does not mention this explicitly in their paper, nor freeze the weights in their original source code. So in our implementation embedding weights are trainable.
In section 4.1.1 of the original paper, the authors proposed a method for interpreting the components of the embeddings learned by SWEM-max. We should implement this method in XSWEM.
To do this we need to first implement a function that allows users to generate a histogram from their word embeddings so that they can confirm whether the embeddings learned for their model are also sparse. Second, we need to implement a function that returns the n words with the largest values for each component (n=5 should be the default).
We only require the datasets package for the demo notebooks. We should remove datasets as a requirement and instead pip install it at the start of each notebook
At maximum, d words (where d is size of the embedding size) from the input sentence contribute to the output of the network. This is because of the max-pooling layer, with it only keeping the maximum value of each dimension across the embeddings of the input sentence. This means at maximum d words contribute to the output of the max-pooling layer.
Thus where d is smaller than the unique words in the input sentence, the max-pooling layer has the effect of shortlisting the d most important words needed to make a prediction. If d is larger than the number of the unique words in the input sentence, it still can have the effect of shortlisting words because some words may have the maximum value for multiple dimensions, but shortlisting is not guaranteed.
We can find the shortlisted words by taking an argmax for each dimension across the embeddings of the input sentence. We should add a function to XSWEM to do this. This can be used as a method for local explainability.
Currently, dropout is applied to the output of the pooling layer. After reading the paper again and looking at the source code for it, we realized that it should be applied to the input of the pooling layer instead.
We are using numpy style docstrings. These can be used to automatically generate documentation in Sphinx using the Napoleon extension. We should set this up.
Create child class of tf.keras.Model which implements SWEM-max as described in section 3.3 of the original paper.
When a unit is dropped out its value is set to 0. As we are applying dropout directly to the word embeddings, for long input sequences, it becomes increasingly likely that at least one component in each dimension will be set to zero. This means that negative components can often die, as they get stuck with negative values due to the zeros being introduced by dropout being taken as the maximum.
This is particularly problematic as our distribution for initializing embeddings is centred at zero, meaning around half of the components are initialized as values less than zero. The histogram below exemplifies this issue.
One possible solution is to initialize all embedding weights with values greater than zero. This should significantly reduce the number of dying units, but units will still die if they are updated with a value less than zero.
A better solution would be to make it so that zero is ignored during the max-pooling operation. But this may slow down training significantly, which would make the first solution more preferable.
In #10 we observed that it looks like a lot of the weights in the embeddings are never seen during training so maintain their initialized values. From what we can tell this seems to be happening for most weights initialized with a negative value. If we randomly initialize our embedding layer, weights that have never been seen during training have little to contribute to prediction at test time as their value is random. We may be able to use this to make our saved models smaller.
After training, we could check which weights have values that have not changed from their initialized value and set them to zero. Then when saving the matrix of embedding weights we just need to save the non-zero values and where they occur in the matrix.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.