Coder Social home page Coder Social logo

matlab-deep-learning / transformer-models Goto Github PK

View Code? Open in Web Editor NEW
179.0 23.0 55.0 162 KB

Deep Learning Transformer models in MATLAB

License: Other

MATLAB 100.00%
deep-learning matlab pretrained-models transformer gpt-2 gpt2 bert finbert matlab-deep-learning

transformer-models's Introduction

Transformer Models for MATLAB

CircleCI Open in MATLAB Online

This repository implements deep learning transformer models in MATLAB.

Translations

Requirements

BERT and FinBERT

  • MATLAB R2021a or later
  • Deep Learning Toolbox
  • Text Analytics Toolbox

GPT-2

  • MATLAB R2020a or later
  • Deep Learning Toolbox

Getting Started

Download or clone this repository to your machine and open it in MATLAB.

Functions

bert

mdl = bert loads a pretrained BERT transformer model and if necessary, downloads the model weights. The output mdl is structure with fields Tokenizer and Parameters that contain the BERT tokenizer and the model parameters, respectively.

mdl = bert("Model",modelName) specifies which BERT model variant to use:

  • "base" (default) - A 12 layer model with hidden size 768.
  • "multilingual-cased" - A 12 layer model with hidden size 768. The tokenizer is case-sensitive. This model was trained on multi-lingual data.
  • "medium" - An 8 layer model with hidden size 512.
  • "small" - A 4 layer model with hidden size 512.
  • "mini" - A 4 layer model with hidden size 256.
  • "tiny" - A 2 layer model with hidden size 128.
  • "japanese-base" - A 12 layer model with hidden size 768, pretrained on texts in the Japanese language.
  • "japanese-base-wwm" - A 12 layer model with hidden size 768, pretrained on texts in the Japanese language. Additionally, the model is trained with the whole word masking enabled for the masked language modeling (MLM) objective.

bert.model

Z = bert.model(X,parameters) performs inference with a BERT model on the input 1-by-numInputTokens-by-numObservations array of encoded tokens with the specified parameters. The output Z is an array of size (NumHeads*HeadSize)-by-numInputTokens-by-numObservations. The element Z(:,i,j) corresponds to the BERT embedding of input token X(1,i,j).

Z = bert.model(X,parameters,Name,Value) specifies additional options using one or more name-value pairs:

  • "PaddingCode" - Positive integer corresponding to the padding token. The default is 1.
  • "InputMask" - Mask indicating which elements to include for computation, specified as a logical array the same size as X or as an empty array. The mask must be false at indices positions corresponds to padding, and true elsewhere. If the mask is [], then the function determines padding according to the PaddingCode name-value pair. The default is [].
  • "DropoutProb" - Probability of dropout for the output activation. The default is 0.
  • "AttentionDropoutProb" - Probability of dropout used in the attention layer. The default is 0.
  • "Outputs" - Indices of the layers to return outputs from, specified as a vector of positive integers, or "last". If "Outputs" is "last", then the function returns outputs from the final encoder layer only. The default is "last".
  • "SeparatorCode" - Separator token specified as a positive integer. The default is 103.

finbert

mdl = finbert loads a pretrained BERT transformer model for sentiment analysis of financial text. The output mdl is structure with fields Tokenizer and Parameters that contain the BERT tokenizer and the model parameters, respectively.

mdl = finbert("Model",modelName) specifies which FinBERT model variant to use:

  • "sentiment-model" (default) - The fine-tuned sentiment classifier model.
  • "language-model" - The FinBERT pretrained language model, which uses a BERT-Base architecture.

finbert.sentimentModel

sentiment = finbert.sentimentModel(X,parameters) classifies the sentiment of the input 1-by-numInputTokens-by-numObservations array of encoded tokens with the specified parameters. The output sentiment is a categorical array with categories "positive", "neutral", or "negative".

[sentiment, scores] = finbert.sentimentModel(X,parameters) also returns the corresponding sentiment scores in the range [-1 1].

gpt2

mdl = gpt2 loads a pretrained GPT-2 transformer model and if necessary, downloads the model weights.

generateSummary

summary = generateSummary(mdl,text) generates a summary of the string or char array text using the transformer model mdl. The output summary is a char array.

summary = generateSummary(mdl,text,Name,Value) specifies additional options using one or more name-value pairs.

  • "MaxSummaryLength" - The maximum number of tokens in the generated summary. The default is 50.
  • "TopK" - The number of tokens to sample from when generating the summary. The default is 2.
  • "Temperature" - Temperature applied to the GPT-2 output probability distribution. The default is 1.
  • "StopCharacter" - Character to indicate that the summary is complete. The default is ".".

Example: Classify Text Data Using BERT

The simplest use of a pretrained BERT model is to use it as a feature extractor. In particular, you can use the BERT model to convert documents to feature vectors which you can then use as inputs to train a deep learning classification network.

The example ClassifyTextDataUsingBERT.m shows how to use a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the factoryReports.csv data set from the Text Analytics Toolbox example Prepare Text Data for Analysis.

Example: Fine-Tune Pretrained BERT Model

To get the most out of a pretrained BERT model, you can retrain and fine tune the BERT parameters weights for your task.

The example FineTuneBERT.m shows how to fine-tune a pretrained BERT model to classify failure events given a data set of factory reports. This example requires the factoryReports.csv data set from the Text Analytics Toolbox example Prepare Text Data for Analysis.

The example FineTuneBERTJapanese.m shows the same workflow using a pretrained Japanese-BERT model. This example requires the factoryReportsJP.csv data set from the Text Analytics Toolbox example Analyze Japanese Text Data, available in R2023a or later.

Example: Analyze Sentiment with FinBERT

FinBERT is a sentiment analysis model trained on financial text data and fine-tuned for sentiment analysis.

The example SentimentAnalysisWithFinBERT.m shows how to classify the sentiment of financial news reports using a pretrained FinBERT model.

Example: Predict Masked Tokens Using BERT and FinBERT

BERT models are trained to perform various tasks. One of the tasks is known as masked language modeling which is the task of predicting tokens in text that have been replaced by a mask value.

The example PredictMaskedTokensUsingBERT.m shows how to predict masked tokens and calculate the token probabilities using a pretrained BERT model.

The example PredictMaskedTokensUsingFinBERT.m shows how to predict masked tokens for financial text using and calculate the token probabilities using a pretrained FinBERT model.

Example: Summarize Text Using GPT-2

Transformer networks such as GPT-2 can be used to summarize a piece of text. The trained GPT-2 transformer can generate text given an initial sequence of words as input. The model was trained on comments left on various web pages and internet forums.

Because lots of these comments themselves contain a summary indicated by the statement "TL;DR" (Too long, didn't read), you can use the transformer model to generate a summary by appending "TL;DR" to the input text. The generateSummary function takes the input text, automatically appends the string "TL;DR" and generates the summary.

The example SummarizeTextUsingTransformersExample.m shows how to summarize a piece of text using GPT-2.

transformer-models's People

Contributors

bwdgithub avatar conordaly0 avatar debymf avatar jianghaw avatar misataguchi avatar segunshums avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformer-models's Issues

GPT-2 doesn't include dropout layers

We would like to use these issues to gauge user interest.

The GPT-2 implementation does not include dropout layers. This would be useful for further pre-training and fine-tuning workflows to prevent overfitting.

what is numInputSubwords-by-numObs

Hi, may I ask a question on what is numInputSubwords and numObs respectively in the code? For example with ''HOW ARE YOU'', ignoring the posional coding and embedding, can you give me hint on what is numInputSubwords and numObs respectively?

Do have any examples of "vision transformer"?

In addition to the NLP applications, what about the CV applications? Are there any examples of transformer being used for "object detection", "behaviour recognition" or even "image classification"? It would be nice to have

Allow usage of `tokenizedDocument` in BERT tokenization

We would like to use these issues to gauge user interest.

The BERT tokenizer is intended as an identical reimplementation of the original BERT tokenization. However it is possible to replace the bert.tokenizer.internal.BasicTokenizer with a tokenizer using tokenizedDocument.

The belief is this should not affect the model too much as the wordpiece encoding is still the same, and it is these wordpiece encoded sub-tokens that are the input to the model.

Advantages of this are that tokenizedDocument is considerably faster than BasicTokenizer and may offer better integration with Text Analytics Toolbox functionality.

Add a GPT-2 training example

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

  1. Add support for a smaller GPT-2 model.
  2. Only train a subset of the GPT-2 parameters.
  3. Use gradient accumulation.
  4. Gradient checkpointing.
  5. Reduced precision gradients.

Query on nvp

Hi, I am confused about what is nvp of all the files, it seems like a input but I did not find the defination of that, or what is that? Can you give me a hint on that and how to define nvp, is that a method of something? It may be a stupid question :) I am a beginner

matlab cannot load variables of the BERT Fine-tuned on my task

Hi, I'm having an issue when loading variables from my fine-tuned model in matlab.
I would like to use the parameters and other variables of the BERT that I have retrained an fine-tuned on my task to classify new data.
I'm facing an error in matlab when I try to import 'parameters' and many other variables.
I would like also to use the data stored in the confusion matrix to compute some evaluation metrics, but when I load the variables
'TValidation' and 'YPredValidation' it gives an error as well.
Does anyone know how to fix this error or if there is another way to use my fine-tuned BERT model on new data?

Provide sparse cross entropy implementation

We would like to use these issues to gauge user interest.

Sparse cross entropy allows the computation of cross entropy loss without one-hot encoding of the target class. This is useful for language modeling as the target classes are the entire vocabulary which is a very large space to one-hot encode, and wouldn't be memory efficient.

It is possible to make a custom implementation of sparse cross entropy computation with dlarray.

Class balancing in FineTune BERT

How do we add custom class weights in FineTuneBERT.m?

For my dataset, there is a huge skew toward the majority class, and I am using a weight array to address that. Where should I incorportate that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.