Coder Social home page Coder Social logo

kaspergroesludvigsen / influenza_transformer Goto Github PK

View Code? Open in Web Editor NEW
239.0 5.0 79.0 375 KB

PyTorch implementation of Transformer model used in "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case"

Python 100.00%
deep-learning forecasting forecasting-models neural-networks pytorch sequence-to-sequence time-series time-series-forecasting transformers

influenza_transformer's Introduction

How to code a Transformer model for time series forecasting in PyTorch

PyTorch implementation of Transformer model used in "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case"

This is the repo for the two Towards Data Science article called "How to make a Transformer for time series forecasting with PyTorch" and "How to run inference with a PyTorch time series Transformer"

The first article explains step by step how to code the Transformer model used in the paper "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case." The article uses the Transformer architecture diagram from the paper as the point of departure and shows step by step how to implement the model with PyTorch.

The second article explains how to use the time series Transformer at inference time where you don't know the decoder input values.

The sandbox.py file shows how to use the Transformer to make a training prediction on the data from the .csv file in "/data".

The inference.py file contains the function that takes care of inference, and the inference_example.py file shows a pseudo-ish code example of how to use the function during model validation and testing.

influenza_transformer's People

Contributors

holgarsson avatar kaspergroesludvigsen avatar philipph77 avatar smathewmanuel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

influenza_transformer's Issues

NameError: name 'training_dataloader' and 'validation_dataloader' is not defined

In the Inference_example.py file 'training_dataloader' and 'validation_dataloader' is not defined. When I am using this it shown a error
Traceback (most recent call last):
File "C:\Users\admin\Desktop\Transformer-main\inference_example.py", line 47, in
for i, (src, tgt, tgt_y) in enumerate(training_dataloader):
^^^^^^^^^^^^^^^^^^^
NameError: name 'training_dataloader' is not defined

Predictions being a straight line along the x-axis

I have implemented the example shared, but the doesn't seem to learn and the loss isn't decreasing.
I used the following training loop.
model = tst.TimeSeriesTransformer(
input_size=1,
dec_seq_len=enc_seq_len,
batch_first=batch_first,
num_predicted_features=1
)

Define your loss function and optimizer

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Number of training epochs

num_epochs = 10

for epoch in range(num_epochs):
model.train()
for i, batch in enumerate(training_data):

    src, trg, trg_y = batch
    

    # Zero the gradients
    optimizer.zero_grad()

    # Permute from shape [batch size, seq len, num features] to [seq len, batch size, num features]
    if batch_first == False:

        shape_before = src.shape
        src = src.permute(1, 0, 2)
        #print("src shape changed from {} to {}".format(shape_before, src.shape))

        shape_before = trg.shape
        trg = trg.permute(1, 0, 2)
        #print("trg shape changed from {} to {}".format(shape_before, trg.shape))
    

    output = model(
        src=src,
        tgt=trg,
        src_mask=src_mask,
        tgt_mask=tgt_mask)
        
 
    
    # # Permute output from [seq len, batch size] to [batch size, seq len]
    trg_y = trg_y.reshape(trg_y.shape[0], trg_y.shape[1], 1)
    trg_y = trg_y.permute(1, 0, 2)

    # print('output shape: ', output.shape)
    # print('target_y shape: ', trg_y.shape)



    # Calculate the loss
    loss = criterion(output, trg_y)

    # Backpropagation
    loss.backward()

    # Update parameters
    optimizer.step()

# Print training progress
print(f'Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{len(training_data)}], Loss: {loss.item()}')

Could you please tell me what I am doing wrong?

RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x512 and 24576x48)

Hi, I run the sandbox and got an error:

Reading file in data/dfs_merged_upload.csv
From get_src_trg: data size = torch.Size([41387, 1])
input_size is: 1
dim_val is: 512
Traceback (most recent call last):
  File "/Users/influenza_transformer-main/sandbox.py", line 101, in <module>
    tgt_mask=tgt_mask
  File "/Users/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/influenza_transformer-main/transformer_timeseries.py", line 238, in forward
    decoder_output= self.linear_mapping(decoder_output)
  File "/Users/opt/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/opt/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x512 and 24576x48)

typo line 88,89 sandbox.py

I believe there is a typo in sandbox.py line 88 and 89.

`
src, trg, trg_y = batch

Permute from shape [batch size, seq len, num features] to [seq len, batch size, num features]

if batch_first == False:

shape_before = src.shape
src = src.permute(1, 0, 2)
print("src shape changed from {} to {}".format(shape_before, src.shape))

shape_before = tgt.shape
tgt = tgt.permute(1, 0, 2)
print("src shape changed from {} to {}".format(shape_before, src.shape))

`

tgt does not exist in this context atm. I believe tgt s/b trg.

There are negative values in the output, how to get the predicted value? Thank you

output:tensor([[ 0.9394, -0.0657, -0.5242, ..., -0.3573, -0.3968, -0.8111],
[ 0.3269, 0.1186, -0.4571, ..., -0.1672, 0.0763, -0.6789],
[ 0.8612, 0.2685, -0.3978, ..., -0.2583, 0.0484, -0.9527],
...,
[ 0.8189, -0.3896, -0.2842, ..., -0.5775, -0.3065, -0.8513],
[ 0.5229, 0.0786, -0.3587, ..., -0.1097, -0.3169, -0.6342],
[ 0.8320, -0.1121, -0.6839, ..., -0.6073, 0.0972, -0.7809]],
grad_fn=)
output_size:torch.Size([128, 48])

Why do we need memory mask in the time series problem?

The reason of using tgt_mask is clear. However, I don't understand any intuitive idea of using mem_mask in your problem. Why do we need prevent the decoder to attend to the whole past data (encoder output/memory)?

Trained model give constant value prediction outputs

Hello there,

First of all, I want to say this implementation seems really interesting. I am currently working on timeseries forecasting with Transformer model for my thesis. I studied the following posts:
https://towardsdatascience.com/how-to-make-a-pytorch-transformer-for-time-series-forecasting-69e073d4061e
https://towardsdatascience.com/how-to-run-inference-with-a-pytorch-time-series-transformer-394fd6cbe16c
as well as your repository and code.

It was a huge help for me to implement the model with pytorch, so I want to thank you for sharing your implementation.
I implemented the TimeseriesTransformer class on my project, as well as a few other functions of my own, about training, validation, Batchify data etc.
I modified a little bit the positional encoder part so it can be compatible with a single sequence input. Also, I fixed the uncompatibility issue of batch_first = true, but I already saw that someone else fixed that before me.

Now let me introduce you to my issue. I tried to train the model with some timeseries data. I used the MSE loss function and Adam optimizer as you introduced in the inference post. I trained it for 53 epochs of 21 batches of 16 data each, as it seemed like it was converging at that point. I trained it in a dataset of 357 time units with encoder_in_len = 10 and decoder_in_len = 1. Each sequence was shifted left by one unit, compared to the previous sequence in a batch.
For example, the first 3 sequences of the first batch:

seq1: 343, 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134 (T1->T10)
seq2: 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237 (T2->T11)
seq3: 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237, 698 (T3->T12)
...

The thing is, both for these data and dummy data I used earlier, the model seems to make a wrong inference.
Any time I tried to make an inference on new/trained data to test, the prediction was a constant value for all sequences in the batch.

For example I got the following inference output for the above sequences:
427.4353, 427.4353, 427.4353 (decoder input 134, 237, 698 respectively. The output should be 237, 698 and 0, where T13 = 0)

What do you think is going on here? Is it the loss function, the optimizer, the model or what?
I will really appriciate your answer!

Implementation

Hi,
I am trying to implement the code using my data. But I am not able to implement it. Does the repo contain full code? Could you please implement the valid and test portion of your code?

Time Series Transformer doesn't converge

I have a question, I was able to run the training loop but the transformer doesn't converge,
is it because of the dataset? or it is because of some wrong hyper parameters or a chosen loss function.

thanks.

wrong dimmensions in sandbox.py example

There is some issue with the sandbox.py file's input parameter. I get the following error

Traceback (most recent call last):
  File "/Users/anshumansinha/Desktop/StructRepGen_Dev/influenza_transformer-main/sandbox.py", line 163, in <module>
    prediction = model(src, tgt, src_mask, tgt_mask)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/Desktop/StructRepGen_Dev/influenza_transformer-main/transformer_timeseries.py", line 226, in forward
    decoder_output = self.decoder(
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 369, in forward
    output = mod(output, memory, tgt_mask=tgt_mask,
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 716, in forward
    x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask, tgt_is_causal))
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 725, in _sa_block
    x = self.self_attn(x, x, x,
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1205, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/functional.py", line 5251, in multi_head_attention_forward
    raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([48, 48]), but should be (128, 128).
(victor_env) (base) anshumansinha@Anshumans-MacBook-Pro-3 influenza_transformer-main % 

Bug: Error in Positional Encoder if batch_first=True

If setting batch_first=True, the Positional Encoder throws an error:

RuntimeError: The size of tensor a (batchsize) must match the size of tensor b (max_sequence_length) at non-singleton dimension 0

The code in this repository was adapted from the official PyTorch Tutorial, in order to also obtain positional encodings for inputs x, that are provided in the form of (B,T,D), and no only if provided as (T,B,D). Unfortunately, these adaptions seem to be incorrect.
(B: Batchsize, T: Sequence Length, D: Embedding Dimensions)

Code to reproduce this issue:

import torch
import torch.nn as nn 
import math
from torch import nn, Tensor

class PositionalEncoder(nn.Module):
    """
    The authors of the original transformer paper describe very succinctly what 
    the positional encoding layer does and why it is needed:
    
    "Since our model contains no recurrence and no convolution, in order for the 
    model to make use of the order of the sequence, we must inject some 
    information about the relative or absolute position of the tokens in the 
    sequence." (Vaswani et al, 2017)
    Adapted from: 
    https://pytorch.org/tutorials/beginner/transformer_tutorial.html
    """

    def __init__(
        self, 
        dropout: float=0.1, 
        max_seq_len: int=5000, 
        d_model: int=512,
        batch_first: bool=False
        ):

        """
        Parameters:
            dropout: the dropout rate
            max_seq_len: the maximum length of the input sequences
            d_model: The dimension of the output of sub-layers in the model 
                     (Vaswani et al, 2017)
        """

        super().__init__()

        self.d_model = d_model
        
        self.dropout = nn.Dropout(p=dropout)

        self.batch_first = batch_first

        self.x_dim = 1 if batch_first else 0

        # copy pasted from PyTorch tutorial
        position = torch.arange(max_seq_len).unsqueeze(1)
        
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        
        pe = torch.zeros(max_seq_len, 1, d_model)
        
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        
        self.register_buffer('pe', pe)
        
    def forward(self, x: Tensor) -> Tensor:
        """
        Args:
            x: Tensor, shape [batch_size, enc_seq_len, dim_val] or 
               [enc_seq_len, batch_size, dim_val]
        """

        x = x + self.pe[:x.size(self.x_dim)]

        return self.dropout(x)

if __name__ == "__main__":
  batchsize = 64
  max_sequence_length = 200
  embedding_dim = 512
  
  pe = PositionalEncoder(max_seq_len=max_sequence_length, d_model=embedding_dim, batch_first=True)
  x = torch.randn(size=(batchsize, max_sequence_length, embedding_dim))
  pe(x)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.