kaspergroesludvigsen / influenza_transformer Goto Github PK

PyTorch implementation of Transformer model used in "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case"

Python 100.00%

deep-learning forecasting forecasting-models neural-networks pytorch sequence-to-sequence time-series time-series-forecasting transformers

influenza_transformer's Introduction

How to code a Transformer model for time series forecasting in PyTorch

PyTorch implementation of Transformer model used in "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case"

This is the repo for the two Towards Data Science article called "How to make a Transformer for time series forecasting with PyTorch" and "How to run inference with a PyTorch time series Transformer"

The first article explains step by step how to code the Transformer model used in the paper "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case." The article uses the Transformer architecture diagram from the paper as the point of departure and shows step by step how to implement the model with PyTorch.

The second article explains how to use the time series Transformer at inference time where you don't know the decoder input values.

The sandbox.py file shows how to use the Transformer to make a training prediction on the data from the .csv file in "/data".

The inference.py file contains the function that takes care of inference, and the inference_example.py file shows a pseudo-ish code example of how to use the function during model validation and testing.

influenza_transformer's People

Contributors

Stargazers

Watchers

Forkers

kdcro101 film2003 benjaminmlucas zplu yingweiy htang2012 historicallyaquant gary88888888 sunny635 holgarsson ltzryan sukhoon-jung hamada-noreldeen rajesh16702 philipph77 doheelab ajayarunachalam hangzhang10 smathewmanuel qwedon maryhedy ahmedmaged0 jldevezas belvo na018 sravan212100 swied arvintashakori ud-malik acoder1983 jonasknecht vincehass nikolaidk samfenske piyushmishra12 rgb91 matheuscfernandes janschroe ai4infra amine179 wliangaz eliyas0007 lzh-lab algorithm004-02 bitsk 191220042 zephyruszhang aswanthmanoj liangqin12354 apoorv28goel pratyushtiw devhci funkyungjz s-onder xylary genliu777 hsouporto rpol-recart stephlee12 iambrownthunder shtwy manev100 1edwar peixingxie raman-kishore-maersk hikari88com helenbrgg qter21 radebajic ryne2010 tcrapse bchaoss bobsiunn prabindh huangshangfo li0217codeninja pidellacqua geotsek saidayousefi

influenza_transformer's Issues

multivariate predict variate

Hello. My data has multiple input features. How should I modify transfomer?

Training code sample

Hello, would you please share the training and validation process?

NameError: name 'training_dataloader' and 'validation_dataloader' is not defined

In the Inference_example.py file 'training_dataloader' and 'validation_dataloader' is not defined. When I am using this it shown a error
Traceback (most recent call last):
File "C:\Users\admin\Desktop\Transformer-main\inference_example.py", line 47, in
for i, (src, tgt, tgt_y) in enumerate(training_dataloader):
^^^^^^^^^^^^^^^^^^^
NameError: name 'training_dataloader' is not defined

Predictions being a straight line along the x-axis

I have implemented the example shared, but the doesn't seem to learn and the loss isn't decreasing.
I used the following training loop.
model = tst.TimeSeriesTransformer(
input_size=1,
dec_seq_len=enc_seq_len,
batch_first=batch_first,
num_predicted_features=1
)

Define your loss function and optimizer

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Number of training epochs

num_epochs = 10

for epoch in range(num_epochs):
model.train()
for i, batch in enumerate(training_data):

    src, trg, trg_y = batch
    

    # Zero the gradients
    optimizer.zero_grad()

    # Permute from shape [batch size, seq len, num features] to [seq len, batch size, num features]
    if batch_first == False:

        shape_before = src.shape
        src = src.permute(1, 0, 2)
        #print("src shape changed from {} to {}".format(shape_before, src.shape))

        shape_before = trg.shape
        trg = trg.permute(1, 0, 2)
        #print("trg shape changed from {} to {}".format(shape_before, trg.shape))
    

    output = model(
        src=src,
        tgt=trg,
        src_mask=src_mask,
        tgt_mask=tgt_mask)
        
 
    
    # # Permute output from [seq len, batch size] to [batch size, seq len]
    trg_y = trg_y.reshape(trg_y.shape[0], trg_y.shape[1], 1)
    trg_y = trg_y.permute(1, 0, 2)

    # print('output shape: ', output.shape)
    # print('target_y shape: ', trg_y.shape)



    # Calculate the loss
    loss = criterion(output, trg_y)

    # Backpropagation
    loss.backward()

    # Update parameters
    optimizer.step()

# Print training progress
print(f'Epoch [{epoch+1}/{num_epochs}], Batch [{i+1}/{len(training_data)}], Loss: {loss.item()}')

Could you please tell me what I am doing wrong?

Can you provide the complete code? thank you

RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x512 and 24576x48)

Hi, I run the sandbox and got an error:

Reading file in data/dfs_merged_upload.csv
From get_src_trg: data size = torch.Size([41387, 1])
input_size is: 1
dim_val is: 512
Traceback (most recent call last):
  File "/Users/influenza_transformer-main/sandbox.py", line 101, in <module>
    tgt_mask=tgt_mask
  File "/Users/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/influenza_transformer-main/transformer_timeseries.py", line 238, in forward
    decoder_output= self.linear_mapping(decoder_output)
  File "/Users/opt/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/opt/anaconda3/envs/ycjk/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x512 and 24576x48)

typo line 88,89 sandbox.py

I believe there is a typo in sandbox.py line 88 and 89.

`
src, trg, trg_y = batch

Permute from shape [batch size, seq len, num features] to [seq len, batch size, num features]

if batch_first == False:

shape_before = src.shape
src = src.permute(1, 0, 2)
print("src shape changed from {} to {}".format(shape_before, src.shape))

shape_before = tgt.shape
tgt = tgt.permute(1, 0, 2)
print("src shape changed from {} to {}".format(shape_before, src.shape))

tgt does not exist in this context atm. I believe tgt s/b trg.

add indentation in line 160 of utils.py

Why do we need memory mask in the time series problem?

There are negative values in the output, how to get the predicted value? Thank you

output:tensor([[ 0.9394, -0.0657, -0.5242, ..., -0.3573, -0.3968, -0.8111],
[ 0.3269, 0.1186, -0.4571, ..., -0.1672, 0.0763, -0.6789],
[ 0.8612, 0.2685, -0.3978, ..., -0.2583, 0.0484, -0.9527],
...,
[ 0.8189, -0.3896, -0.2842, ..., -0.5775, -0.3065, -0.8513],
[ 0.5229, 0.0786, -0.3587, ..., -0.1097, -0.3169, -0.6342],
[ 0.8320, -0.1121, -0.6839, ..., -0.6073, 0.0972, -0.7809]],
grad_fn=)
output_size:torch.Size([128, 48])

Why do we need memory mask in the time series problem?

The reason of using tgt_mask is clear. However, I don't understand any intuitive idea of using mem_mask in your problem. Why do we need prevent the decoder to attend to the whole past data (encoder output/memory)?

why decoder dosnt have positional encoding

why decoder doesnot need positional encoding??

unable to train with more than one features. Error: mat1 and mat2 shapes cannot be multiplied (640x4 and 1x512) (model dim 512, batch size 128, no of features including target 4))

exogenous_vars = [] # I am adding feature columns to here, but sandbox.py works just when this list is empty

Trained model give constant value prediction outputs

Hello there,

First of all, I want to say this implementation seems really interesting. I am currently working on timeseries forecasting with Transformer model for my thesis. I studied the following posts:
https://towardsdatascience.com/how-to-make-a-pytorch-transformer-for-time-series-forecasting-69e073d4061e
https://towardsdatascience.com/how-to-run-inference-with-a-pytorch-time-series-transformer-394fd6cbe16c
as well as your repository and code.

It was a huge help for me to implement the model with pytorch, so I want to thank you for sharing your implementation.
I implemented the TimeseriesTransformer class on my project, as well as a few other functions of my own, about training, validation, Batchify data etc.
I modified a little bit the positional encoder part so it can be compatible with a single sequence input. Also, I fixed the uncompatibility issue of batch_first = true, but I already saw that someone else fixed that before me.

Now let me introduce you to my issue. I tried to train the model with some timeseries data. I used the MSE loss function and Adam optimizer as you introduced in the inference post. I trained it for 53 epochs of 21 batches of 16 data each, as it seemed like it was converging at that point. I trained it in a dataset of 357 time units with encoder_in_len = 10 and decoder_in_len = 1. Each sequence was shifted left by one unit, compared to the previous sequence in a batch.
For example, the first 3 sequences of the first batch:

seq1: 343, 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134 (T1->T10)
seq2: 0, 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237 (T2->T11)
seq3: 128, 62, 0, 2.001, 1.010, 0, 572, 134, 237, 698 (T3->T12)
...

The thing is, both for these data and dummy data I used earlier, the model seems to make a wrong inference.
Any time I tried to make an inference on new/trained data to test, the prediction was a constant value for all sequences in the batch.

For example I got the following inference output for the above sequences:
427.4353, 427.4353, 427.4353 (decoder input 134, 237, 698 respectively. The output should be 237, 698 and 0, where T13 = 0)

What do you think is going on here? Is it the loss function, the optimizer, the model or what?
I will really appriciate your answer!

Implementation

Hi,
I am trying to implement the code using my data. But I am not able to implement it. Does the repo contain full code? Could you please implement the valid and test portion of your code?

Time Series Transformer doesn't converge

I have a question, I was able to run the training loop but the transformer doesn't converge,
is it because of the dataset? or it is because of some wrong hyper parameters or a chosen loss function.

thanks.

wrong dimmensions in sandbox.py example

There is some issue with the sandbox.py file's input parameter. I get the following error

Traceback (most recent call last):
  File "/Users/anshumansinha/Desktop/StructRepGen_Dev/influenza_transformer-main/sandbox.py", line 163, in <module>
    prediction = model(src, tgt, src_mask, tgt_mask)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/Desktop/StructRepGen_Dev/influenza_transformer-main/transformer_timeseries.py", line 226, in forward
    decoder_output = self.decoder(
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 369, in forward
    output = mod(output, memory, tgt_mask=tgt_mask,
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 716, in forward
    x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask, tgt_is_causal))
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 725, in _sa_block
    x = self.self_attn(x, x, x,
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1205, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/Users/anshumansinha/miniconda3/lib/python3.10/site-packages/torch/nn/functional.py", line 5251, in multi_head_attention_forward
    raise RuntimeError(f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}.")
RuntimeError: The shape of the 2D attn_mask is torch.Size([48, 48]), but should be (128, 128).
(victor_env) (base) anshumansinha@Anshumans-MacBook-Pro-3 influenza_transformer-main %

Bug: Error in Positional Encoder if batch_first=True

If setting batch_first=True, the Positional Encoder throws an error:

RuntimeError: The size of tensor a (batchsize) must match the size of tensor b (max_sequence_length) at non-singleton dimension 0

The code in this repository was adapted from the official PyTorch Tutorial, in order to also obtain positional encodings for inputs x, that are provided in the form of (B,T,D), and no only if provided as (T,B,D). Unfortunately, these adaptions seem to be incorrect.
(B: Batchsize, T: Sequence Length, D: Embedding Dimensions)

Code to reproduce this issue:

import torch
import torch.nn as nn 
import math
from torch import nn, Tensor

class PositionalEncoder(nn.Module):
    """
    The authors of the original transformer paper describe very succinctly what 
    the positional encoding layer does and why it is needed:
    
    "Since our model contains no recurrence and no convolution, in order for the 
    model to make use of the order of the sequence, we must inject some 
    information about the relative or absolute position of the tokens in the 
    sequence." (Vaswani et al, 2017)
    Adapted from: 
    https://pytorch.org/tutorials/beginner/transformer_tutorial.html
    """

    def __init__(
        self, 
        dropout: float=0.1, 
        max_seq_len: int=5000, 
        d_model: int=512,
        batch_first: bool=False
        ):

        """
        Parameters:
            dropout: the dropout rate
            max_seq_len: the maximum length of the input sequences
            d_model: The dimension of the output of sub-layers in the model 
                     (Vaswani et al, 2017)
        """

        super().__init__()

        self.d_model = d_model
        
        self.dropout = nn.Dropout(p=dropout)

        self.batch_first = batch_first

        self.x_dim = 1 if batch_first else 0

        # copy pasted from PyTorch tutorial
        position = torch.arange(max_seq_len).unsqueeze(1)
        
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        
        pe = torch.zeros(max_seq_len, 1, d_model)
        
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        
        self.register_buffer('pe', pe)
        
    def forward(self, x: Tensor) -> Tensor:
        """
        Args:
            x: Tensor, shape [batch_size, enc_seq_len, dim_val] or 
               [enc_seq_len, batch_size, dim_val]
        """

        x = x + self.pe[:x.size(self.x_dim)]

        return self.dropout(x)

if __name__ == "__main__":
  batchsize = 64
  max_sequence_length = 200
  embedding_dim = 512
  
  pe = PositionalEncoder(max_seq_len=max_sequence_length, d_model=embedding_dim, batch_first=True)
  x = torch.randn(size=(batchsize, max_sequence_length, embedding_dim))
  pe(x)

kaspergroesludvigsen / influenza_transformer Goto Github PK

influenza_transformer's Introduction

How to code a Transformer model for time series forecasting in PyTorch

PyTorch implementation of Transformer model used in "Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case"

influenza_transformer's People

Contributors

Stargazers

Watchers

Forkers

influenza_transformer's Issues

Define your loss function and optimizer

Number of training epochs

Permute from shape [batch size, seq len, num features] to [seq len, batch size, num features]

Recommend Projects

Recommend Topics

Recommend Org