maxjcohen / transformer Goto Github PK

View Code? Open in Web Editor NEW

814.0 15.0 163.0 66.73 MB

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.

Home Page: https://timeseriestransformer.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 0.92% Jupyter Notebook 99.08%

transformer timeseries metamodel

transformer's Introduction

Transformers for Time Series

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series (Powered by PyTorch).

Transformer model

Transformer are attention based neural networks designed to solve NLP tasks. Their key features are:

linear complexity in the dimension of the feature vector ;
paralellisation of computing of a sequence, as opposed to sequential computing ;
long term memory, as we can look at any input time sequence step directly.

This repo will focus on their application to times series.

Dataset and application as metamodel

Our use-case is modeling a numerical simulator for building consumption prediction. To this end, we created a dataset by sampling random inputs (building characteristics and usage, weather, ...) and got simulated outputs. We then convert these variables in time series format, and feed it to the transformer.

Adaptations for time series

In order to perform well on time series, a few adjustments had to be made:

The embedding layer is replaced by a generic linear layer ;
Original positional encoding are removed. A "regular" version, better matching the input sequence day/night patterns, can be used instead ;
A window is applied on the attention map to limit backward attention, and focus on short term patterns.

Installation

All required packages can be found in requirements.txt, and expect to be run with python3.7. Note that you may have to install pytorch manually if you are not using pip with a Debian distribution : head on to PyTorch installation page. Here are a few lines to get started with pip and virtualenv:

$ apt-get install python3.7
$ pip3 install --upgrade --user pip virtualenv
$ virtualenv -p python3.7 .env
$ . .env/bin/activate
(.env) $ pip install -r requirements.txt

Usage

Downloading the dataset

The dataset is not included in this repo, and must be downloaded manually. It is comprised of two files, dataset.npz contains all input and outputs value, labels.json is a detailed list of the variables. Please refer to #2 for more information.

Running training script

Using jupyter, run the default training.ipynb notebook. All adjustable parameters can be found in the second cell. Careful with the BATCH_SIZE, as we are using it to parallelize head and time chunk calculations.

Outside usage

The Transformer class can be used out of the box, see the docs for more info.

from tst import Transformer

net = Transformer(d_input, d_model, d_output, q, v, h, N, TIME_CHUNK, pe)

Building the docs

To build the doc:

(.env) $ cd docs && make html

transformer's People

Stargazers

Watchers

Forkers

xunen63 wxyhhh huskylens deep-learning-trader 18085547630 guyue5 ricardokazu wisteria2gp junkyul z237908824 surmount1 wzhang1 tcapelle danielatkrypton ntubiolin bhafsa kengz yunxileo omershect jingmouren relevation-143 kiminh egrigokhan jimmy-inl jainds anjanasivaprasad anjana-sivaprasad bask0 hamid701 ggaaooppeenngg chendhui rickysutopo emigre459 muleina davgr686 tianhao-peng transconnectome kryptonite0 tanmdl mischad shamoons michiru123 kevinmichaelschindler justbeat99 1895-art fschoeller heyangzhou1997 mike-halpin abrar09350 chendingliang nepgearg dezhao-huang sunnyqiny willett-group youjp imhithanks jayagami mfzhang chenefei1003 mjimitater t170815518 jonmart0304 bigchan22 2021-paper-fun dannychua luyanfcp susululu cmajorsolo renxb yufanmiao yujiandiao afters-cool mohsensharifi1991 pengfight paulmorio jianjielin tbs17 ricky1203 ccijunk yschoi-github v-smwang liaoyuhua jdwillard19 socar-kayla xdtcssdi shiyanzh chy0315 zhudongwork wangyitian123 leesang100 ykwei1127 hello-starry vivianyiny choodly guanliulong rookiedata1 qhuni sunmingyang1987 qxdxx lyzl2010

transformer's Issues

Input sequence with padding

Thanks for sharing this project! I have a question regarding inputs with padding.

In my dataset, the sequences have different length, so I had to pad / truncate them to the same length K. I use zero vectors for the padded positions, which means in some input sequences (K, d_input), some vectors (of size d_input) on the left end are zero vectors.

Ideally, I'd hope that those padded positions won't affect the loss, or get involved in the computation of subsequent non-padded positions. After looking at your code it seems the current implementation can not guarantee this. If we add position embedding to the zero vectors, they will have non-zero values, and therefore affect loss and computation of subsequent positions.

Please correct me if I am wrong. I am not quite familiar with PyTorch. Is there any easy fix for masking those padded positions?

Using own dataset for time-series forecasting

Hi,
Firstly, thanks for such an interesting repo. I am fairly new to using transformers for Time Series Forecasting, so apologies if my questions are basic. After having gone through the Docs, I have a few queries:-

How can we use the Transformer model here for forecasting on our own datasets? Suppose, I have a Pandas dataframe with 20000 rows and 34 columns, wherein each column has a measurement parameter. For example:-

DateTime       Parameter1    Parameter2........... Parameter34
2020-01-01.     1.2.                   2.3                          6.3
2020-01-02.    4.4.                  1.4                           1.5
2020-01-03.    1.3                    5.6                           5.5
etc.

For now, it seems I can pass my own dataframe directly to dataLoader Class instead of the ozeDataset. But I am not sure if this is the correct way?

From the docs, I understand that d_input and d_output are related to the dimensions of the dataset. I believe if we are predicting 34 parameters, the d_output would be 34, but how is d_input decided please?

For making predictions using
predictions = np.empty(shape=(len(dataloader.dataset),k1 ,k2))
Is k1 and k2 related to the intervals of prediction windows? For example, if we want to predict 34 parameters, how to decide k1 and k2.

Thanks a lot. I am sure this Repo will be very useful for many others, but would be great if you could address these basic questions.

x + residual size mismatch

Hi, thanks for making this implementation available. I am following the tutorial but I am encountering a size mismatch error when I call net(inputs) on my timeseries data. My input is 1 x K x d_input but the output of the self-attention layer appears to be truncated to K-5 and thus cannot be added to the residual.

     86         x = self._selfAttention(query=x, key=x, value=x)
     87         x = self._dopout(x)
---> 88         x = self._layerNorm1(x + residual)
     89 
     90         # Feed forward

RuntimeError: The size of tensor a (3864) must match the size of tensor b (3869) at non-singleton dimension 1

I am using the same parameters as the training.ipynb

d_model = 100 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 4 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 24 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = "window"

NameError: chunk_mode "True" not understood. Must be one of chunk, window or None.

Hello again, I just noticed you might want to change the default value of chunk_mode (currently it is set to boolean True).

from tst import transformer
from tst.transformer import Transformer
model = Transformer(d_input=1, d_model=4, d_output=1, q=8, v=8, h=8, N=2)
Traceback (most recent call last):
  File "/home/giulia/projects/transformer/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-8d6b13c1d865>", line 1, in <module>
    model = Transformer(d_input=1, d_model=4, d_output=1, q=8, v=8, h=8, N=2)
  File "/home/giulia/projects/transformer/tst/transformer.py", line 76, in __init__
    chunk_mode=chunk_mode) for _ in range(N)])
  File "/home/giulia/projects/transformer/tst/transformer.py", line 76, in <listcomp>
    chunk_mode=chunk_mode) for _ in range(N)])
  File "/home/giulia/projects/transformer/tst/encoder.py", line 59, in __init__
    f'chunk_mode "{chunk_mode}" not understood. Must be one of {", ".join(chunk_mode_modules.keys())} or None.')
NameError: chunk_mode "True" not understood. Must be one of chunk, window or None.

Getting training.ipynb to run

Hi,
first of all thank you very much for making your code available!

I am having problems getting the code to work with the Oze Challenge dataset. I use
npz_check(Path('datasets'), 'dataset')
and it succeeds in producing datasets/dataset.npz only if I use the file ozechallenge_benchmark/labels.json. If I use the file transformer/labels.json I get this error:
KeyError: "['initial_temperature'] not in index"

I go on and use datasets/dataset.npz produced with the notebook transformer/training.ipynb with
DATASET_PATH = 'datasets/dataset.npz'
d_input = 27
d_output = 8
(I did not change d_input and d_output values from the values found in your repo)

I added two prints in the class OzeDataset, so that when I run the cell
ozeDataset = OzeDataset(DATASET_PATH)
I obtain these shapes for the _x and the _y tensors:
_x.shape = torch.Size([7500, 18, 691])
_y.shape = torch.Size([7500, 8, 672])

I had to change the line
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (23000, 1000, 1000))
into
dataset_train, dataset_val, dataset_test = random_split(ozeDataset, (5500, 1000, 1000))
to not have an error from random_split

but when I ran the training cell, I get the following error:
RuntimeError: size mismatch, m1: [144 x 691], m2: [27 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41
(see entire error at the end of this message)

What am I doing wrong? I also tried with
d_input = 37
as you suggested in an other post but I have the same error
RuntimeError: size mismatch, m1: [144 x 691], m2: [37 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41

At the end I am not particularly interested on the Oze dataset, I would just like to be able to run your code to understand what are the input dimensions that it needs to make sure my input is of the same dimensions. So if it is easier for you, it would be sufficient for me to feed to your model some tensors filled with random values and be able to make training.ipynb run.

Thank you for your help!
Camilla

RuntimeError Traceback (most recent call last)
in
12
13 # Propagate input
---> 14 netout = net(x.to(device))
15
16 # Comupte loss

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/WORK/BVP/Transformers/transformer/tst/transformer.py in forward(self, x)
119
120 # Embeddin module
--> 121 encoding = self._embedding(x)
122
123 # Add position encoding

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py in forward(self, input)
89
90 def forward(self, input: Tensor) -> Tensor:
---> 91 return F.linear(input, self.weight, self.bias)
92
93 def extra_repr(self) -> str:

/opt/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py in linear(input, weight, bias)
1674 ret = torch.addmm(bias, input, weight.t())
1675 else:
-> 1676 output = input.matmul(weight.t())
1677 if bias is not None:
1678 output += bias

RuntimeError: size mismatch, m1: [144 x 691], m2: [37 x 64] at /tmp/pip-req-build-as628lz5/aten/src/TH/generic/THTensorMath.cpp:41

General Questions for implementation.

For your implementation, what were the dimensions of your input and output? Or in other words, can I input a mutlivariate time series input into this model?
And if i do, should i expect a multivariate output. ie input dimensions [4,10] and get output dimensions [3,10]?
If the inputs where multivariate, I assume that you used a neural network to embed that "information" into some kind of space analogous to word space embeddings?
In the documentation, you could say more about what type of positional embedding you did use? You mention "A "regular" version", could you provide more detail?
"A window is applied on the attention map to limit backward attention, and focus on short term patterns." Since encoders and decoders can have different mask, where specifically did you apply the window?

dataset

Where is the download address of the dataset? thank you

Some questions about the prediction

First of all, thanks for your wonderful work. When I train my data with this model, the following problems arise:

The loss during training is accepted, but the difference between the predicted results and the true results is huge. The predicted results do not change over time.
In fact, no matter which model is used, the predicted results do not change over time.
For my datasets, the samples are 2000, the time sequence is 125, there are 11variables in X and one variable in Y.
Attached is my data.

data.zip
dataset_22_125.zip
labels.zip

init.py in src

Hi,

I tried installing the library in google colab and was unable to run the training.py code.

Got the following error:
ModuleNotFoundError: No module named 'src'

As per the issue as discussed in
https://stackoverflow.com/questions/8953844/import-module-from-subfolder
the reason for the above error could be the missing init.py file in src directory.

Thanks.

Accuracy vs LSTM

Have you compared the results of Transformer vs LSTM in time series prediction?

question about decoder input

Hi,

Thanks for this excellent implementation!

I have been playing with this model, and am wondering about a small detail regarding the decoder input. In here, you use the encoding output as the decoder input (which also serves as memory for the decoder layer here), instead of using the output target as the decoder input (as is given by trg in this post for the translation task). Do you have idea why we do not use output target as decoder inputs? I assume that with proper masking, future information would not be included in the prediction, but am not sure if I miss anything. Thank you!

Can this be setup as a PyPI module?

I'd love to be able to pip install it

the transformer to be applied to classification

How should I change the transformer to be applied to classification, such as seq2seq (many to many), how should I change it in the last layer of the model

How do I set d_model, q, v, h, N, dropout, attention_size value?

Hi, I have a question. If I want to use the transformer part to deal with my image datasets, the datasets from high to 256 pixels and width to 128 pixels images, images in each group have 3000 pieces, how do I set d_model, q, v, h, N, dropout, attention_size value?

Performance Evaluation

I think it is worthwhile to evaluate the performance of the transformer. I have an indication that it is performing slower when compared to LSTM as of now.

We’re improving the state of scalable GPU computing in Python.

-- Matthew Rocklin

This post covers Python, performance, and GPUs. It lays out the current status, and describes future work.

It might be worth to evaluate performance boost with these techniques.

RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask'

thanks for your code.
when running the code 'training.ipynb' ,I got an RuntimeError as follow:
[Epoch 1/30]: 0%| | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):

File "C:\Anaconda3\envs\pytorch\transformer-master\training.py", line 111, in
netout = net(x.to(device))

File "C:\Anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)

File "C:\Anaconda3\envs\pytorch\transformer-master\tst\transformer.py", line 131, in forward
encoding = layer(encoding)

File "C:\Anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)

File "C:\Anaconda3\envs\pytorch\transformer-master\tst\encoder.py", line 86, in forward
x = self._selfAttention(query=x, key=x, value=x)

File "C:\Anaconda3\envs\py37\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)

File "C:\Anaconda3\envs\pytorch\transformer-master\tst\multiHeadAttention.py", line 91, in forward
self._scores = self._scores.masked_fill(attention_mask, float('-inf'))

RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask'

can you tell me how to fix it?

Citation Bibtex

Hi @maxjcohen, great application, thank you!

Im wondering, what citation (preferably bibtex) can I use in order to cite your work?

Chunk, window, attention_size

Hello, nice work! Since this information is not currently included in the documentation, I was wondering what is exactly the role of chunks, how are they different from windows, and what does it mean when they are used together with an attention_size? Could you give some examples when one would want to use chunks vs windows vs attention_size?

About params settings when using dataset_CAPT_v7.npz

hi ,when i set the params like

Model parameters

d_model = 64 # Lattent dim
q = 8 # Query size
v = 8 # Value size
h = 8 # Number of heads
N = 4 # Number of encoder and decoder to stack
attention_size = 12 # Attention window size
dropout = 0.2 # Dropout rate
pe = None # Positional encoding
chunk_mode = None

d_input = 38 # From dataset
d_output = 8 # From dataset

the error comes at RuntimeError: size mismatch, m1: [144 x 691], m2: [38 x 64] at /Users/distiller/project/conda/conda-bld/pytorch_1587428061935/work/aten/src/TH/generic/THTensorMath.cpp:4
then I change the d_input and d_output to
d_input = 691# From dataset
d_output = 672 # From dataset
to fit the error

then comes the error
RuntimeError: The size of tensor a (18) must match the size of tensor b (8) at non-singleton dimension 1
so I have to add a linear in transformer to fit the problem
can u help me? thanks

Possibility for time series anomaly detection?

Thank you for your outstanding work. I would like to know whether the Transformer can be applied to unsupervised time-series anomaly detection tasks, similar to Seq2Seq, Autoencoder model based on reconstruction. If possible, Where does this code need to be changed?
In addition, there is one point in your code that I don’t understand, why the decoder's inputs is the encoder's outputs, it should only be used as memory , Looking forward to your reply!

runtimeerror

Traceback (most recent call last):
File "D:/transformer-master/transformer-master/training.py", line 81, in
dataloader_val, epochs=EPOCHS, pbar=pbar, device=device)
File "D:\transformer-master\transformer-master\src\utils\search.py", line 20, in fit
netout = net(x.to(device))
File "C:\anaconda\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "D:\transformer-master\transformer-master\tst\transformer.py", line 129, in forward
encoding = layer(encoding)
File "C:\anaconda\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "D:\transformer-master\transformer-master\tst\encoder.py", line 86, in forward
x = self._selfAttention(query=x, key=x, value=x)
File "C:\anaconda\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "D:\transformer-master\transformer-master\tst\multiHeadAttention.py", line 97, in forward
self._scores = self._scores.masked_fill(attention_mask, float('-inf'))
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'mask'

Hello, thanks for your great works, I'm confused with the dataset.

Hello sir, i'm confused with the dataset, can share the dataset_57M.npz or other demo dataset.
I just don't know the dataset's structure.

Can time series A be used to predict time series B？

I have read the docs of this github project. In some notebook, I found that time series A was used to predict time series B. I want to know that the Transformer proposed in this project could use time series A to predict time series B.

What's the need for a different positional encoding?

transformer/tst/utils.py

Line 32 in 1ac4b34

    
           def generate_regular_PE(length: int, d_model: int, period: Optional[int] = 24) -> torch.Tensor:

Looks like this one is just sin?

Question about input of the decoder

tst/transformer.py Line 138

#Decoding stack
decoding = encoding
#Add position encoding
if self._generate_PE is not None:
positional_encoding = self._generate_PE(K, self.d_model)
positional_encoding = positional_encoding.to(decoding.device)
decoding.add(positional_encoding)

for layer in self.layers_decoding:
decoding = layer(decoding, encoding)

Why are both input parameters of the decoder the output of the encoder? Shouldn't one of the inputs be a future sequence?

can you explain more on dimension arguments to transformer class ?

d_model here is output encoding dimension from the positional encoding. What is d_input and d_output ? shouldn't the value be 1 for both variables ?

d_input : Model input dimension.
d_model : Dimension of the input vector.
d_output: Model output dimension.

Can you explain this in relation to original paper ? (the application here to time-series )

The training loss is a constant from beginning

Hi.max.Thank you for the nice project!
I have a problem when i run the model with my data,I changed the data inputs and outputs parameters according to my data set, but when i trained the model, the training loss was a constant from the beginning, and the val loss also was a constant.I have reduced learning rate，but it didn't work.
The test loss was very huge and accuracy is so poor.
[Epoch 1/15]: 100%|██████████| 10000/10000 [00:32<00:00, 306.73it/s, loss=80.6, val_loss=80.5]
[Epoch 2/15]: 100%|██████████| 10000/10000 [00:33<00:00, 299.26it/s, loss=80.5, val_loss=80.5]
[Epoch 3/15]: 100%|██████████| 10000/10000 [00:33<00:00, 299.21it/s, loss=80.5, val_loss=80.5]
......

By the way,my data is Dataframe type,and i used DataLoader to organize my data,inputs size is 7,outputs size is 1.And i used MSELoss(when i used OZELoss, the training loss was Nan)

So do you know where the mistake is？Looking forward to your reply.

RuntimeError: The size of tensor a (896) must match the size of tensor b (14) at non-singleton dimension 0

Hello, I'm working on time series classification with transformer. I divide it into 10 classes with 14 features, 1 label (categoric-> LabelEnceder) d_input = 14, d_outpu = 10, d_model? Window_size = 16, but I am getting such an error. What should I do, why does the dimension of y appear one dimension?

[Epoch 1/2]: 0%| | 0/31514 [00:00<?, ?it/s]torch.Size([31514, 16, 14])
torch.Size([31514])
torch.Size([6092, 16, 14])
torch.Size([6092])
31514 6092
Running on the GPU
Using device cuda:0
torch.Size([64, 16, 14])
torch.Size([64])
[Epoch 1/2]: 0%| | 0/31514 [00:00<?, ?it/s]torch.Size([896, 16, 14])
torch.Size([896, 16, 14])

error------

RuntimeError Traceback (most recent call last)
in ()
149 print(y.shape)
150 optimizer.zero_grad()
--> 151 netout = net(x)
152 loss = loss_function(netout, y)
153 loss.backward()

5 frames
/content/multiHeadAttention.py in forward(self, query, key, value, mask)
92 print(queries)
93 print(keys)
---> 94 self._scores = ([email protected]) / np.sqrt(K)
95
96 # Compute local map mask

RuntimeError: The size of tensor a (896) must match the size of tensor b (14) at non-singleton dimension 0

Here is code:


def create_datasetX(dataset,look_back):
  dataX=[]
  row=0
  while(row+look_back) < len(dataset):
    dataX.append(dataset[row:(row+look_back)])
    row=row+3
  return np.array(dataX)

def create_datasetY(dataset,look_back):
  dataY=[]
  col=0
  while(col+look_back) < len(dataset):
    dataY.append(dataset[(col+look_back)])
    col=col+3
  return np.array(dataY)  



#Accuracy : 0.110
from numpy import vstack,argmax
from pandas import read_csv
import torch
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from torch import Tensor
from torch.utils.data import Dataset,DataLoader,random_split
from torch.nn import *
import pandas as pd 
import torch.nn as nn
import torch.optim as optim
from loss import OZELoss
from transformer import Transformer
import seaborn as sns
from tqdm import tqdm
import datetime
from utils_ import compute_loss
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from scipy.stats import zscore
#from plot_functions import map_plot_function, plot_values_distribution, plot_error_distribution, plot_errors_threshold, plot_visual_sampl
class CSVDataset(Dataset):
    def __init__(self):
        KINEMATICS_USECOLS = [c-1 for c in [39, 40, 41, 51, 52, 53, 57, 58, 59, 60, 70, 71, 72, 76,77]]
        trainX = []
        trainY = []
        filenamesTrainX = ['C001.txt','C002.txt','C003.txt','C004.txt','C005.txt',
                          'D001.txt','D002.txt','D003.txt','D004.txt','D005.txt',
                          'E001.txt','E002.txt','E003.txt','E004.txt','E005.txt',
                          'F001.txt','F002.txt','F003.txt','F004.txt','F005.txt',       
                          'G001.txt','G002.txt','G003.txt','G004.txt','G005.txt',
                          
                           'I001.txt','I002.txt','I003.txt','I004.txt','I005.txt'
                          ]

        for fname in filenamesTrainX:
            trainXdata = pd.read_csv(fname,sep=',',usecols=KINEMATICS_USECOLS)
            self.X, self.y = trainXdata.values[:, :-1], trainXdata.values[:, -1]
            self.X = self.X.astype(np.float)          
            mean = np.mean(self.X, axis=(0, 1))
            std = np.std(self.X, axis=(0, 1))
            self.X = (self.X - mean) / (std + np.finfo(float).eps)
            self.X=self.X.astype(np.float32)
            #M = np.max(self.X, axis=(0, 1))
            #m = np.min(self.X, axis=(0, 1))
            self.X, self.y = self.X.astype('float32'), LabelEncoder().fit_transform(self.y)
            self.X = create_datasetX(self.X, look_back) 
            self.y = create_datasetY(self.y, look_back) 
            trainX.extend(self.X)
            trainY.extend(self.y)
        trainX=np.array(trainX)
        trainY=np.array(trainY)
        self.X = trainX
        self.y = trainY
        self.X= torch.Tensor(self.X)
        self.y= torch.Tensor(self.y)
        print(self.X.shape)
        print(self.y.shape)
    def __len__(self):
        return len(self.X)
    def __getitem__(self, idx):
    #  if torch.is_tensor(idx):
     #  idx = idx.tolist()
      return [self.X[idx], self.y[idx]]
class TestDataset(Dataset):
    def __init__(self):
        KINEMATICS_USECOLS = [c-1 for c in [39, 40, 41, 51, 52, 53, 57, 58, 59, 60, 70, 71, 72, 76,77]]
        from sklearn.preprocessing import MinMaxScaler
        from scipy.stats import zscore
        trainX = []
        trainY = []
        filenamesTrainX = ['B001.txt','B002.txt','B003.txt','B004.txt','B005.txt']
        for fname in filenamesTrainX:
            trainXdata = pd.read_csv(fname,sep=',',usecols=KINEMATICS_USECOLS)
            self.X, self.y = trainXdata.values[:, :-1], trainXdata.values[:, -1]
            self.X = self.X.astype(np.float)
            mean = np.mean(self.X, axis=(0, 1))
            std = np.std(self.X, axis=(0, 1))
            self.X = (self.X - mean) / (std + np.finfo(float).eps)
            #M = np.max(self.X, axis=(0, 1))
            #m = np.min(self.X, axis=(0, 1))
            #self.X = (self.X - m) / (M - m + np.finfo(float).eps)
            look_back = 16
            self.X, self.y = self.X.astype('float32'), LabelEncoder().fit_transform(self.y)
            self.X=self.X.astype(np.float32)
            self.X = create_datasetX(self.X, look_back) 
            self.y = create_datasetY(self.y, look_back) 
            trainX.extend(self.X)
            trainY.extend(self.y)
        trainX=np.array(trainX)
        trainY=np.array(trainY)
        self.X = trainX
        self.y = trainY
        self.X= torch.Tensor(self.X)
        self.y= torch.Tensor(self.y)
        print(self.X.shape)
        print(self.y.shape)
    def __len__(self):
        return len(self.X)
    def __getitem__(self, idx):
      if torch.is_tensor(idx):
        idx = idx.tolist()
      return [self.X[idx], self.y[idx]]
def prepare_data():
    dataset = CSVDataset()
    train_dl  = DataLoader(dataset, batch_size=BATCH_SIZE,shuffle=True, num_workers=NUM_WORKERS,pin_memory=False)
    return train_dl
BATCH_SIZE =64
NUM_WORKERS= 0
LR =0.01
EPOCHS=2
d_model =16
q=14
v=14
h = 14 
N = 7
attention_size = None
dropout = 0.5
pe = None
chunk_mode = None 
d_input = 14
d_output= 10
look_back=16
train_dl = prepare_data()
dataset_test =TestDataset()
test_dl = DataLoader(dataset_test, batch_size=BATCH_SIZE,shuffle=False, num_workers=NUM_WORKERS)
print(len(train_dl.dataset),  len(test_dl.dataset))
sns.set()
if torch.cuda.is_available():
    device = torch.device("cuda:0")  # you can continue going on here, like cuda:1 cuda:2....etc. 
    print("Running on the GPU")

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")

net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout, chunk_mode=chunk_mode, pe=pe)
optimizer = optim.Adam(net.parameters(), lr=LR)
#optimizer = optim.SGD(net.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
#optimizer = optim.SGD(net.parameters(), lr=LR, momentum=0.9)
loss_function = nn.CrossEntropyLoss() 
model_save_path = f'models/model_{datetime.datetime.now().strftime("%Y_%m_%d__%H%M%S")}.pth'
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(train_dl.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
      for idx_batch, (x, y) in enumerate(train_dl):
        print(x.shape)
        print(y.shape)
        optimizer.zero_grad()
        netout = net(x)
        loss = loss_function(netout, y)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        pbar.set_postfix({'loss': running_loss/(idx_batch+1)})
        pbar.update(x.shape[0])

# evaluate the model
def evaluate_model(test_dl, model):
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
        yhat = model(inputs)
        yhat = yhat.detach().numpy()
        actual = targets.numpy()
        yhat = argmax(yhat, axis=1)
        actual = actual.reshape((len(actual), 1))
        yhat = yhat.reshape((len(yhat), 1))
        predictions.append(yhat)
        actuals.append(actual)
    predictions, actuals = vstack(predictions), vstack(actuals)
    print(predictions)
    print(actuals)
    acc = accuracy_score(actuals, predictions)
    return acc

acc = evaluate_model(test_dl, net)
print('Accuracy: %.3f' % acc)

transformer (forward pass)

Parameters: | x (Tensor) – torch.Tensor of shape (batch_size, K, d_input).

What is K?

Regular PE

How different is regular_PE from the original one? is it explained anywhere in Docs or paper?

Is this implementation autoregressive for a seq2seq task?

I can't quite see where to pass inputs to the decoder. It seems to just copy the encoder inputs?

When should I use Attention window size?

Hi, when should I use Attention window size, and what it is used for? I recently try to apply the transformer to the load forecasting, should I use the original Multiheadattention?

RuntimeError: Sizes of tensors must match except in dimension 1. Got 249 and 250 (The offending index is 0)

I have:

reconstruct_spect_model = Transformer(d_input=128, d_output=128, d_model=1024 N=2, q=8, v=8, h=4).to(DEVICE)
pred = reconstruct_spect_model(x)

where x has size of (4, 499, 128). 4 items in the batch, 499 is the sequence length and 128 features.

However, I get the error:

  File "/home/shamoon/.local/share/virtualenvs/speech-reconstruction-7HMT9fTW/lib/python3.8/site-packages/tst/multiHeadAttention.py", line 206, in forward
    queries = torch.cat(torch.cat(self._W_q(query).chunk(self._h, dim=-1), dim=0).chunk(n_chunk, dim=1), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 1. Got 249 and 250 (The offending index is 0)

Why the sigmoid in the transformer?

I don't understand why there's a sigmoid in

transformer/tst/transformer.py

Line 152 in 2ebed9c

output = torch.sigmoid(output)

As far as I understand, the sigmoid maps an output to the range [0,1] which is useful in binary classification problems. I don't get how it works here though.

Where can I download the dataset?

Hello, thanks for the wonderful work!

Can you give more details about the dataset? And where can I download the dataset?

Thank you!

A more detailed Readme.

Hello,
I am also interested on Transformers applied to TimeSeries, but I am having a hard time understanding the repo.
I would suggest adding more info to the Readme to show the capabilities of the library.

Is it a forecasting model? multistep ahead, single step?
Is it a classification model (as InceptionTime), can be adapted to do this?
Is it a regression model?

I would also be nice to have a small graph like the ones on the training notebooks on the Readme, to show the potential.
Great work btw, and I am very interested to collaborate.
We run a Time Series Study group on the fast.ai forums:

Here for V1 of the library (soon outdated)
Here for the updated V2 fastai library.

Label time range

If the input is 8 points between t-7 and t, should the corresponding labels be for times t+1 to t+8?

What does the output time window look like?

Understand the dataset dimension

I am using the npz_check function to generate the npz file. Before it dumps the data to npz I printed out the dimension of R, Z and X. They are R: (7500, 19) X: (7500, 8, 672) Z: (7500, 18, 672). There are 7500 rows and 672 entry for one time series, as described by the challenge. 19, 8 and 18 are the number of labels for R, Z and X defined in labels.JSON. But I am wondering why R is not defined with 672 entries and is there any particular reason to define it like this?

Npz_check function and these variables are calculated in this file https://github.com/maxjcohen/ozechallenge_benchmark/blob/master/src/utils.py#L218

Where to download the dataset?

Hi sir, can you tell me where to download the dataset? thanks!

Link to datasets?

Please can you provide links to datasets/dataset_57M.npz and labels.json

The output of my Transformer is the same for all time steps

My model is:


from tst import Transformer
import torch.nn as nn


class AudioEncoder(nn.Module):
    def __init__(self, d_inp, d_out):
        super(AudioEncoder, self).__init__()
        self.encoder1 = nn.Linear(d_inp, d_out)
        self.encoder2 = nn.Linear(d_out, d_out)

    def forward(self, inp):
        out = self.encoder1(inp)
        out = nn.ReLU()(out)

        out = self.encoder2(out)
        out = nn.ReLU()(out)

        return out


class AudioDecoder(nn.Module):
    def __init__(self, d_inp, d_out):
        super(AudioDecoder, self).__init__()
        self.decoder1 = nn.Linear(d_inp, d_out)
        self.decoder2 = nn.Linear(d_out, d_out)

    def forward(self, inp):
        out = self.decoder1(inp)
        out = nn.ReLU()(out)

        out = self.decoder2(out)
        out = nn.Tanh()(out)

        return out


class AudioReconstructor(nn.Module):

    def __init__(self, d_input, d_output, d_model, N, q, v, h, chunk_mode, pe, dim_embedding):
        super(AudioReconstructor, self).__init__()

        self.transformer = Transformer(d_input=dim_embedding, d_output=dim_embedding, d_model=d_model, N=N, q=q, v=v, h=h, chunk_mode=chunk_mode, pe=pe, pe_period=800)

        self.audio_encoder = AudioEncoder(d_inp=d_input, d_out=dim_embedding)

        self.audio_decoder = AudioDecoder(d_inp=dim_embedding, d_out=d_output)


    def forward(self, src):
        out = self.audio_encoder(src)

        # Second, do the transformer operation
        out = self.transformer(out)
        out = nn.ReLU()(out)

        out = self.audio_decoder(out)

        return out

If I print the values after self.transformer(out), I get the same values for each time step. Any ideas why that might be?

IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2) when doing keys.transpose

I run the code using the examples but I get this error on net(x), when model is doing keys.transpose
in multiHeadAttention.py about line 91 in

# Scaled Dot Product
        self._scores = torch.bmm(queries, keys.transpose(1, 2)) / np.sqrt(K)

the error is
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
actually, keys must be in shape (batch_size, K, d_model), but it is not, so it has 2 dim instead of 3, so there is this error.

but why keys tensor is not (batch_size, K, d_model)?

code to produce the error:

import numpy as np
from matplotlib import pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
import seaborn as sns;

sns.set()
from tst import Transformer


BATCH_SIZE = 4
NUM_WORKERS = 0
LR = 1e-2
EPOCHS = 2
attention_size = 24  # Attention window size
dropout = 0.2  # Dropout rate
pe = None  # Positional encoding
chunk_mode = None

K = 300  # Time window length
d_model = 48  # Lattent dim
q = 8  # Query size
v = 8  # Value size
h = 4  # Number of heads
N = 4  # Number of encoder and decoder to stack
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}")

d_input = 11  # From dataset
d_output = 1  # From dataset
from excel_to_oze import getdata
data = np.random.random_sample((300,12))
# data = getdata()
data = data.astype(np.float32)
# data = torch.from_numpy(data,device=device)
data = torch.tensor(data, device=device)
print(data.shape)
dataloader = DataLoader(data,
                        batch_size=BATCH_SIZE,
                        shuffle=True,
                        num_workers=NUM_WORKERS
                        )

# Load transformer with Adam optimizer and MSE loss function
net = Transformer(d_input, d_model, d_output, q, v, h, N, attention_size=attention_size, dropout=dropout,
                  chunk_mode=chunk_mode, pe=pe).to(device)

optimizer = optim.Adam(net.parameters(), lr=LR)
loss_function = nn.MSELoss()

# Prepare loss history
hist_loss = np.zeros(EPOCHS)
for idx_epoch in range(EPOCHS):
    running_loss = 0
    with tqdm(total=len(dataloader.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
        for idx_batch, batch in enumerate(dataloader):
            optimizer.zero_grad()
            x = batch[:, :-1]
            y = batch[:, -1]
            print(x)
            # print(y)
            # Propagate input
            netout = net(x)

            # Comupte loss
            loss = loss_function(netout, y)

            # Backpropage loss
            loss.backward()

            # Update weights
            optimizer.step()

            running_loss += loss.item()
            pbar.set_postfix({'loss': running_loss / (idx_batch + 1)})
            pbar.update(BATCH_SIZE)

    hist_loss[idx_epoch] = running_loss / len(dataloader)
plt.plot(hist_loss, 'o-')
print(f"Loss: {float(hist_loss[-1]):5f}")

version 3
python 3.7.6
torch '1.5.0+cu101'
windows 10

Issue while training the model

First, I would like to thank you for creating and publishing this repo.
Also, thank you for taking the time to answer poeple's questions
I was trying to run the train part and I had this problem
RuntimeError: CUDA out of memory. Tried to allocate 1.72 GiB (GPU 0; 2.00 GiB total capacity; 109.01 MiB already allocated; 1.14 GiB free; 112.00 MiB reserved in total by PyTorch)
Is there another way for me to train the model ?
Thank you so much for you attention

Runtime error: mat1 dim 1 must match mat2 dim 0

Hey,
Thanks for providing code for the prediction of time series using Transformer. I am using Oze energy challenge dataset with 7500 samples. I am sticking at following lines:

Propagate input

        netout = net(x.to(device))

        # Comupte loss
        loss = loss_function(y.to(device), netout)

probably I am not able to set d_input,d_model and d_output parameters correctly. please help me to run this code on said dataset.

how to apply this transformer model to a long univariate time series data

Dear all,
First of all, thank you Max for sharing this amazing transformer implementation with us!
I am able to successfully run the training.ipynb by following the #34 solutions using the x_train_LsAZgHU.csv and y_train_EFo1WyE.csv datasets. I have read the data descriptions on the Ozechallege_benchmark,
for the input datasets(x_train_LsAZgHU.csv), there are 7500 rows and 12116 columns
for the output datasets (y_train_EFo1WyE.csv), there are 7500 rows and 5377 columns
each time series is 28 * 24 = 672

My question is If I have a long single-variable time series data, for example, a NumPy array length of [1,1000000], how to feed this long univariate time-series data to this transformer model? I would like to precisely predict the trend of this time series data.

how should I prepare the initial npz format dataset?
how to set the d_input, d_output, and the batch size?

Get Error/Applying Univariate Time Series Dataset

Hi, I am trying to use a univariate time series dataset. I got this error:

KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 521170

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17036/184118730.py in
9 running_loss = 0
10 with tqdm(total=len(dataloader_train.dataset), desc=f"[Epoch {idx_epoch+1:3d}/{EPOCHS}]") as pbar:
---> 11 for idx_batch, (x, y) in enumerate(dataloader_train):
12 try:
13 print(f'idx_batch {dataloader_train} is {enumerate[dataloader_train]}')

~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in next(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \

~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
559 def _next_data(self):
560 index = self._next_index() # may raise StopIteration
--> 561 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
562 if self._pin_memory:
563 data = _utils.pin_memory.pin_memory(data)

~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in fetch(self, possibly_batched_index)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in (.0)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

~\anaconda3\lib\site-packages\torch\utils\data\dataset.py in getitem(self, idx)
309
310 def getitem(self, idx):
--> 311 return self.dataset[self.indices[idx]]
312
313 def len(self):

~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
3022 if self.columns.nlevels > 1:
3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)
3025 if is_integer(indexer):
3026 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:

KeyError: 521170

I'd appreciate it if you let me know if your code is suitable for the univariate time series. And how to solve this error?
I used the code in this link:
https://github.com/maxjcohen/transformer

Thanks

self._encoderDecoderAttention defined but not used

Hi Max, thank you for sharing this amazing repo. I have a question regarding the architecture. In the Decoder module (https://github.com/maxjcohen/transformer/blob/master/tst/decoder.py) you define a self._encoderDecoderAttention (L62) that you do not seem to use in the forward method. Instead, in the forward method both the self-attention and the encoder-decoder attention use the same layer self._selfAttention. Is this a bug maybe? I compared the architecture with this https://nlp.seas.harvard.edu/2018/04/03/attention.html and I would guess that the encoder-decoder attention should be done using self._encoderDecoderAttention. What do you think? Can I also please ask you to tell me if you wrote the architecture yourself from scratch? Thanks in advance! :)

Do you plan a tensorflow version?

input problem.

Hi,here I come again.
The decoderlayer of original transformer have the input both of encoderlayer and y. But in your transformer, I only find that the decoderlayer have the input of encoderlayer, why?

prediction problems

hi，I have read something about #5, but I am still confused about the applicatility of prediction problems. Now, I am working to apply the transformer to the load forecasting. I want to use the load value of the first 168 moments to predict the load value of the next 24 moments, so I set the input_shape(x) as [batch_size, 168, 1],which is also the input of the encoderlayer, also, I set the target_shape(y) as [batch_size, 24, 1], which is also part of the input of the decoderlayer. Obviously, this is not work in your code, because the K is mismatch(168!=24). The output_shape is still [batch_size, 168, 1], instead of [batch_size, 24, 1], which is I want. Then, I want to know whether the original transformer or your transformer can apply to my problem?
What's more, why the K is the same in the encoderlayer and decoderlayer in your transformer? I have searched other code, where the sequence_length(K) may can be different in the encoderlayer and decoderlayer?
Thank you very much!

maxjcohen / transformer Goto Github PK

transformer's Introduction

Transformers for Time Series

Transformer model

Dataset and application as metamodel

Adaptations for time series

Installation

Usage

Downloading the dataset

Running training script

Outside usage

Building the docs

transformer's People

Stargazers

Watchers

Forkers

transformer's Issues

Model parameters

but why keys tensor is not (batch_size, K, d_model)?

Propagate input

Recommend Projects

Recommend Topics

Recommend Org