oliverguhr / transformer-time-series-prediction Goto Github PK
View Code? Open in Web Editor NEWproof of concept for a transformer-based time series prediction model
License: MIT License
proof of concept for a transformer-based time series prediction model
License: MIT License
I reproduced the plot in the readme for the multistep predictor of a sinusoid, but after changing some hyperparams I'm seeing a mismatch between the loss and the predictive power. Below are the losses for a run with default params and a run with another set of hyperparams labelled"best-1lyr" (lr=0.00843, decay factor=.9673, num features=110):
Both converge to a stable result after ~60 epochs, however their predictions are not stable, nor do they match up with predictive power. Below are gifs of the predictor output for default params and my other set of params, respectively:
The run with default parameters appears to jump out of a locally convex region and into another around the 50th epoch. It actually does this twice, and the 100th epoch prediction is the one with higher magnitude noise at the start and end of the prediction. The run with new parameters seems to be remain in a fixed region of the cost surface, however it has consistently much lower predictive power than the run with default parameters, while at the same time achieving a loss below the run with default parameters. Any ideas what issue(s) I might be running in to?
One thing to note is that it appears there is some randomness in training even though the code sets random seeds for torch and numpy. I get different loss curves for multiple runs of the default params, but, oddly, they only diverge after exactly 15 epochs. Also this note that training curves look pretty much the same.
I wonder if it could be possible to add covariates as input
Hi, thanks for providing the code.
In transformer-singlestep.py
, I change model = TransAm().to(device)
to model = TransAm(num_layers=5).to(device)
. When I train the model, I find the model outputs same value (0) for all inputs. I thought that increasing the number of layers can make model more expressive, but it results in worse performance.
Do you meet the same problem before? I am not sure if I need to change the training setting.
(BTW, I also tried suggestion in another issue. I remove self.
in self.encoder_layer
, but still can't make training converge.)
Prediction for epoch 100:
Hello I want to know in this code the prediction is one dimension. But in my problem, the prediction is three-dimensions. I want to know how to modify this code.
I believe that
input = torch.stack([item[0] for item in data]).view((input_window,batch_len,1))
should be changed to
input = torch.stack([item[0] for item in data]).T.unsqueeze(-1)
.
This is because the orders are not correct if the first part is used. (the inputs that go into the transformer is incorrect (i.e. it doesn't go like t=0, t=1, t=2,... )
Thanks very much for your code. However, there are some difference between your code and the tutorial of PyTorch: SEQUENCE-TO-SEQUENCE MODELING WITH NN.TRANSFORMER AND TORCHTEXT in class TransAm
According to https://www.zhihu.com/question/67209417/answer/1264503855, with the addition of self.
to encoder_ Layers
This leads to self. encoder_ Layers
are counted as parameters of module, but only self. Transformer
is used in network operation_ The nlayers copied from encoder are the parameters in nn.transformerencoderlayer
That is to say, self. Encoder_ Layers
do not participate in model operation, so there is no gradient in backward, which leads to training errors.
can I export the output by csv file instead of png image?
The create_inout_sequences function return the label length same as seq length.
Like
But I saw many LSTM works make the data sequence like:
so I changed the create_inout_sequences function to return 100seq and 1 label for each sequence. It raise error
Traceback (most recent call last):
File "E:/Trans.py", line 254, in
train_data, val_data = get_data()
File "E:/Trans.py", line 120, in get_data
train_sequence = create_inout_sequences(train_data, input_window)
File "E:/Trans.py", line 98, in create_inout_sequences
return torch.FloatTensor(inout_seq)
ValueError: expected sequence of length 100 at dim 2 (got 1)
I don't know how to fix it. I wonder why make label length same as seq length? Is it possible to make the sequences comprised of 100seq and 1label?
Thank you very much for the code. I applied the model to load decomposition, but I found that the final output is a straight line. Is it because there is no connection between the input data and the label? In the prediction problem, the input and the label are only different in time step. The input and the label in the decomposition task It is difficult to establish a connection between the values, I want to know how to deal with this situation
In the two examples code shown, it is only for sin data not for temperature data.
Where is the code for temperature dataset?
Thanks
127 seq_len = min(batch_size, len(source) - 1 - i)
128 data = source[i:i+seq_len]
--> 129 input = torch.stack(torch.stack([item[0] for item in data]).chunk(input_window, 1)) # 1 is feature size
130 target = torch.stack(torch.stack([item[1] for item in data]).chunk(input_window, 1))
131 return input, target
IndexError: invalid index of a 0-dim tensor. Use tensor.item()
in Python or tensor.item<T>()
in C++ to convert a 0-dim tensor to a number
which version is your torch? I use the newest version in colab, but occurs an error . please help me.
I have not been able to get the same same results as those presented here. Not even after 1000 epochs.
In the code the number if epochs is 10 and 100 for single and multi respectively. I was wondering if those were the lengths you trained them when posting the results?
I was wondering why the input to the multistep transformer has zeroes of length of the output_window. Is there a reason why we can't do it in the same way as for the single step transformer, that is, instead of [0 1 2 3 4 0 0], we have [0 1 2 3 4] for the input and [2 3 4 5 6] for the labels instead of [0 1 2 3 4 5 6]?
I was reading and debugging the multi-step implementation to understand it better. I've come across an interesting thing, seams like the features and labels in the training and evaluation are the same. This behavior is correct ? I thought that in a multi step prediction problem the input features is delayed in relation to the wanted labels, this way we have a window of past behavior of the data and we are aiming to predict the future behavior of the data.
Hello, for multi-steps predict, how to get the x, label y tensor? Would you like to use the seq 1-100 to demostrcate it?
as the tittle said.
This seems to be a single-variable prediction, which only uses the sequence information of the own variables of the time signal, does not use other features
Thank you for upload the code. I think the input data is history data, which isn't masked. I don't understand
thanks for your great work!
I have tried your single-step script to train and test for a certain time series dataset.
However, I noticed that although the predicted curve is close to the ground truth by mean squared error, it seems always falls behind in predicting turning point of the curve.
For example, if the actual turning point appears at time point 10, my predicted turning point will probably appear at time point 11 or 12, seems unable to predict that turning point at the actual time point.
did you run into the same issue by any chance? do you have any suggestions?
Hi thanks for your work, may I ask what this line is used for? It seems you made the dataset smaller?
train_sequence = train_sequence[:-output_window]
I was wondering if this uses teacher forcing during training? And what terms did you use as the SOS
and EOS
tokens? :)
I have been trying to the get the transformer to work on time series for over a month now, and it seems almost nearly impossible using the nn.Transformer model provided by Pytorch. Did you by any chance get the decoder in the original transformer to work aswell?
Hello, the current situation is input [1 * 100] - > target [1 * 100], but if it is a CSV table, such as:
featuresA featuresB featuresC
2021/1/10 .........
2021/1/11 ........
2021/1/12 ...........
.......
So the input is BATCH * 3 * 100(3 features , 100timestep),how to change your code, thank you!
Hi,
For now PositionalEncoding is only able to map one dimensional to encoder dimension. Please provide a efficient way to map multi-variate feature to the encoder dimension.
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.