yuqinie98 / patchtst Goto Github PK

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730

License: Apache License 2.0

Python 88.94% Shell 11.06%

patchtst's People

Contributors

Stargazers

Watchers

Forkers

statmixedml zhang-xiaoxue barseghyanartur jingmouren lyapunovstability pskadasi wenzhaojie lulu-cloud chenyp79 valeman shadowxzt g0bel1n jwong8314 saupatil07 jason-d-jones tianjiu233 thesujitroy miaoxinyu-333 lzmax888 tfahg miladalipour99 mr3118 kpup1710 mitchell-xiyunfeng af-74413592 fmcaotef zhangjielun1994 kaimaoge bestbbb etmunoz vbkbmqj findma zhutony lesego94 xzzxin zhouzheng-bingo penglinman afrindange khlin216 threpos jianwubi risingdhxs hankniu01 hrgentry makowskigrzegorz bsbk1 nikohou lux1998 m4rk-lewis codamoda rnikitin kebingxue djdongx adomuro slzwvictor levi-ackman giuseppe31-s lei324 samihaddouti valentinazie hyunjoon526 xudongliu terminaldienst peteryang1 xxxx001 manateechen jmkim-explore stephlee12 fushik eejuncao chetanmehra anthi7 perception-repo shinypond xkszltl farshidbalan shrinivasmural pavankumaruppar rabbitking03 ooyuanyuan tdl77 tienbku 159song ziyit koseoyoung sandy4321 datadudes-ai tonydev-kaggle tonydev-timeseries-ml hibb-bb hmate9 surajitdb xinzzzhou malzahar001 zacharyemiya dangmanhtruong1995 vishalbelsare hanhenryonlyfaceany ahmedest61 edmondliu2058

patchtst's Issues

Question of channel independent

About the fairness of comparisons with other transformer variants.

First of all, thank you for your contribution to this work. I have a question about the fairness of comparing PatchTST42 with other transformer variants, which seem to have a lookback window of 96 while PatchTST42 has a lookback window of 336 as mentioned in the article. Can you explain why this comparison is fair?

关于结果

尊敬的作者您好，感谢您伟大的作品，我在将PatchTST应用于我自己的数据集的时候（我的数据集与Weather数据集类似，时间跨度为2015-2021）得到如下结果，请问您有什么调参方面的指导吗？

How to load models to predict data

This is a potential project. How can I load the model and predict the new data? Can you give me a code example? Thank you

RevInCB and PatchMaskCB

In the current implementation the forward path first applies normalization and then applies masking.

PatchTST/PatchTST_self_supervised/patchtst_supervised.py

Lines 96 to 97 in de8d7f0

    
           cbs = [RevInCB(dls.vars)] if args.revin else [] 
        
           cbs += [PatchCB(patch_len=args.patch_len, stride=args.stride)]

Therefore the RevInCB mean and std are calculated on the non-masked inputs.
I think the RevInCB normalization can reveal the masked patches and assist the algorithm to recover pattern that are hidden if they are significantly different than the non-masked regions.
Is it the intended behavior?

Linear Head

您好，请问一下论文中Finally a flatten layer with linear head is used to obtain the prediction result，linear head的含义是什么？谢谢

Transformer Encoder

Hi, it seems that only the encoder part of the transformer is used in the model. However, both Autoformer and FEDformer use the structure of encoder + decoder. Is it better to use the encoder than the full structure (encoder + decoder) on the time series forecasting task? Could you provide some literature or experimental support?

代码中的位置编码

u = self.dropout(u + self.W_pos)
self.W_pos = positional_encoding(pe, learn_pe, q_len, d_model)其中的positional_encoding未进行定义。

About the application on the video dataset

Hi, thank you so much for such a great job.
I noticed that the dataset you used is in .CSV format. I would like to know whether the self-supervised task of this model is effective for the reconstruction or prediction of the video dataset?

Memory required to pretrain on Electricity and Traffic

Hi could you please share the memory required to pretrain on Electricity and Traffic datasets? Seems it cannot fit into a single 32GB V100 for either of the datasets due to the number of variates they have. Did you apply distributed training or were you able to pretrain on a single A40? Thanks!

Have you ever tried using pre-train Tensor for prediction instead of finetune?

Have you ever tried using pre-train Tensor for prediction instead of finetune?
Bert Using Pretrained Tensor for Classification, The Results can compare with finetune results, can this be same in time series task?

multivariate?

hello! In the paper, you state you have a multivariate method, however as far as I understand each variate (or channel) is processed independently and the emission is also a point forecasting emission which is independent.

Can you kindly clarify what part is multivariate? As far as I understand the only multivariate aspect is the input data being a vector of size M at each time point, however, I see this as a negative since after making your patches you end up with M * number of patches vectors, and thus the compute and memory via the vanilla transformer encoder is quadratic in M? If you had univariate inputs then at least you do not have the issue of O(M^2)....

Thank you for any insight!

Location of datasets

Congratulations on the project. Great work. I'm trying to do some test runs of the code. I've downloaded the datasets, but not sure where to place them. I keep getting the error: FileNotFoundError: [Errno 2] No such file or directory: '/data/datasets/public/ETDataset/ETT-small/ETTh1.csv'. Thanks in advance.

Channel-Independence

Hello,

Is there a way to allow for channel-mixing under the current implementation?

Thank you.

Hello, can I use it for anomaly detection? How effective?

More or Less features?

Hi Guys, I'm doing a school project and would appreciate some advice. I am doing multivariate forecasting for stocks. I want to predict stock "x" with the help of other stocks "y", "z" etc. Adding more features to the model can improve or hurt the model depending on the quality of the new data. Is there a way to determine which combination of features will deliver the best prediction ability?
Maybe a way to penalize features that worsen the model's accuracy and reward features that improve the accuracy?

Your help is appreciated!

problem about hyperparameter independual

Dear author,
I find that in your code you set the hyperparameter individual as 0 defeaultly,but in the paper it is stated that the channel independence has a boost on the effect. Meanwhile, I try to set the individual to 1, but I find that the effect became worse instead,What is the cause of this?

Mulit-GPU

Is there an option to run the training on multiple GPU (single node)?
I would like to make the training faster by (effectively larger batch size)?

a question about head type of self-supervised PatchTST

Hi, I got a question about the head of self-supervised PatchTST.
In my opinion, self-supervised PatchTST use a D × P linear layer(self.create_pretrain_head or PretrainHead) to do the pretrain process，then remove this head and attach a PredictionHead(Flatten_Head) to do end-to-end finetune or linear probing, am I right?
By the way, there are two versions of PatchTST in PatchTST_self_supervised and PatchTST_supervised, which one is the lastest version？

Retraining of the model on new dataset

How to retrain the model on new dataset? Like if I have trained the model on one stock data(Apple)and I want to do incremental learning on new stock data(microsoft)..how to do it?

关于结果的范围

尊敬的作者您好！非常感谢您所提出的杰出的模型。在将PatchTST作用于自己的数据集时，发现模型输出的数值范围与数据集中的数据值域范围有很大出入，经过阅读代码发现在Dataset中有scale默认为True。请问：如何将经过scale的数据的输出缩放为原本数据的值域范围呢？期待您的回答。

将PatchTST的多头注意力修改为ProbAttention

我注意到作者论文中使用的是transformer的encoder和decoder
如果我想要将PatchTST和informer结合，将多头注意力改为ProbAttention
请问我应该修改代码中的哪一部分呢？

how to use it on MacBook Pro M1

Hello!
My computer is MacBook Pro M1 chip, can't use Cuda. So can I use "mps" instead "gpu"?

how 头

run patchtst_finetune.py error

args: Namespace(is_finetune=0, is_linear_probe=0, dset_finetune='exchange', context_points=512, target_points=96, batch_size=64, num_workers=0, scaler='standard', features='M', patch_len=12, stride=12, revin=1, n_layers=3, n_heads=16, d_model=128, d_ff=256, dropout=0.2, head_dropout=0.2, n_epochs_finetune=20, lr=0.0001, pretrained_model='./p_model.pth', finetuned_model_id=1, model_type='based_model')
weight_path= saved_models/exchange/masked_patchtst/based_model/exchange_patchtst_finetuned_cw512_tw96_patch12_stride12_epochs-finetune20_model1
number of patches: 42
number of model params 920672
Traceback (most recent call last):

File "D:\pyyj\PatchTST-main\PatchTST_self_supervised\patchtst_finetune.py", line 235, in
out = test_func(weight_path)

File "D:\pyyj\PatchTST-main\PatchTST_self_supervised\patchtst_finetune.py", line 200, in test_func
out = learn.test(dls.test, weight_path=weight_path+'.pth', scores=[mse,mae]) # out: a list of [pred, targ, score]

File "D:\pyyj\PatchTST-main\PatchTST_self_supervised\src\learner.py", line 258, in test
if weight_path is not None: self.load(weight_path)

File "D:\pyyj\PatchTST-main\PatchTST_self_supervised\src\learner.py", line 387, in load
load_model(fname, self.model, self.opt, with_opt, device=device, strict=strict)

File "D:\pyyj\PatchTST-main\PatchTST_self_supervised\src\learner.py", line 429, in load_model
state = torch.load(path, map_location=device)

File "C:\anaconda3\lib\site-packages\torch\serialization.py", line 771, in load
with _open_file_like(f, 'rb') as opened_file:

File "C:\anaconda3\lib\site-packages\torch\serialization.py", line 270, in _open_file_like
return _open_file(name_or_buffer, mode)

File "C:\anaconda3\lib\site-packages\torch\serialization.py", line 251, in init
super(_open_file, self).init(open(name, mode))

FileNotFoundError: [Errno 2] No such file or directory: 'saved_models/exchange/masked_patchtst/based_model/exchange_patchtst_finetuned_cw512_tw96_patch12_stride12_epochs-finetune20_model1.pth'

Question about Pre-Norm

Hello! This is a very interesting project, thank you for writing the paper and open sourcing the code. Quick question, did you try pre-norm transformer blocks at all, and if so what were the results vs. the post-norm described in the paper?

Does the part of the performance gain come from residual attention?

I noticed that this implementation of Transformer here used residual attention, which does not appear in some of the other baselines mentioned in the paper. So I wonder if you have performed additional ablation studies to see the effect of residual attention for forecasting?

Long run time?

I want to congratulate you for the great patch transformer paper.

I want to ask a question:
I have a dataset which i hold as a pandas dataframe.

Given some window size I want to predict the next time step. Be This means I want to predict only a single step into the future:

Given X1,...,Xt
Predict Xt+1

This means I want to predict only a single step into the future.

As I understand if I want to use your model for the task I will need to have as many forward iterations as the dataset size since you are not using a casual mask in the transformer.

How can this be resolved?

Thanks

After PatchTST encoder, why do permute in last two dims?

Hello, you reshape the u (bs*nvars, patch_num, d_model) before encoder,

PatchTST/PatchTST_supervised/layers/PatchTST_backbone.py

Line 164 in b4c9f6f

    
           u = torch.reshape(x, (x.shape[0]*x.shape[1],x.shape[2],x.shape[3]))      # u: [bs * nvars x patch_num x d_model]

why do permute to transform z (bs*nvars, d_model, patch_num)?

PatchTST/PatchTST_supervised/layers/PatchTST_backbone.py

Lines 168 to 170 in b4c9f6f

    
           z = self.encoder(u)                                                      # z: [bs * nvars x patch_num x d_model] 
        
           z = torch.reshape(z, (-1,n_vars,z.shape[-2],z.shape[-1]))                # z: [bs x nvars x patch_num x d_model] 
        
           z = z.permute(0,1,3,2)                                                   # z: [bs x nvars x d_model x patch_num]

In next step, z (bs*nvars, d_model, patch_num) is fed into head module, then z pass a flatten layer. Can I flatten z in the way of z(-1, patch_num, d_model) instead of (-1, d_model, patch_num) ?

PatchTST/PatchTST_supervised/layers/PatchTST_backbone.py

Lines 74 to 75 in b4c9f6f

    
           z = self.backbone(z)                                                                # z: [bs x nvars x d_model x patch_num] 
        
           z = self.head(z)                                                                    # z: [bs x nvars x target_window]

PatchTST/PatchTST_supervised/layers/PatchTST_backbone.py

Lines 56 to 57 in b4c9f6f

    
           elif head_type == 'flatten':  
        
               self.head = Flatten_Head(self.individual, self.n_vars, self.head_nf, target_window, head_dropout=head_dropout)

PatchTST/PatchTST_supervised/layers/PatchTST_backbone.py

Lines 120 to 122 in b4c9f6f

    
           x = self.flatten(x) 
        
           x = self.linear(x) 
        
           x = self.dropout(x)

About the code of self-supervised

Thanks for your contribution.
After I run the commad:
python patchtst_pretrain.py --dset ettm1 --mask_ratio 0.4
there has create two files in:PatchTST_self_supervised/saved_models/ettm1/patchtst_pretrained_cw512_patch12_stride12_epochs-pretrain10_mask0.4_model1.path(and loss.scv)
But after this when I run the commad:
python patchtst_finetune.py --dset ettm1 --pretrained_model <model_name>
there has error like this:
FileNotFoundError: [Errno 2] No such file or directory: '/saved_models/ettm1/masked_patchtst/based_model/patchtst_pretrained_cw512_patch12_stride12_epochs-pretrain10_mask0.4_model1.pth'
I want to ask what the <model_name> should be?How can I run the patchtst_finetune.py?
Thank you very much!

Torch.onnx.export error

from tqdm import tqdm
import torch
import pandas as pd
import numpy as np
import math
from torch import nn
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

data = np.genfromtxt('c:/data.csv', delimiter=",")
data=scaler.fit_transform(data.reshape(-1,1)).flatten()
# In[ ]:
iw = 96
ow = 15
train=data

from torch.utils.data import DataLoader, Dataset

class windowDataset(Dataset):
    def __init__(self, y, input_window=80, output_window=20, stride=3):
        L = y.shape[0]
        num_samples = (L - input_window - output_window) // stride + 1
        X = np.zeros([input_window, num_samples])
        Y = np.zeros([output_window, num_samples])
        print(X.shape,y.shape)
        for i in np.arange(num_samples):
            start_x = stride*i
            end_x = start_x + input_window
            X[:,i] = y[start_x:end_x]

            start_y = stride*i + input_window
            end_y = start_y + output_window
            Y[:,i] = y[start_y:end_y]

        # size: [num_samples, input_window, 1]
        X = X.reshape(X.shape[1], X.shape[0], 1) 
        Y = Y.reshape(Y.shape[1], Y.shape[0], 1)
        self.x = X
        self.y = Y
        self.len = len(X)

    def __getitem__(self, i):
        return self.x[i], self.y[i]
    def __len__(self):
        return self.len


# In[ ]:
train_dataset = windowDataset(train, input_window=iw, output_window=ow, stride=2)
train_loader = DataLoader(train_dataset, batch_size=512)
# In[ ]:
from PatchTST import PatchTST
class PT_config():
    def __init__(self):
        self.seq_len=iw
        self.pred_len=ow
        self.individual=0
        self.enc_in=1
        self.e_layers=3
        self.n_heads= 16 
        self.d_model= 128 
        self.d_ff= 256 
        self.dropout =0.2
        self.fc_dropout= 0.2
        self.head_dropout= 0
        self.patch_len =16
        self.stride =8
        self.padding_patch='end'
        self.revin=1
        self.affine=0
        self.subtract_last=0
        self.decomposition=1
        self.kernel_size=25

# In[ ]:
# # 3. Train
#device = torch.device("cuda")
device = torch.device("cpu")
lr = 1e-4

class modelParam():
    def __init__(self,label):
        self.model=PatchTST(configs=PT_config()).to(device)
        self.optimizer=torch.optim.Adam(self.model.parameters(), lr=lr)

    def epoch(self,epoch):
        self.epoch=epoch
        return self.epoch

# In[ ]:
criterion = nn.MSELoss() #0.58
# In[ ]:
PT=modelParam('PT')
PTmodel=PT.model.to(device)
optimizer=PT.optimizer
epoch=PT.epoch(2) #训练次数
progress = tqdm(range(epoch))
PTmodel.train()
losses=[]
for i in progress:
    batchloss = 0.0
    for (inputs, outputs) in train_loader:
        optimizer.zero_grad()
        #'''
        result = PTmodel(inputs.float().to(device))
        loss = criterion(result, outputs.float().to(device))
        loss.backward()
        optimizer.step()
        batchloss += loss
    losses.append(batchloss.cpu().item())
    progress.set_description("PT- loss: {:0.6f}".format(batchloss.cpu().item() / len(train_loader)))

torch.save(PTmodel.to('cpu'), 'PTmodel.pth')

# In[ ]:
input_shape = (1, 96, 1)  
input_names = ["input"]  
output_names = ["output"]  


x = torch.randn(input_shape)

 
onnx_model_path = "PTmodel.onnx"  
dynamic_axes = {'input': {0: 'batch_size', 1: 'sequence_length', 2: 'input_dim'}, # 输入张量的动态维度axis
                'output': {0: 'batch_size', 1: 'sequence_length', 2: 'output_dim'}} # 输出张量的动态维度axis
torch.onnx.export(PTmodel, x, onnx_model_path, input_names=input_names, output_names=output_names, dynamic_axes=dynamic_axes)

data.csv

寻找学习率时模型读取报错

您好，我当前有个任务，是在不同的数据集下跑SSP的pretrain和finetune寻找超参数。
当在服务器上多卡跑多个脚本，在不同的数据集寻找超参的时候，出现了如下报错：

初步判断是由于多个程序同时寻找学习率时保存的中间模型文件./temp/current.pth 名字相同，导致读取时，读到的模型和自己pretrain保存的不一致，将current增加当前显卡的标识项（一张显卡只运行一个脚本），已区分不同程序运行保存的中间文件时，依旧会出现上面提到的报错。请问这该如何解决？下图是做的修改及改后运行结果：

Self Supervised vs Supervised

The paper is very dense and super informative, but I want to make sure I understand it and use the code correctly.
If I have the following multivariate forecasting task: multivariate predict univariate

Multiple past metrics (~100+ metrics over ~96 past time stamps) including the past of the target as input feature
Single output forecasting target (1 metric over future ~24 time stamps)

If I read correctly table Table 4 self-supervised embedding should be probably better than supervised embedding.
Question:

Is patchtst_pretrain with features='MS' the correct starting point to train your model to my dataset?
Any other code parameter you would suggest me to consider setting for a my first experiments?

ValueError: len() should return >= 0

File "/public/yxz/TimeSeriesForecast/V2+TS-library/PatchTST/PatchTST_supervised/exp/exp_main.py", line 102, in train
vali_data, vali_loader = self._get_data(flag='val')
File "/public/yxz/TimeSeriesForecast/V2+TS-library/PatchTST/PatchTST_supervised/exp/exp_main.py", line 43, in _get_data
data_set, data_loader = data_provider(self.args, flag)
File "/public/yxz/TimeSeriesForecast/V2+TS-library/PatchTST/PatchTST_supervised/data_provider/data_factory.py", line 44, in data_provider
print(flag, len(data_set))
ValueError: len() should return >= 0

为什么测试集是空的，？？？哪里出bug了？

How to pachify non-uniform data?

Thank you for the great work. I have a small question. Many time series in the real world are non-uniform, but patching seems to require uniform sampling. How to deal with it?

spatial sequence prediction

In my dataset,each sample is in a piece of 1D space,We've devided this 1D space evenly into n segments.In these n segments,4 known features are given,and i want to predict the other 2 target features in the original 1D space.Will this time sequence prediction model help
spatial sequence prediction ?

How to do an ablation experiment about channel-independence?

Thank you for your excellent work!
Problem solved！

Runtime Error

Can anyone assist? I trided running:
sh ./scripts/PatchTST/ettm1.sh
and get the following error:

 RuntimeError: 
          An attempt has been made to start a new process before the
          current process has finished its bootstrapping phase.
  
          This probably means that you are not using fork to start your
          child processes and you have forgotten to use the proper idiom
          in the main module:
  
              if __name__ == '__main__':
                  freeze_support()
                  ...
  
          The "freeze_support()" line can be omitted if the program
          is not going to be frozen to produce an executable.
  [7]  + 18410 suspended  sh ./scripts/PatchTST/ettm1.sh

Im using an M1 Mac

exchange rate dataset

Thanks for your work! I noticed that there is no experimental result of the exchange rate dataset in the paper. Did you do experiments on the exchange_rate dataset? If you have done experiments, can you provide the parameter settings for the optimal results of the exchange rate data set?

why pad S repeated numbers of the last value to the end of the original sequence before patching

i note the paper

and code

i want to know why do this？

About Attention map visualization

Hello, may I ask how the attention map in Figure 6 of the paper is visualized?

Do you use the attention map from the training stage or the test stage for visualization?In this case, each epoch corresponding to the training or testing phase will generate an attention map, but the attention map returned during the training or testing phase does not seem to have been saved.
Or is it visualized using the attention map in the model saved in the trained checkpoint.path file?

Input normalization twice - scaler and revin

While loading the data there is zcore normalization z = (x - mean) / std of the input data:

PatchTST/PatchTST_self_supervised/src/data/pred_dataset.py

Line 48 in de8d7f0

self.scaler = StandardScaler()

PatchTST/PatchTST_self_supervised/src/data/pred_dataset.py

Line 65 in de8d7f0

self.scaler.fit(train_data.values)

Before the forward path there is a zscore normalization to input as part of revin layer:

PatchTST/PatchTST_self_supervised/src/models/layers/revin.py

Lines 37 to 39 in de8d7f0

    
           def _normalize(self, x): 
        
               x = x - self.mean 
        
               x = x / self.stdev

Questions:

Can you help me understand the difference between the two normalizations and why both are required by default?
Why in finetune RevInCB gets denorm=True while in pretrain RevInCB gets denorm=False?

Can replacing the decoder with a transformer improve performance?

Comparisons might not be consistent across batch sizes

The prediction and actual sizes here are not the same across batch sizes. This means that the metrics calculated are not exactly comparable across different models trained with different batch sizes.

Therefore if my understanding is correct all models might need to be reevaluated.

I think the culprit is here: https://github.com/cure-lab/LTSF-Linear/blob/main/data_provider/data_factory.py#L20

The drop_last should be False during the test evaluations.

Adding Entity ID

If I understand correctly 'ettm1', 'ettm2' are the same multivariate task from two different stations.

Have you considered training those two station jointly?
They probably share similar patterns so training them on the same past dates will allow to have a better forecasting.
It would required to add a binary variable (eg station_id) for the pre-training so the embedding will be station-aware.I am not sure to add it to the data loader that expect only date x metric input (float). Do you have an idea where I should introduce such a change in the code?

Thank you

Data Loader use_time_features flag

Thank you for the excellent code base that is very flexible and clear.
Comparing the data loader between supervised to self-supervised it seem:

Self supervised has use_time_features=False by default https://github.com/yuqinie98/PatchTST/blob/main/PatchTST_self_supervised/src/data/pred_dataset.py#L97
Supervised does has use_time_features effectively always True https://github.com/yuqinie98/PatchTST/blob/main/PatchTST_supervised/data_provider/data_loader.py#L390

Questions:

How important are the time feature for model performance?
Why is the behaviour different between the two self-supervised and supervised ?

LogTrans Implementation

Thank you the great work and the well-sorted codebase!
Regarding LogTrans, it seems that the authors didn't release their official code. My I ask what is the implementation you used and are you planning to release it?

Why use drop_last=True in test (and val) dataloader?

Hi,
First, thanks for the excellent paper and for sharing this repo. Great work!

I want to ask why do you set the test dataloader drop_last=True? By doing that performance is not reported for all samples in the test dataset (some samples will be dropped, which is not what we want). In addition to this, changes in the batch size would lead to reporting performance on a different number of samples.
I've tested the difference by setting drop_last=False with the ILI dataset and the result is worse than the published one, although it still is the best-published result I've seen so far.

I saw the same issue in the Autoformer repo and logged an issue (thuml/Autoformer#104). As a result they've now updated the code.

BTW, this is likely to also occur with other papers that seem to use a similar code base to Autoformer's.

Question regarding decoder inputs

Hey guys, I really enjoyed reading the paper, and thanks for pushing the source code. I am working on a multivariate problem where I am having 94 features and one output target. I have a question about the model inference/prediction.

In the predict method, we are passing the decoder input where firstly, it is being initiated by zeros of batch_y shape[0] which makes sense but then it is being concatenated with batch_y.

    # decoder input
    dec_inp = torch.zeros([batch_y.shape[0], self.args.pred_len, batch_y.shape[-1]]).float()
    dec_inp = torch.cat([batch_y[:,:self.args.label_len,:], dec_inp], dim=1).float().to(self.device)

My question is why are we concatenating batch_y values as, in real-time, we will not be having batch_y values?

How to use multi-GPU in patchtst_pretrain.py and patchtst_finetune.py?

How to use multi-GPU in patchtst_pretrain.py and patchtst_finetune.py? Thank you.

	cbs = [RevInCB(dls.vars)] if args.revin else []
	cbs += [PatchCB(patch_len=args.patch_len, stride=args.stride)]

	z = self.encoder(u) # z: [bs * nvars x patch_num x d_model]
	z = torch.reshape(z, (-1,n_vars,z.shape[-2],z.shape[-1])) # z: [bs x nvars x patch_num x d_model]
	z = z.permute(0,1,3,2) # z: [bs x nvars x d_model x patch_num]

	z = self.backbone(z) # z: [bs x nvars x d_model x patch_num]
	z = self.head(z) # z: [bs x nvars x target_window]

	elif head_type == 'flatten':
	self.head = Flatten_Head(self.individual, self.n_vars, self.head_nf, target_window, head_dropout=head_dropout)

yuqinie98 / patchtst Goto Github PK

patchtst's People

Contributors

Stargazers

Watchers

Forkers

patchtst's Issues

Recommend Projects

Recommend Topics

Recommend Org