salesforce / cost Goto Github PK

PyTorch code for CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting (ICLR 2022)

License: BSD 3-Clause "New" or "Revised" License

Python 93.85% Shell 6.15%

contrastive-learning deep-learning self-supervised-learning time-series time-series-forecasting time-series-decomposition forecasting-model machine-learning

cost's Introduction

CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting (ICLR 2022)

Figure 1. Overall CoST Architecture.

Official PyTorch code repository for the CoST paper.

CoST is a contrastive learning method for learning disentangled seasonal-trend representations for time series forecasting.
CoST consistently outperforms state-of-the-art methods by a considerable margin, achieveing a 21.3% improvement in MSE on multivariate benchmarks.

Requirements

Install Python 3.8, and the required dependencies.
Required dependencies can be installed by: pip install -r requirements.txt

Data

The datasets can be obtained and put into datasets/ folder in the following way:

3 ETT datasets should be placed at datasets/ETTh1.csv, datasets/ETTh2.csv and datasets/ETTm1.csv.
Electricity dataset placed at datasets/LD2011_2014.txt and run electricity.py.
Weather dataset (link from Informer repository) placed at datasets/WTH.csv
M5 dataset place calendar.csv, sales_train_validation.csv, sales_train_evaluation.csv, sales_test_validation.csv and sales_test_evaluation.csv at datasets/ and run m5.py.

Usage

To train and evaluate CoST on a dataset, run the script from the scripts folder: ./scripts/ETT_CoST.sh (edit file permissions via chmod u+x scripts/*).

After training and evaluation, the trained encoder, output and evaluation metrics can be found in training/<DatasetName>/<RunName>_<Date>_<Time>/.

Alternatively, you can directly run the python scripts:

python train.py <dataset_name> <run_name> --archive <archive> --batch-size <batch_size> --repr-dims <repr_dims> --gpu <gpu> --eval

The detailed descriptions about the arguments are as following:

Parameter name	Description of parameter
dataset_name	The dataset name
run_name	The folder name used to save model, output and evaluation metrics. This can be set to any word
archive	The archive name that the dataset belongs to. This can be set to `forecast_csv` or `forecast_csv_univar`
batch_size	The batch size (defaults to 8)
repr_dims	The representation dimensions (defaults to 320)
gpu	The gpu no. used for training and inference (defaults to 0)
eval	Whether to perform evaluation after training
kernels	Kernel sizes for mixture of AR experts module
alpha	Weight for loss function

(For descriptions of more arguments, run python train.py -h.)

Main Results

We perform experiments on five real-world public benchmark datasets, comparing against both state-of-the-art representation learning and end-to-end forecasting approaches. CoST achieves state-of-the-art performance, beating the best performing end-to-end forecasting approach by 39.3% and 18.22% (MSE) in the multivariate and univariate settings respectively. CoST also beats next best performing feature-based approach by 21.3% and 4.71% (MSE) in the multivariate and univariate settings respectively (refer to main paper for full results).

FAQs

Q: ValueError: Found array with dim 4. StandardScaler expected <= 2.

A: Please install the appropriate package requirements as found in requirements.txt, in particular, scikit_learn==0.24.1.

Q: How to set the --kernels parameter?

A: It should be list of space separated integers, e.g. --kernels 1 2 4. See the scripts folder for further examples.

Acknowledgements

The implementation of CoST relies on resources from the following codebases and repositories, we thank the original authors for open-sourcing their work.

Citation

Please consider citing if you find this code useful to your research.

@inproceedings{
    woo2022cost,
    title={Co{ST}: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting},
    author={Gerald Woo and Chenghao Liu and Doyen Sahoo and Akshat Kumar and Steven Hoi},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=PilZY3omXV2}
}

cost's People

Contributors

Stargazers

Watchers

Forkers

kwuking shism2 zhenlongsong z007-t chenghaoliu89 statmixedml vishalbelsare gorold motherofunicorns seorin-kim isabella232 rkainkaryam sharpe5 hangzhang10 yunfei-bai safros lyjlu cehw wenzhaojie lixixibj yfy324 juyongjiang vigneashpandiyan carvaee shubhamkapoor o3otz forex24 jiawei0322 ghas-results david-ttao melquemz leesw9501 hanlaoshi lzx-buaa axl-zhang alexwei21 liesgame echopengqian ql-denoising wudibawanglonggege

cost's Issues

The use of instance_contrastive_loss

Thanks for your great work!
Could you please explain the instance contrastive loss as written in cost.py?

logits = torch.tril(sim, diagonal=-1)[:, :, :-1] # T x 2B x (2B-1)
logits += torch.triu(sim, diagonal=1)[:, :, 1:]
logits = -F.log_softmax(logits, dim=-1)
i = torch.arange(B, device=z1.device)
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2

The above defination is different from the L_{amp} and L_{phase} noted in the paper.

Look forward to your reply! Thanks in advance!

How to visualize the trends representation after selecting a single seasonality

This is the relevant code I wrote myself, but I can't get the effect shown in figure 4 of the paper.

import numpy as np
import seaborn as sns
from cuml.manifold import TSNE
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("ticks")
##Learned representations from CoST
dataset_path = 'xxxx'
read_data = pd.read_csv(dataset_path+'representations.csv')
##Top 160 are trends representation
trends = np.array(read_data.iloc[:,1:161])
##Last 160 are seasons representation
seasons = np.array(read_data.iloc[:,161:])
##Perform T-SNE on the trends with a fixed season
trend_tsne = TSNE(n_components=2).fit_transform(trends)
seasonal_tsne = TSNE(n_components=2).fit_transform(seasons)
fig, axs = plt.subplots(2, 1, figsize=(8, 12))
##After fixing a certain 160-dimensional seasonal item, draw the two trend item cluster pictures
for i in range(2):
sns.scatterplot(x=trend_tsne[:, 0], y=trend_tsne[:, 1], hue=seasons[:, i+1], ax=axs[0], palette=['yellow', 'purple'])
axs[0].set_title('Fixed Seasonal Item {}'.format(i+1))
axs[0].set_xlabel('TSNE Dimension 1')
axs[0].set_ylabel('TSNE Dimension 2')

##After fixing a certain trend item in the first 160 dimensions, draw the clustering pictures of the three seasonal items
for i in range(3):
sns.scatterplot(x=seasonal_tsne[:, 0], y=seasonal_tsne[:, 1], hue=trends[:, i+1], ax=axs[1], palette=['yellow', 'blue', 'purple'])
axs[1].set_title('Fixed Trend Item {}'.format(i+1))
axs[1].set_xlabel('TSNE Dimension 1')
axs[1].set_ylabel('TSNE Dimension 2')

plt.tight_layout()
plt.show()

Any help will be appreciated.

Error during Evaluation of the model

When I run train.py through Electricity.sh script, after training procedure I get this error:
Traceback (most recent call last): File "/content/CoST/train.py", line 110, in <module> out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols, args.max_train_length-1) File "/content/CoST/tasks/forecasting.py", line 70, in eval_forecasting test_pred_inv = scaler.inverse_transform(test_pred) File "/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_data.py", line 1034, in inverse_transform X = check_array( File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 915, in check_array raise ValueError( ValueError: Found array with dim 4. None expected <= 2.

Learning Help

I have a special admiration for this research of yours. I'm not a computer science student, so I don't understand many parts of your code. Can I ask you for more detailed instructions on how to use the code?I appreciate your help.

Are the results in Table 1 and Table 7 both normalized and not raw?

I met a mistake.

Traceback (most recent call last):
File "D:\skrsuper\python1\pytorch\cost\train.py", line 97, in
loss_log = model.fit(
File "D:\skrsuper\python1\pytorch\cost\cost.py", line 299, in fit
loss = self.cost(x_q, x_k)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\cost.py", line 145, in forward
q_t, q_s = self.encoder_q(x_q)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 163, in forward
out = mod(x) # b t d
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 65, in forward
output_fft[:, self.start:self.end] = self._forward(input_fft)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 69, in _forward
output = torch.einsum('bti,tio->bto', input[:, self.start:self.end], self.weight)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): subscript t has size 501 for operand 1 which does not broadcast with previously seen size 301

Some questions about replication

There is a "archive" option in args, here is my unstanding "forecast_csv"="Multivariate forecasting" and "forecast_csv_univar"="Univariate forecasting"(Results show in Table 1 and Table 7). Is it correct?
How to run CoST on feature-based approach ? (Results show in Table 8)

Inquiries about training loss

First of all, thanks for your sharing good code.

When training with sample data and my custom data,
although the code runs fine, training loss does not decrease.

I would appreciate it if you could tell me how to solve this or how to learn.

Rounding error concerning max_train_length

Hi, I think there is a rounding error concerning the max_train_length

CoST/cost.py

Lines 256 to 259 in afc26aa

    
           if self.max_train_length is not None: 
        
               sections = train_data.shape[1] // self.max_train_length 
        
               if sections >= 2: 
        
                   train_data = np.concatenate(split_with_nan(train_data, sections, axis=1), axis=0)

To crop the data into cropped into some sequences, each of which has a length less than <max_train_length>, the number of sections should be rounded up.

For example in the ETTh dataset cropping the train slice of length 8640 with max_train_length = 201 results in 42 sections of length 206, instead of 43 sections of length 201.

Where can one download your paper's tex file ?

Wonderful work,
I was wondering how and where can one download this paper's tex file

Hello, Gerald. I am trying to running your code on my own dataset. But I got some problems here:

My dataset is similar to yours, with a total of 8 columns and 1681 rows, and the runtime reports such an error

Traceback (most recent call last):
File "train.py", line 109, in
out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols, args.max_train_length-1)
File "/userdata/lwy/CoST-main/tasks/forecasting.py", line 55, in eval_forecasting
lr = eval_protocols.fit_ridge(train_features, train_labels, valid_features, valid_labels)
File "/userdata/lwy/CoST-main/tasks/_eval_protocols.py", line 25, in fit_ridge
lr = Ridge(alpha=alpha).fit(train_features, train_y)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py", line 762, in fit
return super().fit(X, y, sample_weight=sample_weight)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py", line 542, in fit
X, y = self._validate_data(X, y,
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 814, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 669, in check_array
raise ValueError("Found array with %d sample(s) (shape=%s) while a"
ValueError: Found array with 0 sample(s) (shape=(0, 320)) while a minimum of 1 is required.

compute loss (labels: torch.zeros)

Hello, again.
I'm studying your paper and code.
However, in following codes in your 'cost.py' file,

l_pos = torch.einsum('nc,nc->n', [a1, a2]).unsqueeze(-1)
# negative logits: NxK
l_neg = torch.einsum('nc,ck->nk', [a1, a2_neg])

# logits: Nx(1+K)
logits = torch.cat([l_pos, l_neg], dim=1)

# apply temperature
logits /= T

# labels: positive key indicators - first dim of each batch
labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()
loss = F.cross_entropy(logits, labels)

I think that one of the instances (N) is 1.. because when crossentropy is calculated, the positive one's label become 1.
I don't know well, so I want your advice. Thank you.

--eval problem

Thank for sharing the code.
I found that the value of parameter "padding" in forecasting.py (line 24 or the first line in function eval_forecasting()) should be equal to args.max-train-length, if not, the evaluation process is not available.

A little confused about Trend Feature Disentangler

Think your for your brilliant job. Having read your guys paper, but didn't get the Trend Feature Disentangler, in the paper there is a formula about it which is

V (T ) = AvePool(V~ (T;0); V~ (T;1); : : : ; V~ (T;L))

when i run it in my IDE, found it is more like vertical connection conv block, and where is the AvgPool layer?
hope to get your help, and really appreciate your answer.

TypeError: Rearrange can't be applied to an empty list

Hello, I'm interested in your research.

So, I tried to run your code as you mentioned home:
python train.py ETTh1 output --archive forecast_csv --batch-size 256 --repr-dims 320 --gpu 0 --epochs 200
But, I got following error:
TypeError: Rearrange can't be applied to an empty list

How can I solve this problem?
Thank you.

How to use your approach for downstream forecasting tasks

Summary

Thanks for making the code available. I really like the idea of first learning the embeddings in a self-supervised manner and then using a simpler model for forecasting. However, I am struggling how to use the learned embeddings for the forecasting part.

Problem Description

Say you are tasked with forecasting a monthly univariate time series Y = (y1, ..., yT), which is historically available from January.2010 until December.2020. The task is to forecast 2021, with the forecasting horizon being h=12 months. Based on the CoST framework, we are using the TCN-Encoder (f) to learn the embeddings, V=f(Y), where V =[V_Trend, V_Seasonality] for January.2010 until December.2020. For training of the downstream forecasting model, say a Ridge Regression Model, we are using the final timestamp of the learned representations. So far so good.

@gorold My questions is now: given the representations and the trained Ridge model, how do we forecast 2021, since the representations are available until end of 2020 only? More specifically, what are the features for the Ridge model used for forecasting 2021?

How to set parameters for AR experts?

Traing Loss problem.

When I used your algorithm and parameters to train on both the WTH dataset and my own dataset, I found that the loss was very low in the first epoch, but increased sharply in the second epoch, and subsequently, the loss remained higher than in the first epoch. The variation in the training loss is perplexing, and I hope you can provide some insights.

A Quick Question

I'm an undergraduate beginner interested in your project, when I try to run your github code, it reports an error that I'm having trouble solving, below are the parameter settings and the details of the error. Sorry for taking up your precious time and I hope you can give me valuable advice on the bug.

Dataset: WTH
Arguments: Namespace(alpha=0.0005, archive='forecast_csv', batch_size=8, dataset='WTH', epochs=None, eval=False, gpu=0, iters=None, kernels=None, lr=0.001, max_threads=None, max_train_length=3000, repr_dims=320, run_name='saved_model', save_every=None, seed=None)
Traceback (most recent call last):
File "C:/Users/免仑/Desktop/CoST-main/train.py", line 101, in
verbose=True
File "C:\Users\免仑\Desktop\CoST-main\cost.py", line 299, in fit
loss = self.cost(x_q, x_k)
File "D:\anaconda\envs\fb_prophet\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\免仑\Desktop\CoST-main\cost.py", line 145, in forward
q_t, q_s = self.encoder_q(x_q)
File "D:\anaconda\envs\fb_prophet\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\免仑\Desktop\CoST-main\models\encoder.py", line 155, in forward
rearrange(trend, 'list b t d -> list b t d'),
File "D:\anaconda\envs\fb_prophet\lib\site-packages\einops\einops.py", line 422, in rearrange
raise TypeError("Rearrange can't be applied to an empty list")
TypeError: Rearrange can't be applied to an empty list

Process finished with exit code 1

'--kernels' problem

Thank for sharing the code.
You set the parameter 'kernels' (Kernel sizes for mixture of AR experts module) in the code to 'None', when I run the code, it will report an error (Rearrange can't be applied to an empty list), please ask' How should kernels' be set up, may you give me some advice?
Looking forward to your reply！

Helloteacher, I really don’t understand this part: (rand_idx = np.random.randint(0, x_q.shape[1]); q_t = F.normalize(self.head_q(q_t[:, rand_idx]), dim=-1)). Why specifically extract a random time point instead of expanding over all time points?

different window offset for x_q, x_k

Hello,
it seems like x_q and x_k will have different sequence data because of different window offset, does it make sense to learn representations using data from different time frames?

CoST/cost.py

Lines 292 to 297 in 3c4e765

    
           if self.max_train_length is not None and x_q.size(1) > self.max_train_length: 
        
               window_offset = np.random.randint(x_q.size(1) - self.max_train_length + 1) 
        
               x_q = x_q[:, window_offset : window_offset + self.max_train_length] 
        
           if self.max_train_length is not None and x_k.size(1) > self.max_train_length: 
        
               window_offset = np.random.randint(x_k.size(1) - self.max_train_length + 1) 
        
               x_k = x_k[:, window_offset : window_offset + self.max_train_length]

SystemExit: 2

Hello
I used Macbook pro 2017 ( 2.9 GHz Quad-Core Intel Core i7 and Intel HD Graphics 630 1536 MB) and Spydar. When I input the arguments in ( train.py) :
if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('electricity', help='The dataset name')
parser.add_argument('forecast_univar', help='The folder name used to save model, output and evaluation metrics. This can be set to any word')
parser.add_argument('--forecast_csv_univar', type=str, required=True, help='The archive name that the dataset belongs to. This can be set to forecast_csv, or forecast_csv_univar')
parser.add_argument('--gpu', type=int, default=0, help='The gpu no. used for training and inference (defaults to 0)')
parser.add_argument('--batch-size', type=int, default=8, help='The batch size (defaults to 8)')
parser.add_argument('--lr', type=float, default=0.001, help='The learning rate (defaults to 0.001)')
parser.add_argument('--repr-dims', type=int, default=320, help='The representation dimension (defaults to 320)')
parser.add_argument('--max-train-length', type=int, default=3000, help='For sequence with a length greater than <max_train_length>, it would be cropped into some sequences, each of which has a length less than <max_train_length> (defaults to 3000)')
parser.add_argument('--iters', type=int, default=None, help='The number of iterations')
parser.add_argument('--epochs', type=int, default=None, help='The number of epochs')
parser.add_argument('--save-every', type=int, default=None, help='Save the checkpoint every <save_every> iterations/epochs')
parser.add_argument('--seed', type=int, default=None, help='The random seed')
parser.add_argument('--max-threads', type=int, default=None, help='The maximum allowed number of threads used by this process')
parser.add_argument('--eval', action="store_true", help='Whether to perform evaluation after training')

parser.add_argument('--kernels', type=int, nargs='+', default=[1, 2, 4, 8, 16, 32, 64, 128], help='The kernel sizes used in the mixture of AR expert layers')
parser.add_argument('--alpha', type=float, default=0.0005, help='Weighting hyperparameter for loss function')

args = parser.parse_args()

I have this problem:
runfile('/Users/humamalkaabi/Documents/Applications/Tasks/CoST/train.py', wdir='/Users/humamalkaabi/Documents/Applications/Tasks/CoST')
Reloaded modules: models, models.dilated_conv, models.encoder, utils
usage: train.py [-h] --forecast_csv_univar FORECAST_CSV_UNIVAR [--gpu GPU]
[--batch-size BATCH_SIZE] [--lr LR] [--repr-dims REPR_DIMS]
[--max-train-length MAX_TRAIN_LENGTH] [--iters ITERS]
[--epochs EPOCHS] [--save-every SAVE_EVERY] [--seed SEED]
[--max-threads MAX_THREADS] [--eval]
[--kernels KERNELS [KERNELS ...]] [--alpha ALPHA]
electricity forecast_univar
train.py: error: the following arguments are required: electricity, forecast_univar, --forecast_csv_univar
An exception has occurred, use %tb to see the full traceback.