Coder Social home page Coder Social logo

salesforce / cost Goto Github PK

View Code? Open in Web Editor NEW
211.0 6.0 41.0 377 KB

PyTorch code for CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting (ICLR 2022)

License: BSD 3-Clause "New" or "Revised" License

Python 93.85% Shell 6.15%
contrastive-learning deep-learning self-supervised-learning time-series time-series-forecasting time-series-decomposition forecasting-model machine-learning

cost's Introduction

CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting (ICLR 2022)



Figure 1. Overall CoST Architecture.

Official PyTorch code repository for the CoST paper.

  • CoST is a contrastive learning method for learning disentangled seasonal-trend representations for time series forecasting.
  • CoST consistently outperforms state-of-the-art methods by a considerable margin, achieveing a 21.3% improvement in MSE on multivariate benchmarks.

Requirements

  1. Install Python 3.8, and the required dependencies.
  2. Required dependencies can be installed by: pip install -r requirements.txt

Data

The datasets can be obtained and put into datasets/ folder in the following way:

  • 3 ETT datasets should be placed at datasets/ETTh1.csv, datasets/ETTh2.csv and datasets/ETTm1.csv.
  • Electricity dataset placed at datasets/LD2011_2014.txt and run electricity.py.
  • Weather dataset (link from Informer repository) placed at datasets/WTH.csv
  • M5 dataset place calendar.csv, sales_train_validation.csv, sales_train_evaluation.csv, sales_test_validation.csv and sales_test_evaluation.csv at datasets/ and run m5.py.

Usage

To train and evaluate CoST on a dataset, run the script from the scripts folder: ./scripts/ETT_CoST.sh (edit file permissions via chmod u+x scripts/*).

After training and evaluation, the trained encoder, output and evaluation metrics can be found in training/<DatasetName>/<RunName>_<Date>_<Time>/.

Alternatively, you can directly run the python scripts:

python train.py <dataset_name> <run_name> --archive <archive> --batch-size <batch_size> --repr-dims <repr_dims> --gpu <gpu> --eval

The detailed descriptions about the arguments are as following:

Parameter name Description of parameter
dataset_name The dataset name
run_name The folder name used to save model, output and evaluation metrics. This can be set to any word
archive The archive name that the dataset belongs to. This can be set to forecast_csv or forecast_csv_univar
batch_size The batch size (defaults to 8)
repr_dims The representation dimensions (defaults to 320)
gpu The gpu no. used for training and inference (defaults to 0)
eval Whether to perform evaluation after training
kernels Kernel sizes for mixture of AR experts module
alpha Weight for loss function

(For descriptions of more arguments, run python train.py -h.)

Main Results

We perform experiments on five real-world public benchmark datasets, comparing against both state-of-the-art representation learning and end-to-end forecasting approaches. CoST achieves state-of-the-art performance, beating the best performing end-to-end forecasting approach by 39.3% and 18.22% (MSE) in the multivariate and univariate settings respectively. CoST also beats next best performing feature-based approach by 21.3% and 4.71% (MSE) in the multivariate and univariate settings respectively (refer to main paper for full results).

FAQs

Q: ValueError: Found array with dim 4. StandardScaler expected <= 2.

A: Please install the appropriate package requirements as found in requirements.txt, in particular, scikit_learn==0.24.1.

Q: How to set the --kernels parameter?

A: It should be list of space separated integers, e.g. --kernels 1 2 4. See the scripts folder for further examples.

Acknowledgements

The implementation of CoST relies on resources from the following codebases and repositories, we thank the original authors for open-sourcing their work.

Citation

Please consider citing if you find this code useful to your research.

@inproceedings{
    woo2022cost,
    title={Co{ST}: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting},
    author={Gerald Woo and Chenghao Liu and Doyen Sahoo and Akshat Kumar and Steven Hoi},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=PilZY3omXV2}
}

cost's People

Contributors

dependabot[bot] avatar gorold avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cost's Issues

The use of instance_contrastive_loss

Thanks for your great work!
Could you please explain the instance contrastive loss as written in cost.py?

logits = torch.tril(sim, diagonal=-1)[:, :, :-1] # T x 2B x (2B-1)
logits += torch.triu(sim, diagonal=1)[:, :, 1:]
logits = -F.log_softmax(logits, dim=-1)
i = torch.arange(B, device=z1.device)
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2

The above defination is different from the L_{amp} and L_{phase} noted in the paper.

Look forward to your reply! Thanks in advance!

How to visualize the trends representation after selecting a single seasonality

This is the relevant code I wrote myself, but I can't get the effect shown in figure 4 of the paper.

import numpy as np
import seaborn as sns
from cuml.manifold import TSNE
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("ticks")
##Learned representations from CoST
dataset_path = 'xxxx'
read_data = pd.read_csv(dataset_path+'representations.csv')
##Top 160 are trends representation
trends = np.array(read_data.iloc[:,1:161])
##Last 160 are seasons representation
seasons = np.array(read_data.iloc[:,161:])
##Perform T-SNE on the trends with a fixed season
trend_tsne = TSNE(n_components=2).fit_transform(trends)
seasonal_tsne = TSNE(n_components=2).fit_transform(seasons)
fig, axs = plt.subplots(2, 1, figsize=(8, 12))
##After fixing a certain 160-dimensional seasonal item, draw the two trend item cluster pictures
for i in range(2):
sns.scatterplot(x=trend_tsne[:, 0], y=trend_tsne[:, 1], hue=seasons[:, i+1], ax=axs[0], palette=['yellow', 'purple'])
axs[0].set_title('Fixed Seasonal Item {}'.format(i+1))
axs[0].set_xlabel('TSNE Dimension 1')
axs[0].set_ylabel('TSNE Dimension 2')

##After fixing a certain trend item in the first 160 dimensions, draw the clustering pictures of the three seasonal items
for i in range(3):
sns.scatterplot(x=seasonal_tsne[:, 0], y=seasonal_tsne[:, 1], hue=trends[:, i+1], ax=axs[1], palette=['yellow', 'blue', 'purple'])
axs[1].set_title('Fixed Trend Item {}'.format(i+1))
axs[1].set_xlabel('TSNE Dimension 1')
axs[1].set_ylabel('TSNE Dimension 2')

plt.tight_layout()
plt.show()

Any help will be appreciated.

Error during Evaluation of the model

When I run train.py through Electricity.sh script, after training procedure I get this error:
Traceback (most recent call last): File "/content/CoST/train.py", line 110, in <module> out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols, args.max_train_length-1) File "/content/CoST/tasks/forecasting.py", line 70, in eval_forecasting test_pred_inv = scaler.inverse_transform(test_pred) File "/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_data.py", line 1034, in inverse_transform X = check_array( File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py", line 915, in check_array raise ValueError( ValueError: Found array with dim 4. None expected <= 2.

Learning Help

I have a special admiration for this research of yours. I'm not a computer science student, so I don't understand many parts of your code. Can I ask you for more detailed instructions on how to use the code?I appreciate your help.

I met a mistake.

Traceback (most recent call last):
File "D:\skrsuper\python1\pytorch\cost\train.py", line 97, in
loss_log = model.fit(
File "D:\skrsuper\python1\pytorch\cost\cost.py", line 299, in fit
loss = self.cost(x_q, x_k)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\cost.py", line 145, in forward
q_t, q_s = self.encoder_q(x_q)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 163, in forward
out = mod(x) # b t d
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 65, in forward
output_fft[:, self.start:self.end] = self._forward(input_fft)
File "D:\skrsuper\python1\pytorch\cost\models\encoder.py", line 69, in _forward
output = torch.einsum('bti,tio->bto', input[:, self.start:self.end], self.weight)
File "C:\Users\lx.conda\envs\py39\lib\site-packages\torch\functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: einsum(): subscript t has size 501 for operand 1 which does not broadcast with previously seen size 301

Some questions about replication

  • There is a "archive" option in args, here is my unstanding "forecast_csv"="Multivariate forecasting" and "forecast_csv_univar"="Univariate forecasting"(Results show in Table 1 and Table 7). Is it correct?
  • How to run CoST on feature-based approach ? (Results show in Table 8)

Inquiries about training loss

First of all, thanks for your sharing good code.

When training with sample data and my custom data,
although the code runs fine, training loss does not decrease.

I would appreciate it if you could tell me how to solve this or how to learn.

Rounding error concerning max_train_length

Hi, I think there is a rounding error concerning the max_train_length

CoST/cost.py

Lines 256 to 259 in afc26aa

if self.max_train_length is not None:
sections = train_data.shape[1] // self.max_train_length
if sections >= 2:
train_data = np.concatenate(split_with_nan(train_data, sections, axis=1), axis=0)

To crop the data into cropped into some sequences, each of which has a length less than <max_train_length>, the number of sections should be rounded up.

For example in the ETTh dataset cropping the train slice of length 8640 with max_train_length = 201 results in 42 sections of length 206, instead of 43 sections of length 201.

Hello, Gerald. I am trying to running your code on my own dataset. But I got some problems here:

My dataset is similar to yours, with a total of 8 columns and 1681 rows, and the runtime reports such an error

Traceback (most recent call last):
File "train.py", line 109, in
out, eval_res = tasks.eval_forecasting(model, data, train_slice, valid_slice, test_slice, scaler, pred_lens, n_covariate_cols, args.max_train_length-1)
File "/userdata/lwy/CoST-main/tasks/forecasting.py", line 55, in eval_forecasting
lr = eval_protocols.fit_ridge(train_features, train_labels, valid_features, valid_labels)
File "/userdata/lwy/CoST-main/tasks/_eval_protocols.py", line 25, in fit_ridge
lr = Ridge(alpha=alpha).fit(train_features, train_y)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py", line 762, in fit
return super().fit(X, y, sample_weight=sample_weight)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/linear_model/_ridge.py", line 542, in fit
X, y = self._validate_data(X, y,
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/base.py", line 433, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 814, in check_X_y
X = check_array(X, accept_sparse=accept_sparse,
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/userdata/lwy/.local/lib/python3.8/site-packages/sklearn/utils/validation.py", line 669, in check_array
raise ValueError("Found array with %d sample(s) (shape=%s) while a"
ValueError: Found array with 0 sample(s) (shape=(0, 320)) while a minimum of 1 is required.

compute loss (labels: torch.zeros)

Hello, again.
I'm studying your paper and code.
However, in following codes in your 'cost.py' file,

l_pos = torch.einsum('nc,nc->n', [a1, a2]).unsqueeze(-1)
# negative logits: NxK
l_neg = torch.einsum('nc,ck->nk', [a1, a2_neg])

# logits: Nx(1+K)
logits = torch.cat([l_pos, l_neg], dim=1)

# apply temperature
logits /= T

# labels: positive key indicators - first dim of each batch
labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()
loss = F.cross_entropy(logits, labels)

I think that one of the instances (N) is 1.. because when crossentropy is calculated, the positive one's label become 1.
I don't know well, so I want your advice. Thank you.

--eval problem

Thank for sharing the code.
I found that the value of parameter "padding" in forecasting.py (line 24 or the first line in function eval_forecasting()) should be equal to args.max-train-length, if not, the evaluation process is not available.

A little confused about Trend Feature Disentangler

Think your for your brilliant job. Having read your guys paper, but didn't get the Trend Feature Disentangler, in the paper there is a formula about it which is

V (T ) = AvePool(V~ (T;0); V~ (T;1); : : : ; V~ (T;L)) 

when i run it in my IDE, found it is more like vertical connection conv block, and where is the AvgPool layer?
hope to get your help, and really appreciate your answer.

TypeError: Rearrange can't be applied to an empty list

Hello, I'm interested in your research.

So, I tried to run your code as you mentioned home:
python train.py ETTh1 output --archive forecast_csv --batch-size 256 --repr-dims 320 --gpu 0 --epochs 200
But, I got following error:
TypeError: Rearrange can't be applied to an empty list

How can I solve this problem?
Thank you.

How to use your approach for downstream forecasting tasks

Summary

Thanks for making the code available. I really like the idea of first learning the embeddings in a self-supervised manner and then using a simpler model for forecasting. However, I am struggling how to use the learned embeddings for the forecasting part.

Problem Description

Say you are tasked with forecasting a monthly univariate time series Y = (y1, ..., yT), which is historically available from January.2010 until December.2020. The task is to forecast 2021, with the forecasting horizon being h=12 months. Based on the CoST framework, we are using the TCN-Encoder (f) to learn the embeddings, V=f(Y), where V =[V_Trend, V_Seasonality] for January.2010 until December.2020. For training of the downstream forecasting model, say a Ridge Regression Model, we are using the final timestamp of the learned representations. So far so good.

@gorold My questions is now: given the representations and the trained Ridge model, how do we forecast 2021, since the representations are available until end of 2020 only? More specifically, what are the features for the Ridge model used for forecasting 2021?

Traing Loss problem.

When I used your algorithm and parameters to train on both the WTH dataset and my own dataset, I found that the loss was very low in the first epoch, but increased sharply in the second epoch, and subsequently, the loss remained higher than in the first epoch. The variation in the training loss is perplexing, and I hope you can provide some insights.

A Quick Question

I'm an undergraduate beginner interested in your project, when I try to run your github code, it reports an error that I'm having trouble solving, below are the parameter settings and the details of the error. Sorry for taking up your precious time and I hope you can give me valuable advice on the bug.

Dataset: WTH
Arguments: Namespace(alpha=0.0005, archive='forecast_csv', batch_size=8, dataset='WTH', epochs=None, eval=False, gpu=0, iters=None, kernels=None, lr=0.001, max_threads=None, max_train_length=3000, repr_dims=320, run_name='saved_model', save_every=None, seed=None)
Traceback (most recent call last):
File "C:/Users/免仑/Desktop/CoST-main/train.py", line 101, in
verbose=True
File "C:\Users\免仑\Desktop\CoST-main\cost.py", line 299, in fit
loss = self.cost(x_q, x_k)
File "D:\anaconda\envs\fb_prophet\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\免仑\Desktop\CoST-main\cost.py", line 145, in forward
q_t, q_s = self.encoder_q(x_q)
File "D:\anaconda\envs\fb_prophet\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\免仑\Desktop\CoST-main\models\encoder.py", line 155, in forward
rearrange(trend, 'list b t d -> list b t d'),
File "D:\anaconda\envs\fb_prophet\lib\site-packages\einops\einops.py", line 422, in rearrange
raise TypeError("Rearrange can't be applied to an empty list")
TypeError: Rearrange can't be applied to an empty list

Process finished with exit code 1

'--kernels' problem

Thank for sharing the code.
You set the parameter 'kernels' (Kernel sizes for mixture of AR experts module) in the code to 'None', when I run the code, it will report an error (Rearrange can't be applied to an empty list), please ask' How should kernels' be set up, may you give me some advice?
Looking forward to your reply!

different window offset for x_q, x_k

Hello,
it seems like x_q and x_k will have different sequence data because of different window offset, does it make sense to learn representations using data from different time frames?

CoST/cost.py

Lines 292 to 297 in 3c4e765

if self.max_train_length is not None and x_q.size(1) > self.max_train_length:
window_offset = np.random.randint(x_q.size(1) - self.max_train_length + 1)
x_q = x_q[:, window_offset : window_offset + self.max_train_length]
if self.max_train_length is not None and x_k.size(1) > self.max_train_length:
window_offset = np.random.randint(x_k.size(1) - self.max_train_length + 1)
x_k = x_k[:, window_offset : window_offset + self.max_train_length]

SystemExit: 2

Hello
I used Macbook pro 2017 ( 2.9 GHz Quad-Core Intel Core i7 and Intel HD Graphics 630 1536 MB) and Spydar. When I input the arguments in ( train.py) :
if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('electricity', help='The dataset name')
parser.add_argument('forecast_univar', help='The folder name used to save model, output and evaluation metrics. This can be set to any word')
parser.add_argument('--forecast_csv_univar', type=str, required=True, help='The archive name that the dataset belongs to. This can be set to forecast_csv, or forecast_csv_univar')
parser.add_argument('--gpu', type=int, default=0, help='The gpu no. used for training and inference (defaults to 0)')
parser.add_argument('--batch-size', type=int, default=8, help='The batch size (defaults to 8)')
parser.add_argument('--lr', type=float, default=0.001, help='The learning rate (defaults to 0.001)')
parser.add_argument('--repr-dims', type=int, default=320, help='The representation dimension (defaults to 320)')
parser.add_argument('--max-train-length', type=int, default=3000, help='For sequence with a length greater than <max_train_length>, it would be cropped into some sequences, each of which has a length less than <max_train_length> (defaults to 3000)')
parser.add_argument('--iters', type=int, default=None, help='The number of iterations')
parser.add_argument('--epochs', type=int, default=None, help='The number of epochs')
parser.add_argument('--save-every', type=int, default=None, help='Save the checkpoint every <save_every> iterations/epochs')
parser.add_argument('--seed', type=int, default=None, help='The random seed')
parser.add_argument('--max-threads', type=int, default=None, help='The maximum allowed number of threads used by this process')
parser.add_argument('--eval', action="store_true", help='Whether to perform evaluation after training')

parser.add_argument('--kernels', type=int, nargs='+', default=[1, 2, 4, 8, 16, 32, 64, 128], help='The kernel sizes used in the mixture of AR expert layers')
parser.add_argument('--alpha', type=float, default=0.0005, help='Weighting hyperparameter for loss function')

args = parser.parse_args()

I have this problem:
runfile('/Users/humamalkaabi/Documents/Applications/Tasks/CoST/train.py', wdir='/Users/humamalkaabi/Documents/Applications/Tasks/CoST')
Reloaded modules: models, models.dilated_conv, models.encoder, utils
usage: train.py [-h] --forecast_csv_univar FORECAST_CSV_UNIVAR [--gpu GPU]
[--batch-size BATCH_SIZE] [--lr LR] [--repr-dims REPR_DIMS]
[--max-train-length MAX_TRAIN_LENGTH] [--iters ITERS]
[--epochs EPOCHS] [--save-every SAVE_EVERY] [--seed SEED]
[--max-threads MAX_THREADS] [--eval]
[--kernels KERNELS [KERNELS ...]] [--alpha ALPHA]
electricity forecast_univar
train.py: error: the following arguments are required: electricity, forecast_univar, --forecast_csv_univar
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

I would like to ask your help to solve the error. I’ll really appreciate that and be grateful.
My Respects and thanks.

einsum() operands do not broadcast with remapped shapes

Hi, Gerald.
I am trying to running your code on my own dataset. But I got some problems here:

RuntimeError: einsum() operands do not broadcast with remapped shapes [original->remapped]: [32, 37, 320]->[32, 37, 1, 320] [101, 320, 160]->[1, 101, 160, 320]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.