Coder Social home page Coder Social logo

zalandoresearch / pytorch-ts Goto Github PK

View Code? Open in Web Editor NEW
1.2K 24.0 186.0 3.83 MB

PyTorch based Probabilistic Time Series forecasting framework based on GluonTS backend

License: MIT License

Python 100.00%
pytorch time-series probabilistic deepar lstnet n-beats

pytorch-ts's Introduction

PyTorchTS

PyTorchTS is a PyTorch Probabilistic Time Series forecasting framework which provides state of the art PyTorch time series models by utilizing GluonTS as its back-end API and for loading, transforming and back-testing time series data sets.

Installation

$ pip3 install pytorchts

Quick start

Here we highlight the the API changes via the GluonTS README.

import matplotlib.pyplot as plt
import pandas as pd
import torch

from gluonts.dataset.common import ListDataset
from gluonts.dataset.util import to_pandas

from pts.model.deepar import DeepAREstimator
from pts import Trainer

This simple example illustrates how to train a model on some data, and then use it to make predictions. As a first step, we need to collect some data: in this example we will use the volume of tweets mentioning the AMZN ticker symbol.

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

The first 100 data points look like follows:

df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()

png

We can now prepare a training dataset for our model to train on. Datasets are essentially iterable collections of dictionaries: each dictionary represents a time series with possibly associated features. For this example, we only have one entry, specified by the "start" field which is the timestamp of the first data point, and the "target" field containing time series data. For training, we will use data up to midnight on April 5th, 2015.

training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)

A forecasting model is a predictor object. One way of obtaining predictors is by training a correspondent estimator. Instantiating an estimator requires specifying the frequency of the time series that it will handle, as well as the number of time steps to predict. In our example we're using 5 minutes data, so req="5min", and we will train a model to predict the next hour, so prediction_length=12. The input to the model will be a vector of size input_size=43 at each time point. We also specify some minimal training options in particular training on a device for epoch=10.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=19,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data, num_workers=4)
    45it [00:01, 37.60it/s, avg_epoch_loss=4.64, epoch=0]
    48it [00:01, 39.56it/s, avg_epoch_loss=4.2, epoch=1] 
    45it [00:01, 38.11it/s, avg_epoch_loss=4.1, epoch=2] 
    43it [00:01, 36.29it/s, avg_epoch_loss=4.05, epoch=3]
    44it [00:01, 35.98it/s, avg_epoch_loss=4.03, epoch=4]
    48it [00:01, 39.48it/s, avg_epoch_loss=4.01, epoch=5]
    48it [00:01, 38.65it/s, avg_epoch_loss=4, epoch=6]   
    46it [00:01, 37.12it/s, avg_epoch_loss=3.99, epoch=7]
    48it [00:01, 38.86it/s, avg_epoch_loss=3.98, epoch=8]
    48it [00:01, 39.49it/s, avg_epoch_loss=3.97, epoch=9]

During training, useful information about the progress will be displayed. To get a full overview of the available options, please refer to the source code of DeepAREstimator (or other estimators) and Trainer.

We're now ready to make predictions: we will forecast the hour following the midnight on April 15th, 2015.

test_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-15 00:00:00"]}],
    freq = "5min"
)
for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

png

Note that the forecast is displayed in terms of a probability distribution: the shaded areas represent the 50% and 90% prediction intervals, respectively, centered around the median (dark green line).

Development

pip install -e .
pytest test

Citing

To cite this repository:

@software{pytorchgithub,
    author = {Kashif Rasul},
    title = {{P}yTorch{TS}},
    url = {https://github.com/zalandoresearch/pytorch-ts},
    version = {0.6.x},
    year = {2021},
}

Scientific Article

We have implemented the following model using this framework:

@INPROCEEDINGS{rasul2020tempflow,
  author = {Kashif Rasul and  Abdul-Saboor Sheikh and  Ingmar Schuster and Urs Bergmann and Roland Vollgraf},
  title = {{M}ultivariate {P}robabilistic {T}ime {S}eries {F}orecasting via {C}onditioned {N}ormalizing {F}lows},
  year = {2021},
  url = {https://openreview.net/forum?id=WiGQBFuVRv},
  booktitle = {International Conference on Learning Representations 2021},
}
@InProceedings{pmlr-v139-rasul21a,
  title = 	 {{A}utoregressive {D}enoising {D}iffusion {M}odels for {M}ultivariate {P}robabilistic {T}ime {S}eries {F}orecasting},
  author =       {Rasul, Kashif and Seward, Calvin and Schuster, Ingmar and Vollgraf, Roland},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8857--8868},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/rasul21a/rasul21a.pdf},
  url = 	 {http://proceedings.mlr.press/v139/rasul21a.html},
}
@misc{gouttes2021probabilistic,
      title={{P}robabilistic {T}ime {S}eries {F}orecasting with {I}mplicit {Q}uantile {N}etworks}, 
      author={Adèle Gouttes and Kashif Rasul and Mateusz Koren and Johannes Stephan and Tofigh Naghibi},
      year={2021},
      eprint={2107.03743},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

pytorch-ts's People

Contributors

adelegouttes avatar aslinagy avatar edrinb-zalando avatar kashif avatar kousu avatar larkz avatar nielsrogge avatar sabman avatar samnor avatar shashankdeshpande avatar ssmall41 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-ts's Issues

Implement Temporal Fusion Transformer?

Hi!

First of all, thank you for this library. It is the only Pytorch lib for ts with a usable API!

I am pretty sure you already know this, but one of the latest and most famous deep learning architectures for ts forecasting is the Temporal Fusion Transformer (TFT): https://arxiv.org/abs/1912.09363

There are already two Pytorch implementations out there: this one and this other one (that is heavily inspired by the former), but both lack a nice API and there are some implementation issues.

IMO it would be a great addition to pytorch-ts.

Thank you!

pip3 install failing

Outputs

ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts

DeepVAR: name 'SetField' is not defined

Description

I am trying to use DeepVAREstimator from the issue-3 branch throwing an error NameError: name 'SetField' is not defined.

To Reproduce

from pts.transform import SetField
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
trainer = Trainer(device = device, epochs = 10) 

estimator = DeepVAREstimator(input_size = 401,
                             freq = "1M", 
                             prediction_length = pred_h,
                             context_length = pred_h*2,
                             target_dim = target_dim,
                             use_feat_static_cat = True,
                             cardinality = card_static,
                             trainer = trainer)                              
predictor = estimator.train(training_data = train_ds)

Error message output

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-375c015eb18b> in <module>
     20                              # time_features = feat_dynamic_real_train,
     21                              trainer = trainer)                              
---> 22 predictor = estimator.train(training_data = train_ds)
     23 predictor.__dict__["prediction_net"]

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train(self, training_data)
    132 
    133     def train(self, training_data: Dataset) -> Predictor:
--> 134         return self.train_model(training_data).predictor

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/estimator.py in train_model(self, training_data)
     98 
     99     def train_model(self, training_data: Dataset) -> TrainOutput:
--> 100         transformation = self.create_transformation()
    101         transformation.estimate(iter(training_data))
    102 

~/miniconda3/envs/pytorchts/lib/python3.7/site-packages/pts/model/deepvar/deepvar_estimator.py in create_transformation(self)
    154                 else []
    155             )
--> 156             + [
    157                 AsNumpyArray(
    158                     field=FieldName.FEAT_STATIC_CAT, expected_ndim=1, dtype=np.long,

NameError: name 'SetField' is not defined

Potential Solution

Include the following into deepvar_estimator.py

from pts.transform import (
...
    SetField
)

TQDM issue, perhaps with versioning

/databricks/python/lib/python3.7/site-packages/pts/trainer.py in __call__(self, net, train_iter, validation_iter)
     96             if validation_iter is not None:
     97                 cumm_epoch_loss_val = 0.0
---> 98                 with tqdm(validation_iter, total=total, colour="green") as it:
     99 
    100                     for batch_no, data_entry in enumerate(it, start=1):

/databricks/python/lib/python3.7/site-packages/tqdm/std.py in __init__(self, iterable, desc, total, leave, file, ncols, mininterval, maxinterval, miniters, ascii, disable, unit, unit_scale, dynamic_ncols, smoothing, bar_format, initial, position, postfix, unit_divisor, write_bytes, lock_args, gui, **kwargs)
    946                     fp_write=getattr(file, 'write', sys.stderr.write))
    947                 if "nested" in kwargs else
--> 948                 TqdmKeyError("Unknown argument(s): " + str(kwargs)))
    949 
    950         # Preprocess the arguments

TQDM version used is incompatible signature.

File not found: datasets/pts_m5/metadata.json

Excuse me if this is a newbie question but when I try to run https://github.com/zalandoresearch/pytorch-ts/blob/master/examples/m5-tft.ipynb I am running into the following error:

Traceback (most recent call last):
  File "C:\Users\Gili\Documents\myproject\pytorch-ts-test.py", line 17, in <module>
    dataset = get_dataset("pts_m5", regenerate=False)
  File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\repository\datasets.py", line 232, in get_dataset
    return load_datasets(
  File "C:\Users\Gili\Documents\myproject\python\lib\site-packages\gluonts\dataset\common.py", line 491, in load_datasets
    meta = MetaData.parse_file(Path(metadata) / "metadata.json")
  File "pydantic\main.py", line 613, in pydantic.main.BaseModel.parse_file
  File "pydantic\parse.py", line 57, in pydantic.parse.load_file
  File "C:\Python39\lib\pathlib.py", line 1248, in read_bytes
    with self.open(mode='rb') as f:
  File "C:\Python39\lib\pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "C:\Python39\lib\pathlib.py", line 1109, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Gili\\.mxnet\\gluon-ts\\datasets\\pts_m5\\metadata.json'

Looking at https://www.kaggle.com/c/m5-forecasting-accuracy/data this file does not seem to exist. What am I missing here?

unable to reproduce the resultsof TimeGrad on some datasets

I'm very glad to read the paper "Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting", it's a very intesting work and TimeGrad achieves state of the art results on multivariate time series forecasting tasks. However, I can not reproduce the results on some datasets with the model implemented in pytorch-ts. Could you release the hyperparameter settings of these datasets in the paper? Thx a lot!

Referencing gluon-ts and copyright

The vast majority of this repo seems to be copy-pasted from gluon-ts. It is a bit problematic as this is not really stated clearly, for instance in the README.md (which is also copy-pasted from gluon-ts).

I assume there is no ill-intent, could you perhaps state clearly which files does not come from gluon-ts? Note also, that all files that have been copy-pasted should still hold the initial copyright according to Apache license (see https://opensource.stackexchange.com/questions/5528/removing-copyright-notice-in-uis-of-apache-2-licensed-software).

Thanks!

ERROR: Could not install packages due to an OSError:

Hello, I was trying to install pytorch-ts using pipe install pytorchts in the Anaconda environment, and got the following error message. Any hint or suggestions on how to fix it will be highly appreciated. I did not get this kind of permission error when using pip to install other packages.

Successfully built pytorchts subprocess32 pathtools
Installing collected packages: smmap, gitdb, subprocess32, shortuuid, sentry-sdk, promise, pathtools, GitPython, docker-pycreds, configparser, wandb, torch, pytorchts
  Attempting uninstall: torch
    Found existing installation: torch 1.6.0
    Uninstalling torch-1.6.0:
      Successfully uninstalled torch-1.6.0
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\users\\anaconda3\\envs\\jane_street_kaggle\\lib\\site-packages\\~orch\\lib\\asmjit.dll'
Consider using the `--user` option or check the permissions.

Potential bug in calculating `lags_for_fourier_time_features_from_frequency` function

Hello,

I was trying to understand the code execution and stumbled across a potential bug.
The lags_for_fourier_time_features_from_frequency() returns an incorrect result when you pass a minute level freq arg.

Example : lags_for_fourier_time_features_from_frequency(freq='10min') == [1] when it should return [1, 4, 12, 24, 48]

For other frequencies it works as expected -

  • lags_for_fourier_time_features_from_frequency(freq='1D') == [1, 7, 14]
  • lags_for_fourier_time_features_from_frequency(freq='10M') == [1, 12]

Could you please explain why max(lags) is added to self.history_length to increase context len in here ?

Can I turn off the frequency argument in the DeepAREstimator?

Hi,

I am working on some time series data that do not have a stable frequency. I have a datetime column of when X occurs, and this X can occur at any time after the time of the previous occurrence i.e Xt can be 1 minute after Xt-1 but even months after Xt-1.

Is there a way to adapt the frequency argument to be suitable for my case?

Thanks in advance!

TempFlowEstimator deserialize error

i train a tempFlow model,and then save(serialize) in a folder, but when i load(deserialize)the model,error append,just like

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
p = Predictor.deserialize(get_model_path('tempflow'))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 82, in deserialize
return tpe.deserialize(path, device)
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\model\predictor.py", line 172, in deserialize
transformation = load_json(fp.read())
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 173, in load_json
return decode(json.loads(s))
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 354, in decode
kwargs = decode(r["kwargs"]) if "kwargs" in r else {}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in decode
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 362, in
return {k: decode(v) for k, v in r.items()}
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in decode
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 368, in
return [decode(y) for y in r]
File "C:\ProgramData\Anaconda3\lib\site-packages\pts\core\serde.py", line 355, in decode
return cls(*args, **kwargs)
TypeError: init() got an unexpected keyword argument 'normalized'

Forecast Reconciliation

Description

It would be very useful to allow for forecast reconciliation of hierarchical and/or grouped time series. This means that the sum of all forecasts that make up a hierarchy matches to the forecast of the hierarchy. Say, you forecast several time series that are within the same hierarchy + the time series of the total (e.g., all tourism visits in Australia within all territories + Total Tourism of the territories as an aggregate). What forecast reconciliation does it makes sure that the bottom level forecasts match the top-level aggregate forecast. As PyTorch-TS is a probabilistic framework, we also need to make sure that the uncertainty attached to the forecasts are corrected.

Besides cross-sectional hierarchies, you may also want to include temporal hierarchies, so that you train the model on daily, weekly and monthly data, and you make sure that all sum up to the temporal hierarchy of interest, e.g., monthly forecast.

Several paper show that Cross-temporal coherent forecasts improve accuracy compared to not taking the information into account.

References

This is a non-exhaustive list of references intended to give a first overview over the topic:

Anomaly detection

Hi, would it be possible to use the code for anomaly detection ? I'm interested in applying the conditional normalizing flows model for this in the framework of detecting gravitational waves and compare with the performance of other models like VAE + LSTM, GANs, etc ...

Thanks !!

Cannot import because of tensorboard error, yet tensorboard is installed

I have a fresh install of pytorchts in it's own environment on my Ubuntu 18 machine, but am unable to import it because of a tensorboard logging issue despite the fact that I have tensorboard installed.

Tensorboard, PyTorch, and PyTorchTS versions

(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "board"
tensorboard               2.2.1              pyh532a8cf_0
tensorboard-plugin-wit    1.6.0                      py_0
(timeseries) amruch@wit:~/graphika/SBIR_COVID$ conda list | grep "torch"
pytorch                   1.6.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
pytorchts                 0.2.0                    pypi_0    pypi
torchvision               0.7.0                py36_cu101    pytorch

Error thrown when importing various pts modules:

(timeseries) amruch@wit:~/graphika/SBIR_COVID$ python
Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pts.dataset import ListDataset
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
    from .trainer import Trainer
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
    raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above
>>> from pts.model.deepar import DeepAREstimator
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/__init__.py", line 6, in <module>
    from .trainer import Trainer
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/pts/trainer.py", line 7, in <module>
    from torch.utils.tensorboard import SummaryWriter
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.6/site-packages/torch/utils/tensorboard/__init__.py", line 4, in <module>
    raise ImportError('TensorBoard logging requires TensorBoard version 1.15 or above')
ImportError: TensorBoard logging requires TensorBoard version 1.15 or above```

It seems to me that preventing any kind of import due to tensorboard is a bit overkill. Why not just throw a warning that progress/results cannot be logged?

Add early stopping to model training

Currently the loss function in model training only measures training loss per epoch. Add a feature to include validation loss as an early stopping criteria.

Scalable Representation Learning for Multivariate Time Series

Summary

It would be great to add Scalable Representation Learning for Multivariate Time Series as an additional feature input to models implemented in to Pytorch-TS.

Potential Benefits

Both univariate and multivariate models can potentially benefit from this approach, as it helps to bring more information to the model for prediction tasks. Also, the embeddings can be used for classification to better understand the data at hand.

Description

The basic idea is to learn embeddings of time series from which similarities of the time series can be derived. The objective is to ensure that similar time series obtain similar representations that can be used as an input for modelling. As for image embeddings, the learned representations may also be used to define a meaningful measure between time series, e.g., comparing time series using a distance measure between their representations with dimensionality reduction and/or clustering.

The criterion to select pairs of similar time series follows word2vec’s intuition. For word embeddings, the representation of the context of a word should probably be, on one hand, close to the one of this word, and, on the other hand, distant from the one of randomly chosen words, since they are probably unrelated to the original word’s context. The corresponding loss then pushes pairs of (context, word) and (context, random word) to be linearly separable. This is called negative sampling. It can visualized as follows:

image

The loss minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity.

To adapt this principle to time series, one can consider a random subseries x_ref of a given time series y_i. Then, on one hand, the representation of x_ref should be close to the one of any of its subseries x_pos (a positive example). On the other hand, if one considers another subseries x_neg (a negative example) chosen at random (in a different random time series y_j if several series are available, or in the same time series if it is long enough and not stationary), then its representation should be distant from the one of x_ref. Following the analogy with word2vec, x_pos corresponds to a word, x_ref to its context, and x_neg to a random word. To improve the stability and convergence of the training procedure as well as the experimental results of the learned representations, once can introduce, as in word2vec, several negative samples (x_neg_k) chosen independently at random.

image

The loss pushes the computed representations to distinguish between x_ref and x_neg, and to assimilate x_ref and x_pos. Overall, the training procedure consists in traveling through the training dataset for several epochs (possibly using mini-batches), picking tuples x_ref , x_pos ,(x_neg_k ) at random and performing a minimization step on the corresponding loss for each pair, until training ends.

Some Initial Comments

The approach needs to be a two step procedure:

  • We need to learn the embeddings first, i.e., train an embedding model
  • Once the embeddings are learned, we can incorporate them as a feat_static_real into any of the available model implementations, i.e., DeepAR, DeepVAR, TransformerTempFlowEstimator, etc.
  • We should also output the embeddings as they are useful in their own right

References

Unsupervised Scalable Representation Learning for Multivariate Time Series:

README example is failing with "RuntimeError: input.size(-1) must be equal to input_size"

Hi, I'm executing the following code from the README and

import pandas as pd
import matplotlib.pyplot as plt

import torch
print(torch.__version__)

import gluonts
from gluonts.dataset.common import ListDataset
from gluonts.dataset.util import to_pandas

import pts
from pts.model.deepar import DeepAREstimator
from pts import Trainer

print(pts.__version__, gluonts.__version__)

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()


training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)


device = "cpu"
estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data, num_workers=4)

and got the following error:

1.9.0
0.0.0-unknown 0.8.0

    203                     expected_input_dim, input.dim()))
    204         if self.input_size != input.size(-1):
--> 205             raise RuntimeError(
    206                 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
    207                     self.input_size, input.size(-1)))

RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19

Version:

pip list | grep pytorchts
pytorchts                     0.5.1

Any suggestions ?
Thanks

when i try to your Quick start,i have some trouble

BrokenPipeError Traceback (most recent call last)

in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data)

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train(self, training_data)
146
147 def train(self, training_data: Dataset) -> Predictor:
--> 148 return self.train_model(training_data).predictor

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
134 net=trained_net,
135 input_names=get_module_forward_input_names(trained_net),
--> 136 data_loader=training_data_loader,
137 )
138

D:\software\anaconda\envs\tensorflow\lib\site-packages\pts\trainer.py in call(self, net, input_names, data_loader)
46
47 with tqdm(data_loader) as it:
---> 48 for batch_no, data_entry in enumerate(it, start=1):
49 optimizer.zero_grad()
50 inputs = [data_entry[k].to(self.device) for k in input_names]

D:\software\anaconda\envs\tensorflow\lib\site-packages\tqdm\std.py in iter(self)
1163
1164 try:
-> 1165 for obj in iterable:
1166 yield obj
1167 # Update and possibly print the progressbar.

D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
289 return _SingleProcessDataLoaderIter(self)
290 else:
--> 291 return _MultiProcessingDataLoaderIter(self)
292
293 @Property

D:\software\anaconda\envs\tensorflow\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
735 # before it starts, and del tries to join but will get:
736 # AssertionError: can only join a started process.
--> 737 w.start()
738 self._index_queues.append(index_queue)
739 self._workers.append(w)

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

D:\software\anaconda\envs\tensorflow\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

How to plot forecasts of multivariate time series

I'd like to plot the predictions of the TempFlowEstimator on a multivariate time series dataset, similar to what is done in the README of this repository.

When I make the forecasts as follows:

from pts.evaluation import make_evaluation_predictions
from pts.evaluation import MultivariateEvaluator
import numpy as np

evaluator = MultivariateEvaluator(quantiles=(np.arange(20)/20.0)[1:],
                                  target_agg_funcs={'sum': np.sum})

forecast_it, ts_it = make_evaluation_predictions(dataset=dataset_test,
                                             predictor=predictor,
                                             num_samples=100)
forecasts = list(forecast_it)
targets = list(ts_it)

targets[0] is a Pandas dataframe containing the true values for each of the (in my case 12) time series for all time steps. forecasts[0] is a SampleForecast object whose samples is a Numpy array of shape (100, 365, 12). This means that we have 100 samples for each of the 365 time steps of the test set, for each of the 12 time series.

However, how can I plot the samples of the first time series for example? I tried to set the samples of the forecasts[0] object to the samples of the first series (i.e. forecasts[0].samples = forecasts[0].samples[:,:,0]), but when I call the plot function on that I get


Exception                                 Traceback (most recent call last)
<ipython-input-83-b629071ff750> in <module>()
----> 1 samples_first_time_series.plot()

2 frames
/usr/local/lib/python3.6/dist-packages/pts/model/forecast.py in plot(self, prediction_intervals, show_mean, color, label, output_file, *args, **kwargs)
    132 
    133         p50_data = ps_data[i_p50]
--> 134         p50_series = pd.Series(data=p50_data, index=self.index)
    135         p50_series.plot(color=color, ls="-", label=f"{label_prefix}median")
    136 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

Readme example not working

Description

Many thanks to the authors for making the implementation available! Great initiative.

I am trying to run the README.md example, but it is not working.

Code Snippet

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

estimator = DeepAREstimator(freq="5min",
                            prediction_length=12,
                            input_size=43,
                            trainer=Trainer(epochs=10,
                                            device=device))
predictor = estimator.train(training_data=training_data)

Error

Running the code in readme until the snippet works just fine. When I run estimator.train(training_data=training_data), the above snipped throws the following error:

0it [00:02, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-57e06c572bad> in <module>
      6                             trainer=Trainer(epochs=10,
      7                                             device=device))
----> 8 predictor = estimator.train(training_data=training_data)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train(self, training_data)
    133 
    134     def train(self, training_data: Dataset) -> Predictor:
--> 135         return self.train_model(training_data).predictor

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\estimator.py in train_model(self, training_data)
    118         trained_net = self.create_training_network(self.trainer.device)
    119 
--> 120         self.trainer(
    121             net=trained_net,
    122             input_names=get_module_forward_input_names(trained_net),

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\trainer.py in __call__(self, net, input_names, data_loader)
     50                     inputs = [data_entry[k].to(self.device) for k in input_names]
     51 
---> 52                     output = net(*inputs)
     53                     if isinstance(output, (list, tuple)):
     54                         loss = output[0]

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
    244         future_observed_values: torch.Tensor,
    245     ) -> torch.Tensor:
--> 246         distr = self.distribution(
    247             feat_static_cat=feat_static_cat,
    248             feat_static_real=feat_static_real,

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
    219         future_observed_values: torch.Tensor,
    220     ) -> Distribution:
--> 221         rnn_outputs, _, scale, _ = self.unroll_encoder(
    222             feat_static_cat=feat_static_cat,
    223             feat_static_real=feat_static_real,

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
    166 
    167         # (batch_size, num_features)
--> 168         embedded_cat = self.embedder(feat_static_cat)
    169 
    170         # in addition to embedding features, use the log scale as it can help

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in forward(self, features)
     28 
     29         return torch.cat(
---> 30             [
     31                 embed(cat_feature_slice.squeeze(-1))
     32                 for embed, cat_feature_slice in zip(

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\pts\modules\feature.py in <listcomp>(.0)
     29         return torch.cat(
     30             [
---> 31                 embed(cat_feature_slice.squeeze(-1))
     32                 for embed, cat_feature_slice in zip(
     33                     self.__embedders, cat_feature_slices

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
    110 
    111     def forward(self, input):
--> 112         return F.embedding(
    113             input, self.weight, self.padding_idx, self.max_norm,
    114             self.norm_type, self.scale_grad_by_freq, self.sparse)

C:\ProgramData\Anaconda3\envs\pytorchts\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1482         # remove once script supports set_grad_enabled
   1483         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1485 
   1486 

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Environment

  • Operating system: Windows 10
  • Python version: 3.8.2
  • Pytorch: 1.4.0
  • Torchvision: 0.5.0
  • Cudatoolkit:10.0

Multivariate question

Hello im trying to train using a simple detaframe made of {timestamp, a, b} in which a on time t is 0, 1, 2, 3 and b on time t + 1 is =a, it should be able to predict b simply by a.

but i cant understand how MultivariateGrouper works and what should be the "max_target_dim"
also why in the example MultivariateEvaluator was used "quantiles=(np.arange(20)/20.0)[1:]"?

and is the "target_dim" in TempFlowEstimator the last one because that is what we want to predict or no?

Matrix multiplication error during training TFT

Environment Details

  • Python version: 3.7.10
  • Operating System: Google Colab web platform

Error Description

I get a matrix multiplication error during training TFT

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4608x2 and 1x32)

Steps to reproduce

  1. Open Colab https://colab.research.google.com/
  2. Run the code
!pip install pytorchts -q
!curl https://forecasters.org/data/m3comp/M3C.xls --create-dirs -o /root/.mxnet/gluon-ts/datasets/M3C.xls 

from gluonts.dataset.repository.datasets import get_dataset
from pts.model.tft import TemporalFusionTransformerEstimator
from pts import Trainer

dataset = get_dataset("m3_monthly", regenerate=False)

estimator = TemporalFusionTransformerEstimator(
    freq=dataset.metadata.freq,
    prediction_length=dataset.metadata.prediction_length,
    context_length=dataset.metadata.prediction_length,
    dropout_rate=0.1,
    num_outputs=15,
    trainer=Trainer(device='cpu',
                    epochs=20,
                    learning_rate=1e-3,
                    num_batches_per_epoch=100,
                    batch_size=128))

predictor = estimator.train(dataset.train)

Working Example Notebooks

Description

Referring to #6 (comment), it would be good to have Notebooks that illustrate the usage of each estimator available in PyTorch-TS. That would greatly facilitate getting hands-on to the package.

TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'

Despite assuring I am pandas 1.0.5 (cf. awslabs/gluonts#958), I am still getting a TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day' error when running the following:

# Define DL Time Series Model
estimator = DeepAREstimator(
    freq = FREQ,
    prediction_length = 1, #predict 1 day ahead
    input_size = 32,
    trainer = Trainer(
        epochs = 100,
        device = DEVICE
    )
predictor = estimator.train(training_data=training_data)

Which returned

0it [00:00, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-b7c68ebabaa3> in <module>
----> 1 predictor = estimator.train(training_data=training_data)

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train(self, training_data)
    146 
    147     def train(self, training_data: Dataset) -> Predictor:
--> 148         return self.train_model(training_data).predictor

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/model/estimator.py in train_model(self, training_data)
    131         trained_net = self.create_training_network(self.trainer.device)
    132 
--> 133         self.trainer(
    134             net=trained_net,
    135             input_names=get_module_forward_input_names(trained_net),

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/trainer.py in __call__(self, net, input_names, data_loader)
     46 
     47             with tqdm(data_loader) as it:
---> 48                 for batch_no, data_entry in enumerate(it, start=1):
     49                     optimizer.zero_grad()
     50                     inputs = [data_entry[k].to(self.device) for k in input_names]

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/tqdm/std.py in __iter__(self)
   1128 
   1129         try:
-> 1130             for obj in iterable:
   1131                 yield obj
   1132                 # Update and possibly print the progressbar.

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    361 
    362     def __next__(self):
--> 363         data = self._next_data()
    364         self._num_yielded += 1
    365         if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    987             else:
    988                 del self._task_info[idx]
--> 989                 return self._process_data(data)
    990 
    991     def _try_put_index(self):

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1012         self._try_put_index()
   1013         if isinstance(data, ExceptionWrapper):
-> 1014             data.reraise()
   1015         return data
   1016 

~/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
    393             # (https://bugs.python.org/issue2651), so we work around it.
    394             msg = KeyErrorMessage(msg)
--> 395         raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
    data.append(next(self.dataset_iter))
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/dataset/transformed_iterable_dataset.py", line 39, in __iter__
    data_entry = next(self._cur_iter)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 128, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 81, in __call__
    for data_entry in data_it:
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 85, in __call__
    raise e
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/transform.py", line 83, in __call__
    yield self.map_transform(data_entry.copy(), is_train)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 195, in map_transform
    self._update_cache(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/feature.py", line 169, in _update_cache
    end = shift_timestamp(start, length)
  File "/home/amruch/anaconda3/envs/timeseries/lib/python3.8/site-packages/pts/transform/split.py", line 33, in shift_timestamp
    return _shift_timestamp_helper(ts, ts.freq, offset)
TypeError: unhashable type: 'pandas._libs.tslibs.offsets.Day'

Where the following preceded that code:

# Print Timestamp Statistics
earliest_time = min(example_ny_df.index)
latest_time = max(example_ny_df.index)
time_range_full = (max(example_ny_df.index) - min(example_ny_df.index)).days

# Determine Cut-point for 80/20 Training/Testing Splits
TRAININGSPLIT = 0.8
time_range_split = int(time_range_full * TRAININGSPLIT)
time_split = min(example_ny_df.index) + datetime.timedelta(days=time_range_split)

# Create Training Split / Predictor Object
FREQ = "1D"
training_data = ListDataset(
    [{"start": earliest_time, "target": example_ny_df.positiveIncrease[:time_split]}],
    freq = FREQ
)

# Setup GPU, if Exists
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Processing device:", DEVICE)

I'm going to try to redo this from a 100% clean install without even trying the GPU version of torch as mentioned #22

Where to place m5 data?

When I want to use

from pts.dataset.repository import get_dataset

dataset = get_dataset("m5", regenerate=False)

I get the warning that the files from Kaggle are not present in the directory:
RuntimeError: M5 data is available on Kaggle (https://www.kaggle.com/c/m5-forecasting-accuracy/data). You first need to agree to the terms of the competition before being able to download the data. After you have done that, please copy the files into /root/.pytorch/pytorch-ts/datasets/m5.

However, I have no idea where to put these files. I'm working in Google Colab. The root directory is called "content". Should I make a ./pytorch/pytorch-ts/datasets/m5 directory myself?

how to customize dataset?

It's a great work! And I want to apply the model to predict air quality such as PM10 and PM2.5 based on temperature, wind speed and direction and so on. Can this model directly accept pytorch's dataloader object? If not, how to customize dataset?

Report errors when using pip3

ERROR: Could not find a version that satisfies the requirement pytorchts (from versions: none)
ERROR: No matching distribution found for pytorchts

How to set input_size

Description

What does the input_size argument in DeepAREstimator or TransformerTempFlowEstimator stand for and how to properly set meaningful values for each of them? Would it be possible to derive the values directly from the input data?

Working Example of TransformerTempFlowEstimator

I have been trying to get TransformerTempFlowEstimator working without success.
Can you provide an example script? Issues include RuntimeError: Sizes of tensors must match except in dimension 2. Got 1 and 32 in dimension 0 and not understanding how the data loading works for multivariate data.
My example below:

from pts.dataset import MultivariateGrouper
import pandas as pd
import torch

from pts.dataset import ListDataset
from pts.model.transformer_tempflow import TransformerTempFlowEstimator
from pts import Trainer

url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)

train_ds = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]+i}
        for i in range(2)],
    freq="5min"
)

grouper_train = MultivariateGrouper(max_target_dim=2)
gt = grouper_train(train_ds)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

trainer = Trainer(epochs=10)

estimator = TransformerTempFlowEstimator(input_size=1,
                                         freq="5min",
                                         prediction_length=100,
                                         context_length=4,
                                         target_dim=64,
                                         cardinality=[7, 20],
                                         trainer=trainer)

predictor = estimator.train(training_data=gt)

Inserting Target features

Hello,

As far as I know, in order to include a dynamic feature using deepar-estimator, feat_dynamic_real size should be target size + prediction length size.

Example - 3 days of prediction
{"target: [10, 5, 0, 0, 0], "feat_dynamic_real": [2, 30, 3, 1, 6, 2, 10, 8]}

If I want to use the same features as this kaggle solution`, how should I include:

**Sale values:**

Lag 1 value
Moving average of 7, 28 days
Continuous zero-sale days until today

Lag 1 value I just use lags_seq parameter lags_seq=[1]

However Moving average and Continuous zero-sale days until today will have the same target`s array size. How should I include it ?

Getting error in embedding

Hi, while trying to reproduce the simple example from the "quick start" section, I keep getting the follow error message:

`Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/binance_conda/NEW/test-123.py", line 73, in <module>
    predictor = estimator.train(training_data=training_data)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 148, in train
    return self.train_model(training_data).predictor
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\estimator.py", line 133, in train_model
    self.trainer(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\trainer.py", line 52, in __call__
    output = net(*inputs)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 246, in forward
    distr = self.distribution(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 221, in distribution
    rnn_outputs, _, scale, _ = self.unroll_encoder(
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\model\deepar\deepar_network.py", line 168, in unroll_encoder
    embedded_cat = self.embedder(feat_static_cat)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 30, in forward
    [
  File "C:\Users\User\AppData\Roaming\Python\Python38\site-packages\pts\modules\feature.py", line 31, in <listcomp>
    embed(cat_feature_slice.squeeze(-1))
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\modules\sparse.py", line 124, in forward
    return F.embedding(
  File "C:\ProgramData\Miniconda3\envs\binance_conda\lib\site-packages\torch\nn\functional.py", line 1852, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
`

Issue running example on Windows pc

Hey,

I ran into an issue while testing your example code. I use a windows pc with cpu. Latest version of torch is installed 1.7.1.

Any idea what could resolve the issue?

Thanks,
Pieter


RuntimeError Traceback (most recent call last)
in
6 trainer=Trainer(epochs=10,
7 device=device))
----> 8 predictor = estimator.train(training_data=training_data, num_workers=2)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
171 shuffle_buffer_length=shuffle_buffer_length,
172 cache_data=cache_data,
--> 173 **kwargs,
174 ).predictor

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
143 net=trained_net,
144 train_iter=training_data_loader,
--> 145 validation_iter=validation_data_loader,
146 )
147

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\trainer.py in call(self, net, train_iter, validation_iter)
68 inputs = [v.to(self.device) for v in data_entry.values()]
69
---> 70 output = net(*inputs)
71 if isinstance(output, (list, tuple)):
72 loss = output[0]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\model\deepar\deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
166
167 # (batch_size, num_features)
--> 168 embedded_cat = self.embedder(feat_static_cat)
169
170 # in addition to embedding features, use the log scale as it can help

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in forward(self, features)
35 embed(cat_feature_slice.squeeze(-1))
36 for embed, cat_feature_slice in zip(
---> 37 self.__embedders, cat_feature_slices
38 )
39 ],

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pts\modules\feature.py in (.0)
34 [
35 embed(cat_feature_slice.squeeze(-1))
---> 36 for embed, cat_feature_slice in zip(
37 self.__embedders, cat_feature_slices
38 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1850 # remove once script supports set_grad_enabled
1851 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1852 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1853
1854

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)

have a problem on ubuntu example

dataset:url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
example=https://github.com/zalandoresearch/pytorch-ts


RuntimeError Traceback (most recent call last)

in
7 device=device))
8 # predictor = estimator.train(training_data=training_data, num_workers=4)
----> 9 predictor = estimator.train(training_data=training_data, num_workers=1)
10

~/Documents/pytorch-ts/pts/model/estimator.py in train(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
179 shuffle_buffer_length=shuffle_buffer_length,
180 cache_data=cache_data,
--> 181 **kwargs,
182 ).predictor

~/Documents/pytorch-ts/pts/model/estimator.py in train_model(self, training_data, validation_data, num_workers, prefetch_factor, shuffle_buffer_length, cache_data, **kwargs)
147 net=trained_net,
148 train_iter=training_data_loader,
--> 149 validation_iter=validation_data_loader,
150 )
151

~/Documents/pytorch-ts/pts/trainer.py in call(self, net, train_iter, validation_iter)
70
71 inputs = [v.to(self.device) for v in data_entry.values()]
---> 72 output = net(*inputs)
73
74 if isinstance(output, (list, tuple)):

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in forward(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
252 future_time_feat=future_time_feat,
253 future_target=future_target,
--> 254 future_observed_values=future_observed_values,
255 )
256

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in distribution(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target, future_observed_values)
226 past_observed_values=past_observed_values,
227 future_time_feat=future_time_feat,
--> 228 future_target=future_target,
229 )
230

~/Documents/pytorch-ts/pts/model/deepar/deepar_network.py in unroll_encoder(self, feat_static_cat, feat_static_real, past_time_feat, past_target, past_observed_values, future_time_feat, future_target)
198
199 # unroll encoder
--> 200 outputs, state = self.rnn(inputs)
201
202 # outputs: (batch_size, seq_len, num_cells)

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
657 hx = self.permute_hidden(hx, sorted_indices)
658
--> 659 self.check_forward_args(input, hx, batch_sizes)
660 if batch_sizes is None:
661 result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
603 # See torch/nn/modules/module.py::_forward_unimplemented
604 def check_forward_args(self, input: Tensor, hidden: Tuple[Tensor, Tensor], batch_sizes: Optional[Tensor]): # type: ignore
--> 605 self.check_input(input, batch_sizes)
606 self.check_hidden_size(hidden[0], self.get_expected_hidden_size(input, batch_sizes),
607 'Expected hidden[0] size {}, got {}')

~/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
202 raise RuntimeError(
203 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 204 self.input_size, input.size(-1)))
205
206 def get_expected_hidden_size(self, input: Tensor, batch_sizes: Optional[Tensor]) -> Tuple[int, int, int]:

RuntimeError: input.size(-1) must be equal to input_size. Expected 43, got 19

data format - from pandas to gluonts data format for both uni-variate and multivariate data

Hi @kashif

I hope your well!
I am just trying to get familiar with the library and I generally work with the pandas data format.
Could you help me format my data and create a basic uni-variate model for prediction into the future.

import numpy as np

import pandas as pd
import yfinance as yf
data = yf.download("SPY", start="2012-01-01", end="2017-04-30")['Adj Close']
data=pd.DataFrame(data)
data

from gluonts.dataset.common import ListDataset
training_data = ListDataset(
    [{"start": data.index[0], "target": data.values[:1300]}],
    freq = "1D"
)

from gluonts.dataset.common import ListDataset
test_data = ListDataset(
    [{"start": data.index[1301], "target": data.values[1301:1339]}],
    freq = "1D"
)


from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=38,
    context_length=100,
    freq='1D',
    trainer=Trainer(ctx="cpu", 
                    epochs=5, 
                    learning_rate=1e-3, 
                    num_batches_per_epoch=100
                   )
)
predictor = estimator.train(training_data=training_data)

Error: wrong Input size

Hello, I’m trying the example in the readme with a very simple univariate of about 1300 datapoint.
It is made of just the data (y:m:d) and the values.
I keep getting errors like input size got is 37 != expected (I tried using the length of the data frame) , but I can’t understand what is that 37.

How was the input = 43 of the example calculated?

unable to reproduce results from notebook

I am unable to reproduce results from TimeGrad Notebook. I am getting diverging loss into NaN loss.

predictor = estimator.train(dataset_train, num_workers=8)

99it [00:22, 4.39it/s, avg_epoch_loss=0.945, epoch=0]
99it [00:22, 4.40it/s, avg_epoch_loss=0.495, epoch=1]
99it [00:22, 4.39it/s, avg_epoch_loss=0.466, epoch=2]
99it [00:22, 4.35it/s, avg_epoch_loss=0.795, epoch=3]
99it [00:22, 4.33it/s, avg_epoch_loss=0.852, epoch=4]
99it [00:22, 4.32it/s, avg_epoch_loss=nan, epoch=5]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=6]
99it [00:22, 4.30it/s, avg_epoch_loss=nan, epoch=7]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=8]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=9]
99it [00:23, 4.29it/s, avg_epoch_loss=nan, epoch=10]
99it [00:23, 4.28it/s, avg_epoch_loss=nan, epoch=11]
99it [00:22, 4.33it/s, avg_epoch_loss=nan, epoch=12]
99it [00:23, 4.21it/s, avg_epoch_loss=nan, epoch=13]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=14]
99it [00:23, 4.30it/s, avg_epoch_loss=nan, epoch=15]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=16]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=17]
99it [00:22, 4.34it/s, avg_epoch_loss=nan, epoch=18]
99it [00:23, 4.20it/s, avg_epoch_loss=nan, epoch=19]

How to incorporate covariate information using TransformerTempFlowEstimator

Description

I am currently using the Australian retail trade turnover data set to get familiar with PyTorch-TS in general and with TransformerTempFlowEstimator in particular. The data looks as follows:

aus_retail_df

Each series (133 series in total) has 417 months of training observations and is uniquely identified using two keys:

  • State: The Australian state (or territory)
  • Industry: The industry of retail trade

All series show quite some positive dependencies, as the correlation matrix shows:

cor_matr

As such, TransformerTempFlowEstimator seems to be a good option. I want to make use of both State and Industry as covariates in the model. For each categorical covariate, a generalized linear mixed model is fit to the outcome and the coefficients are returned as the encodings. The cardinality of State and Industry is [7, 20]. After bringing the data into the right format, I create the train data as follows:

train_ds = ListDataset([{FieldName.TARGET: target, 
                         FieldName.START: start,
                         FieldName.ITEM_ID: item_id,
                         FieldName.FEAT_DYNAMIC_REAL: feat_dynamic_real,
                         FieldName.FEAT_STATIC_REAL: feat_static_real,
                         FieldName.FEAT_TIME: time_feat
                        } 
                        for (target, 
                             start, 
                             item_id, 
                             feat_dynamic_real, 
                             feat_static_real, 
                             time_feat
                            ) in zip(target_train,
                                     start_train,
                                     item_id_train,
                                     feat_dynamic_real_train,
                                     feat_static_real_train,
                                     time_feat_train
                                    )],
                      freq = "1M") 

feat_static_real_train contain the embeddings and time_feat_train the month information. To transform the data into a multivariate data set, I use

grouper_train = MultivariateGrouper(max_target_dim = 133) # as there are 133 unique series 
train_ds = grouper_train(train_ds)

However, after using grouper_train(train_ds), none of the covariate information is included anymore. To bring them back, I use

train_ds.list_data[0]["feat_dynamic_real"] = feat_dynamic_real_train
train_ds.list_data[0]["feat_static_real"] = feat_static_real_train

I then train the model as follows:

np.random.seed(123)
torch.manual_seed(123)
trainer = Trainer( epochs = 40) 

estimator = TransformerTempFlowEstimator(input_size = 401,
                                         freq = "1M", 
                                         prediction_length = 24,
                                         context_length = 48,
                                         target_dim = 133,
                                         cardinality = [7, 20],
                                         trainer = trainer)                              
predictor = estimator.train(training_data = train_ds)

The model summary is

predictor.__dict__["prediction_net"]*
pts.model.transformer_tempflow.transformer_tempflow_network.TransformerTempFlowPredictionNetwork(act_type="gelu", cardinality=[7, 20], conditioning_length=200, context_length=48, d_model=32, dequantize=False, dim_feedforward_scale=4, dropout_rate=0.1, embedding_dimension=5, flow_type="RealNVP", hidden_size=100, history_length=60, input_size=401, lags_seq=[1, 12], n_blocks=3, n_hidden=2, num_decoder_layers=3, num_encoder_layers=3, num_heads=8, prediction_length=24, scaling=True, target_dim=133)

I also compared the forecast to some competing models, even though I am not sure that all models are correctly specified (i.e., covariate information, no parameter tuning).

model_comp

Given the strong dependencies between the different series, I would suspect that TransformerTempFlowEstimator should outperform models that treat the series as being independent.

Question

Based on the above summary, I have the following questions concerning the proper use of TransformerTempFlowEstimator:

  • How can covariates be included, in particular categorical information.
  • Does the model automatically include, e.g., month and/or age information that it itself derives from the data or do we need to pass it using time_features in the function call.
  • Does the model automatically derive holiday information from the data, or do we need to derive it ourselves as described here.
  • Does the model automatically select an appropriate lag-structure from the data, or do we need to derive it ourselves as described here.
  • Which of the following field names are currently supported:
 "FieldName.START = 'start'",
 "FieldName.TARGET = 'target'",
 "FieldName.FEAT_STATIC_CAT = 'feat_static_cat'",
 "FieldName.FEAT_STATIC_REAL = 'feat_static_real'",
 "FieldName.FEAT_DYNAMIC_CAT = 'feat_dynamic_cat'",
 "FieldName.FEAT_DYNAMIC_REAL = 'feat_dynamic_real'",
 "FieldName.FEAT_TIME = 'time_feat'",
 "FieldName.FEAT_CONST = 'feat_dynamic_const'",
 "FieldName.FEAT_AGE = 'feat_dynamic_age'",
 "FieldName.OBSERVED_VALUES = 'observed_values'",
 "FieldName.IS_PAD = 'is_pad'",
 "FieldName.FORECAST_START = 'forecast_start'"]

Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'

While trying to forecast predictions and plot the confidence intervals, I received the following error: Cannot cast array data from dtype('<M8[ns]') to dtype('float64') according to the rule 'safe'. I'm not sure what's going on, as in my code I also transformed the index datetime dtype into a float manually. Is this related to some other error in my code and/or setup? My notebook is available here: https://drive.google.com/file/d/1B7kmDmdqY-zYFscL-LyV2nTJd_GGfAJ1/view?usp=sharing. The code uses public data and loads it via a request call, so you should be able to run it as is. There are no external dependencies beyond those used in the tutorial.

This is especially odd to me because I confirmed that the datetime type (datetime64[ns]) is the same for my dataset as for the dataset used in the example before I transform it to a float (it fails either way):

My data

>>> # Assess Index dtype
>>> daily_covid_df.index
DatetimeIndex(['2020-01-22', '2020-01-22', '2020-01-23', '2020-01-23',
               '2020-01-24', '2020-01-24', '2020-01-25', '2020-01-25',
               '2020-01-26', '2020-01-26',
               ...
               '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
               '2020-09-11', '2020-09-11', '2020-09-11', '2020-09-11',
               '2020-09-11', '2020-09-11'],
              dtype='datetime64[ns]', name='date', length=10738, freq=None)

Your data

>>> import pandas as pd
>>> url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
>>> df = pd.read_csv(url, header=0, index_col=0, parse_dates=True)
>>> df.index
DatetimeIndex(['2015-02-26 21:42:53', '2015-02-26 21:47:53',
               '2015-02-26 21:52:53', '2015-02-26 21:57:53',
               '2015-02-26 22:02:53', '2015-02-26 22:07:53',
               '2015-02-26 22:12:53', '2015-02-26 22:17:53',
               '2015-02-26 22:22:53', '2015-02-26 22:27:53',
               ...
               '2015-04-22 20:07:53', '2015-04-22 20:12:53',
               '2015-04-22 20:17:53', '2015-04-22 20:22:53',
               '2015-04-22 20:27:53', '2015-04-22 20:32:53',
               '2015-04-22 20:37:53', '2015-04-22 20:42:53',
               '2015-04-22 20:47:53', '2015-04-22 20:52:53'],
              dtype='datetime64[ns]', name='timestamp', length=15831, freq=None)

Also, I'm not sure if I understand the logic of what is done when the test_data object is created.

# Create Test Split
test_data = ListDataset(
    [{"start": example_ny_df.index[0], "target": example_ny_df.positiveIncrease}],
    freq = FREQ
)

I would have presumed that because the start is the earliest date in the dataset and the target is the full data up through and to last date, that the plotting code for

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

would plot predictions from the start date to the end date; however, in my code it plots from mid-August to the first week of September. Is this because of the -60 in the to_pandas(test_entry)[-60] part?

Is there documentation available that explains these functions a bit more or should I just reference the code itself?

Thanks for your time and attention!

jit compiled error with DeepAR prediction net

I want to compile a trained DeepAR prediction net with

net = predictor.prediction_net scripted_module = torch.jit.script(net)

and got the following error:
Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults

Continue training after making predictor

I would like to perform time series cross validation on a fairly large dataset.
To see if the results are consistent over multiple folds.

The way I approach this is to stop training at certain points in time, then predict the next 2 weeks of my data. After this I can continue training (see figure below.)
image

Looking at the source code pts/model/estimator.py
I can see a function train_model :

def train_model(

Which outputs a trained neural network

The other function I use now train:

Creates a predictor object

I'm not sure how to combine these functions to achieve the desired result,
anyone has experience in using this?

Issue with fourier time-series features at weekly frequency

Hey!

My pandas version is 1.1.0. In pts/feature/fourier_date_feature.py on line 52, pandas.tseries.frequencies.to_offset is used to normalize frequency, but when the freq_str parameter is W this function produces the following:

offset = to_offset('W')
multiple, granularity = offset.n, offset.name

print(granularity)
# this prints 'W-SUN' which is equivalent to 'W'

Because of the assertion on line 66, W-SUN and thus the initial W is not accepted. Changing W to W-SUN in the features dictionary (or adding both) had fixed this issue for me.

Thanks for looking into this.

Where to find documentation?

I'm new to this project. Aside from the README, which doesn't cover the entire API surface (e.g. Temporal fusion transformer), how are we supposed to discover the API? Is there more detailed API documentation somewhere?

Thank you.

get_dataset() function failing

get_dataset() function failing with call

dataset = get_dataset("pts_m5", regenerate=False)

Fails with error message,

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-3674bc0c6fce> in <module>
----> 1 dataset = get_dataset("pts_m5", regenerate=False)

~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in get_dataset(dataset_name, path, regenerate)
    189         dataset obtained by either downloading or reloading from local file.
    190     """
--> 191     dataset_path = materialize_dataset(dataset_name, path, regenerate)
    192 
    193     return load_datasets(

~/code/geo-deep-forecast/.geo-deep-env3.8-v5/lib/python3.8/site-packages/gluonts/dataset/repository/datasets.py in materialize_dataset(dataset_name, path, regenerate)
    142         the path where the dataset is materialized
    143     """
--> 144     assert dataset_name in dataset_recipes.keys(), (
    145         f"{dataset_name} is not present, please choose one from "
    146         f"{dataset_recipes.keys()}."

AssertionError: pts_m5 is not present, please choose one from odict_keys(['constant', 'exchange_rate', 'solar-energy', 'electricity', 'traffic', 'exchange_rate_nips', 'electricity_nips', 'traffic_nips', 'solar_nips', 'wiki-rolling_nips', 'taxi_30min', 'm3_monthly', 'm3_quarterly', 'm3_yearly', 'm3_other', 'm4_hourly', 'm4_daily', 'm4_weekly', 'm4_monthly', 'm4_quarterly', 'm4_yearly', 'm5']).

Relation to gluonts

First of all, thanks a lot for the interesting paper and for open-sourcing the corresponding model!

I was wondering about the precise relation of this project to gluonts. In the readme you're saying that this project uses gluonts for data loading, transformations etc., but looking at the source code, it seems like you essentially did a port of the existing gluonts code to pytorch? So in that sense you're using the gluonts API and if I have some function (like a transform) coded for gluonts, chances are that it is compatible with this project due to python's duck typing?
Is this the correct understanding?

Multivariate Target Dim errors

Hello, im trying to use the dataset here https://github.com/smallGum/MLCNN-Multivariate-Time-Series/blob/master/data/nasdaq100_padding.csv to train TransformerTempFlowEstimator but i keep getting error related to the target_dim, here im using only 2 columns:

df = pd.read_csv ("./data/nasdaq100_padding.csv")

leng = len(df.NDX)

train = int(leng/2)
test = int(leng/2)
prediction_length = 15

training_data1 = ListDataset(
    [{"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AAPL[:train]},
     {"start":pd.Timestamp(2017, 1, 1, 12) , "target":df.AMZN[:train]}
    ],
    one_dim_target=False,
    freq = "min"
)
device = torch.device("cuda" )
estimator = TransformerTempFlowEstimator (freq="min", 
                            prediction_length=prediction_length,
                            input_size=600,
                            target_dim = 2,                          
                            trainer=Trainer(epochs=15,
                                            #learning_rate = 0.00001,
                                            device=device, 
                                            num_batches_per_epoch=500, 
                                            batch_size=20))
predictor = estimator.train(training_data=training_data1)

im getting errors like: RuntimeError: Sizes of tensors must match except in dimension 0. Got 20 and 10 (The offending index is 0) (which usually works by changing target_dim but then i get:)
and: RuntimeError: shape '[-1, 30, 3]' is invalid for input of size 600

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.