Comments (10)
thanks @AlexMRuch having a look!
from pytorch-ts.
@AlexMRuch thanks for the suggestions! I agree it needs more documentation. So I have pushed a m5 data loader here which shows how one can create a dataset with categorical and dynamic features.... it also has how I created a meta file...
Do you think its worth having a quick call?
from pytorch-ts.
BTW to fix the issue in your notebook you can set the prediction_length
to be equal to the extra days in the test dataset (for calculating metrics or any length > 1). So for your example do:
# Define DL Time Series Model
estimator = DeepAREstimator(
freq = FREQ,
prediction_length = time_range_full - time_range_split, # or anything > 1
input_size = 37,
trainer = Trainer(
epochs = 10,
device = DEVICE
)
)
and then everything else in your notebook should work.
from pytorch-ts.
So the multivariate grouper is just a helper function to make multivariate time series out of the open datasets which are all uni-variate. If you prepare the dataset yourself then you can just make it via ListDataset
but now the target
will be an 2d-array and you will need to set the flag one_dim_target=False
.
from pytorch-ts.
from pytorch-ts.
Thanks so much, @kashif! Really appreciate it!
from pytorch-ts.
I just wanted to follow-up on my issue above and not that a substantial barrier to more widespread adoption of pytorch-ts
may be that there just isn't enough commenting and instructions in your examples, or documentation for the library for that matter.
For example, I am working through the Multivariate-Flow-Solar.ipynb
notebook now and the cells within Prepare data set are very uninformative, especially if you're working with a dataset that isn't built into the library (as most people will be, including myself).
Even having comments for things like what MultivariateGrouper
train_grouper = MultivariateGrouper(max_target_dim=int(dataset.metadata.feat_static_cat[0].cardinality))
does (and how it differs from classes like ListDataset
) would be tremendously helpful. However, when setting this up on my own data, I'm not sure how to even get my multivariate data into a pts-dataset form that has the metadata
method.
It's very frustrating and really makes it hard for me to justify using the library more (e.g., if it's this hard just to make a dataset, I can be pretty confident that lots of other things are going to be a challenge).
I am more than happy to use this library more and to push contributions; however, the learning curve for getting things even up and running with the presented examples has been a challenge π
Thanks for your patience and help with this. I really am excited to use this library and potentially to help it develop, but right now I'm kind of stuck because unless I go through each of the specific modules to try to tease apart what everything does and how and when, I can only go with what's on the repo so far. It's a little perplexing given how well documented flair
is: https://github.com/flairNLP/flair; however, that fact may be why flair
has over 9k stars and 400+ users β you can pick it up and run with it in under an hour. Would really love to see pytorch-ts
get to that stage too (and maybe help in the process) π
from pytorch-ts.
So note that the predictor.predict(test_data)
will generate forecasts from where the test_data
ends for prediction_length
time points:
for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
to_pandas(test_entry)[-40:].plot(linewidth=2)
forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')
However if you want to generate predictions in the test time range you need to use the make_evaluation_predictions
helper:
forecast_it, ts_it = make_evaluation_predictions(
dataset=test_data, # test dataset
predictor=predictor, # predictor
num_samples=100, # number of sample paths we want for evaluation
)
forecasts = list(forecast_it)
tss = list(ts_it)
and then you can plot the predictions for each entry in your dataset together with the unseen test data via something like:
def plot_prob_forecasts(ts_entry, forecast_entry):
plot_length = 50
prediction_intervals = (50.0, 90.0)
legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]
fig, ax = plt.subplots(1, 1, figsize=(10, 7))
ts_entry[-plot_length:].plot(ax=ax) # plot the time series
forecast_entry.plot(prediction_intervals=prediction_intervals, color='g')
plt.grid(which="both")
plt.legend(legend, loc="upper left")
plt.show()
plot_prob_forecasts(tss[0], forecasts[0])
from pytorch-ts.
Ah, yes! Your note on updating prediction_length
did the fix! π
I was under the impression that prediction_length
was "how far into the future to predict given the freq
parameter, so 1
in my case was meant to imply 1D
. So, to clarify, given that 40 == time_range_full - time_range_split
(as of today) and that's 40 days ahead (Oct. 27th, which aligns with your forecast). Was my issue simply that prediction_length
needs to be more than 1 day (e.g., 7 or 14 days)? I was just curious about what your logic was for choosing time_range_full - time_range_split
. I know you said this is done to set the prediction window "equal to the extra days in the test dataset," but I'm not sure if that's a general suggestion of something specific to pytorch-ts
, given that 7
and 14
also work well. (Most of my work is with non-time-series DL, and my longitudinal data analysis background is with mixed effects modeling, so thanks for your thoughts here.)
I'm still unclear about what input_size
is 37
, given that my training_data
object has 157 days and that easy day only has one variable so far (a single float). I tried checking the source code for the model (https://github.com/zalandoresearch/pytorch-ts/blob/master/pts/model/deepar/deepar_estimator.py#L36), but it didn't describe it, so I'm not sure if this is a static variable or if I'll have to update it when I rerun this in the future (when more days are in the dataset).
Very help details on MultivariageGrouper
and setting one_dim_target=False
β thanks!
I can't even begin to thank you for the advice and examples on make_evaluation_predictions
and plot_prob_forecasts
β seriously, thank you.
Is there a way to plot both the make_evaluation_predictions
and the predictor.predict()
predictions in the same plot to get at evaluation and forecasting in the same plot that you know of offhand?
Also, how can we get clear evaluation accuracy reports (e.g., RMSE, etc.) for the evaluation above? Is there a built-in
pytorch-ts
method?
Thanks again, @kashif!
I am going to give the multivariate and m5 notebooks a try next week (I can either post new issues should I hit them, or I can post them here since you laid out some tips above) but would love to setup a time to chat with you in the near future about the library. I followed you on Twitter and can email you as well if you'd prefer.
from pytorch-ts.
cool @AlexMRuch lets try to catch up next week. So to answer some of your questions here:
yes so prediction_length
needs to be more than 1 and i choose it to be the extra days in your test set so that I can compare the forecasts to the ground truth. I could have set the prediction length to another number but then the test dataset would not be of much use. So if you want to obtain metrics on the prediction compared to the ground truth test set then its a good idea to set the prediction_length to be the size by which the test set is larger. You typically set it to the number of time steps you would like for a specific problem or data set.
So input_size
is the size of the feature vector which consist of the 1-dim target (as you correctly stated) together with other covariates which are either time varying features like the encoding of the current time point, lag features, embeddings of a particular time series etc etc.) So for your current experiment even though the target is 1-dim at every time point, the other features end up giving you a total feature size of 37. This is a bit clunky in pytorch... I should be able to calculate it from the parameters of the dataset and other parameters of model but I haven't gotten around to it.
So to get the metrics for uni-variate models you do:
from pts.evaluation import Evaluator
import json
evaluator = Evaluator()
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(test_data))
and that returns metrics aggregated over all the time series of your dataset as well as over individual ones in the tuple above, e.g.
print(json.dumps(agg_metrics, indent=4))
{
"MSE": 13505.071875,
"abs_error": 3810.694580078125,
"abs_target_sum": 26724.0,
"abs_target_mean": 668.1,
"seasonal_error": 492.5,
"MASE": 0.19343627310041245,
"MAPE": 0.1368498585666218,
"sMAPE": 0.14887559716391757,
"OWA": NaN,
"MSIS": 2.412104219543147,
"QuantileLoss[0.1]": 1330.3681396484376,
"Coverage[0.1]": 0.075,
"QuantileLoss[0.2]": 2218.410595703125,
"Coverage[0.2]": 0.075,
"QuantileLoss[0.3]": 2886.637915039062,
"Coverage[0.3]": 0.15,
"QuantileLoss[0.4]": 3453.1165405273437,
"Coverage[0.4]": 0.15,
"QuantileLoss[0.5]": 3810.694549560547,
"Coverage[0.5]": 0.2,
"QuantileLoss[0.6]": 4070.61484375,
"Coverage[0.6]": 0.2,
"QuantileLoss[0.7]": 4184.155157470703,
"Coverage[0.7]": 0.275,
"QuantileLoss[0.8]": 3994.054711914063,
"Coverage[0.8]": 0.35,
"QuantileLoss[0.9]": 3354.4697082519533,
"Coverage[0.9]": 0.425,
"RMSE": 116.21132421154145,
"NRMSE": 0.17394300884828837,
"ND": 0.1425944686453422,
"wQuantileLoss[0.1]": 0.04978177442180952,
"wQuantileLoss[0.2]": 0.08301192170719672,
"wQuantileLoss[0.3]": 0.10801668593919556,
"wQuantileLoss[0.4]": 0.12921406004068792,
"wQuantileLoss[0.5]": 0.14259446750338822,
"wQuantileLoss[0.6]": 0.15232056742067057,
"wQuantileLoss[0.7]": 0.15656919463668248,
"wQuantileLoss[0.8]": 0.14945572189470374,
"wQuantileLoss[0.9]": 0.12552274016808687,
"mean_wQuantileLoss": 0.12183190374804684,
"MAE_Coverage": 0.2888888888888889
}
hope that helps!
from pytorch-ts.
Related Issues (20)
- Branch: 0.7.0 - RuntimeError: Cannot serialize type diffusers.schedulers
- Run out of memory when I tried to run "Time-Grad-Electricity.ipynb" HOT 2
- Missing Trainer in version-0.7.0 HOT 1
- Enhancing Covariate Conditioning in TimeGrad HOT 1
- Multivariate-Flow-SolarοΌan error is reported when flow_type='MAF' HOT 1
- Reproducibility issue in TimeGrad with ver-0.7.0 HOT 8
- Inquiry about implementation of mean_wQuantileLoss and m_sum_mean_wQuantileLoss
- A question about the hyperparameter Settings of the model Time-Grad on both of Solar and Wikipedia datasets.
- Issue while runing the Readme
- can't generate dataset "pts_m5" HOT 5
- TypeError: `model` must be a `LightningModule` or `torch._dynamo.OptimizedModule`, got `TimeGradLightningModule`
- ValidationError: 1 validation error for PyTorchPredictorModel
- TypeError: PyTorchPredictor.__init__() got an unexpected keyword argument 'freq' HOT 14
- too many indices for array: array is 1-dimensional, but 2 were indexed
- Data imputation.
- TimeGrad Notebook version 0.7.0 -> predicts all nans HOT 4
- TimeGrad-electricity error
- Pytest pydantic throws an error
- Reproducing the results in "Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows" in need of Parameters
- ImportError: cannot import name 'PyTorchPredictor' from partially initialized module 'gluonts.torch.model.predictor'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.