business-science / modeltime.resample Goto Github PK

Resampling Tools for Time Series Forecasting with Modeltime

Home Page: https://business-science.github.io/modeltime.resample/

License: Other

R 82.73% CSS 17.27%

resampling time-series cross-validation accuracy-metrics tidymodels modeltime modeltime-resample backtesting forecasting bootstrap

modeltime.resample's Introduction

modeltime.resample

Model Performance and Stability Assessment Tools for Single Time Series, Panel Data, & Cross-Sectional Time Series Analysis

A modeltime extension that implements forecast resampling tools that assess time-based model performance and stability for a single time series, panel data, and cross-sectional time series analysis.

Installation

CRAN version:

install.packages("modeltime.resample")

Development version (latest features):

remotes::install_github("business-science/modeltime.resample")

Why Modeltime Resample?

Resampling time series is an important strategy to evaluate the stability of models over time. However, it’s a pain to do this because it requires multiple for-loops to generate the predictions for multiple models and potentially multiple time series groups. Modeltime Resample simplifies the iterative forecasting process taking the pain away.

Modeltime Resample makes it easy to:

Iteratively generate predictions from time series cross-validation plans.
Evaluate the resample predictions to compare many time series models across multiple time-series windows.

Here is an example from Resampling Panel Data, where we can see that Prophet Boost and XGBoost Models outperform Prophet with Regressors for the Walmart Time Series Panel Dataset using the 6-Slice Time Series Cross Validation plan shown above.

Model Accuracy for 6 Time Series Resamples

Resampled Model Accuracy (3 Models, 6 Resamples, 7 Time Series Groups)

Getting Started

Getting Started with Modeltime: Learn the basics of forecasting with Modeltime.
Resampling a Single Time Series: Learn the basics of time series resample evaluation.
Resampling Panel Data: An advanced tutorial on resample evaluation with multiple time series groups (Panel Data)

Meet the modeltime ecosystem

Learn a growing ecosystem of forecasting packages

The modeltime ecosystem is growing

Modeltime is part of a growing ecosystem of Modeltime forecasting packages.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
Deep Learning with GluonTS (Competition Winners)
Time Series Preprocessing, Noise Reduction, & Anomaly Detection
Feature engineering using lagged variables & external regressors
Hyperparameter Tuning
Time series cross-validation
Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
Scalable Forecasting - Forecast 1000+ time series in parallel
and more.

Become the Time Series Expert for your organization.

Take the High-Performance Time Series Forecasting Course

modeltime.resample's People

Contributors

Stargazers

Watchers

Forkers

albertoalmuinha khalil628 marcburri oskyrosky poluru zhnathaniellee olivroy

modeltime.resample's Issues

modeltime_fit_resamples() automatically 'lags' the .row column

The out-of-sample predictions generated by modeltime_fit_resamples() incorrectly show the predicted variable as if it was lagged.

When I run the following code it outputs out-of-sample projections, as expected :

  submodels_resamples_tscv_tbl <- submodels_tbl %>% 
    modeltime_fit_resamples(
      resamples    = cv_resamples
    )

However, a closer inspection of the output provided by modeltime_fit_resamples() shows that the predicted variable is not indexed by the same ".row" (id column) as it was inside "cv_resamples"


# Input received by modeltime_fit_resamples()
input <- cv_resamples %>%
      filter(id == "Slice01") %>% 
      pull(splits) %>%
      first() %>% 
      training() %>% 
      select(.row, value)

input %>% filter(.row >= 1816, .row <= 1825 ) %>% print()

# Output generated by modeltime_fit_resamples()

output <- submodels_resamples_tscv_tbl %>% 
    select(.model_desc, .resample_results) %>% 
    unnest(.resample_results) %>% 
    select(.model_desc, id, .predictions) %>% 
    unnest(.predictions)


output %>% select(.row, value) %>%  filter(.row >= 1816, .row <= 1825 ) %>% print()

If we plot those variables, we can clearly see that the test set was lagged by a few days.
The first slice inside 'cv_resamples' includes the test and the training set, both depicted in blue.
The output, includes the out-of-sample projections for the test set but we ignore those and plot only the realized value of the target variable, in red.
I include the code below just for completeness.

ggplot() +
      geom_line(data = output, aes(x = .row, y = value, color = "input")) +
      geom_line(data = input,  aes(x = .row, y = value, color = "output"))

Resample with Autoregressive Models

Is there a way to use Resample with Autoregressive Models?

Thanks

Could you clarify how the rolling root mean squared error (RMSE) is calculated?

Suppose that I have two slices, say [1, 2, 3, 4] and [2, 3, 4, 5]; and I want to make a two-period-ahead forecast in each slice. For slice 1, I used the first two observations [1, 2] to train a model which will then be used to forecast the last two observations [3, 4]. Similarly, for slice 2, I used [2, 3] to train a model which will then be used to forecast [4, 5]. Let's assume that I obtain [3.5, 4.5] as the forecasts of [3, 4] in slide 1 and [4.1, 5.1] as the forecasts of [4, 5] in slide 2.

I then proceed to calculate the RMSE for slide 1 as $sqrt[( (3.5 - 3)^2 + (4.5 - 4)^2 ) / 2] $ and the RMSE for slide 2 as
$sqrt[( (4.1 - 4)^2 + (5.1 - 5)^2 ) / 2]$. Is this how you calculate the rolling RMSe in modeltime?

Thank you!

Local Behavior in Resample Accuracy

Hi @mdancho84 ,

Now that the development of breaking down the accuracy by local models in modeltime has been done, I think it would be nice to somehow try to do the same for plot_modeltime_resamples() and modeltime_resample_accuracy() in this package, since when using panel data the results presented are those of the global models, but it would be interesting to somehow try to break down by local models as well.

I don't know what you think about this, let's comment.

Regards,

Request to resampling to nested forecasting.

Hi Matt,

Requesting for you to add k-fold and ts-fold feature to your nested forecasting. I am starting to use the nested forecasting a lot but I get poor results if the recent couple months (my test set) were a little off from the rest of the time series.

Thanks!

modeltime_resample_accuracy with 1 observation test sets

Hi @mdancho84,

I cannot for the life of me get the function modeltime_resample_accuracy to work when using 1 observation in each test set.

When running:

resamples = time_series_cv(
  data = score_df,
  initial = "180 days",
  assess = "1 days"
)

... models ... 

resamples_fitted = model_table %>% 
  modeltime_fit_resamples(
    resamples = resamples,
    control = control_resamples(verbose = FALSE)
  )

resamples_fitted %>%
  modeltime_resample_accuracy(summary_fns = mean) %>%
  table_modeltime_accuracy(.interactive = FALSE)

I get the following error:

Error: In metric: `mase`
Problem with `summarise()` column `.estimate`.
i `.estimate = metric_fn(...)`.
x `truth` must have a length greater than `m` to compute the out-of-sample naive mean absolute error.
i The error occurred in group 1: .model_id = 1, .model_desc = "ARIMA(2,0,2)(1,1,2)[7] WITH DRIFT", .resample_id = "Slice01", .type = "Resamples".

Using 2 days instantly fixes the problem, but this is not what I want. I want to judge the average day-ahead performance of each model, e.g. over 30 days.

Resample Fails with Parnsip Fit Model that uses Inline Functions

Full Example: https://github.com/wtmoreland3/reprex/blob/3915d45d424135d8ca27c8c607ffc9da70aa89bf/modeltime_resample_error

Models that have inline functions fail. Example:

model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(
        value ~ as.numeric(date) + month(date, label = TRUE), 
        data = training(splits)
    )

Causes this error:

## > model_tbl_tuned_resamples <- submodels_tbl %>%
## +     slice(c(5)) %>%
## +     modeltime_fit_resamples(
## +         resamples = resamples_tscv,
## +         control   = control_resamples(verbose = TRUE, allow_par = TRUE)
## +     )
## -- Fitting Resamples --------------------------------------------
## 
## * Model ID: 5 LM
## Error: No in-line functions should be used here; use steps to define baking actions.
## 0.03 sec elapsed

Not possible to use ensemble

I was going to use an ensemble as a part of my time series cross validation. This did not work. Here is my code.

# Create average ensemble and add to the modeltime table
ml_mtbl <- ml_mtbl %>% 
    combine_modeltime_tables(
        ml_mtbl %>% 
            ensemble_average() %>% 
            modeltime_table()
    )
    
# TS CV
resamples_tscv <- time_series_cv(
    data        = train_data,
    assess      = "11 days",
    initial     = "730 days",
    skip        = 11,
    slice_limit = 20,
    cumulative = TRUE
    )
resamples_fitted <- ml_mtbl %>% 
    modeltime_fit_resamples(
        resamples = resamples_tscv,
        control   = control_resamples(verbose = FALSE, allow_par = TRUE)
    )

The results:

> resamples_fitted
# Modeltime Table
# A tibble: 6 x 4
  .model_id .model         .model_desc               .resample_results
      <int> <list>         <chr>                     <list>           
1         1 <workflow>     XGBOOST                   <rsmp[+]>        
2         2 <workflow>     RANGER                    <rsmp[+]>        
3         3 <workflow>     GLMNET                    <rsmp[+]>        
4         4 <workflow>     KERNLAB                   <rsmp[+]>        
5         5 <workflow>     KERNLAB                   <rsmp[+]>        
6         6 <ensemble [5]> ENSEMBLE (MEAN): 5 MODELS <lgl [1]>

So when I want to check the accuracy I only get the accuracy for the individual models, not the ensemble. So this code gives me a tibble with all the five models, not he ensemble.

resamples_fitted %>%
    modeltime_resample_accuracy()

Ability to combine resample tables

Shafi Qureshi:

Hello, Matt Dancho I am using 15 different models, LightGBM, CATboost, XGBoost, and others. The problem is when running the models with resamples_fitted <- submodels_1_tbl %>%
modeltime_fit_resamples(
resamples = resample_spec ,
control = control_resamples(verbose = TRUE,allow_par = TRUE )) The R studio crushes. So I am running the LightGBM, CATboost, XGBoost models in differentresamples_fitted therefore, I have now resamples_fitted_1 ,resamples_fitted_2 and resamples_fitted_3. . I want to combine all threeso could be used for the stacked ensemble. Is there a way to combine all different resample fits (resamples_fitted ) models together like combine_modeltime_tables ?

NA Values in modeltime_resample_accuracy

Hi @mdancho84,

I am running the ?modeltime_resample_accuracy example and I am getting all NA values. I have also realized that this is something that was not happening before so you can read in the following link where it says "From the table below, ARIMA has a 6% lower RMSE" but now the table appears all with NA values:

https://business-science.github.io/modeltime.resample/articles/getting-started.html#accuracy-table-1

I also realized that this problem could be solved using purrr::partial, but I think maybe is better to limit the summarization functions and have the function handle it internally (Not sure I'm just thinking out loud).

m750_training_resamples_fitted %>%
    modeltime_resample_accuracy(summary_fns = partial(mean, na.rm = TRUE))

Finally, I would unify the output so we don't have as many columns (in this example, we have 18 columns per metric when we actually have 6 resamples). Couldn't have as many columns per metric as number of resamples?

library(modeltime)
library(modeltime.resample)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
#> Warning: package 'forcats' was built under R version 4.0.4

# Mean (Default)
m750_training_resamples_fitted %>%
    modeltime_resample_accuracy() %>%
    glimpse()
#> Rows: 3
#> Columns: 112
#> $ .model_id   <int> 1, 2, 3
#> $ .model_desc <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type       <chr> "Resamples", "Resamples", "Resamples"
#> $ n           <int> 6, 6, 6
#> $ mae         <dbl> NA, NA, NA
#> $ mae_1       <dbl> NA, NA, NA
#> $ mae_2       <dbl> NA, NA, NA
#> $ mae_3       <dbl> NA, NA, NA
#> $ mae_4       <dbl> NA, NA, NA
#> $ mae_5       <dbl> NA, NA, NA
#> $ mae_6       <dbl> NA, NA, NA
#> $ mae_7       <dbl> NA, NA, NA
#> $ mae_8       <dbl> NA, NA, NA
#> $ mae_9       <dbl> NA, NA, NA
#> $ mae_10      <dbl> NA, NA, NA
#> $ mae_11      <dbl> NA, NA, NA
#> $ mae_12      <dbl> NA, NA, NA
#> $ mae_13      <dbl> NA, NA, NA
#> $ mae_14      <dbl> NA, NA, NA
#> $ mae_15      <dbl> NA, NA, NA
#> $ mae_16      <dbl> NA, NA, NA
#> $ mae_17      <dbl> NA, NA, NA
#> $ mape        <dbl> NA, NA, NA
#> $ mape_1      <dbl> NA, NA, NA
#> $ mape_2      <dbl> NA, NA, NA
#> $ mape_3      <dbl> NA, NA, NA
#> $ mape_4      <dbl> NA, NA, NA
#> $ mape_5      <dbl> NA, NA, NA
#> $ mape_6      <dbl> NA, NA, NA
#> $ mape_7      <dbl> NA, NA, NA
#> $ mape_8      <dbl> NA, NA, NA
#> $ mape_9      <dbl> NA, NA, NA
#> $ mape_10     <dbl> NA, NA, NA
#> $ mape_11     <dbl> NA, NA, NA
#> $ mape_12     <dbl> NA, NA, NA
#> $ mape_13     <dbl> NA, NA, NA
#> $ mape_14     <dbl> NA, NA, NA
#> $ mape_15     <dbl> NA, NA, NA
#> $ mape_16     <dbl> NA, NA, NA
#> $ mape_17     <dbl> NA, NA, NA
#> $ mase        <dbl> NA, NA, NA
#> $ mase_1      <dbl> NA, NA, NA
#> $ mase_2      <dbl> NA, NA, NA
#> $ mase_3      <dbl> NA, NA, NA
#> $ mase_4      <dbl> NA, NA, NA
#> $ mase_5      <dbl> NA, NA, NA
#> $ mase_6      <dbl> NA, NA, NA
#> $ mase_7      <dbl> NA, NA, NA
#> $ mase_8      <dbl> NA, NA, NA
#> $ mase_9      <dbl> NA, NA, NA
#> $ mase_10     <dbl> NA, NA, NA
#> $ mase_11     <dbl> NA, NA, NA
#> $ mase_12     <dbl> NA, NA, NA
#> $ mase_13     <dbl> NA, NA, NA
#> $ mase_14     <dbl> NA, NA, NA
#> $ mase_15     <dbl> NA, NA, NA
#> $ mase_16     <dbl> NA, NA, NA
#> $ mase_17     <dbl> NA, NA, NA
#> $ smape       <dbl> NA, NA, NA
#> $ smape_1     <dbl> NA, NA, NA
#> $ smape_2     <dbl> NA, NA, NA
#> $ smape_3     <dbl> NA, NA, NA
#> $ smape_4     <dbl> NA, NA, NA
#> $ smape_5     <dbl> NA, NA, NA
#> $ smape_6     <dbl> NA, NA, NA
#> $ smape_7     <dbl> NA, NA, NA
#> $ smape_8     <dbl> NA, NA, NA
#> $ smape_9     <dbl> NA, NA, NA
#> $ smape_10    <dbl> NA, NA, NA
#> $ smape_11    <dbl> NA, NA, NA
#> $ smape_12    <dbl> NA, NA, NA
#> $ smape_13    <dbl> NA, NA, NA
#> $ smape_14    <dbl> NA, NA, NA
#> $ smape_15    <dbl> NA, NA, NA
#> $ smape_16    <dbl> NA, NA, NA
#> $ smape_17    <dbl> NA, NA, NA
#> $ rmse        <dbl> NA, NA, NA
#> $ rmse_1      <dbl> NA, NA, NA
#> $ rmse_2      <dbl> NA, NA, NA
#> $ rmse_3      <dbl> NA, NA, NA
#> $ rmse_4      <dbl> NA, NA, NA
#> $ rmse_5      <dbl> NA, NA, NA
#> $ rmse_6      <dbl> NA, NA, NA
#> $ rmse_7      <dbl> NA, NA, NA
#> $ rmse_8      <dbl> NA, NA, NA
#> $ rmse_9      <dbl> NA, NA, NA
#> $ rmse_10     <dbl> NA, NA, NA
#> $ rmse_11     <dbl> NA, NA, NA
#> $ rmse_12     <dbl> NA, NA, NA
#> $ rmse_13     <dbl> NA, NA, NA
#> $ rmse_14     <dbl> NA, NA, NA
#> $ rmse_15     <dbl> NA, NA, NA
#> $ rmse_16     <dbl> NA, NA, NA
#> $ rmse_17     <dbl> NA, NA, NA
#> $ rsq         <dbl> NA, NA, NA
#> $ rsq_1       <dbl> NA, NA, NA
#> $ rsq_2       <dbl> NA, NA, NA
#> $ rsq_3       <dbl> NA, NA, NA
#> $ rsq_4       <dbl> NA, NA, NA
#> $ rsq_5       <dbl> NA, NA, NA
#> $ rsq_6       <dbl> NA, NA, NA
#> $ rsq_7       <dbl> NA, NA, NA
#> $ rsq_8       <dbl> NA, NA, NA
#> $ rsq_9       <dbl> NA, NA, NA
#> $ rsq_10      <dbl> NA, NA, NA
#> $ rsq_11      <dbl> NA, NA, NA
#> $ rsq_12      <dbl> NA, NA, NA
#> $ rsq_13      <dbl> NA, NA, NA
#> $ rsq_14      <dbl> NA, NA, NA
#> $ rsq_15      <dbl> NA, NA, NA
#> $ rsq_16      <dbl> NA, NA, NA
#> $ rsq_17      <dbl> NA, NA, NA

# Mean and Standard Deviation
m750_training_resamples_fitted %>%
    modeltime_resample_accuracy(
        summary_fns = list(mean = mean, sd = sd)
    ) %>% glimpse()
#> Rows: 3
#> Columns: 220
#> $ .model_id     <int> 1, 2, 3
#> $ .model_desc   <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type         <chr> "Resamples", "Resamples", "Resamples"
#> $ n             <int> 6, 6, 6
#> $ mae_mean      <dbl> NA, NA, NA
#> $ mae_sd        <dbl> NA, NA, NA
#> $ mae_1_mean    <dbl> NA, NA, NA
#> $ mae_1_sd      <dbl> NA, NA, NA
#> $ mae_2_mean    <dbl> NA, NA, NA
#> $ mae_2_sd      <dbl> NA, NA, NA
#> $ mae_3_mean    <dbl> NA, NA, NA
#> $ mae_3_sd      <dbl> NA, NA, NA
#> $ mae_4_mean    <dbl> NA, NA, NA
#> $ mae_4_sd      <dbl> NA, NA, NA
#> $ mae_5_mean    <dbl> NA, NA, NA
#> $ mae_5_sd      <dbl> NA, NA, NA
#> $ mae_6_mean    <dbl> NA, NA, NA
#> $ mae_6_sd      <dbl> NA, NA, NA
#> $ mae_7_mean    <dbl> NA, NA, NA
#> $ mae_7_sd      <dbl> NA, NA, NA
#> $ mae_8_mean    <dbl> NA, NA, NA
#> $ mae_8_sd      <dbl> NA, NA, NA
#> $ mae_9_mean    <dbl> NA, NA, NA
#> $ mae_9_sd      <dbl> NA, NA, NA
#> $ mae_10_mean   <dbl> NA, NA, NA
#> $ mae_10_sd     <dbl> NA, NA, NA
#> $ mae_11_mean   <dbl> NA, NA, NA
#> $ mae_11_sd     <dbl> NA, NA, NA
#> $ mae_12_mean   <dbl> NA, NA, NA
#> $ mae_12_sd     <dbl> NA, NA, NA
#> $ mae_13_mean   <dbl> NA, NA, NA
#> $ mae_13_sd     <dbl> NA, NA, NA
#> $ mae_14_mean   <dbl> NA, NA, NA
#> $ mae_14_sd     <dbl> NA, NA, NA
#> $ mae_15_mean   <dbl> NA, NA, NA
#> $ mae_15_sd     <dbl> NA, NA, NA
#> $ mae_16_mean   <dbl> NA, NA, NA
#> $ mae_16_sd     <dbl> NA, NA, NA
#> $ mae_17_mean   <dbl> NA, NA, NA
#> $ mae_17_sd     <dbl> NA, NA, NA
#> $ mape_mean     <dbl> NA, NA, NA
#> $ mape_sd       <dbl> NA, NA, NA
#> $ mape_1_mean   <dbl> NA, NA, NA
#> $ mape_1_sd     <dbl> NA, NA, NA
#> $ mape_2_mean   <dbl> NA, NA, NA
#> $ mape_2_sd     <dbl> NA, NA, NA
#> $ mape_3_mean   <dbl> NA, NA, NA
#> $ mape_3_sd     <dbl> NA, NA, NA
#> $ mape_4_mean   <dbl> NA, NA, NA
#> $ mape_4_sd     <dbl> NA, NA, NA
#> $ mape_5_mean   <dbl> NA, NA, NA
#> $ mape_5_sd     <dbl> NA, NA, NA
#> $ mape_6_mean   <dbl> NA, NA, NA
#> $ mape_6_sd     <dbl> NA, NA, NA
#> $ mape_7_mean   <dbl> NA, NA, NA
#> $ mape_7_sd     <dbl> NA, NA, NA
#> $ mape_8_mean   <dbl> NA, NA, NA
#> $ mape_8_sd     <dbl> NA, NA, NA
#> $ mape_9_mean   <dbl> NA, NA, NA
#> $ mape_9_sd     <dbl> NA, NA, NA
#> $ mape_10_mean  <dbl> NA, NA, NA
#> $ mape_10_sd    <dbl> NA, NA, NA
#> $ mape_11_mean  <dbl> NA, NA, NA
#> $ mape_11_sd    <dbl> NA, NA, NA
#> $ mape_12_mean  <dbl> NA, NA, NA
#> $ mape_12_sd    <dbl> NA, NA, NA
#> $ mape_13_mean  <dbl> NA, NA, NA
#> $ mape_13_sd    <dbl> NA, NA, NA
#> $ mape_14_mean  <dbl> NA, NA, NA
#> $ mape_14_sd    <dbl> NA, NA, NA
#> $ mape_15_mean  <dbl> NA, NA, NA
#> $ mape_15_sd    <dbl> NA, NA, NA
#> $ mape_16_mean  <dbl> NA, NA, NA
#> $ mape_16_sd    <dbl> NA, NA, NA
#> $ mape_17_mean  <dbl> NA, NA, NA
#> $ mape_17_sd    <dbl> NA, NA, NA
#> $ mase_mean     <dbl> NA, NA, NA
#> $ mase_sd       <dbl> NA, NA, NA
#> $ mase_1_mean   <dbl> NA, NA, NA
#> $ mase_1_sd     <dbl> NA, NA, NA
#> $ mase_2_mean   <dbl> NA, NA, NA
#> $ mase_2_sd     <dbl> NA, NA, NA
#> $ mase_3_mean   <dbl> NA, NA, NA
#> $ mase_3_sd     <dbl> NA, NA, NA
#> $ mase_4_mean   <dbl> NA, NA, NA
#> $ mase_4_sd     <dbl> NA, NA, NA
#> $ mase_5_mean   <dbl> NA, NA, NA
#> $ mase_5_sd     <dbl> NA, NA, NA
#> $ mase_6_mean   <dbl> NA, NA, NA
#> $ mase_6_sd     <dbl> NA, NA, NA
#> $ mase_7_mean   <dbl> NA, NA, NA
#> $ mase_7_sd     <dbl> NA, NA, NA
#> $ mase_8_mean   <dbl> NA, NA, NA
#> $ mase_8_sd     <dbl> NA, NA, NA
#> $ mase_9_mean   <dbl> NA, NA, NA
#> $ mase_9_sd     <dbl> NA, NA, NA
#> $ mase_10_mean  <dbl> NA, NA, NA
#> $ mase_10_sd    <dbl> NA, NA, NA
#> $ mase_11_mean  <dbl> NA, NA, NA
#> $ mase_11_sd    <dbl> NA, NA, NA
#> $ mase_12_mean  <dbl> NA, NA, NA
#> $ mase_12_sd    <dbl> NA, NA, NA
#> $ mase_13_mean  <dbl> NA, NA, NA
#> $ mase_13_sd    <dbl> NA, NA, NA
#> $ mase_14_mean  <dbl> NA, NA, NA
#> $ mase_14_sd    <dbl> NA, NA, NA
#> $ mase_15_mean  <dbl> NA, NA, NA
#> $ mase_15_sd    <dbl> NA, NA, NA
#> $ mase_16_mean  <dbl> NA, NA, NA
#> $ mase_16_sd    <dbl> NA, NA, NA
#> $ mase_17_mean  <dbl> NA, NA, NA
#> $ mase_17_sd    <dbl> NA, NA, NA
#> $ smape_mean    <dbl> NA, NA, NA
#> $ smape_sd      <dbl> NA, NA, NA
#> $ smape_1_mean  <dbl> NA, NA, NA
#> $ smape_1_sd    <dbl> NA, NA, NA
#> $ smape_2_mean  <dbl> NA, NA, NA
#> $ smape_2_sd    <dbl> NA, NA, NA
#> $ smape_3_mean  <dbl> NA, NA, NA
#> $ smape_3_sd    <dbl> NA, NA, NA
#> $ smape_4_mean  <dbl> NA, NA, NA
#> $ smape_4_sd    <dbl> NA, NA, NA
#> $ smape_5_mean  <dbl> NA, NA, NA
#> $ smape_5_sd    <dbl> NA, NA, NA
#> $ smape_6_mean  <dbl> NA, NA, NA
#> $ smape_6_sd    <dbl> NA, NA, NA
#> $ smape_7_mean  <dbl> NA, NA, NA
#> $ smape_7_sd    <dbl> NA, NA, NA
#> $ smape_8_mean  <dbl> NA, NA, NA
#> $ smape_8_sd    <dbl> NA, NA, NA
#> $ smape_9_mean  <dbl> NA, NA, NA
#> $ smape_9_sd    <dbl> NA, NA, NA
#> $ smape_10_mean <dbl> NA, NA, NA
#> $ smape_10_sd   <dbl> NA, NA, NA
#> $ smape_11_mean <dbl> NA, NA, NA
#> $ smape_11_sd   <dbl> NA, NA, NA
#> $ smape_12_mean <dbl> NA, NA, NA
#> $ smape_12_sd   <dbl> NA, NA, NA
#> $ smape_13_mean <dbl> NA, NA, NA
#> $ smape_13_sd   <dbl> NA, NA, NA
#> $ smape_14_mean <dbl> NA, NA, NA
#> $ smape_14_sd   <dbl> NA, NA, NA
#> $ smape_15_mean <dbl> NA, NA, NA
#> $ smape_15_sd   <dbl> NA, NA, NA
#> $ smape_16_mean <dbl> NA, NA, NA
#> $ smape_16_sd   <dbl> NA, NA, NA
#> $ smape_17_mean <dbl> NA, NA, NA
#> $ smape_17_sd   <dbl> NA, NA, NA
#> $ rmse_mean     <dbl> NA, NA, NA
#> $ rmse_sd       <dbl> NA, NA, NA
#> $ rmse_1_mean   <dbl> NA, NA, NA
#> $ rmse_1_sd     <dbl> NA, NA, NA
#> $ rmse_2_mean   <dbl> NA, NA, NA
#> $ rmse_2_sd     <dbl> NA, NA, NA
#> $ rmse_3_mean   <dbl> NA, NA, NA
#> $ rmse_3_sd     <dbl> NA, NA, NA
#> $ rmse_4_mean   <dbl> NA, NA, NA
#> $ rmse_4_sd     <dbl> NA, NA, NA
#> $ rmse_5_mean   <dbl> NA, NA, NA
#> $ rmse_5_sd     <dbl> NA, NA, NA
#> $ rmse_6_mean   <dbl> NA, NA, NA
#> $ rmse_6_sd     <dbl> NA, NA, NA
#> $ rmse_7_mean   <dbl> NA, NA, NA
#> $ rmse_7_sd     <dbl> NA, NA, NA
#> $ rmse_8_mean   <dbl> NA, NA, NA
#> $ rmse_8_sd     <dbl> NA, NA, NA
#> $ rmse_9_mean   <dbl> NA, NA, NA
#> $ rmse_9_sd     <dbl> NA, NA, NA
#> $ rmse_10_mean  <dbl> NA, NA, NA
#> $ rmse_10_sd    <dbl> NA, NA, NA
#> $ rmse_11_mean  <dbl> NA, NA, NA
#> $ rmse_11_sd    <dbl> NA, NA, NA
#> $ rmse_12_mean  <dbl> NA, NA, NA
#> $ rmse_12_sd    <dbl> NA, NA, NA
#> $ rmse_13_mean  <dbl> NA, NA, NA
#> $ rmse_13_sd    <dbl> NA, NA, NA
#> $ rmse_14_mean  <dbl> NA, NA, NA
#> $ rmse_14_sd    <dbl> NA, NA, NA
#> $ rmse_15_mean  <dbl> NA, NA, NA
#> $ rmse_15_sd    <dbl> NA, NA, NA
#> $ rmse_16_mean  <dbl> NA, NA, NA
#> $ rmse_16_sd    <dbl> NA, NA, NA
#> $ rmse_17_mean  <dbl> NA, NA, NA
#> $ rmse_17_sd    <dbl> NA, NA, NA
#> $ rsq_mean      <dbl> NA, NA, NA
#> $ rsq_sd        <dbl> NA, NA, NA
#> $ rsq_1_mean    <dbl> NA, NA, NA
#> $ rsq_1_sd      <dbl> NA, NA, NA
#> $ rsq_2_mean    <dbl> NA, NA, NA
#> $ rsq_2_sd      <dbl> NA, NA, NA
#> $ rsq_3_mean    <dbl> NA, NA, NA
#> $ rsq_3_sd      <dbl> NA, NA, NA
#> $ rsq_4_mean    <dbl> NA, NA, NA
#> $ rsq_4_sd      <dbl> NA, NA, NA
#> $ rsq_5_mean    <dbl> NA, NA, NA
#> $ rsq_5_sd      <dbl> NA, NA, NA
#> $ rsq_6_mean    <dbl> NA, NA, NA
#> $ rsq_6_sd      <dbl> NA, NA, NA
#> $ rsq_7_mean    <dbl> NA, NA, NA
#> $ rsq_7_sd      <dbl> NA, NA, NA
#> $ rsq_8_mean    <dbl> NA, NA, NA
#> $ rsq_8_sd      <dbl> NA, NA, NA
#> $ rsq_9_mean    <dbl> NA, NA, NA
#> $ rsq_9_sd      <dbl> NA, NA, NA
#> $ rsq_10_mean   <dbl> NA, NA, NA
#> $ rsq_10_sd     <dbl> NA, NA, NA
#> $ rsq_11_mean   <dbl> NA, NA, NA
#> $ rsq_11_sd     <dbl> NA, NA, NA
#> $ rsq_12_mean   <dbl> NA, NA, NA
#> $ rsq_12_sd     <dbl> NA, NA, NA
#> $ rsq_13_mean   <dbl> NA, NA, NA
#> $ rsq_13_sd     <dbl> NA, NA, NA
#> $ rsq_14_mean   <dbl> NA, NA, NA
#> $ rsq_14_sd     <dbl> NA, NA, NA
#> $ rsq_15_mean   <dbl> NA, NA, NA
#> $ rsq_15_sd     <dbl> NA, NA, NA
#> $ rsq_16_mean   <dbl> NA, NA, NA
#> $ rsq_16_sd     <dbl> NA, NA, NA
#> $ rsq_17_mean   <dbl> NA, NA, NA
#> $ rsq_17_sd     <dbl> NA, NA, NA

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1                 stringr_1.4.0                
#>  [3] dplyr_1.0.5                   purrr_0.3.4                  
#>  [5] readr_1.4.0                   tidyr_1.1.3                  
#>  [7] tibble_3.1.0                  ggplot2_3.3.3                
#>  [9] tidyverse_1.3.0               modeltime.resample_0.1.0.9000
#> [11] modeltime_0.4.1.9000         
#> 
#> loaded via a namespace (and not attached):
#>  [1] fs_1.5.0             xts_0.12.1           lubridate_1.7.10    
#>  [4] httr_1.4.2           DiceDesign_1.9       tools_4.0.3         
#>  [7] backports_1.2.1      utf8_1.1.4           R6_2.5.0            
#> [10] rpart_4.1-15         DBI_1.1.0            colorspace_2.0-0    
#> [13] yardstick_0.0.7      nnet_7.3-14          withr_2.4.1         
#> [16] tidyselect_1.1.0     compiler_4.0.3       rvest_0.3.6         
#> [19] cli_2.3.1            xml2_1.3.2           scales_1.1.1        
#> [22] tune_0.1.3           digest_0.6.27        StanHeaders_2.21.0-7
#> [25] rmarkdown_2.7        pkgconfig_2.0.3      htmltools_0.5.1.1   
#> [28] parallelly_1.23.0    lhs_1.1.1            dbplyr_2.0.0        
#> [31] highr_0.8            readxl_1.3.1         rlang_0.4.10        
#> [34] rstudioapi_0.13      generics_0.1.0       jsonlite_1.7.2      
#> [37] zoo_1.8-9            magrittr_2.0.1       Matrix_1.2-18       
#> [40] Rcpp_1.0.6           munsell_0.5.0        fansi_0.4.2         
#> [43] GPfit_1.0-8          lifecycle_1.0.0      furrr_0.2.2         
#> [46] stringi_1.5.3        pROC_1.17.0.1        yaml_2.2.1          
#> [49] MASS_7.3-53          plyr_1.8.6           recipes_0.1.15      
#> [52] grid_4.0.3           parallel_4.0.3       listenv_0.8.0       
#> [55] crayon_1.4.1         lattice_0.20-41      haven_2.3.1         
#> [58] splines_4.0.3        hms_1.0.0            knitr_1.30          
#> [61] ps_1.6.0             pillar_1.5.1         dials_0.0.9         
#> [64] codetools_0.2-16     parsnip_0.1.5        timetk_2.6.1        
#> [67] reprex_1.0.0         glue_1.4.2           evaluate_0.14       
#> [70] rsample_0.0.9        modelr_0.1.8         RcppParallel_5.0.3  
#> [73] vctrs_0.3.6          foreach_1.5.1        cellranger_1.1.0    
#> [76] gtable_0.3.0         future_1.21.0        assertthat_0.2.1    
#> [79] xfun_0.21            gower_0.2.2          prodlim_2019.11.13  
#> [82] broom_0.7.2          class_7.3-17         survival_3.2-7      
#> [85] timeDate_3043.102    iterators_1.0.13     hardhat_0.1.5       
#> [88] lava_1.6.9           workflows_0.2.2      globals_0.14.0      
#> [91] ellipsis_0.3.1       ipred_0.9-10
Created on 2021-03-13 by the reprex package (v1.0.0)

Error in out$extras$final(predictors_extras, outcomes_extras): argument "outcomes_extras" is missing, with no default

This is a replicate of tidymodels/hardhat#200

add .resample_id to modeltime_resample_accuracy output if summary_fns = NULL

modeltime_resample_accuracy returns the raw output of the resample fits if summary_fns = NULL. However, the output is returned without .resample_id, which means that it is not possible to check the metrics for specific resample id.

It would be valuable to have the .resample_id available in the case where modeltime_resample_accuracy(..., summary_fns = NULL)

I don't mind dropping a PR for this, but unclear when I'll have time.

Display resampling accuracy metrics by ID. It is possible to see the results of the resampling folds by ID, that is, for each predicted time series using the function modeltime_resample_accuracy

Dear, @mdancho84 @AlbertoAlmuinha

There is the possibility of viewing the metrics in the resampling forecast by ID, I saw that summary_fns = NULL #1 #3 provides the results by resampling fold, but they are global results. I think it would be very valuable to be able to access the local metrics for each ID and the results per fold resampling. I saw that #4 already requested something similar. If this is already possible and you can help me visualize these results, I would appreciate it. Follow the scripts:

Link to download the database used: https://github.com/forecastingEDs/Forecasting-of-admissions-in-the-emergency-departments/blob/131bd23723a39724ad4f88ad6b8e5a58f42a7960/datasets.xlsx

data_tbl <- datasets %>%
  select(id, Date, attendences, average_temperature, min, max,  sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
  set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

data_tbl

Full = Training + Forecast Datasets

full_data_tbl <- datasets %>%
  select(id, Date, attendences, average_temperature, min, max,  sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
  set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) %>%

Apply Group-wise Time Series Manipulations

group_by(id) %>%
 future_frame(
   .date_var   = date,
   .length_out = "3 days",
   .bind_data  = TRUE
 ) %>%
 ungroup() %>%

Consolidate IDs

mutate(id = fct_drop(id))

Training Data

data_prepared_tbl <- full_data_tbl %>%
  filter(!is.na(value))

Forecast Data

future_tbl <- full_data_tbl %>%
  filter(is.na(value))

emergency_tscv <- data_prepared_tbl %>%
  time_series_cv(
    date_var    = date, 
    assess      = "3 days",
    skip        = "30 days",
    cumulative  = TRUE,
    slice_limit = 5
  )

emergency_tscv

test data preprocessing for ML ----

recipe_spec <- recipe(value ~ ., 
                      data = training(emergency_tscv$splits[[1]])) %>%
  step_timeseries_signature(date) %>%
  step_rm(matches("(.iso$)|(.xts$)|(hour)|(minute)|(second)|(am.pm)")) %>%
  step_mutate(data = factor(value, ordered = TRUE))%>%
  step_dummy(all_nominal(), one_hot = TRUE)%>%
  step_normalize (date_index.num,tempe_verage,tempemin,tempemax,date_year, -all_outcomes())

Model 1: Xgboost ----

wflw_fit_xgboost <- workflow() %>%
  add_model(
    boost_tree("regression") %>% set_engine("xgboost") 
  ) %>%
  add_recipe(recipe_spec %>% step_rm(date)) %>%
  fit(training(emergency_tscv$splits[[1]]))

Model 2: LightGBM ----

wflw_fit_lightgbm <- workflow() %>%
  add_model(
    boost_tree("regression") %>% set_engine("lightgbm")
  ) %>%
  add_recipe(recipe_spec %>% step_rm(date)) %>%
  fit(training(emergency_tscv$splits[[1]]))

---- MODELTIME TABLE ----

model_tbl <- modeltime_table(
  wflw_fit_xgboost,
  wflw_fit_lightgbm
)

model_tbl

resample_results <- model_tbl %>%
  modeltime_fit_resamples(
    resamples = emergency_tscv,
    control   = control_resamples(allow_par = TRUE, verbose = TRUE)
  )

resample_results

This step I need the results by ID but I only get the global results by fold resampling. Can you help me?

resample_results %>%
  modeltime_resample_accuracy(summary_fns = NULL, yardstick::metric_set(mape, smape, mase, rmse)) %>%
  table_modeltime_accuracy(.interactive = FALSE)

resampling with external regressors with smooth_es

I try to run the following code:

model_fit_exp_smooth <- exp_smoothing(
mode = "regression",
seasonal_period = 12,
error = "additive",
trend = "additive",
season = "additive"
) %>%
set_engine(engine = "smooth_es") %>%
fit(value ~ date + moy + VIXRQBIN + DFFD, data = initial_training)

my_models_tbl <- modeltime_table(
model_fit_exp_smooth,
model_fit_lm
)

resamples_fitted <- my_models_tbl %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(allow_par = TRUE, parallel_over = NULL, verbose = T)
)
── Fitting Resamples ────────────────────────────────────────────

• Model ID: 1 ETSX(AAA)
i Slice01: preprocessor 1/1
✓ Slice01: preprocessor 1/1
i Slice01: preprocessor 1/1, model 1/1
✓ Slice01: preprocessor 1/1, model 1/1
i Slice01: preprocessor 1/1, model 1/1 (extracts)
i Slice01: preprocessor 1/1, model 1/1 (predictions)
! Slice01: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice02: preprocessor 1/1
✓ Slice02: preprocessor 1/1
i Slice02: preprocessor 1/1, model 1/1
✓ Slice02: preprocessor 1/1, model 1/1
i Slice02: preprocessor 1/1, model 1/1 (extracts)
i Slice02: preprocessor 1/1, model 1/1 (predictions)
! Slice02: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice03: preprocessor 1/1
✓ Slice03: preprocessor 1/1
i Slice03: preprocessor 1/1, model 1/1
✓ Slice03: preprocessor 1/1, model 1/1
i Slice03: preprocessor 1/1, model 1/1 (extracts)
i Slice03: preprocessor 1/1, model 1/1 (predictions)
! Slice03: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...

Does the warning "The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model..." mean that my external regressors for smooth_es are not used at all on modeltime_fit_resamples() or does it mean that the external regressors are taken from the in-sample data directly following the slice being processed?