business-science / modeltime.resample Goto Github PK

Resampling Tools for Time Series Forecasting with Modeltime

Home Page: https://business-science.github.io/modeltime.resample/

License: Other

R 82.73% CSS 17.27%

resampling time-series cross-validation accuracy-metrics tidymodels modeltime modeltime-resample backtesting forecasting bootstrap bootstrapping statistics r-package

modeltime.resample's Issues

Resample Fails with Parnsip Fit Model that uses Inline Functions

Full Example: https://github.com/wtmoreland3/reprex/blob/3915d45d424135d8ca27c8c607ffc9da70aa89bf/modeltime_resample_error

Models that have inline functions fail. Example:

model_fit_lm <- linear_reg() %>%
    set_engine("lm") %>%
    fit(
        value ~ as.numeric(date) + month(date, label = TRUE), 
        data = training(splits)
    )

Causes this error:

## > model_tbl_tuned_resamples <- submodels_tbl %>%
## +     slice(c(5)) %>%
## +     modeltime_fit_resamples(
## +         resamples = resamples_tscv,
## +         control   = control_resamples(verbose = TRUE, allow_par = TRUE)
## +     )
## -- Fitting Resamples --------------------------------------------
## 
## * Model ID: 5 LM
## Error: No in-line functions should be used here; use steps to define baking actions.
## 0.03 sec elapsed

Resample with Autoregressive Models

Is there a way to use Resample with Autoregressive Models?

Thanks

Could you clarify how the rolling root mean squared error (RMSE) is calculated?

Suppose that I have two slices, say [1, 2, 3, 4] and [2, 3, 4, 5]; and I want to make a two-period-ahead forecast in each slice. For slice 1, I used the first two observations [1, 2] to train a model which will then be used to forecast the last two observations [3, 4]. Similarly, for slice 2, I used [2, 3] to train a model which will then be used to forecast [4, 5]. Let's assume that I obtain [3.5, 4.5] as the forecasts of [3, 4] in slide 1 and [4.1, 5.1] as the forecasts of [4, 5] in slide 2.

I then proceed to calculate the RMSE for slide 1 as $sqrt[( (3.5 - 3)^2 + (4.5 - 4)^2 ) / 2] $ and the RMSE for slide 2 as
$sqrt[( (4.1 - 4)^2 + (5.1 - 5)^2 ) / 2]$. Is this how you calculate the rolling RMSe in modeltime?

Thank you!

modeltime_fit_resamples() automatically 'lags' the .row column

The out-of-sample predictions generated by modeltime_fit_resamples() incorrectly show the predicted variable as if it was lagged.

When I run the following code it outputs out-of-sample projections, as expected :

  submodels_resamples_tscv_tbl <- submodels_tbl %>% 
    modeltime_fit_resamples(
      resamples    = cv_resamples
    )

However, a closer inspection of the output provided by modeltime_fit_resamples() shows that the predicted variable is not indexed by the same ".row" (id column) as it was inside "cv_resamples"


# Input received by modeltime_fit_resamples()
input <- cv_resamples %>%
      filter(id == "Slice01") %>% 
      pull(splits) %>%
      first() %>% 
      training() %>% 
      select(.row, value)

input %>% filter(.row >= 1816, .row <= 1825 ) %>% print()

# Output generated by modeltime_fit_resamples()

output <- submodels_resamples_tscv_tbl %>% 
    select(.model_desc, .resample_results) %>% 
    unnest(.resample_results) %>% 
    select(.model_desc, id, .predictions) %>% 
    unnest(.predictions)


output %>% select(.row, value) %>%  filter(.row >= 1816, .row <= 1825 ) %>% print()

If we plot those variables, we can clearly see that the test set was lagged by a few days.
The first slice inside 'cv_resamples' includes the test and the training set, both depicted in blue.
The output, includes the out-of-sample projections for the test set but we ignore those and plot only the realized value of the target variable, in red.
I include the code below just for completeness.

ggplot() +
      geom_line(data = output, aes(x = .row, y = value, color = "input")) +
      geom_line(data = input,  aes(x = .row, y = value, color = "output"))

Error in out$extras$final(predictors_extras, outcomes_extras): argument "outcomes_extras" is missing, with no default

This is a replicate of tidymodels/hardhat#200

modeltime_resample_accuracy with 1 observation test sets

Hi @mdancho84,

I cannot for the life of me get the function modeltime_resample_accuracy to work when using 1 observation in each test set.

When running:

resamples = time_series_cv(
  data = score_df,
  initial = "180 days",
  assess = "1 days"
)

... models ... 

resamples_fitted = model_table %>% 
  modeltime_fit_resamples(
    resamples = resamples,
    control = control_resamples(verbose = FALSE)
  )

resamples_fitted %>%
  modeltime_resample_accuracy(summary_fns = mean) %>%
  table_modeltime_accuracy(.interactive = FALSE)

I get the following error:

Error: In metric: `mase`
Problem with `summarise()` column `.estimate`.
i `.estimate = metric_fn(...)`.
x `truth` must have a length greater than `m` to compute the out-of-sample naive mean absolute error.
i The error occurred in group 1: .model_id = 1, .model_desc = "ARIMA(2,0,2)(1,1,2)[7] WITH DRIFT", .resample_id = "Slice01", .type = "Resamples".

Using 2 days instantly fixes the problem, but this is not what I want. I want to judge the average day-ahead performance of each model, e.g. over 30 days.

NA Values in modeltime_resample_accuracy

Hi @mdancho84,

I am running the ?modeltime_resample_accuracy example and I am getting all NA values. I have also realized that this is something that was not happening before so you can read in the following link where it says "From the table below, ARIMA has a 6% lower RMSE" but now the table appears all with NA values:

https://business-science.github.io/modeltime.resample/articles/getting-started.html#accuracy-table-1

I also realized that this problem could be solved using purrr::partial, but I think maybe is better to limit the summarization functions and have the function handle it internally (Not sure I'm just thinking out loud).

m750_training_resamples_fitted %>%
    modeltime_resample_accuracy(summary_fns = partial(mean, na.rm = TRUE))

Finally, I would unify the output so we don't have as many columns (in this example, we have 18 columns per metric when we actually have 6 resamples). Couldn't have as many columns per metric as number of resamples?

library(modeltime)
library(modeltime.resample)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
#> Warning: package 'forcats' was built under R version 4.0.4

# Mean (Default)
m750_training_resamples_fitted %>%
    modeltime_resample_accuracy() %>%
    glimpse()
#> Rows: 3
#> Columns: 112
#> $ .model_id   <int> 1, 2, 3
#> $ .model_desc <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type       <chr> "Resamples", "Resamples", "Resamples"
#> $ n           <int> 6, 6, 6
#> $ mae         <dbl> NA, NA, NA
#> $ mae_1       <dbl> NA, NA, NA
#> $ mae_2       <dbl> NA, NA, NA
#> $ mae_3       <dbl> NA, NA, NA
#> $ mae_4       <dbl> NA, NA, NA
#> $ mae_5       <dbl> NA, NA, NA
#> $ mae_6       <dbl> NA, NA, NA
#> $ mae_7       <dbl> NA, NA, NA
#> $ mae_8       <dbl> NA, NA, NA
#> $ mae_9       <dbl> NA, NA, NA
#> $ mae_10      <dbl> NA, NA, NA
#> $ mae_11      <dbl> NA, NA, NA
#> $ mae_12      <dbl> NA, NA, NA
#> $ mae_13      <dbl> NA, NA, NA
#> $ mae_14      <dbl> NA, NA, NA
#> $ mae_15      <dbl> NA, NA, NA
#> $ mae_16      <dbl> NA, NA, NA
#> $ mae_17      <dbl> NA, NA, NA
#> $ mape        <dbl> NA, NA, NA
#> $ mape_1      <dbl> NA, NA, NA
#> $ mape_2      <dbl> NA, NA, NA
#> $ mape_3      <dbl> NA, NA, NA
#> $ mape_4      <dbl> NA, NA, NA
#> $ mape_5      <dbl> NA, NA, NA
#> $ mape_6      <dbl> NA, NA, NA
#> $ mape_7      <dbl> NA, NA, NA
#> $ mape_8      <dbl> NA, NA, NA
#> $ mape_9      <dbl> NA, NA, NA
#> $ mape_10     <dbl> NA, NA, NA
#> $ mape_11     <dbl> NA, NA, NA
#> $ mape_12     <dbl> NA, NA, NA
#> $ mape_13     <dbl> NA, NA, NA
#> $ mape_14     <dbl> NA, NA, NA
#> $ mape_15     <dbl> NA, NA, NA
#> $ mape_16     <dbl> NA, NA, NA
#> $ mape_17     <dbl> NA, NA, NA
#> $ mase        <dbl> NA, NA, NA
#> $ mase_1      <dbl> NA, NA, NA
#> $ mase_2      <dbl> NA, NA, NA
#> $ mase_3      <dbl> NA, NA, NA
#> $ mase_4      <dbl> NA, NA, NA
#> $ mase_5      <dbl> NA, NA, NA
#> $ mase_6      <dbl> NA, NA, NA
#> $ mase_7      <dbl> NA, NA, NA
#> $ mase_8      <dbl> NA, NA, NA
#> $ mase_9      <dbl> NA, NA, NA
#> $ mase_10     <dbl> NA, NA, NA
#> $ mase_11     <dbl> NA, NA, NA
#> $ mase_12     <dbl> NA, NA, NA
#> $ mase_13     <dbl> NA, NA, NA
#> $ mase_14     <dbl> NA, NA, NA
#> $ mase_15     <dbl> NA, NA, NA
#> $ mase_16     <dbl> NA, NA, NA
#> $ mase_17     <dbl> NA, NA, NA
#> $ smape       <dbl> NA, NA, NA
#> $ smape_1     <dbl> NA, NA, NA
#> $ smape_2     <dbl> NA, NA, NA
#> $ smape_3     <dbl> NA, NA, NA
#> $ smape_4     <dbl> NA, NA, NA
#> $ smape_5     <dbl> NA, NA, NA
#> $ smape_6     <dbl> NA, NA, NA
#> $ smape_7     <dbl> NA, NA, NA
#> $ smape_8     <dbl> NA, NA, NA
#> $ smape_9     <dbl> NA, NA, NA
#> $ smape_10    <dbl> NA, NA, NA
#> $ smape_11    <dbl> NA, NA, NA
#> $ smape_12    <dbl> NA, NA, NA
#> $ smape_13    <dbl> NA, NA, NA
#> $ smape_14    <dbl> NA, NA, NA
#> $ smape_15    <dbl> NA, NA, NA
#> $ smape_16    <dbl> NA, NA, NA
#> $ smape_17    <dbl> NA, NA, NA
#> $ rmse        <dbl> NA, NA, NA
#> $ rmse_1      <dbl> NA, NA, NA
#> $ rmse_2      <dbl> NA, NA, NA
#> $ rmse_3      <dbl> NA, NA, NA
#> $ rmse_4      <dbl> NA, NA, NA
#> $ rmse_5      <dbl> NA, NA, NA
#> $ rmse_6      <dbl> NA, NA, NA
#> $ rmse_7      <dbl> NA, NA, NA
#> $ rmse_8      <dbl> NA, NA, NA
#> $ rmse_9      <dbl> NA, NA, NA
#> $ rmse_10     <dbl> NA, NA, NA
#> $ rmse_11     <dbl> NA, NA, NA
#> $ rmse_12     <dbl> NA, NA, NA
#> $ rmse_13     <dbl> NA, NA, NA
#> $ rmse_14     <dbl> NA, NA, NA
#> $ rmse_15     <dbl> NA, NA, NA
#> $ rmse_16     <dbl> NA, NA, NA
#> $ rmse_17     <dbl> NA, NA, NA
#> $ rsq         <dbl> NA, NA, NA
#> $ rsq_1       <dbl> NA, NA, NA
#> $ rsq_2       <dbl> NA, NA, NA
#> $ rsq_3       <dbl> NA, NA, NA
#> $ rsq_4       <dbl> NA, NA, NA
#> $ rsq_5       <dbl> NA, NA, NA
#> $ rsq_6       <dbl> NA, NA, NA
#> $ rsq_7       <dbl> NA, NA, NA
#> $ rsq_8       <dbl> NA, NA, NA
#> $ rsq_9       <dbl> NA, NA, NA
#> $ rsq_10      <dbl> NA, NA, NA
#> $ rsq_11      <dbl> NA, NA, NA
#> $ rsq_12      <dbl> NA, NA, NA
#> $ rsq_13      <dbl> NA, NA, NA
#> $ rsq_14      <dbl> NA, NA, NA
#> $ rsq_15      <dbl> NA, NA, NA
#> $ rsq_16      <dbl> NA, NA, NA
#> $ rsq_17      <dbl> NA, NA, NA

# Mean and Standard Deviation
m750_training_resamples_fitted %>%
    modeltime_resample_accuracy(
        summary_fns = list(mean = mean, sd = sd)
    ) %>% glimpse()
#> Rows: 3
#> Columns: 220
#> $ .model_id     <int> 1, 2, 3
#> $ .model_desc   <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type         <chr> "Resamples", "Resamples", "Resamples"
#> $ n             <int> 6, 6, 6
#> $ mae_mean      <dbl> NA, NA, NA
#> $ mae_sd        <dbl> NA, NA, NA
#> $ mae_1_mean    <dbl> NA, NA, NA
#> $ mae_1_sd      <dbl> NA, NA, NA
#> $ mae_2_mean    <dbl> NA, NA, NA
#> $ mae_2_sd      <dbl> NA, NA, NA
#> $ mae_3_mean    <dbl> NA, NA, NA
#> $ mae_3_sd      <dbl> NA, NA, NA
#> $ mae_4_mean    <dbl> NA, NA, NA
#> $ mae_4_sd      <dbl> NA, NA, NA
#> $ mae_5_mean    <dbl> NA, NA, NA
#> $ mae_5_sd      <dbl> NA, NA, NA
#> $ mae_6_mean    <dbl> NA, NA, NA
#> $ mae_6_sd      <dbl> NA, NA, NA
#> $ mae_7_mean    <dbl> NA, NA, NA
#> $ mae_7_sd      <dbl> NA, NA, NA
#> $ mae_8_mean    <dbl> NA, NA, NA
#> $ mae_8_sd      <dbl> NA, NA, NA
#> $ mae_9_mean    <dbl> NA, NA, NA
#> $ mae_9_sd      <dbl> NA, NA, NA
#> $ mae_10_mean   <dbl> NA, NA, NA
#> $ mae_10_sd     <dbl> NA, NA, NA
#> $ mae_11_mean   <dbl> NA, NA, NA
#> $ mae_11_sd     <dbl> NA, NA, NA
#> $ mae_12_mean   <dbl> NA, NA, NA
#> $ mae_12_sd     <dbl> NA, NA, NA
#> $ mae_13_mean   <dbl> NA, NA, NA
#> $ mae_13_sd     <dbl> NA, NA, NA
#> $ mae_14_mean   <dbl> NA, NA, NA
#> $ mae_14_sd     <dbl> NA, NA, NA
#> $ mae_15_mean   <dbl> NA, NA, NA
#> $ mae_15_sd     <dbl> NA, NA, NA
#> $ mae_16_mean   <dbl> NA, NA, NA
#> $ mae_16_sd     <dbl> NA, NA, NA
#> $ mae_17_mean   <dbl> NA, NA, NA
#> $ mae_17_sd     <dbl> NA, NA, NA
#> $ mape_mean     <dbl> NA, NA, NA
#> $ mape_sd       <dbl> NA, NA, NA
#> $ mape_1_mean   <dbl> NA, NA, NA
#> $ mape_1_sd     <dbl> NA, NA, NA
#> $ mape_2_mean   <dbl> NA, NA, NA
#> $ mape_2_sd     <dbl> NA, NA, NA
#> $ mape_3_mean   <dbl> NA, NA, NA
#> $ mape_3_sd     <dbl> NA, NA, NA
#> $ mape_4_mean   <dbl> NA, NA, NA
#> $ mape_4_sd     <dbl> NA, NA, NA
#> $ mape_5_mean   <dbl> NA, NA, NA
#> $ mape_5_sd     <dbl> NA, NA, NA
#> $ mape_6_mean   <dbl> NA, NA, NA
#> $ mape_6_sd     <dbl> NA, NA, NA
#> $ mape_7_mean   <dbl> NA, NA, NA
#> $ mape_7_sd     <dbl> NA, NA, NA
#> $ mape_8_mean   <dbl> NA, NA, NA
#> $ mape_8_sd     <dbl> NA, NA, NA
#> $ mape_9_mean   <dbl> NA, NA, NA
#> $ mape_9_sd     <dbl> NA, NA, NA
#> $ mape_10_mean  <dbl> NA, NA, NA
#> $ mape_10_sd    <dbl> NA, NA, NA
#> $ mape_11_mean  <dbl> NA, NA, NA
#> $ mape_11_sd    <dbl> NA, NA, NA
#> $ mape_12_mean  <dbl> NA, NA, NA
#> $ mape_12_sd    <dbl> NA, NA, NA
#> $ mape_13_mean  <dbl> NA, NA, NA
#> $ mape_13_sd    <dbl> NA, NA, NA
#> $ mape_14_mean  <dbl> NA, NA, NA
#> $ mape_14_sd    <dbl> NA, NA, NA
#> $ mape_15_mean  <dbl> NA, NA, NA
#> $ mape_15_sd    <dbl> NA, NA, NA
#> $ mape_16_mean  <dbl> NA, NA, NA
#> $ mape_16_sd    <dbl> NA, NA, NA
#> $ mape_17_mean  <dbl> NA, NA, NA
#> $ mape_17_sd    <dbl> NA, NA, NA
#> $ mase_mean     <dbl> NA, NA, NA
#> $ mase_sd       <dbl> NA, NA, NA
#> $ mase_1_mean   <dbl> NA, NA, NA
#> $ mase_1_sd     <dbl> NA, NA, NA
#> $ mase_2_mean   <dbl> NA, NA, NA
#> $ mase_2_sd     <dbl> NA, NA, NA
#> $ mase_3_mean   <dbl> NA, NA, NA
#> $ mase_3_sd     <dbl> NA, NA, NA
#> $ mase_4_mean   <dbl> NA, NA, NA
#> $ mase_4_sd     <dbl> NA, NA, NA
#> $ mase_5_mean   <dbl> NA, NA, NA
#> $ mase_5_sd     <dbl> NA, NA, NA
#> $ mase_6_mean   <dbl> NA, NA, NA
#> $ mase_6_sd     <dbl> NA, NA, NA
#> $ mase_7_mean   <dbl> NA, NA, NA
#> $ mase_7_sd     <dbl> NA, NA, NA
#> $ mase_8_mean   <dbl> NA, NA, NA
#> $ mase_8_sd     <dbl> NA, NA, NA
#> $ mase_9_mean   <dbl> NA, NA, NA
#> $ mase_9_sd     <dbl> NA, NA, NA
#> $ mase_10_mean  <dbl> NA, NA, NA
#> $ mase_10_sd    <dbl> NA, NA, NA
#> $ mase_11_mean  <dbl> NA, NA, NA
#> $ mase_11_sd    <dbl> NA, NA, NA
#> $ mase_12_mean  <dbl> NA, NA, NA
#> $ mase_12_sd    <dbl> NA, NA, NA
#> $ mase_13_mean  <dbl> NA, NA, NA
#> $ mase_13_sd    <dbl> NA, NA, NA
#> $ mase_14_mean  <dbl> NA, NA, NA
#> $ mase_14_sd    <dbl> NA, NA, NA
#> $ mase_15_mean  <dbl> NA, NA, NA
#> $ mase_15_sd    <dbl> NA, NA, NA
#> $ mase_16_mean  <dbl> NA, NA, NA
#> $ mase_16_sd    <dbl> NA, NA, NA
#> $ mase_17_mean  <dbl> NA, NA, NA
#> $ mase_17_sd    <dbl> NA, NA, NA
#> $ smape_mean    <dbl> NA, NA, NA
#> $ smape_sd      <dbl> NA, NA, NA
#> $ smape_1_mean  <dbl> NA, NA, NA
#> $ smape_1_sd    <dbl> NA, NA, NA
#> $ smape_2_mean  <dbl> NA, NA, NA
#> $ smape_2_sd    <dbl> NA, NA, NA
#> $ smape_3_mean  <dbl> NA, NA, NA
#> $ smape_3_sd    <dbl> NA, NA, NA
#> $ smape_4_mean  <dbl> NA, NA, NA
#> $ smape_4_sd    <dbl> NA, NA, NA
#> $ smape_5_mean  <dbl> NA, NA, NA
#> $ smape_5_sd    <dbl> NA, NA, NA
#> $ smape_6_mean  <dbl> NA, NA, NA
#> $ smape_6_sd    <dbl> NA, NA, NA
#> $ smape_7_mean  <dbl> NA, NA, NA
#> $ smape_7_sd    <dbl> NA, NA, NA
#> $ smape_8_mean  <dbl> NA, NA, NA
#> $ smape_8_sd    <dbl> NA, NA, NA
#> $ smape_9_mean  <dbl> NA, NA, NA
#> $ smape_9_sd    <dbl> NA, NA, NA
#> $ smape_10_mean <dbl> NA, NA, NA
#> $ smape_10_sd   <dbl> NA, NA, NA
#> $ smape_11_mean <dbl> NA, NA, NA
#> $ smape_11_sd   <dbl> NA, NA, NA
#> $ smape_12_mean <dbl> NA, NA, NA
#> $ smape_12_sd   <dbl> NA, NA, NA
#> $ smape_13_mean <dbl> NA, NA, NA
#> $ smape_13_sd   <dbl> NA, NA, NA
#> $ smape_14_mean <dbl> NA, NA, NA
#> $ smape_14_sd   <dbl> NA, NA, NA
#> $ smape_15_mean <dbl> NA, NA, NA
#> $ smape_15_sd   <dbl> NA, NA, NA
#> $ smape_16_mean <dbl> NA, NA, NA
#> $ smape_16_sd   <dbl> NA, NA, NA
#> $ smape_17_mean <dbl> NA, NA, NA
#> $ smape_17_sd   <dbl> NA, NA, NA
#> $ rmse_mean     <dbl> NA, NA, NA
#> $ rmse_sd       <dbl> NA, NA, NA
#> $ rmse_1_mean   <dbl> NA, NA, NA
#> $ rmse_1_sd     <dbl> NA, NA, NA
#> $ rmse_2_mean   <dbl> NA, NA, NA
#> $ rmse_2_sd     <dbl> NA, NA, NA
#> $ rmse_3_mean   <dbl> NA, NA, NA
#> $ rmse_3_sd     <dbl> NA, NA, NA
#> $ rmse_4_mean   <dbl> NA, NA, NA
#> $ rmse_4_sd     <dbl> NA, NA, NA
#> $ rmse_5_mean   <dbl> NA, NA, NA
#> $ rmse_5_sd     <dbl> NA, NA, NA
#> $ rmse_6_mean   <dbl> NA, NA, NA
#> $ rmse_6_sd     <dbl> NA, NA, NA
#> $ rmse_7_mean   <dbl> NA, NA, NA
#> $ rmse_7_sd     <dbl> NA, NA, NA
#> $ rmse_8_mean   <dbl> NA, NA, NA
#> $ rmse_8_sd     <dbl> NA, NA, NA
#> $ rmse_9_mean   <dbl> NA, NA, NA
#> $ rmse_9_sd     <dbl> NA, NA, NA
#> $ rmse_10_mean  <dbl> NA, NA, NA
#> $ rmse_10_sd    <dbl> NA, NA, NA
#> $ rmse_11_mean  <dbl> NA, NA, NA
#> $ rmse_11_sd    <dbl> NA, NA, NA
#> $ rmse_12_mean  <dbl> NA, NA, NA
#> $ rmse_12_sd    <dbl> NA, NA, NA
#> $ rmse_13_mean  <dbl> NA, NA, NA
#> $ rmse_13_sd    <dbl> NA, NA, NA
#> $ rmse_14_mean  <dbl> NA, NA, NA
#> $ rmse_14_sd    <dbl> NA, NA, NA
#> $ rmse_15_mean  <dbl> NA, NA, NA
#> $ rmse_15_sd    <dbl> NA, NA, NA
#> $ rmse_16_mean  <dbl> NA, NA, NA
#> $ rmse_16_sd    <dbl> NA, NA, NA
#> $ rmse_17_mean  <dbl> NA, NA, NA
#> $ rmse_17_sd    <dbl> NA, NA, NA
#> $ rsq_mean      <dbl> NA, NA, NA
#> $ rsq_sd        <dbl> NA, NA, NA
#> $ rsq_1_mean    <dbl> NA, NA, NA
#> $ rsq_1_sd      <dbl> NA, NA, NA
#> $ rsq_2_mean    <dbl> NA, NA, NA
#> $ rsq_2_sd      <dbl> NA, NA, NA
#> $ rsq_3_mean    <dbl> NA, NA, NA
#> $ rsq_3_sd      <dbl> NA, NA, NA
#> $ rsq_4_mean    <dbl> NA, NA, NA
#> $ rsq_4_sd      <dbl> NA, NA, NA
#> $ rsq_5_mean    <dbl> NA, NA, NA
#> $ rsq_5_sd      <dbl> NA, NA, NA
#> $ rsq_6_mean    <dbl> NA, NA, NA
#> $ rsq_6_sd      <dbl> NA, NA, NA
#> $ rsq_7_mean    <dbl> NA, NA, NA
#> $ rsq_7_sd      <dbl> NA, NA, NA
#> $ rsq_8_mean    <dbl> NA, NA, NA
#> $ rsq_8_sd      <dbl> NA, NA, NA
#> $ rsq_9_mean    <dbl> NA, NA, NA
#> $ rsq_9_sd      <dbl> NA, NA, NA
#> $ rsq_10_mean   <dbl> NA, NA, NA
#> $ rsq_10_sd     <dbl> NA, NA, NA
#> $ rsq_11_mean   <dbl> NA, NA, NA
#> $ rsq_11_sd     <dbl> NA, NA, NA
#> $ rsq_12_mean   <dbl> NA, NA, NA
#> $ rsq_12_sd     <dbl> NA, NA, NA
#> $ rsq_13_mean   <dbl> NA, NA, NA
#> $ rsq_13_sd     <dbl> NA, NA, NA
#> $ rsq_14_mean   <dbl> NA, NA, NA
#> $ rsq_14_sd     <dbl> NA, NA, NA
#> $ rsq_15_mean   <dbl> NA, NA, NA
#> $ rsq_15_sd     <dbl> NA, NA, NA
#> $ rsq_16_mean   <dbl> NA, NA, NA
#> $ rsq_16_sd     <dbl> NA, NA, NA
#> $ rsq_17_mean   <dbl> NA, NA, NA
#> $ rsq_17_sd     <dbl> NA, NA, NA

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1                 stringr_1.4.0                
#>  [3] dplyr_1.0.5                   purrr_0.3.4                  
#>  [5] readr_1.4.0                   tidyr_1.1.3                  
#>  [7] tibble_3.1.0                  ggplot2_3.3.3                
#>  [9] tidyverse_1.3.0               modeltime.resample_0.1.0.9000
#> [11] modeltime_0.4.1.9000         
#> 
#> loaded via a namespace (and not attached):
#>  [1] fs_1.5.0             xts_0.12.1           lubridate_1.7.10    
#>  [4] httr_1.4.2           DiceDesign_1.9       tools_4.0.3         
#>  [7] backports_1.2.1      utf8_1.1.4           R6_2.5.0            
#> [10] rpart_4.1-15         DBI_1.1.0            colorspace_2.0-0    
#> [13] yardstick_0.0.7      nnet_7.3-14          withr_2.4.1         
#> [16] tidyselect_1.1.0     compiler_4.0.3       rvest_0.3.6         
#> [19] cli_2.3.1            xml2_1.3.2           scales_1.1.1        
#> [22] tune_0.1.3           digest_0.6.27        StanHeaders_2.21.0-7
#> [25] rmarkdown_2.7        pkgconfig_2.0.3      htmltools_0.5.1.1   
#> [28] parallelly_1.23.0    lhs_1.1.1            dbplyr_2.0.0        
#> [31] highr_0.8            readxl_1.3.1         rlang_0.4.10        
#> [34] rstudioapi_0.13      generics_0.1.0       jsonlite_1.7.2      
#> [37] zoo_1.8-9            magrittr_2.0.1       Matrix_1.2-18       
#> [40] Rcpp_1.0.6           munsell_0.5.0        fansi_0.4.2         
#> [43] GPfit_1.0-8          lifecycle_1.0.0      furrr_0.2.2         
#> [46] stringi_1.5.3        pROC_1.17.0.1        yaml_2.2.1          
#> [49] MASS_7.3-53          plyr_1.8.6           recipes_0.1.15      
#> [52] grid_4.0.3           parallel_4.0.3       listenv_0.8.0       
#> [55] crayon_1.4.1         lattice_0.20-41      haven_2.3.1         
#> [58] splines_4.0.3        hms_1.0.0            knitr_1.30          
#> [61] ps_1.6.0             pillar_1.5.1         dials_0.0.9         
#> [64] codetools_0.2-16     parsnip_0.1.5        timetk_2.6.1        
#> [67] reprex_1.0.0         glue_1.4.2           evaluate_0.14       
#> [70] rsample_0.0.9        modelr_0.1.8         RcppParallel_5.0.3  
#> [73] vctrs_0.3.6          foreach_1.5.1        cellranger_1.1.0    
#> [76] gtable_0.3.0         future_1.21.0        assertthat_0.2.1    
#> [79] xfun_0.21            gower_0.2.2          prodlim_2019.11.13  
#> [82] broom_0.7.2          class_7.3-17         survival_3.2-7      
#> [85] timeDate_3043.102    iterators_1.0.13     hardhat_0.1.5       
#> [88] lava_1.6.9           workflows_0.2.2      globals_0.14.0      
#> [91] ellipsis_0.3.1       ipred_0.9-10
Created on 2021-03-13 by the reprex package (v1.0.0)

add .resample_id to modeltime_resample_accuracy output if summary_fns = NULL

modeltime_resample_accuracy returns the raw output of the resample fits if summary_fns = NULL. However, the output is returned without .resample_id, which means that it is not possible to check the metrics for specific resample id.

It would be valuable to have the .resample_id available in the case where modeltime_resample_accuracy(..., summary_fns = NULL)

I don't mind dropping a PR for this, but unclear when I'll have time.

Local Behavior in Resample Accuracy

Hi @mdancho84 ,

Now that the development of breaking down the accuracy by local models in modeltime has been done, I think it would be nice to somehow try to do the same for plot_modeltime_resamples() and modeltime_resample_accuracy() in this package, since when using panel data the results presented are those of the global models, but it would be interesting to somehow try to break down by local models as well.

I don't know what you think about this, let's comment.

Regards,

Display resampling accuracy metrics by ID. It is possible to see the results of the resampling folds by ID, that is, for each predicted time series using the function modeltime_resample_accuracy

Dear, @mdancho84 @AlbertoAlmuinha

There is the possibility of viewing the metrics in the resampling forecast by ID, I saw that summary_fns = NULL #1 #3 provides the results by resampling fold, but they are global results. I think it would be very valuable to be able to access the local metrics for each ID and the results per fold resampling. I saw that #4 already requested something similar. If this is already possible and you can help me visualize these results, I would appreciate it. Follow the scripts:

Link to download the database used: https://github.com/forecastingEDs/Forecasting-of-admissions-in-the-emergency-departments/blob/131bd23723a39724ad4f88ad6b8e5a58f42a7960/datasets.xlsx

data_tbl <- datasets %>%
  select(id, Date, attendences, average_temperature, min, max,  sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
  set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

data_tbl

Full = Training + Forecast Datasets

full_data_tbl <- datasets %>%
  select(id, Date, attendences, average_temperature, min, max,  sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
  set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) %>%

Apply Group-wise Time Series Manipulations

group_by(id) %>%
 future_frame(
   .date_var   = date,
   .length_out = "3 days",
   .bind_data  = TRUE
 ) %>%
 ungroup() %>%

Consolidate IDs

mutate(id = fct_drop(id))

Training Data

data_prepared_tbl <- full_data_tbl %>%
  filter(!is.na(value))

Forecast Data

future_tbl <- full_data_tbl %>%
  filter(is.na(value))

emergency_tscv <- data_prepared_tbl %>%
  time_series_cv(
    date_var    = date, 
    assess      = "3 days",
    skip        = "30 days",
    cumulative  = TRUE,
    slice_limit = 5
  )

emergency_tscv

test data preprocessing for ML ----

recipe_spec <- recipe(value ~ ., 
                      data = training(emergency_tscv$splits[[1]])) %>%
  step_timeseries_signature(date) %>%
  step_rm(matches("(.iso$)|(.xts$)|(hour)|(minute)|(second)|(am.pm)")) %>%
  step_mutate(data = factor(value, ordered = TRUE))%>%
  step_dummy(all_nominal(), one_hot = TRUE)%>%
  step_normalize (date_index.num,tempe_verage,tempemin,tempemax,date_year, -all_outcomes())

Model 1: Xgboost ----

wflw_fit_xgboost <- workflow() %>%
  add_model(
    boost_tree("regression") %>% set_engine("xgboost") 
  ) %>%
  add_recipe(recipe_spec %>% step_rm(date)) %>%
  fit(training(emergency_tscv$splits[[1]]))

Model 2: LightGBM ----

wflw_fit_lightgbm <- workflow() %>%
  add_model(
    boost_tree("regression") %>% set_engine("lightgbm")
  ) %>%
  add_recipe(recipe_spec %>% step_rm(date)) %>%
  fit(training(emergency_tscv$splits[[1]]))

---- MODELTIME TABLE ----

model_tbl <- modeltime_table(
  wflw_fit_xgboost,
  wflw_fit_lightgbm
)

model_tbl

resample_results <- model_tbl %>%
  modeltime_fit_resamples(
    resamples = emergency_tscv,
    control   = control_resamples(allow_par = TRUE, verbose = TRUE)
  )

resample_results

This step I need the results by ID but I only get the global results by fold resampling. Can you help me?

resample_results %>%
  modeltime_resample_accuracy(summary_fns = NULL, yardstick::metric_set(mape, smape, mase, rmse)) %>%
  table_modeltime_accuracy(.interactive = FALSE)

Request to resampling to nested forecasting.

Hi Matt,

Requesting for you to add k-fold and ts-fold feature to your nested forecasting. I am starting to use the nested forecasting a lot but I get poor results if the recent couple months (my test set) were a little off from the rest of the time series.

Thanks!

Ability to combine resample tables

Shafi Qureshi:

Hello, Matt Dancho I am using 15 different models, LightGBM, CATboost, XGBoost, and others. The problem is when running the models with resamples_fitted <- submodels_1_tbl %>%
modeltime_fit_resamples(
resamples = resample_spec ,
control = control_resamples(verbose = TRUE,allow_par = TRUE )) The R studio crushes. So I am running the LightGBM, CATboost, XGBoost models in differentresamples_fitted therefore, I have now resamples_fitted_1 ,resamples_fitted_2 and resamples_fitted_3. . I want to combine all threeso could be used for the stacked ensemble. Is there a way to combine all different resample fits (resamples_fitted ) models together like combine_modeltime_tables ?

Not possible to use ensemble

I was going to use an ensemble as a part of my time series cross validation. This did not work. Here is my code.

# Create average ensemble and add to the modeltime table
ml_mtbl <- ml_mtbl %>% 
    combine_modeltime_tables(
        ml_mtbl %>% 
            ensemble_average() %>% 
            modeltime_table()
    )
    
# TS CV
resamples_tscv <- time_series_cv(
    data        = train_data,
    assess      = "11 days",
    initial     = "730 days",
    skip        = 11,
    slice_limit = 20,
    cumulative = TRUE
    )
resamples_fitted <- ml_mtbl %>% 
    modeltime_fit_resamples(
        resamples = resamples_tscv,
        control   = control_resamples(verbose = FALSE, allow_par = TRUE)
    )

The results:

> resamples_fitted
# Modeltime Table
# A tibble: 6 x 4
  .model_id .model         .model_desc               .resample_results
      <int> <list>         <chr>                     <list>           
1         1 <workflow>     XGBOOST                   <rsmp[+]>        
2         2 <workflow>     RANGER                    <rsmp[+]>        
3         3 <workflow>     GLMNET                    <rsmp[+]>        
4         4 <workflow>     KERNLAB                   <rsmp[+]>        
5         5 <workflow>     KERNLAB                   <rsmp[+]>        
6         6 <ensemble [5]> ENSEMBLE (MEAN): 5 MODELS <lgl [1]>

So when I want to check the accuracy I only get the accuracy for the individual models, not the ensemble. So this code gives me a tibble with all the five models, not he ensemble.

resamples_fitted %>%
    modeltime_resample_accuracy()

resampling with external regressors with smooth_es

I try to run the following code:

model_fit_exp_smooth <- exp_smoothing(
mode = "regression",
seasonal_period = 12,
error = "additive",
trend = "additive",
season = "additive"
) %>%
set_engine(engine = "smooth_es") %>%
fit(value ~ date + moy + VIXRQBIN + DFFD, data = initial_training)

my_models_tbl <- modeltime_table(
model_fit_exp_smooth,
model_fit_lm
)

resamples_fitted <- my_models_tbl %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(allow_par = TRUE, parallel_over = NULL, verbose = T)
)
── Fitting Resamples ────────────────────────────────────────────

• Model ID: 1 ETSX(AAA)
i Slice01: preprocessor 1/1
✓ Slice01: preprocessor 1/1
i Slice01: preprocessor 1/1, model 1/1
✓ Slice01: preprocessor 1/1, model 1/1
i Slice01: preprocessor 1/1, model 1/1 (extracts)
i Slice01: preprocessor 1/1, model 1/1 (predictions)
! Slice01: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice02: preprocessor 1/1
✓ Slice02: preprocessor 1/1
i Slice02: preprocessor 1/1, model 1/1
✓ Slice02: preprocessor 1/1, model 1/1
i Slice02: preprocessor 1/1, model 1/1 (extracts)
i Slice02: preprocessor 1/1, model 1/1 (predictions)
! Slice02: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice03: preprocessor 1/1
✓ Slice03: preprocessor 1/1
i Slice03: preprocessor 1/1, model 1/1
✓ Slice03: preprocessor 1/1, model 1/1
i Slice03: preprocessor 1/1, model 1/1 (extracts)
i Slice03: preprocessor 1/1, model 1/1 (predictions)
! Slice03: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...

Does the warning "The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model..." mean that my external regressors for smooth_es are not used at all on modeltime_fit_resamples() or does it mean that the external regressors are taken from the in-sample data directly following the slice being processed?