business-science / modeltime.resample Goto Github PK
View Code? Open in Web Editor NEWResampling Tools for Time Series Forecasting with Modeltime
Home Page: https://business-science.github.io/modeltime.resample/
License: Other
Resampling Tools for Time Series Forecasting with Modeltime
Home Page: https://business-science.github.io/modeltime.resample/
License: Other
Full Example: https://github.com/wtmoreland3/reprex/blob/3915d45d424135d8ca27c8c607ffc9da70aa89bf/modeltime_resample_error
Models that have inline functions fail. Example:
model_fit_lm <- linear_reg() %>%
set_engine("lm") %>%
fit(
value ~ as.numeric(date) + month(date, label = TRUE),
data = training(splits)
)
Causes this error:
## > model_tbl_tuned_resamples <- submodels_tbl %>%
## + slice(c(5)) %>%
## + modeltime_fit_resamples(
## + resamples = resamples_tscv,
## + control = control_resamples(verbose = TRUE, allow_par = TRUE)
## + )
## -- Fitting Resamples --------------------------------------------
##
## * Model ID: 5 LM
## Error: No in-line functions should be used here; use steps to define baking actions.
## 0.03 sec elapsed
Hi
Is there a way to use Resample with Autoregressive Models?
Thanks
Suppose that I have two slices, say [1, 2, 3, 4] and [2, 3, 4, 5]; and I want to make a two-period-ahead forecast in each slice. For slice 1, I used the first two observations [1, 2] to train a model which will then be used to forecast the last two observations [3, 4]. Similarly, for slice 2, I used [2, 3] to train a model which will then be used to forecast [4, 5]. Let's assume that I obtain [3.5, 4.5] as the forecasts of [3, 4] in slide 1 and [4.1, 5.1] as the forecasts of [4, 5] in slide 2.
I then proceed to calculate the RMSE for slide 1 as $sqrt[( (3.5 - 3)^2 + (4.5 - 4)^2 ) / 2] $ and the RMSE for slide 2 as
Thank you!
The out-of-sample predictions generated by modeltime_fit_resamples() incorrectly show the predicted variable as if it was lagged.
When I run the following code it outputs out-of-sample projections, as expected :
submodels_resamples_tscv_tbl <- submodels_tbl %>%
modeltime_fit_resamples(
resamples = cv_resamples
)
However, a closer inspection of the output provided by modeltime_fit_resamples() shows that the predicted variable is not indexed by the same ".row" (id column) as it was inside "cv_resamples"
# Input received by modeltime_fit_resamples()
input <- cv_resamples %>%
filter(id == "Slice01") %>%
pull(splits) %>%
first() %>%
training() %>%
select(.row, value)
input %>% filter(.row >= 1816, .row <= 1825 ) %>% print()
# Output generated by modeltime_fit_resamples()
output <- submodels_resamples_tscv_tbl %>%
select(.model_desc, .resample_results) %>%
unnest(.resample_results) %>%
select(.model_desc, id, .predictions) %>%
unnest(.predictions)
output %>% select(.row, value) %>% filter(.row >= 1816, .row <= 1825 ) %>% print()
If we plot those variables, we can clearly see that the test set was lagged by a few days.
The first slice inside 'cv_resamples' includes the test and the training set, both depicted in blue.
The output, includes the out-of-sample projections for the test set but we ignore those and plot only the realized value of the target variable, in red.
I include the code below just for completeness.
ggplot() +
geom_line(data = output, aes(x = .row, y = value, color = "input")) +
geom_line(data = input, aes(x = .row, y = value, color = "output"))
This is a replicate of tidymodels/hardhat#200
Hi @mdancho84,
I cannot for the life of me get the function modeltime_resample_accuracy to work when using 1 observation in each test set.
When running:
resamples = time_series_cv(
data = score_df,
initial = "180 days",
assess = "1 days"
)
... models ...
resamples_fitted = model_table %>%
modeltime_fit_resamples(
resamples = resamples,
control = control_resamples(verbose = FALSE)
)
resamples_fitted %>%
modeltime_resample_accuracy(summary_fns = mean) %>%
table_modeltime_accuracy(.interactive = FALSE)
I get the following error:
Error: In metric: `mase`
Problem with `summarise()` column `.estimate`.
i `.estimate = metric_fn(...)`.
x `truth` must have a length greater than `m` to compute the out-of-sample naive mean absolute error.
i The error occurred in group 1: .model_id = 1, .model_desc = "ARIMA(2,0,2)(1,1,2)[7] WITH DRIFT", .resample_id = "Slice01", .type = "Resamples".
Using 2 days instantly fixes the problem, but this is not what I want. I want to judge the average day-ahead performance of each model, e.g. over 30 days.
Hi @mdancho84,
I am running the ?modeltime_resample_accuracy example and I am getting all NA values. I have also realized that this is something that was not happening before so you can read in the following link where it says "From the table below, ARIMA has a 6% lower RMSE" but now the table appears all with NA values:
https://business-science.github.io/modeltime.resample/articles/getting-started.html#accuracy-table-1
I also realized that this problem could be solved using purrr::partial, but I think maybe is better to limit the summarization functions and have the function handle it internally (Not sure I'm just thinking out loud).
m750_training_resamples_fitted %>%
modeltime_resample_accuracy(summary_fns = partial(mean, na.rm = TRUE))
Finally, I would unify the output so we don't have as many columns (in this example, we have 18 columns per metric when we actually have 6 resamples). Couldn't have as many columns per metric as number of resamples?
library(modeltime)
library(modeltime.resample)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'dplyr' was built under R version 4.0.4
#> Warning: package 'forcats' was built under R version 4.0.4
# Mean (Default)
m750_training_resamples_fitted %>%
modeltime_resample_accuracy() %>%
glimpse()
#> Rows: 3
#> Columns: 112
#> $ .model_id <int> 1, 2, 3
#> $ .model_desc <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type <chr> "Resamples", "Resamples", "Resamples"
#> $ n <int> 6, 6, 6
#> $ mae <dbl> NA, NA, NA
#> $ mae_1 <dbl> NA, NA, NA
#> $ mae_2 <dbl> NA, NA, NA
#> $ mae_3 <dbl> NA, NA, NA
#> $ mae_4 <dbl> NA, NA, NA
#> $ mae_5 <dbl> NA, NA, NA
#> $ mae_6 <dbl> NA, NA, NA
#> $ mae_7 <dbl> NA, NA, NA
#> $ mae_8 <dbl> NA, NA, NA
#> $ mae_9 <dbl> NA, NA, NA
#> $ mae_10 <dbl> NA, NA, NA
#> $ mae_11 <dbl> NA, NA, NA
#> $ mae_12 <dbl> NA, NA, NA
#> $ mae_13 <dbl> NA, NA, NA
#> $ mae_14 <dbl> NA, NA, NA
#> $ mae_15 <dbl> NA, NA, NA
#> $ mae_16 <dbl> NA, NA, NA
#> $ mae_17 <dbl> NA, NA, NA
#> $ mape <dbl> NA, NA, NA
#> $ mape_1 <dbl> NA, NA, NA
#> $ mape_2 <dbl> NA, NA, NA
#> $ mape_3 <dbl> NA, NA, NA
#> $ mape_4 <dbl> NA, NA, NA
#> $ mape_5 <dbl> NA, NA, NA
#> $ mape_6 <dbl> NA, NA, NA
#> $ mape_7 <dbl> NA, NA, NA
#> $ mape_8 <dbl> NA, NA, NA
#> $ mape_9 <dbl> NA, NA, NA
#> $ mape_10 <dbl> NA, NA, NA
#> $ mape_11 <dbl> NA, NA, NA
#> $ mape_12 <dbl> NA, NA, NA
#> $ mape_13 <dbl> NA, NA, NA
#> $ mape_14 <dbl> NA, NA, NA
#> $ mape_15 <dbl> NA, NA, NA
#> $ mape_16 <dbl> NA, NA, NA
#> $ mape_17 <dbl> NA, NA, NA
#> $ mase <dbl> NA, NA, NA
#> $ mase_1 <dbl> NA, NA, NA
#> $ mase_2 <dbl> NA, NA, NA
#> $ mase_3 <dbl> NA, NA, NA
#> $ mase_4 <dbl> NA, NA, NA
#> $ mase_5 <dbl> NA, NA, NA
#> $ mase_6 <dbl> NA, NA, NA
#> $ mase_7 <dbl> NA, NA, NA
#> $ mase_8 <dbl> NA, NA, NA
#> $ mase_9 <dbl> NA, NA, NA
#> $ mase_10 <dbl> NA, NA, NA
#> $ mase_11 <dbl> NA, NA, NA
#> $ mase_12 <dbl> NA, NA, NA
#> $ mase_13 <dbl> NA, NA, NA
#> $ mase_14 <dbl> NA, NA, NA
#> $ mase_15 <dbl> NA, NA, NA
#> $ mase_16 <dbl> NA, NA, NA
#> $ mase_17 <dbl> NA, NA, NA
#> $ smape <dbl> NA, NA, NA
#> $ smape_1 <dbl> NA, NA, NA
#> $ smape_2 <dbl> NA, NA, NA
#> $ smape_3 <dbl> NA, NA, NA
#> $ smape_4 <dbl> NA, NA, NA
#> $ smape_5 <dbl> NA, NA, NA
#> $ smape_6 <dbl> NA, NA, NA
#> $ smape_7 <dbl> NA, NA, NA
#> $ smape_8 <dbl> NA, NA, NA
#> $ smape_9 <dbl> NA, NA, NA
#> $ smape_10 <dbl> NA, NA, NA
#> $ smape_11 <dbl> NA, NA, NA
#> $ smape_12 <dbl> NA, NA, NA
#> $ smape_13 <dbl> NA, NA, NA
#> $ smape_14 <dbl> NA, NA, NA
#> $ smape_15 <dbl> NA, NA, NA
#> $ smape_16 <dbl> NA, NA, NA
#> $ smape_17 <dbl> NA, NA, NA
#> $ rmse <dbl> NA, NA, NA
#> $ rmse_1 <dbl> NA, NA, NA
#> $ rmse_2 <dbl> NA, NA, NA
#> $ rmse_3 <dbl> NA, NA, NA
#> $ rmse_4 <dbl> NA, NA, NA
#> $ rmse_5 <dbl> NA, NA, NA
#> $ rmse_6 <dbl> NA, NA, NA
#> $ rmse_7 <dbl> NA, NA, NA
#> $ rmse_8 <dbl> NA, NA, NA
#> $ rmse_9 <dbl> NA, NA, NA
#> $ rmse_10 <dbl> NA, NA, NA
#> $ rmse_11 <dbl> NA, NA, NA
#> $ rmse_12 <dbl> NA, NA, NA
#> $ rmse_13 <dbl> NA, NA, NA
#> $ rmse_14 <dbl> NA, NA, NA
#> $ rmse_15 <dbl> NA, NA, NA
#> $ rmse_16 <dbl> NA, NA, NA
#> $ rmse_17 <dbl> NA, NA, NA
#> $ rsq <dbl> NA, NA, NA
#> $ rsq_1 <dbl> NA, NA, NA
#> $ rsq_2 <dbl> NA, NA, NA
#> $ rsq_3 <dbl> NA, NA, NA
#> $ rsq_4 <dbl> NA, NA, NA
#> $ rsq_5 <dbl> NA, NA, NA
#> $ rsq_6 <dbl> NA, NA, NA
#> $ rsq_7 <dbl> NA, NA, NA
#> $ rsq_8 <dbl> NA, NA, NA
#> $ rsq_9 <dbl> NA, NA, NA
#> $ rsq_10 <dbl> NA, NA, NA
#> $ rsq_11 <dbl> NA, NA, NA
#> $ rsq_12 <dbl> NA, NA, NA
#> $ rsq_13 <dbl> NA, NA, NA
#> $ rsq_14 <dbl> NA, NA, NA
#> $ rsq_15 <dbl> NA, NA, NA
#> $ rsq_16 <dbl> NA, NA, NA
#> $ rsq_17 <dbl> NA, NA, NA
# Mean and Standard Deviation
m750_training_resamples_fitted %>%
modeltime_resample_accuracy(
summary_fns = list(mean = mean, sd = sd)
) %>% glimpse()
#> Rows: 3
#> Columns: 220
#> $ .model_id <int> 1, 2, 3
#> $ .model_desc <chr> "ARIMA(0,1,1)(0,1,1)[12]", "PROPHET", "GLMNET"
#> $ .type <chr> "Resamples", "Resamples", "Resamples"
#> $ n <int> 6, 6, 6
#> $ mae_mean <dbl> NA, NA, NA
#> $ mae_sd <dbl> NA, NA, NA
#> $ mae_1_mean <dbl> NA, NA, NA
#> $ mae_1_sd <dbl> NA, NA, NA
#> $ mae_2_mean <dbl> NA, NA, NA
#> $ mae_2_sd <dbl> NA, NA, NA
#> $ mae_3_mean <dbl> NA, NA, NA
#> $ mae_3_sd <dbl> NA, NA, NA
#> $ mae_4_mean <dbl> NA, NA, NA
#> $ mae_4_sd <dbl> NA, NA, NA
#> $ mae_5_mean <dbl> NA, NA, NA
#> $ mae_5_sd <dbl> NA, NA, NA
#> $ mae_6_mean <dbl> NA, NA, NA
#> $ mae_6_sd <dbl> NA, NA, NA
#> $ mae_7_mean <dbl> NA, NA, NA
#> $ mae_7_sd <dbl> NA, NA, NA
#> $ mae_8_mean <dbl> NA, NA, NA
#> $ mae_8_sd <dbl> NA, NA, NA
#> $ mae_9_mean <dbl> NA, NA, NA
#> $ mae_9_sd <dbl> NA, NA, NA
#> $ mae_10_mean <dbl> NA, NA, NA
#> $ mae_10_sd <dbl> NA, NA, NA
#> $ mae_11_mean <dbl> NA, NA, NA
#> $ mae_11_sd <dbl> NA, NA, NA
#> $ mae_12_mean <dbl> NA, NA, NA
#> $ mae_12_sd <dbl> NA, NA, NA
#> $ mae_13_mean <dbl> NA, NA, NA
#> $ mae_13_sd <dbl> NA, NA, NA
#> $ mae_14_mean <dbl> NA, NA, NA
#> $ mae_14_sd <dbl> NA, NA, NA
#> $ mae_15_mean <dbl> NA, NA, NA
#> $ mae_15_sd <dbl> NA, NA, NA
#> $ mae_16_mean <dbl> NA, NA, NA
#> $ mae_16_sd <dbl> NA, NA, NA
#> $ mae_17_mean <dbl> NA, NA, NA
#> $ mae_17_sd <dbl> NA, NA, NA
#> $ mape_mean <dbl> NA, NA, NA
#> $ mape_sd <dbl> NA, NA, NA
#> $ mape_1_mean <dbl> NA, NA, NA
#> $ mape_1_sd <dbl> NA, NA, NA
#> $ mape_2_mean <dbl> NA, NA, NA
#> $ mape_2_sd <dbl> NA, NA, NA
#> $ mape_3_mean <dbl> NA, NA, NA
#> $ mape_3_sd <dbl> NA, NA, NA
#> $ mape_4_mean <dbl> NA, NA, NA
#> $ mape_4_sd <dbl> NA, NA, NA
#> $ mape_5_mean <dbl> NA, NA, NA
#> $ mape_5_sd <dbl> NA, NA, NA
#> $ mape_6_mean <dbl> NA, NA, NA
#> $ mape_6_sd <dbl> NA, NA, NA
#> $ mape_7_mean <dbl> NA, NA, NA
#> $ mape_7_sd <dbl> NA, NA, NA
#> $ mape_8_mean <dbl> NA, NA, NA
#> $ mape_8_sd <dbl> NA, NA, NA
#> $ mape_9_mean <dbl> NA, NA, NA
#> $ mape_9_sd <dbl> NA, NA, NA
#> $ mape_10_mean <dbl> NA, NA, NA
#> $ mape_10_sd <dbl> NA, NA, NA
#> $ mape_11_mean <dbl> NA, NA, NA
#> $ mape_11_sd <dbl> NA, NA, NA
#> $ mape_12_mean <dbl> NA, NA, NA
#> $ mape_12_sd <dbl> NA, NA, NA
#> $ mape_13_mean <dbl> NA, NA, NA
#> $ mape_13_sd <dbl> NA, NA, NA
#> $ mape_14_mean <dbl> NA, NA, NA
#> $ mape_14_sd <dbl> NA, NA, NA
#> $ mape_15_mean <dbl> NA, NA, NA
#> $ mape_15_sd <dbl> NA, NA, NA
#> $ mape_16_mean <dbl> NA, NA, NA
#> $ mape_16_sd <dbl> NA, NA, NA
#> $ mape_17_mean <dbl> NA, NA, NA
#> $ mape_17_sd <dbl> NA, NA, NA
#> $ mase_mean <dbl> NA, NA, NA
#> $ mase_sd <dbl> NA, NA, NA
#> $ mase_1_mean <dbl> NA, NA, NA
#> $ mase_1_sd <dbl> NA, NA, NA
#> $ mase_2_mean <dbl> NA, NA, NA
#> $ mase_2_sd <dbl> NA, NA, NA
#> $ mase_3_mean <dbl> NA, NA, NA
#> $ mase_3_sd <dbl> NA, NA, NA
#> $ mase_4_mean <dbl> NA, NA, NA
#> $ mase_4_sd <dbl> NA, NA, NA
#> $ mase_5_mean <dbl> NA, NA, NA
#> $ mase_5_sd <dbl> NA, NA, NA
#> $ mase_6_mean <dbl> NA, NA, NA
#> $ mase_6_sd <dbl> NA, NA, NA
#> $ mase_7_mean <dbl> NA, NA, NA
#> $ mase_7_sd <dbl> NA, NA, NA
#> $ mase_8_mean <dbl> NA, NA, NA
#> $ mase_8_sd <dbl> NA, NA, NA
#> $ mase_9_mean <dbl> NA, NA, NA
#> $ mase_9_sd <dbl> NA, NA, NA
#> $ mase_10_mean <dbl> NA, NA, NA
#> $ mase_10_sd <dbl> NA, NA, NA
#> $ mase_11_mean <dbl> NA, NA, NA
#> $ mase_11_sd <dbl> NA, NA, NA
#> $ mase_12_mean <dbl> NA, NA, NA
#> $ mase_12_sd <dbl> NA, NA, NA
#> $ mase_13_mean <dbl> NA, NA, NA
#> $ mase_13_sd <dbl> NA, NA, NA
#> $ mase_14_mean <dbl> NA, NA, NA
#> $ mase_14_sd <dbl> NA, NA, NA
#> $ mase_15_mean <dbl> NA, NA, NA
#> $ mase_15_sd <dbl> NA, NA, NA
#> $ mase_16_mean <dbl> NA, NA, NA
#> $ mase_16_sd <dbl> NA, NA, NA
#> $ mase_17_mean <dbl> NA, NA, NA
#> $ mase_17_sd <dbl> NA, NA, NA
#> $ smape_mean <dbl> NA, NA, NA
#> $ smape_sd <dbl> NA, NA, NA
#> $ smape_1_mean <dbl> NA, NA, NA
#> $ smape_1_sd <dbl> NA, NA, NA
#> $ smape_2_mean <dbl> NA, NA, NA
#> $ smape_2_sd <dbl> NA, NA, NA
#> $ smape_3_mean <dbl> NA, NA, NA
#> $ smape_3_sd <dbl> NA, NA, NA
#> $ smape_4_mean <dbl> NA, NA, NA
#> $ smape_4_sd <dbl> NA, NA, NA
#> $ smape_5_mean <dbl> NA, NA, NA
#> $ smape_5_sd <dbl> NA, NA, NA
#> $ smape_6_mean <dbl> NA, NA, NA
#> $ smape_6_sd <dbl> NA, NA, NA
#> $ smape_7_mean <dbl> NA, NA, NA
#> $ smape_7_sd <dbl> NA, NA, NA
#> $ smape_8_mean <dbl> NA, NA, NA
#> $ smape_8_sd <dbl> NA, NA, NA
#> $ smape_9_mean <dbl> NA, NA, NA
#> $ smape_9_sd <dbl> NA, NA, NA
#> $ smape_10_mean <dbl> NA, NA, NA
#> $ smape_10_sd <dbl> NA, NA, NA
#> $ smape_11_mean <dbl> NA, NA, NA
#> $ smape_11_sd <dbl> NA, NA, NA
#> $ smape_12_mean <dbl> NA, NA, NA
#> $ smape_12_sd <dbl> NA, NA, NA
#> $ smape_13_mean <dbl> NA, NA, NA
#> $ smape_13_sd <dbl> NA, NA, NA
#> $ smape_14_mean <dbl> NA, NA, NA
#> $ smape_14_sd <dbl> NA, NA, NA
#> $ smape_15_mean <dbl> NA, NA, NA
#> $ smape_15_sd <dbl> NA, NA, NA
#> $ smape_16_mean <dbl> NA, NA, NA
#> $ smape_16_sd <dbl> NA, NA, NA
#> $ smape_17_mean <dbl> NA, NA, NA
#> $ smape_17_sd <dbl> NA, NA, NA
#> $ rmse_mean <dbl> NA, NA, NA
#> $ rmse_sd <dbl> NA, NA, NA
#> $ rmse_1_mean <dbl> NA, NA, NA
#> $ rmse_1_sd <dbl> NA, NA, NA
#> $ rmse_2_mean <dbl> NA, NA, NA
#> $ rmse_2_sd <dbl> NA, NA, NA
#> $ rmse_3_mean <dbl> NA, NA, NA
#> $ rmse_3_sd <dbl> NA, NA, NA
#> $ rmse_4_mean <dbl> NA, NA, NA
#> $ rmse_4_sd <dbl> NA, NA, NA
#> $ rmse_5_mean <dbl> NA, NA, NA
#> $ rmse_5_sd <dbl> NA, NA, NA
#> $ rmse_6_mean <dbl> NA, NA, NA
#> $ rmse_6_sd <dbl> NA, NA, NA
#> $ rmse_7_mean <dbl> NA, NA, NA
#> $ rmse_7_sd <dbl> NA, NA, NA
#> $ rmse_8_mean <dbl> NA, NA, NA
#> $ rmse_8_sd <dbl> NA, NA, NA
#> $ rmse_9_mean <dbl> NA, NA, NA
#> $ rmse_9_sd <dbl> NA, NA, NA
#> $ rmse_10_mean <dbl> NA, NA, NA
#> $ rmse_10_sd <dbl> NA, NA, NA
#> $ rmse_11_mean <dbl> NA, NA, NA
#> $ rmse_11_sd <dbl> NA, NA, NA
#> $ rmse_12_mean <dbl> NA, NA, NA
#> $ rmse_12_sd <dbl> NA, NA, NA
#> $ rmse_13_mean <dbl> NA, NA, NA
#> $ rmse_13_sd <dbl> NA, NA, NA
#> $ rmse_14_mean <dbl> NA, NA, NA
#> $ rmse_14_sd <dbl> NA, NA, NA
#> $ rmse_15_mean <dbl> NA, NA, NA
#> $ rmse_15_sd <dbl> NA, NA, NA
#> $ rmse_16_mean <dbl> NA, NA, NA
#> $ rmse_16_sd <dbl> NA, NA, NA
#> $ rmse_17_mean <dbl> NA, NA, NA
#> $ rmse_17_sd <dbl> NA, NA, NA
#> $ rsq_mean <dbl> NA, NA, NA
#> $ rsq_sd <dbl> NA, NA, NA
#> $ rsq_1_mean <dbl> NA, NA, NA
#> $ rsq_1_sd <dbl> NA, NA, NA
#> $ rsq_2_mean <dbl> NA, NA, NA
#> $ rsq_2_sd <dbl> NA, NA, NA
#> $ rsq_3_mean <dbl> NA, NA, NA
#> $ rsq_3_sd <dbl> NA, NA, NA
#> $ rsq_4_mean <dbl> NA, NA, NA
#> $ rsq_4_sd <dbl> NA, NA, NA
#> $ rsq_5_mean <dbl> NA, NA, NA
#> $ rsq_5_sd <dbl> NA, NA, NA
#> $ rsq_6_mean <dbl> NA, NA, NA
#> $ rsq_6_sd <dbl> NA, NA, NA
#> $ rsq_7_mean <dbl> NA, NA, NA
#> $ rsq_7_sd <dbl> NA, NA, NA
#> $ rsq_8_mean <dbl> NA, NA, NA
#> $ rsq_8_sd <dbl> NA, NA, NA
#> $ rsq_9_mean <dbl> NA, NA, NA
#> $ rsq_9_sd <dbl> NA, NA, NA
#> $ rsq_10_mean <dbl> NA, NA, NA
#> $ rsq_10_sd <dbl> NA, NA, NA
#> $ rsq_11_mean <dbl> NA, NA, NA
#> $ rsq_11_sd <dbl> NA, NA, NA
#> $ rsq_12_mean <dbl> NA, NA, NA
#> $ rsq_12_sd <dbl> NA, NA, NA
#> $ rsq_13_mean <dbl> NA, NA, NA
#> $ rsq_13_sd <dbl> NA, NA, NA
#> $ rsq_14_mean <dbl> NA, NA, NA
#> $ rsq_14_sd <dbl> NA, NA, NA
#> $ rsq_15_mean <dbl> NA, NA, NA
#> $ rsq_15_sd <dbl> NA, NA, NA
#> $ rsq_16_mean <dbl> NA, NA, NA
#> $ rsq_16_sd <dbl> NA, NA, NA
#> $ rsq_17_mean <dbl> NA, NA, NA
#> $ rsq_17_sd <dbl> NA, NA, NA
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.4.0
#> [3] dplyr_1.0.5 purrr_0.3.4
#> [5] readr_1.4.0 tidyr_1.1.3
#> [7] tibble_3.1.0 ggplot2_3.3.3
#> [9] tidyverse_1.3.0 modeltime.resample_0.1.0.9000
#> [11] modeltime_0.4.1.9000
#>
#> loaded via a namespace (and not attached):
#> [1] fs_1.5.0 xts_0.12.1 lubridate_1.7.10
#> [4] httr_1.4.2 DiceDesign_1.9 tools_4.0.3
#> [7] backports_1.2.1 utf8_1.1.4 R6_2.5.0
#> [10] rpart_4.1-15 DBI_1.1.0 colorspace_2.0-0
#> [13] yardstick_0.0.7 nnet_7.3-14 withr_2.4.1
#> [16] tidyselect_1.1.0 compiler_4.0.3 rvest_0.3.6
#> [19] cli_2.3.1 xml2_1.3.2 scales_1.1.1
#> [22] tune_0.1.3 digest_0.6.27 StanHeaders_2.21.0-7
#> [25] rmarkdown_2.7 pkgconfig_2.0.3 htmltools_0.5.1.1
#> [28] parallelly_1.23.0 lhs_1.1.1 dbplyr_2.0.0
#> [31] highr_0.8 readxl_1.3.1 rlang_0.4.10
#> [34] rstudioapi_0.13 generics_0.1.0 jsonlite_1.7.2
#> [37] zoo_1.8-9 magrittr_2.0.1 Matrix_1.2-18
#> [40] Rcpp_1.0.6 munsell_0.5.0 fansi_0.4.2
#> [43] GPfit_1.0-8 lifecycle_1.0.0 furrr_0.2.2
#> [46] stringi_1.5.3 pROC_1.17.0.1 yaml_2.2.1
#> [49] MASS_7.3-53 plyr_1.8.6 recipes_0.1.15
#> [52] grid_4.0.3 parallel_4.0.3 listenv_0.8.0
#> [55] crayon_1.4.1 lattice_0.20-41 haven_2.3.1
#> [58] splines_4.0.3 hms_1.0.0 knitr_1.30
#> [61] ps_1.6.0 pillar_1.5.1 dials_0.0.9
#> [64] codetools_0.2-16 parsnip_0.1.5 timetk_2.6.1
#> [67] reprex_1.0.0 glue_1.4.2 evaluate_0.14
#> [70] rsample_0.0.9 modelr_0.1.8 RcppParallel_5.0.3
#> [73] vctrs_0.3.6 foreach_1.5.1 cellranger_1.1.0
#> [76] gtable_0.3.0 future_1.21.0 assertthat_0.2.1
#> [79] xfun_0.21 gower_0.2.2 prodlim_2019.11.13
#> [82] broom_0.7.2 class_7.3-17 survival_3.2-7
#> [85] timeDate_3043.102 iterators_1.0.13 hardhat_0.1.5
#> [88] lava_1.6.9 workflows_0.2.2 globals_0.14.0
#> [91] ellipsis_0.3.1 ipred_0.9-10
Created on 2021-03-13 by the reprex package (v1.0.0)
modeltime_resample_accuracy
returns the raw output of the resample fits if summary_fns = NULL
. However, the output is returned without .resample_id
, which means that it is not possible to check the metrics for specific resample id.
It would be valuable to have the .resample_id
available in the case where modeltime_resample_accuracy(..., summary_fns = NULL)
I don't mind dropping a PR for this, but unclear when I'll have time.
Hi @mdancho84 ,
Now that the development of breaking down the accuracy by local models in modeltime has been done, I think it would be nice to somehow try to do the same for plot_modeltime_resamples()
and modeltime_resample_accuracy()
in this package, since when using panel data the results presented are those of the global models, but it would be interesting to somehow try to break down by local models as well.
I don't know what you think about this, let's comment.
Regards,
Dear, @mdancho84 @AlbertoAlmuinha
data_tbl <- datasets %>%
select(id, Date, attendences, average_temperature, min, max, sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
data_tbl
full_data_tbl <- datasets %>%
select(id, Date, attendences, average_temperature, min, max, sunday, monday, tuesday, wednesday, thursday, friday, saturday, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>%
set_names(c("id", "date", "value","tempe_verage", "tempemin", "tempemax", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) %>%
group_by(id) %>%
future_frame(
.date_var = date,
.length_out = "3 days",
.bind_data = TRUE
) %>%
ungroup() %>%
mutate(id = fct_drop(id))
data_prepared_tbl <- full_data_tbl %>%
filter(!is.na(value))
future_tbl <- full_data_tbl %>%
filter(is.na(value))
emergency_tscv <- data_prepared_tbl %>%
time_series_cv(
date_var = date,
assess = "3 days",
skip = "30 days",
cumulative = TRUE,
slice_limit = 5
)
emergency_tscv
recipe_spec <- recipe(value ~ .,
data = training(emergency_tscv$splits[[1]])) %>%
step_timeseries_signature(date) %>%
step_rm(matches("(.iso$)|(.xts$)|(hour)|(minute)|(second)|(am.pm)")) %>%
step_mutate(data = factor(value, ordered = TRUE))%>%
step_dummy(all_nominal(), one_hot = TRUE)%>%
step_normalize (date_index.num,tempe_verage,tempemin,tempemax,date_year, -all_outcomes())
wflw_fit_xgboost <- workflow() %>%
add_model(
boost_tree("regression") %>% set_engine("xgboost")
) %>%
add_recipe(recipe_spec %>% step_rm(date)) %>%
fit(training(emergency_tscv$splits[[1]]))
wflw_fit_lightgbm <- workflow() %>%
add_model(
boost_tree("regression") %>% set_engine("lightgbm")
) %>%
add_recipe(recipe_spec %>% step_rm(date)) %>%
fit(training(emergency_tscv$splits[[1]]))
model_tbl <- modeltime_table(
wflw_fit_xgboost,
wflw_fit_lightgbm
)
model_tbl
resample_results <- model_tbl %>%
modeltime_fit_resamples(
resamples = emergency_tscv,
control = control_resamples(allow_par = TRUE, verbose = TRUE)
)
resample_results
resample_results %>%
modeltime_resample_accuracy(summary_fns = NULL, yardstick::metric_set(mape, smape, mase, rmse)) %>%
table_modeltime_accuracy(.interactive = FALSE)
Hi Matt,
Requesting for you to add k-fold and ts-fold feature to your nested forecasting. I am starting to use the nested forecasting a lot but I get poor results if the recent couple months (my test set) were a little off from the rest of the time series.
Thanks!
Shafi Qureshi:
Hello, Matt Dancho I am using 15 different models, LightGBM, CATboost, XGBoost, and others. The problem is when running the models with resamples_fitted <- submodels_1_tbl %>%
modeltime_fit_resamples(
resamples = resample_spec ,
control = control_resamples(verbose = TRUE,allow_par = TRUE )) The R studio crushes. So I am running the LightGBM, CATboost, XGBoost models in differentresamples_fitted therefore, I have now resamples_fitted_1 ,resamples_fitted_2 and resamples_fitted_3. . I want to combine all threeso could be used for the stacked ensemble. Is there a way to combine all different resample fits (resamples_fitted ) models together like combine_modeltime_tables ?
I was going to use an ensemble as a part of my time series cross validation. This did not work. Here is my code.
# Create average ensemble and add to the modeltime table
ml_mtbl <- ml_mtbl %>%
combine_modeltime_tables(
ml_mtbl %>%
ensemble_average() %>%
modeltime_table()
)
# TS CV
resamples_tscv <- time_series_cv(
data = train_data,
assess = "11 days",
initial = "730 days",
skip = 11,
slice_limit = 20,
cumulative = TRUE
)
resamples_fitted <- ml_mtbl %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(verbose = FALSE, allow_par = TRUE)
)
The results:
> resamples_fitted
# Modeltime Table
# A tibble: 6 x 4
.model_id .model .model_desc .resample_results
<int> <list> <chr> <list>
1 1 <workflow> XGBOOST <rsmp[+]>
2 2 <workflow> RANGER <rsmp[+]>
3 3 <workflow> GLMNET <rsmp[+]>
4 4 <workflow> KERNLAB <rsmp[+]>
5 5 <workflow> KERNLAB <rsmp[+]>
6 6 <ensemble [5]> ENSEMBLE (MEAN): 5 MODELS <lgl [1]>
So when I want to check the accuracy I only get the accuracy for the individual models, not the ensemble. So this code gives me a tibble with all the five models, not he ensemble.
resamples_fitted %>%
modeltime_resample_accuracy()
I try to run the following code:
model_fit_exp_smooth <- exp_smoothing(
mode = "regression",
seasonal_period = 12,
error = "additive",
trend = "additive",
season = "additive"
) %>%
set_engine(engine = "smooth_es") %>%
fit(value ~ date + moy + VIXRQBIN + DFFD, data = initial_training)my_models_tbl <- modeltime_table(
model_fit_exp_smooth,
model_fit_lm
)resamples_fitted <- my_models_tbl %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(allow_par = TRUE, parallel_over = NULL, verbose = T)
)
── Fitting Resamples ────────────────────────────────────────────• Model ID: 1 ETSX(AAA)
i Slice01: preprocessor 1/1
✓ Slice01: preprocessor 1/1
i Slice01: preprocessor 1/1, model 1/1
✓ Slice01: preprocessor 1/1, model 1/1
i Slice01: preprocessor 1/1, model 1/1 (extracts)
i Slice01: preprocessor 1/1, model 1/1 (predictions)
! Slice01: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice02: preprocessor 1/1
✓ Slice02: preprocessor 1/1
i Slice02: preprocessor 1/1, model 1/1
✓ Slice02: preprocessor 1/1, model 1/1
i Slice02: preprocessor 1/1, model 1/1 (extracts)
i Slice02: preprocessor 1/1, model 1/1 (predictions)
! Slice02: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
i Slice03: preprocessor 1/1
✓ Slice03: preprocessor 1/1
i Slice03: preprocessor 1/1, model 1/1
✓ Slice03: preprocessor 1/1, model 1/1
i Slice03: preprocessor 1/1, model 1/1 (extracts)
i Slice03: preprocessor 1/1, model 1/1 (predictions)
! Slice03: preprocessor 1/1, model 1/1 (predictions): The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model...
Does the warning "The newdata is not provided.Predicting the explanatory variables based on what I have in-sample., Only additive model..." mean that my external regressors for smooth_es are not used at all on modeltime_fit_resamples() or does it mean that the external regressors are taken from the in-sample data directly following the slice being processed?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.