business-science / modeltime Goto Github PK

View Code? Open in Web Editor NEW

501.0 28.0 81.0 52.11 MB

Modeltime unlocks time series forecast models and machine learning in one framework

Home Page: https://business-science.github.io/modeltime/

License: Other

R 99.08% CSS 0.92%

time time-series forecasting tidymodels machine-learning-algorithms machine-learning prophet arima ets tbats

modeltime's Introduction

modeltime

Tidy time series forecasting in R.

Mission: Our number 1 goal is to make high-performance time series analysis easier, faster, and more scalable. Modeltime solves this with a simple to use infrastructure for modeling and forecasting time series.

Quickstart Video

For those that prefer video tutorials, we have an 11-minute YouTube Video that walks you through the Modeltime Workflow.

(Click to Watch on YouTube)

Tutorials

Getting Started with Modeltime: A walkthrough of the 6-Step Process for using modeltime to forecast
Modeltime Documentation: Learn how to use modeltime, find Modeltime Models, and extend modeltime so you can use new algorithms inside the Modeltime Workflow.

Installation

CRAN version:

install.packages("modeltime", dependencies = TRUE)

Development version:

remotes::install_github("business-science/modeltime", dependencies = TRUE)

Why modeltime?

Modeltime unlocks time series models and machine learning in one framework

No need to switch back and forth between various frameworks. modeltime unlocks machine learning & classical time series analysis.

forecast: Use ARIMA, ETS, and more models coming (arima_reg(), arima_boost(), & exp_smoothing()).
prophet: Use Facebook’s Prophet algorithm (prophet_reg() & prophet_boost())
tidymodels: Use any parsnip model: rand_forest(), boost_tree(), linear_reg(), mars(), svm_rbf() to forecast

Forecast faster

A streamlined workflow for forecasting

Modeltime incorporates a streamlined workflow (see Getting Started with Modeltime) for using best practices to forecast.

A streamlined workflow for forecasting

Meet the modeltime ecosystem

Learn a growing ecosystem of forecasting packages

The modeltime ecosystem is growing

Modeltime is part of a growing ecosystem of Modeltime forecasting packages.

Summary

Modeltime is an amazing ecosystem for time series forecasting. But it can take a long time to learn:

Many algorithms
Ensembling and Resampling
Machine Learning
Deep Learning
Scalable Modeling: 10,000+ time series

Your probably thinking how am I ever going to learn time series forecasting. Here’s the solution that will save you years of struggling.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
Deep Learning with GluonTS (Competition Winners)
Time Series Preprocessing, Noise Reduction, & Anomaly Detection
Feature engineering using lagged variables & external regressors
Hyperparameter Tuning
Time series cross-validation
Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
Scalable Forecasting - Forecast 1000+ time series in parallel
and more.

Become the Time Series Expert for your organization.

Take the High-Performance Time Series Forecasting Course

modeltime's People

Contributors

Stargazers

Watchers

Forkers

ekipus drroad fintrek peterhaglich valtyx aykhaled topepo conradbm arrendi stjordanis hangzhang10 herolin12 trendingtechnology bihailantian21 jingmouren dipakbagal krzjoa goncaloperes mcui123 tonyk7440 imarcello tht18 omarun oprilium flrs rafzamb brentmat minghao2016 vsteiger ntag1618 amrofi seliduka twodonkeys rodrigodelrosso linsilin wmhenry vishalbelsare sepalex snowman828 sn4ai ggardiakos silverf62 technocrat97 lucazav steviey khalil628 virtualpeer dionysisbacchus quinfer datalearns wilsonfreitas regisely darkbreaker0 davisvaughan dliofindia bryan85le apuyue liyiwu naeem2355 mdurbanowski albertoalmuinha jorane rickarko marcburri filipwastberg leonhgeis ursu1964 emilhvitfeldt nova-ch seafarer2008 andreschprr ni2scmn poluru olivroy sandy4321 sumoanalytics z3br4p01nt gast1111 277jah

modeltime's Issues

Arima parameters

Hi there! First, I must say I´m loving modeltime! Really great workflow for forecasting! I have one question (kind of more philosophical) and one suggestion (if it makes sense).

When trainning with auto.arima, I get a model ARIMA(1,23), for example. Then, when refitting, these parameters may change. Wouldn´t it make more sense to keep the same parameters (in my example, 1,2,3), and recalculate just the coefficients? My rationale is that those are the parameters selected during trainning, the same way you select the best number of trees when using xgboost (and keep this number when refitting).
As for my suggestion, in my team, we have found that it is interesting to fit multiple ARIMA models using different lengths of the time series as train set, and select the best length. Using modeltime, I can easily do that with time_series_split, but, when refitting the model, the whole series is used. It would be interesting that modeltime_refit could use the same length of the time series that was used during trainning (or maybe it is already possible, and I just can´t see how - I apologise if that´s the case).

Thanks!

Dependency on Version of tidyselect

I installed {modeltime} and had {tidyselect} version 0.2.5.

Trying to use function plot_time_series() resulted in this error.

Error: 'eval_select' is not an exported object from 'namespace:tidyselect'

Upgrading to {tidyselect} 1.1.0 fixed the problem.

So perhaps the DESCRIPTION file should import some minimal version of {tidyselect}, somewhere in (0.2.5 and 1.1.0].

dials parameters

I'm working on another dials release. I was thinking of moving an dials parameter objects in ancillary packages into dials. Not required or anything, but having them in one place is nice from an organization standpoint.

Let me know if you want to move any of yours in there and I can add them (or you can PR).

modeltime_accuracy when test has just one point

When my test set has just one value, modeltime_accuracy doesn´t return any metrics, although the predictions are made. Is there a restriction in using just one value for test? I know it does´t make much sense, but I got curious.

Modeltime Model Calibration Failure - All Models Failed

Hi!..
I have an error in calibration step. The error message is:

`> calibration_tbl %>%

modeltime::modeltime_calibrate(new_data = testing(splits), quiet = FALSE)
Error:

── Model Calibration Failure Report ────────────────────────

A tibble: 1 x 4

.model_id .model .model_desc .nested.col

1 1 LM NA
All models failed Modeltime Calibration:

Model 1: Failed Calibration.

Potential Solution: Use modeltime_calibrate(quiet = FALSE) AND Check the Error/Warning Messages for clues as to why your model(s) failed calibration.
── End Model Calibration Failure Report ────────────────────

Error: All models failed Modeltime Calibration.
Run rlang::last_error() to see where the error occurred.
In addition: Warning messages:
1: Problem with mutate() input .nested.col.
ℹ prediction from a rank-deficient fit may be misleading
ℹ Input .nested.col is purrr::map2(...).
2: In predict.lm(object = object$fit, newdata = new_data, type = "response") :
prediction from a rank-deficient fit may be misleading
3: Problem with mutate() input .nested.col

ℹ Could not reconcile actual data. To reconcile, please remove actual data from modeltime_forecast() and add manually using bind_rows().
ℹ Input .nested.col is purrr::map2(...).
4: Could not reconcile actual data. To reconcile, please remove actual data from modeltime_forecast() and add manually using bind_rows(). `

Archivo.zip

Thanks!

Error: No date or date-time variable provided. Please supply a date or date-time variable as a...

Hi Matt,

I am completely new to GitHub and I am trying to get an answer to the following problem I encountered.

I am trying to use fit_resample() together with arima_reg() and I am getting an error message that I cannot understand.

I am using the following code:

library(modeltime)
library(tidymodels)

df<-
data.frame(day=seq(as.Date("2020-1-1"), as.Date("2020-3-31"), "days"),
value=sample(1:100, length(seq(as.Date("2020-1-1"), as.Date("2020-3-31"), "days"))))

splits<-
sliding_window(df, lookback = Inf, skip=6, assess_start = 7, assess_stop = 14, complete = T)

model_fit_arima <-
arima_reg() %>%
set_engine("auto_arima")%>%
fit_resamples(value~day, data=df, resamples=splits)

I am getting the following error message:

x Slice01: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...
x Slice02: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...
x Slice03: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...

The same code works with other models, i.e. this works:

linear_reg() %>% set_engine("lm")%>% fit_resamples(value~day, data=df, resamples=splits)

and the same data works with auto_arima without resampling, i.e. this works:

arima_reg() %>% set_engine("auto_arima")%>% fit(value~day,data=df)

I am not sure, why day is not accepted as date anymore, since:

class(analysis(splits$splits[[1]])$day)=="Date"
[1] TRUE

Many thanks in advances for your help and best wishes, Jan

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] yardstick_0.0.7 workflows_0.2.1 tune_0.1.1 rsample_0.0.8
[5] recipes_0.1.14 parsnip_0.1.4 modeldata_0.1.0 infer_0.5.3
[9] dials_0.0.9 scales_1.1.1 tidymodels_0.1.1 modeltime_0.3.0
[13] RColorBrewer_1.1-2 skimr_2.1.2 forcats_0.5.0 purrr_0.3.4
[17] tibble_3.0.3 tidyverse_1.3.0 fable_0.2.1 fabletools_0.2.0
[21] tsibble_0.9.2 dummies_1.5.6 tseries_0.10-46 tscount_1.4.1
[25] magick_2.0 gganimate_1.0.2 lubridate_1.7.4 fpp2_2.3
[29] expsmooth_2.3 fma_2.3 forecast_8.5 readr_1.3.1
[33] data.table_1.12.8 haven_2.3.1 DMwR_0.4.1 xgboost_1.2.0.1
[37] circlize_0.4.5 plotROC_2.2.1 tableone_0.10.0 ggjoy_0.4.1
[41] ggridges_0.5.1 stringr_1.4.0 cowplot_0.9.4 ranger_0.11.1
[45] caret_6.0-81 pROC_1.16.2 rpart.plot_3.0.6 rpart_4.1-13
[49] party_1.3-1 strucchange_1.5-1 sandwich_2.5-0 zoo_1.8-4
[53] modeltools_0.2-22 mvtnorm_1.0-8 mctest_1.2 BSDA_1.2.0
[57] lattice_0.20-38 tidyr_1.1.0 plotrix_3.7-4 XML_3.98-1.17
[61] margins_0.3.23 broom_0.7.2 lme4_1.1-20 Matrix_1.2-15
[65] foreign_0.8-72 ggpubr_0.2 magrittr_1.5 reshape2_1.4.3
[69] ggplot2_3.3.2 readxl_1.3.1 dplyr_1.0.1 plyr_1.8.4

loaded via a namespace (and not attached):
[1] utf8_1.1.4 tidyselect_1.1.0 munsell_0.5.0 codetools_0.2-15
[5] future_1.11.1.1 withr_2.2.0 colorspace_1.4-1 knitr_1.29
[9] rstudioapi_0.11 ROCR_1.0-7 TTR_0.23-4 listenv_0.7.0
[13] repr_1.1.0 DiceDesign_1.8-1 farver_2.0.3 vctrs_0.3.2
[17] generics_0.0.2 TH.data_1.0-10 ipred_0.9-8 xfun_0.16
[21] R6_2.4.1 bitops_1.0-6 lhs_1.0.1 assertthat_0.2.1
[25] multcomp_1.4-8 nnet_7.3-12 gtable_0.3.0 globals_0.12.4
[29] timeDate_3043.102 rlang_0.4.7 GlobalOptions_0.1.0 splines_3.5.2
[33] ModelMetrics_1.2.2 yaml_2.2.1 prediction_0.3.6.2 abind_1.4-5
[37] modelr_0.1.6 backports_1.1.8 quantmod_0.4-13 tools_3.5.2
[41] lava_1.6.5 ltsa_1.4.6 ellipsis_0.3.1 gplots_3.0.1.1
[45] Rcpp_1.0.5 base64enc_0.1-3 progress_1.2.2 prettyunits_1.1.1
[49] slider_0.1.5 fracdiff_1.4-2 fs_1.5.0 survey_3.35-1
[53] furrr_0.1.0 warp_0.1.0 lmtest_0.9-36 reprex_0.3.0
[57] GPfit_1.0-8 hms_0.5.3 shape_1.4.4 compiler_3.5.2
[61] KernSmooth_2.23-15 crayon_1.3.4 minqa_1.2.4 StanHeaders_2.21.0-6
[65] htmltools_0.5.0 RcppParallel_5.0.2 DBI_1.1.0 tweenr_1.0.1
[69] dbplyr_1.4.2 MASS_7.3-51.1 cli_2.0.2 quadprog_1.5-5
[73] gdata_2.18.0 parallel_3.5.2 gower_0.1.2 pkgconfig_2.0.3
[77] coin_1.2-2 xml2_1.3.2 foreach_1.4.4 prodlim_2018.04.18
[81] anytime_0.3.7 rvest_0.3.5 distributional_0.2.0 digest_0.6.25
[85] cellranger_1.1.0 curl_4.3 gtools_3.8.1 urca_1.3-0
[89] nloptr_1.2.1 lifecycle_0.2.0 nlme_3.1-137 jsonlite_1.7.0
[93] fansi_0.4.1 pillar_1.4.6 httr_1.4.1 survival_2.43-3
[97] glue_1.4.1 xts_0.11-2 iterators_1.0.10 class_7.3-14
[101] stringi_1.4.6 caTools_1.17.1.1 e1071_1.7-0.1

modeltime_accuracy(): new_data is not used

Update modeltime_accuracy() to enable prediction & residual estimation from non-calibration data (i.e. new_data).

Error: Problem with `mutate()` input `.is_null`. - New Factor Levels

While checking the accuracy table on the holdout test using the function modeltime_accuracy() throws me an error. Help!

library(modeltime)
library(tidymodels)
#> -- Attaching packages ---------------- tidymodels 0.1.0 --
#> v broom     0.7.0      v recipes   0.1.13
#> v dials     0.0.8      v rsample   0.0.7 
#> v dplyr     1.0.2      v tibble    3.0.3 
#> v ggplot2   3.3.2      v tune      0.1.1 
#> v infer     0.5.3      v workflows 0.2.1 
#> v parsnip   0.1.3      v yardstick 0.0.6 
#> v purrr     0.3.4
#> -- Conflicts ------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(modeltime.ensemble)
library(timetk)
library(tidyverse)
library(recipes)


df.total_pr1 <- read_csv("data.csv")
#> Parsed with column specification:
#> cols(
#>   Week = col_date(format = ""),
#>   Product_Name = col_character(),
#>   Demand_value = col_double()
#> )

df.total_pr1 %>%
  plot_seasonal_diagnostics(
    Week,Demand_value,
    .feature_set = c("week", "month.lbl"),
    .interactive = TRUE
  )
#> Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
#> Please use `group_by()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.

splits <- splits <- df.total_pr1 %>%
  arrange(Week) %>% 
  time_series_split(date_var = Week,
                    assess = "40 weeks", cumulative = TRUE)

splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(Week, Demand_value, .interactive = FALSE)

recipe_spec <- recipe(Demand_value ~ Week, df.total_pr1) %>%
  step_timeseries_signature(Week) %>%
  step_rm(matches("(iso$)|(xts$)|(day)|(hour)|(min)|(sec)|(am.pm)")) %>%
  step_mutate(Week_week = factor(Week_week, ordered = TRUE)) %>%
  step_dummy(all_nominal()) %>%
  step_normalize(contains("index.num"), Week_year)

recipe_spec %>% prep() %>% juice()
#> # A tibble: 52 x 35
#>    Week       Demand_value Week_index.num Week_year Week_half Week_quarter
#>    <date>            <dbl>          <dbl>     <dbl>     <int>        <int>
#>  1 2019-09-01        11773          0.792     0.550         2            3
#>  2 2019-10-01        13330          0.879     0.550         2            4
#>  3 2019-11-01         5743          0.969     0.550         2            4
#>  4 2019-12-01         6244          1.06      0.550         2            4
#>  5 2020-01-01         2560          1.15      1.57          1            1
#>  6 2020-03-01        24122          1.32      1.57          1            1
#>  7 2020-04-01         2980          1.41      1.57          1            2
#>  8 2020-05-01         1950          1.50      1.57          1            2
#>  9 2020-03-01          100          1.32      1.57          1            1
#> 10 2020-05-01          270          1.50      1.57          1            2
#> # ... with 42 more rows, and 29 more variables: Week_month <int>,
#> #   Week_mweek <int>, Week_week2 <int>, Week_week3 <int>, Week_week4 <int>,
#> #   Week_month.lbl_01 <dbl>, Week_month.lbl_02 <dbl>, Week_month.lbl_03 <dbl>,
#> #   Week_month.lbl_04 <dbl>, Week_month.lbl_05 <dbl>, Week_month.lbl_06 <dbl>,
#> #   Week_month.lbl_07 <dbl>, Week_month.lbl_08 <dbl>, Week_month.lbl_09 <dbl>,
#> #   Week_month.lbl_10 <dbl>, Week_month.lbl_11 <dbl>, Week_week_01 <dbl>,
#> #   Week_week_02 <dbl>, Week_week_03 <dbl>, Week_week_04 <dbl>,
#> #   Week_week_05 <dbl>, Week_week_06 <dbl>, Week_week_07 <dbl>,
#> #   Week_week_08 <dbl>, Week_week_09 <dbl>, Week_week_10 <dbl>,
#> #   Week_week_11 <dbl>, Week_week_12 <dbl>, Week_week_13 <dbl>

model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
  set_engine("glmnet")

#workflow
#elastic net
wflw_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec %>% step_rm(Week)) %>%
  fit(training(splits))

#xgboost
model_spec_xgboost <- boost_tree() %>%
  set_engine("xgboost")

set.seed(123)
wflw_fit_xgboost <- workflow() %>%
  add_model(model_spec_xgboost) %>%
  add_recipe(recipe_spec %>% step_rm(Week)) %>%
  fit(training(splits))

#prophet
model_spec_prophet <- prophet_reg(
  seasonality_yearly = TRUE
) %>%
  set_engine("prophet") 

wflw_fit_prophet <- workflow() %>%
  add_model(model_spec_prophet) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

submodels_tbl <- modeltime_table(
  wflw_fit_glmnet,
  wflw_fit_xgboost,
  wflw_fit_prophet
)

submodels_tbl
#> # Modeltime Table
#> # A tibble: 3 x 3
#>   .model_id .model     .model_desc          
#>       <int> <list>     <chr>                
#> 1         1 <workflow> GLMNET               
#> 2         2 <workflow> XGBOOST              
#> 3         3 <workflow> PROPHET W/ REGRESSORS

submodels_tbl %>% 
  modeltime_accuracy(testing(splits)) %>%
  table_modeltime_accuracy(.interactive = FALSE)
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Error: Problem with `mutate()` input `.is_null`.
#> x object '.calibration_data' not found
#> i Input `.is_null` is `purrr::map_lgl(.calibration_data, is.null)`.

^{Created on 2020-10-19 by the reprex package (v0.3.0)}

modeltime_calibrate

I am calibrating the model but when I run modeltime_calibrate I get the following error:
Error in eval(parse(text = text, keep.source = FALSE), envir) :
object 'summary_fun' not found
..same error as
https://www.business-science.io/code-tools/2020/06/29/introducing-modeltime.html

Warning messages: 1: Problem with `mutate()` input `.nested.col`. i prediction from a rank-deficient fit may be misleading i Input `.nested.col` is `purrr::map(...)`. 2: In predict.lm(object = object$fit, newdata = new_data, type = "response") : prediction from a rank-deficient fit may be misleading

Problem

You are getting this warning message when using linear regression from linear_reg() with lm via set_engine("lm").

Warning messages:
1: Problem with `mutate()` input `.nested.col`.
i prediction from a rank-deficient fit may be misleading
i Input `.nested.col` is `purrr::map(...)`. 
2: In predict.lm(object = object$fit, newdata = new_data, type = "response") :
  prediction from a rank-deficient fit may be misleading

Why the Warning?

You have a rank deficient matrix, which isn't the end of the world. It's just a warning indicating your results could be misleading because you have a lot of features, some features may have zero variance, etc. Th

Solution 1 - Stick with LM & ingore warning

This is just a warning message. The vctrs library attempts to locate where the warning is occurring. Internally to Modeltime, the modeltime_calibrate() uses a temporary column called .nested.col that maps your models to calibration data. You get 2 warnings:

The warning message 1st shows you where the warning occurs
The warning message shows you the call to predict.lm() that generates the warning message.

To get rid of this message, wrap your code in suppressWarnings().

Solution 2 - Use an algorithm that implements regularization

Algorithms like GLMNET & XGBoost implement regularized machine learning, which reduces or eliminates the effect of poor predictors.

Encountered Problem - could not find function "time_series_split"

splits <- bike_transactions_tbl %>%

time_series_split(assess = "3 months", cumulative = TRUE)

Error in time_series_split(., assess = "3 months", cumulative = TRUE) :
could not find function "time_series_split"

Lesson 11.11 - Model Inspection - Visualizing the Future Forecast

in the session 11.11 I was trying to forecast my future data and I got the following error:
Error: Problem occurred combining processed data with timestamps. Most likely cause is rows being added or removed during preprocessing. Try imputing missing values to retain the full number of rows.
All my lag models was removed for my data, follow my code:
forecast_future_tbl <- refit_tbl %>%
modeltime_forecast(
new_data = forecast_tbl,
actual_data = iniciativa_full1
)

Where you read : actual_data = iniciativa_full1 is actual_data = data_prepared_tbl
Follow my recipe for lag models:
recipe_spec_2_lag <- recipe_spec_base %>%
step_rm(data_planejada) %>%
step_naomit(starts_with("lag_"))
I did not have any issues befofe this step, I could do lag models normally

tsibble: Error occurred when using 'fit-forecast' workflow

I installed the development version from GitHub and tried to forecast a time series using 'auto.rima' for intermittent demand forecasting. But I got the following error.

Here is my testing example.
test_ts.csv.zip

Add MDFA

It would be great if you could add MDFA from Marc Wildi: https://github.com/wiaidp/MDFA . Details here https://github.com/wiaidp/MDFA-Legacy and https://github.com/wiaidp/Tutorials .

object '.key' not found

I get two top performing models from the modeltime_refit() function like so:

refit_tbl <- calibration_tbl %>%
  modeltime_refit(data = df_anomalized_tbl)

top_two_models <- refit_tbl %>% 
  modeltime_accuracy() %>% 
  arrange(mae) %>% 
  slice(1:2)

> refit_tbl %>%
+   filter(.model_id %in% top_two_models$.model_id)
# Modeltime Table
# A tibble: 2 x 5
  .model_id .model   .model_desc .type .calibration_data
      <int> <list>   <chr>       <chr> <list>           
1         4 <fit[+]> PROPHET     Test  <tibble [8 x 4]> 
2         6 <fit[+]> LM          Test  <tibble [8 x 4]>

I then try to predict 1 year out like so

refit_tbl %>%
  filter(.model_id %in% top_two_models$.model_id) %>%
  modeltime_forecast(h = "1 year", actual_data = df_anomalized_tbl) %>%
  plot_modeltime_forecast(
    .legend_max_width = 25
    , .interactive = FALSE
    , .title = "IP Discharges Excess Days Forecast 1 Year Out"
  )

I get the following error:

> refit_tbl %>%
+   filter(.model_id %in% top_two_models$.model_id) %>%
+   modeltime_forecast(h = "1 year", actual_data = df_anomalized_tbl)
Error: Attempt to extend '.calibration_data' into the future using 'h' has failed.
Error: Attempt to extend '.calibration_data' into the future using 'h' has failed.
Error: Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Unknown or uninitialised column: `.key`.

Here is the last rlang::error() :

rlang::last_error()
<error/dplyr_error>
Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Backtrace:
  1. dplyr::filter(., .model_id %in% top_two_models$.model_id)
  2. modeltime::modeltime_forecast(., h = "1 year", actual_data = df_anomalized_tbl)
 25. dplyr:::h(simpleError(msg, call))
Run `rlang::last_trace()` to see the full context.

Here is the last rlang::trace():

 rlang::last_trace()
<error/dplyr_error>
Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Backtrace:
     x
  1. +-`%>%`(...)
  2. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5. |     \-`_fseq`(`_lhs`)
  6. |       \-magrittr::freduce(value, `_function_list`)
  7. |         +-base::withVisible(function_list[[k]](value))
  8. |         \-function_list[[k]](value)
  9. |           +-modeltime::modeltime_forecast(., h = "1 year", actual_data = df_anomalized_tbl)
 10. |           \-modeltime:::modeltime_forecast.mdl_time_tbl(...)
 11. |             \-ret %>% dplyr::filter(.model_desc == "ACTUAL" | .key == "prediction")
 12. |               +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 13. |               \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 14. |                 \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 15. |                   \-modeltime:::`_fseq`(`_lhs`)
 16. |                     \-magrittr::freduce(value, `_function_list`)
 17. |                       +-base::withVisible(function_list[[k]](value))
 18. |                       \-function_list[[k]](value)
 19. |                         +-dplyr::filter(., .model_desc == "ACTUAL" | .key == "prediction")
 20. |                         \-dplyr:::filter.data.frame(...)
 21. |                           \-dplyr:::filter_rows(.data, ...)
 22. |                             +-base::withCallingHandlers(...)
 23. |                             \-mask$eval_all_filter(dots, env_filter)
 24. \-base::.handleSimpleError(...)
 25.   \-dplyr:::h(simpleError(msg, call))

session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] anomalize_0.2.1            tidyquant_1.0.1            quantmod_0.4.17           
 [4] TTR_0.24.2                 PerformanceAnalytics_2.0.4 xts_0.12.1                
 [7] zoo_1.8-8                  janitor_2.0.1              DBI_1.1.0                 
[10] odbc_1.2.2                 timetk_2.4.0               lubridate_1.7.9           
[13] forcats_0.5.0              stringr_1.4.0              readr_1.4.0               
[16] tidyverse_1.3.0            modeltime_0.2.1            yardstick_0.0.7           
[19] workflows_0.2.1            tune_0.1.1                 tidyr_1.1.2               
[22] tibble_3.0.4               rsample_0.0.8              recipes_0.1.13            
[25] purrr_0.3.4                parsnip_0.1.3              modeldata_0.0.2           
[28] infer_0.5.3                ggplot2_3.3.2              dplyr_1.0.2               
[31] dials_0.0.9                scales_1.1.1               broom_0.7.1               
[34] tidymodels_0.1.1           pacman_0.5.1              

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.1.10     plyr_1.8.6           lazyeval_0.2.2      
  [5] splines_3.6.3        crosstalk_1.1.0.1    listenv_0.8.0        inline_0.3.16       
  [9] digest_0.6.25        foreach_1.5.1        htmltools_0.5.0      earth_5.3.0         
 [13] fansi_0.4.1          magrittr_1.5         globals_0.13.1       modelr_0.1.8        
 [17] gower_0.2.2          matrixStats_0.57.0   RcppParallel_5.0.2   hardhat_0.1.4       
 [21] prettyunits_1.1.1    forecast_8.13        tseries_0.10-47      colorspace_1.4-1    
 [25] blob_1.2.1           rvest_0.3.6          haven_2.3.1          callr_3.5.1         
 [29] crayon_1.3.4         jsonlite_1.7.1       progressr_0.6.0      survival_3.2-7      
 [33] iterators_1.0.13     glue_1.4.2           gtable_0.3.0         ipred_0.9-9         
 [37] V8_3.2.0             pkgbuild_1.1.0       Quandl_2.10.0        rstan_2.21.2        
 [41] Rcpp_1.0.5           plotrix_3.7-8        viridisLite_0.3.0    GPfit_1.0-8         
 [45] bit_4.0.4            Formula_1.2-3        stats4_3.6.3         tibbletime_0.1.6    
 [49] lava_1.6.8           StanHeaders_2.21.0-6 prodlim_2019.11.13   htmlwidgets_1.5.2   
 [53] httr_1.4.2           ellipsis_0.3.1       loo_2.3.1            pkgconfig_2.0.3     
 [57] farver_2.0.3         nnet_7.3-14          dbplyr_1.4.4         utf8_1.1.4          
 [61] tidyselect_1.1.0     labeling_0.3         rlang_0.4.8          DiceDesign_1.8-1    
 [65] reactR_0.4.3         TeachingDemos_2.12   munsell_0.5.0        cellranger_1.1.0    
 [69] tools_3.6.3          xgboost_1.2.0.1      cli_2.1.0            generics_0.0.2      
 [73] sweep_0.2.3          yaml_2.2.1           processx_3.4.4       bit64_4.0.5         
 [77] fs_1.5.0             future_1.19.1        nlme_3.1-149         reactable_0.2.3     
 [81] xml2_1.3.2           compiler_3.6.3       rstudioapi_0.11      plotly_4.9.2.1      
 [85] curl_4.3             reprex_0.3.0         lhs_1.1.1            stringi_1.5.3       
 [89] plotmo_3.6.0         ps_1.4.0             lattice_0.20-41      Matrix_1.2-18       
 [93] urca_1.3-0           vctrs_0.3.4          pillar_1.4.6         lifecycle_0.2.0     
 [97] furrr_0.2.0          lmtest_0.9-38        data.table_1.13.0    R6_2.4.1            
[101] gridExtra_2.3        codetools_0.2-16     MASS_7.3-53          assertthat_0.2.1    
[105] withr_2.3.0          fracdiff_1.5-1       parallel_3.6.3       hms_0.5.3           
[109] quadprog_1.5-8       grid_3.6.3           rpart_4.1-15         timeDate_3043.102   
[113] class_7.3-17         snakecase_0.11.0     prophet_0.6.1        pROC_1.16.2

modeltime.rdb is corrupt

library(modeltime)
Error: package or namespace load failed for ‘modeltime’ in get(Info[i, 1], envir = env):
lazy-load database 'C:/R-4.0.2/library/modeltime/R/modeltime.rdb' is corrupt
In addition: Warning messages:
1: package ‘modeltime’ was built under R version 4.0.3
2: In get(Info[i, 1], envir = env) : internal error -3 in R_decompress1

Refit and Forecast 1 day ahead with exogenous regressors

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

Thank you Matt for the marvelous code.

I can't find any documentation/information how to refit and forecast 1 day ahead,
using exogenous regressors.

In the forecast package by Prof. Hyndman I would say:

arima.forecast <- forecast(arima.model,h=myH,xreg=newRegressors,biasadj=T)

When I say...

            unseenPredict<-calibration_table %>%
                modeltime_refit(model_table,data=allData) %>%
                #modeltime_forecast(h="3 months",actual_data=allData)# %>%
                modeltime_forecast(new_data=testing(splits),actual_data=allData)# %>%

... I get data by type actual and forecast until the last day contained in splits.
But I want to forecast one unseen day ahead, including exogenous regressors used while training.

In regard to "modeltime_forecast()" I read about a future tibble. But then the documentation PDF ends... "Forecasting Future Data: See future_frame() for creating future tibbles." "future_frame()" seems to be a dead link.

Are there any hints available- or a small code example?

Thank you.

Identical .model_desc for PROPHET

I have run two different PRHOPHET models like so:

# Prophet -----------------------------------------------------------------

model_fit_prophet <- prophet_reg() %>%
  set_engine(engine = "prophet") %>%
  fit(observed_cleaned ~ date_col, data = training(splits))

model_fit_prophet_boost <- prophet_boost(learn_rate = 0.1) %>% 
  set_engine("prophet_xgboost") %>%
  fit(observed_cleaned ~ date_col + as.numeric(date_col) + factor(hour(date_col), ordered = FALSE), data = training(splits))

When I add them to a modeltime_table() the model desc is the same, PROPHET

> models_tbl
# Modeltime Table
# A tibble: 7 x 3
  .model_id .model     .model_desc                              
      <int> <list>     <chr>                                    
1         1 <fit[+]>   ARIMA(1,1,2)(1,0,0)[12]                  
2         2 <fit[+]>   ARIMA(1,1,2)(1,0,0)[12] W/ XGBOOST ERRORS
3         3 <fit[+]>   ETS(M,AD,A)                              
4         4 <fit[+]>   PROPHET                                  
5         5 <fit[+]>   PROPHET                                  
6         6 <fit[+]>   LM                                       
7         7 <workflow> EARTH

Would it be hard to implement naming the prophet_boost to something like PROPHET BOOST?

Ensemble Modeling

Implement model stacking. tidymodels/stacks#2

Recursive forecasting

Among the popular strategies of time series modelling, we can mention regression models with lagged variables. Such variables are often created by shifting our dependent variable(s). It makes us use some special approches to deal with lags in new data, i. a. recursive forecasting. As far as I know, currently there is no widely used R library, which deliver that feature. Interestingly, @edgBR
in #5 names fable and modeltime as packages for recursive forecasting, which indicates he probably used some other sense of this notion.

I've written this issue, because I started working on an add-in for tidymodels/modeltime ecosystem, which facilitates turning regular regression models into recursive ones. Proposed API may look as follows:

library(parsnip)
library(recipes)

dax_stock <- 
  as_tibble(EuStockMarkets) %>% 
  select(DAX) %>% 
  bind_rows(tibble(DAX = rep(NA, 30)))

recipe_dax_stock <-
  recipe(DAX ~ ., data = dax_stock) %>% 
  step_lag(all_outcomes(), lag = 1:5) %>% 
  prep()

model_linear <- 
  linear_reg() %>% 
  set_engine("lm") %>% 
  fit(DAX ~ ., data = dax_stock)

# Here, we add recursion to the model
# We pass recipe to re-generate new data after each step
# We get a model with additional class, say: 'recursive'
recursive_linear <- 
  model_linear %>% 
  recursive(recipe_dax_stock)

# predict.recursive, which internally calls predict.model_fit
recursive_linear %>% 
  predict(new_data)

Obviously, there is a couple of places, where the implementaions should be well thought out.
I can elaborate it later if needed.

After this longish introduction, I would like to ask:
Would you be interested in including recursive forecasting into modeltime or it lies outside the scope of this great library?

Ability to Forecast without Calibrating/Refitting

In situations where accuracy & confidence intervals are not required, the ability to forecast without running through the additional calibration and refitting steps is desired.

Proposed Implementation:

The user will skip the calibration step and go straight from modeltime_table() to modeltime_forecast(). From their, the user can provide new_data or h along with actual_data, and the forecast will be produced without confidence intervals (this is the purpose of calibration).

modeltime_table(
    model_fit_prophet,
    model_fit_lm
) %>%
    modeltime_forecast(
        h = "3 years",
        actual_data = m750
    ) %>%
    plot_modeltime_forecast(.conf_interval_show = F)

Auto.arima fit function

I was inspecting your package today. In first example, you use auto.arima model:

# Model 1: auto_arima ----
model_fit_arima_no_boost <- arima_reg() %>%
    set_engine(engine = "auto_arima") %>%
    fit(value ~ date, data = training(splits))

I don't understand why do you have formula input in the fit function when auto.arima function from forecast package doesn't have fornula input. It has only univariate series (y) argument. It's confusing for me to understand how formula is converted to the main function arguments.

Modeltime Table - Update model description

Make a convenience-function for updating model descriptions in modeltime tables.

namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required

library(tidymodels)
Error: package or namespace load failed for ‘tidymodels’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘rlang’ 0.4.6 is being loaded, but >= 0.4.7 is required
library(modeltime)
Error: package or namespace load failed for ‘modeltime’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required
library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required
library(lubridate)

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

date, intersect, setdiff, union

library(timetk)
Error: package or namespace load failed for ‘timetk’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘dplyr’ 0.8.5 is already loaded, but >= 1.0.0 is required

Error: Tuning nnetar_reg() hyperparameter num_networks

Hello Matt,

thanks again for you fas implementation of NNETAR. I tried it out today and came accross the following error after trying to tune the num_networks hyperparameter using the following code:

tune_nnetar_model <-
  nnetar_reg(
    seasonal_period = 12,
    non_seasonal_ar = 1,
    seasonal_ar = 1,
    hidden_units = tune(),
    num_networks = tune(),
    penalty = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar") %>%
  set_mode("regression")

Tuning

n_levels <- 1

tune_grid <- grid_regular(
  hidden_units(),
  num_networks(),
  penalty(),
  epochs(),
  levels = n_levels
)

Workflow

nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(steenkopies_recipe)

Modelling with resamples

nnetar_resampling <- 
  nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy_cv5fold,
    grid = tune_grid)

Error: Problem with `mutate()` input `object`. x Error when calling num_networks(): Error : 'num_networks' is not an exported object from 'namespace:dials' i Input `object` is `purrr::map(call_info, eval_call_info)`.

Sorry for asking if the cause should be obvious but I couldn't find anything related in the documentation.
I can add a reprex if needed.

Baselines for comparisons: NAIVE and Window Regression

Baseline Methods

An important way to compare cutting-edge performance is to use a baseline model, these are simple models that organizations are accustomed to using (e.g. simple moving average or naive models). We can use these to showcase how much better high-performance methods like stacking, ensembling, and better algorithms (e.g. XGBoost) can do.

Comparison models

There are currently two types of comparison models:

Window Regression (window_reg) - This can be used to showcase how a moving average, weighted average, or even simple seasonal models would perform.
NAIVE Regression (naive_reg) - This can be used to perform NAIVE (Pick most recent observation) and Seasonal NAIVE (replicate most recent seasonal sequence).

Modeltime Refit - More clearly indicate if model parameters have changed during refitting (e.g. Auto ARIMA)

Refitting will recalculate a model using the parameters that were specified during fitting. However, with automatic models like Auto ARIMA, ETS, and TBATS, the underlying model parameters may change during refitting. This is typically a good thing.

The modeltime_refit() function should reflect if the parameters change more clearly so users know that the model has updated.

Question: Documentation 'rollingWindow feature'

Hi there,

in one of the videos you mentioned a new 'rollingWindow feature'. Is it documented already? Where can I find it?

Parameter Tuning

Determine if tune helpers needed to be added to modeltime.
Parameter tuning vignette

Error: package or namespace load failed for ‘prophet’ in dyn.load(file, DLLpath = DLLpath, ...)

MacOS seems to have an issue (sometimes???) when loading prophet.

> library(prophet)
Loading required package: Rcpp
Loading required package: rlang
Error: package or namespace load failed for ‘prophet’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so':
  dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so, 6): Library not loaded: @rpath/libtbb.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so
  Reason: image not found

Solution

The easiest way to fix is to load StanHeaders.
This sets the C++ flags and libraries, which seems to solve the missed libtbb.dylib path.

updates for upcoming parsnip

I'll be sending parsnip 0.1.2 to CRAN very soon and it has some differences in how to choose encodings from how modeltime currently does it.

Using arima_reg as an example, the model definition should use the new set_encoding() interface (rather than passing indicators directly). To make sure that parsnip (and, soon workflows) does no modifications to the predictors columns, use

set_encoding(
  model = "arima_reg",
  eng = "auto_arima",
  mode = "regression",
  options = list(
    predictor_indicators = "none",
    compute_intercept = FALSE,
    remove_intercept = FALSE
  )
)

They same type of declaration is required for each engine/model combination.

The current GH version of parsnip can be used for testing.

Ensure modeltime plays nice with any parsnip model

The following functions need to play nicely with parsnip models:

modeltime_table()
modeltime_forecast()
modeltime_accuracy()

arima_boost strange prediction

I'm not sure if this is a bug or what. I followd the "Getting started" post on Modeltime but on my own time series. I did not change the XGBoost parameters (min_n = 2 and learning_rate 0 0.015). The prediction is totally off, see graph. Why is this happening, any idea?

Residual Diagnostics - Function to visualize residuals

I think it would be great to be able to extract from a modeltime_table the model description and the associated data with it this allows one to look at the residuals ect. how they want.

Forecast NNETAR - Add support in Modeltime

Hi all,

as already posted here:
"I would like to build a parsnip model for a simple neural net as implemented in the nnetar() function from the forecast package."

According to the hints from @topepo and @mdancho84 I followed this example to build a model from nnetar using the following code:

library(tidymodels)
library(modeltime)

set_new_model("narx_neuralnet")
set_model_mode(model = "narx_neuralnet", mode = "regression")
set_model_engine(
  "narx_neuralnet",
  mode = "regression",
  eng = "nnetar"
)
set_dependency("narx_neuralnet", eng = "nnetar", pkg = "forecast")

show_model_info("narx_neuralnet")


set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "hidden_units",
  original = "size",
  func = list(fun = "hidden_units", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "epochs",
  original = "repeats",
  func = list(fun = "epochs", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "nonseas_lags",
  original = "p",
  func = list(fun = "sample_size", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "seas_lags",
  original = "P",
  func = list(fun = "sample_size", pkg = "dials"),
  has_submodel = FALSE
)

#decay = L2 regularization, 
set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "penalty",
  original = "decay",
  func = list(fun = "penalty", pkg = "dials"),
  has_submodel = FALSE
)


narx_neuralnet <-
  function(mode = "regression",
           hidden_units = NULL,
           epochs = NULL,
           nonseas_lags = NULL,
           seas_lags = NULL,
           penalty = NULL) {
    # Check for correct mode
    if (mode != "regression") {
      stop("`mode` should be 'regression'", call. = FALSE)
    }

    # Capture the arguments in quosures
    args <- list(
      hidden_units = rlang::enquo(hidden_units),
      epochs = rlang::enquo(epochs),
      nonseas_lags = rlang::enquo(nonseas_lags),
      seas_lags = rlang::enquo(seas_lags),
      penalty = rlang::enquo(penalty)
    )

    parsnip::new_model_spec(
      "narx_neuralnet",
      args     = args,
      eng_args = NULL,
      mode     = mode,
      method   = NULL,
      engine   = NULL
    )
  }



# Bridge function for fitting
bridge_nnetar_fit <- function(x, y, 
                              hidden_units = NULL, 
                              epochs = NULL,
                              nonseas_lags = NULL,
                              seas_lags = NULL,
                              penalty = NULL, 
                              ...) {
  
  outcome    <- y # Comes in as a vector
  predictors <- x # Comes in as a data.frame (dates and possible xregs)
  
  # 2. Predictors - Handle Dates 
  index_tbl <- modeltime::parse_index_from_data(predictors)
  idx_col   <- names(index_tbl)
  idx       <- timetk::tk_index(index_tbl)
  
  # 3. Predictors - Handle Xregs
  xreg_recipe <- create_xreg_recipe(predictors, prepare = TRUE)
  xreg_matrix <- juice_xreg_recipe(xreg_recipe, format = "matrix")
  
  # 4. Fitting
  model_1 <- forecast::nnetar(y = outcome, 
                              x = predictors, 
                              size = hidden_units,
                              repeats = epochs,
                              p = nonseas_lags,
                              P = seas_lags,
                              decay = penalty,
                              ...)
  
  # 5. New Modeltime Bridge
  new_modeltime_bridge(
    class  = "bridge_nnetar_fit",
    models = list(model_1 = model_1),
    data   = tibble::tibble(
      idx_col   := idx,
      .actual    = y,
      .fitted    = model_1$fitted,
      .residuals = model_1$residuals
    ),
    extras = list(xreg_recipe = xreg_recipe), # Can add xreg preprocessors here
    desc   = stringr::str_c("NARX Model: ", model_1$model$method)
  )
  
}


print.bridge_nnetar_fit <- function(x, ...) {
  
  model <- x$models$model_1$model
  
  cat(x$desc)
  cat("\n")
  print(model$call)
  cat("\n")
  print(
    tibble(
      aic    = model$aic,
      bic    = model$bic,
      aicc   = model$aicc,
      loglik = model$loglik,
      mse    = model$mse
    )
  )
  invisible(x)
}

When testing the bridge function bridge_nnetar_fit with

data_reprex <- beaver1 %>% 
  as_tibble() %>% 
  mutate(day = as.Date(day, origin = "2019-01-01")) %>% 
  mutate(time = str_pad(as.character(time), 4, pad = "0")) %>% 
  mutate(date = lubridate::ymd_hm(str_c(day, time)),
         .before = 1) %>% 
  select(-day, -time)

nnetar_test <- bridge_nnetar_fit(
  x = select(data_reprex, -temp),
  y = pull(data_reprex, temp),
  epochs = 100,
  penalty = 0.001,
  seas_lags = 1,
  nonseas_lags = 1,
  hidden_units = 5
)

(Its not the best dataset for this purpose, but it doesn't matter at this point)

It gives me the error message:

Error in is.constant(na.interp(x)) : 
  'list' object cannot be coerced to type 'double'

I couldn't figure out what's its cause. I would be happy if anyone could help me out with that.
Thanks a lot,
Max

add_formula(): Error: No date or date-time variable provided. Please supply a date or date-time variable as a predictor.

The workflow add_formula() interface does not seem to be transferring the date as an encoded feature.

library(tidymodels)
library(modeltime)
library(tidyverse)
library(lubridate)
library(timetk)

# Data
m750 <- m4_monthly %>% filter(id == "M750")

# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)

# --- MODELS ---

model_spec_arima <- arima_reg() %>%
    set_engine(engine = "auto_arima") 

# --- WORKFLOW ---
workflow() %>% 
    add_model(model_spec_arima) %>% 
    add_formula(value ~ date) %>%  # <-- Error here
    fit(training(splits))

# > Error: No date or date-time variable provided. Please supply a date or date-time variable as a predictor.

Modeltime Ecosystem Roadmap: New Algorithms & Models

Modeltime Ecosystem Roadmap

The modeltime project roadmap tracks the overall development of the Modeltime Ecosystem of forecasting packages. Modeltime is a cutting-edge ecosystem for forecasting using strategies and best practices that won or placed highly in major forecasting competitions. We have a state-of-the-art Time Series Forecasting Course (DS4B 203-R) that teaches Machine Learning, Deep Learning, and Feature Engineering for Time Series. Take this course to become the forecasting expert for your organization.

Forecasting Approaches

Global Models: Panel data and global modeling was introduced in Modeltime >= 0.3.0 and Modeltime Ensemble >= 0.3.0. We now have a vignette covering Forecasting with Panel Data.
Iterative Forecasting: The sknifedatar package extends modeltime for iterative forecasting. We are also working on an iterative forecasting solution that can be tracked with Issue #122. Nested Forecasting has been implemented. Refer to the Nested Forecasting Tutorial.

Ensembles (Stacking, Averaging)

Please refer to modeltime.ensemble R package: https://business-science.github.io/modeltime.ensemble/

Stacking - See modeltime.ensemble::ensemble_model_spec()
Averaging - See modeltime.ensemble::ensemble_average()
Weighted Averaging - See modeltime.ensemble::ensemble_weighted()

Forecast

These algorithms are provided in the base modeltime package. Refer to included models in the model list: https://business-science.github.io/modeltime/articles/modeltime-model-list.html

STLM - Decomposition Models - See seasonal_reg()
TBATS - See seasonal_reg()
NNETAR - See nnetar_reg()
Theta Method - See exp_smoothing() (#93)
CROSTON - See exp_smoothing() (#98)

GluonTS (Deep Learning)

Please refer to modeltime.gluonts R package: https://github.com/business-science/modeltime.gluonts

DeepAR - See modeltime.gluonts::deep_ar()
N-BEATS - See modeltime.gluonts::nbeats()

Autoregressive (Recursive) Forecasting

Recursive forecasting is required when Lags of Features < Forecast Horizon. Please refer to the recursive() function that allows any model to be converted into a recursive prediction model: https://business-science.github.io/modeltime/reference/recursive.html

Recursive for Single Time Series - Can be implemented using recursive()
Recursive for Panel Data (Multiple Time Series) - Follow #67
Recursive Ensembles - Tracked with #86. Refer to modeltime.ensemble for development efforts. https://business-science.github.io/modeltime.ensemble/

Resampling & Backtesting

Please refer to modeltime.resample for backtesting and resampling time series: https://business-science.github.io/modeltime.resample/index.html

Hyper Parameter Tuning & Parallel Processing

We have new hyperparameter tuning & parallel processing functionality that is available in Modeltime >= 0.6.0.
Vignette: Hyperparameter Tuning and Parallel Processing

Global Baseline Models

Refer to baseline algorithms #37

Naive & Seasonal Naive - See naive_reg()
Mean & Median Window Forecasts - See window_reg()

H2O

This project is being managed via modeltime.h2o: https://github.com/business-science/modeltime.h2o

H2O AutoML Backend (business-science/modeltime.h2o#1)

General Additive Models (GAMs)

Parsnip now incorporates GAMs

GARCH Models

Refer to garchmodels (Website, GitHub).

GARCH
RUGARCH
RMGARCH

Bayesian Models

Refer to bayesmodels (Website, GitHub)

Sarima: bayesmodels connects to the bayesforecast package.
Garch: bayesmodels connects to the bayesforecast package.
Random Walk (Naive): bayesmodels connects to the bayesforecast package.
State Space Model: bayesmodels connects to the bayesforecast and bsts packages.
Stochastic Volatility Model: bayesmodels connects to the bayesforecast package.
Generalized Additive Models (GAMS): bayesmodels connects to the brms package.
Adaptive Splines Surface: bayesmodels connects to the BASS package.
Exponential Smoothing: bayesmodels connects to the Rglt package.

Boosted Models: CatBoost, LightGBM

Refer to boostime (GitHub).

ARIMA + Catboost
ARIMA + LightGBM
Prophet + Catboost
Prophet + LightGBM

Conformal Predictions

Investigate integration of probably for Conformal Regression and Confidence Intervals. Follow GH Issue #173

Vector Autoregression (VAR)

Investigate integration of VAR models. Follow GH Issue #77.

Hierarchical Time Series

Investigate integration of hts package. https://github.com/earowang/hts

Modeltime Targets

Develop a reproducible targets workflow for ARIMA, Prophet, and Exponential Smoothing. Incorporate reporting and error diagnostics. #109

Spark: Distributed Compute

Investigate the possibility of adding Spark backend to improve scalability of modeltime, specifically for nested (iterative) forecasting.

Spark Backend #105 - See The Modeltime Spark Backend

Facebook Neural Prophet

Create a Modeltime Neural Prophet Extension.

LinkedIn Greykite

Investigate algorithm. Determine if backend has value. https://linkedin.github.io/greykite/docs/0.1.0/html/index.html

Uber Orbit

Investigate Uber Orbit library for Bayesian Forecasting. https://github.com/uber/orbit

Facebook Kats

investigate Facebook Kats library https://github.com/facebookresearch/Kats

Temporal Hierachical Forecasting (THIEF)

Implement the thief package for ensembling using temporal hierarchical forecast (THIEF) aggregations. #117

Smooth

Investigate smooth Library. https://github.com/config-i1/smooth The smooth library has several benefits including:

ETS + XREGS: An es() function that implements Exponential Smoothing with handling of xregs. This can be a new engine in exp_smoothing()
A Seasonal ARIMA auto.ssarima() / ssarima() implemented in state space. These can be additional options in arima_reg().

These methods have shown promising results in comparison to Facebook Prophet.

Follow #119.

Merlion

Investigate Salesforce Merlion for incorporating into Modeltime https://github.com/salesforce/Merlion

Pytorch Forecasting

Investigate Pytorch forecasting #163

ets :: alternative model string

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

An alternative model string for ets seems not to work...

model_spec_ets <- exp_smoothing() %>%
    set_engine("ets",model=as.character(params$model))

Error in forecast::ets(outcome, model = model_ets, damped = damping_ets,  : 
  formal argument "model" matched by multiple actual arguments

argument names

Would it be possible to use more descriptive argument names? For example, maybe non_seasonal_term or similar for p. We try to avoid jargon with the names.

Since the argument names are similar, maybe optimize them for tab-complete too.

Prophet :: Logistic Growth - Modeltime Support with logistic_cap & logistic_floor

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

If param growth is set to logistic, param: 'cap' has to be set e.g.:

https://facebook.github.io/prophet/docs/saturating_forecasts.html

It seems this can't be done either in "set_engine()" nor in "prophet_reg()".

Message & Traceback:
`Error in setup_dataframe(m, history, initialize_scales = TRUE) :
Capacities must be supplied for logistic growth.
█

├─global::bAlgos(i, f, "testing") ~/R/daScript.R:32006:12
│ └─global::standAloneAlgos(...) ~/R/daScript.R:31621:12
│ └─%>%(...) ~/R/daScript.R:16790:20
│ ├─base::withVisible(eval(quote(_fseq(_lhs)), env, env))
│ └─base::eval(quote(_fseq(_lhs)), env, env)
│ └─base::eval(quote(_fseq(_lhs)), env, env)
│ └─_fseq(_lhs)
│ └─magrittr::freduce(value, _function_list)
│ ├─base::withVisible(function_list[k])
│ └─function_list[k]
│ ├─parsnip::fit(...)
│ └─parsnip::fit.model_spec(...)
│ └─parsnip:::form_xy(...)
│ └─parsnip:::xy_xy(...)
│ ├─base::system.time(...)
│ └─parsnip:::eval_mod(...)
│ └─rlang::eval_tidy(e, ...)
├─modeltime::prophet_fit_impl(...)
│ └─prophet::fit.prophet(m, df)
│ └─prophet:::setup_dataframe(m, history, initialize_scales = TRUE)
│ └─base::stop("Capacities must be supplied for logistic growth.")
└─(function () ...
└─lobstr::cst() ~/R/daScript.R:8:25`
Thank you.

Error while loading modeltime: The values passed to `set_encoding()` are missing arguments: `allow_sparse_x`

Hi Matt, I've just installed your new modeltime.ensemble package.
However it seems I now get an error trying to load modeltime (not sure these two facts are related actually)
Do you have any recommendation to make it work again (I removed / reinstalled but it did not solve the issue)
I'm using the latest version of R (4.0.3)
Many thanks in advance for your view on this,
Thomas

have you really run the code on your web site?

https://www.business-science.io/code-tools/2020/06/29/introducing-modeltime.html

too much bug.

indicators in fit()

I noticed this argument in the fit() method. We are working on a general way to control if indicators are produced in a model-specific way in parsnip (that would also be used by workflows). This is related to tidymodels/parsnip#290 and tidymodels/workflows#34

I'm in the process of implementing this and it probably won't be ready for about a month (so hopefully that wouldn't disqualify it). A prototype of the user-facing code is here.

Prophet Regressors

    # Add regressors
    xreg_nms <- names(xreg_tbl)
    if (length(xreg_nms) > 0) {
        for (nm in xreg_nms) {
            m <- prophet::add_regressor(m, name = nm, prior.scale = 10000)
        }
    }

The add_regressor function also allows one to specify if the regressor will be standardized TRUE/FALSE. Is it possible to include this option in the modeltime framework?

modeltime_calibrate() fail | Error: Problem occurred getting predictors from new data.

The function modeltime_calibrate() will sometimes fail following an upgrade to the modeltime package or an upgrade to the R version. In this case the fitted model object argument should be refitted before applying modeltime_calibrate().

Cannot set Prophet yearly.seasonality

suppressMessages(library(tidyverse))
suppressMessages(library(tidyquant))
suppressMessages(library(timetk))
suppressMessages(library(tidymodels))
suppressMessages(library(modeltime))
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] modeltime_0.2.1 yardstick_0.0.7
#> [3] workflows_0.2.1 tune_0.1.1.9000
#> [5] rsample_0.0.8 recipes_0.1.13
#> [7] parsnip_0.1.3 modeldata_0.0.2
#> [9] infer_0.5.2 dials_0.0.9
#> [11] scales_1.1.1 broom_0.7.0
#> [13] tidymodels_0.1.1 timetk_2.3.0
#> [15] tidyquant_1.0.1 quantmod_0.4.17
#> [17] TTR_0.23-6 PerformanceAnalytics_2.0.4
#> [19] xts_0.12-0 zoo_1.8-8
#> [21] lubridate_1.7.9 forcats_0.5.0
#> [23] stringr_1.4.0 dplyr_1.0.2
#> [25] purrr_0.3.4 readr_1.3.1
#> [27] tidyr_1.1.2 tibble_3.0.4
#> [29] ggplot2_3.3.2 tidyverse_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] colorspace_1.4-1 ellipsis_0.3.1 class_7.3-17
#> [4] fs_1.5.0 rstudioapi_0.11 listenv_0.8.0
#> [7] furrr_0.2.0 prodlim_2019.11.13 fansi_0.4.1
#> [10] xml2_1.3.2 codetools_0.2-16 splines_4.0.2
#> [13] knitr_1.29.4 jsonlite_1.7.1 pROC_1.16.2
#> [16] dbplyr_1.4.4 compiler_4.0.2 httr_1.4.2
#> [19] backports_1.1.10 assertthat_0.2.1 Matrix_1.2-18
#> [22] cli_2.1.0 htmltools_0.5.0 tools_4.0.2
#> [25] gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5
#> [28] cellranger_1.1.0 DiceDesign_1.8-1 vctrs_0.3.4
#> [31] iterators_1.0.12 timeDate_3043.102 gower_0.2.2
#> [34] xfun_0.17 globals_0.13.1 rvest_0.3.5
#> [37] lifecycle_0.2.0 future_1.19.1 MASS_7.3-51.6
#> [40] ipred_0.9-9 hms_0.5.3 parallel_4.0.2
#> [43] yaml_2.2.1 curl_4.3 StanHeaders_2.21.0-6
#> [46] rpart_4.1-15 stringi_1.5.3 highr_0.8
#> [49] foreach_1.5.0 lhs_1.1.0 lava_1.6.8
#> [52] rlang_0.4.8 pkgconfig_2.0.3 evaluate_0.14
#> [55] lattice_0.20-41 tidyselect_1.1.0 plyr_1.8.6
#> [58] magrittr_1.5 R6_2.4.1 generics_0.0.2
#> [61] DBI_1.1.0 pillar_1.4.6 haven_2.3.1
#> [64] withr_2.3.0 survival_3.1-12 nnet_7.3-14
#> [67] modelr_0.1.8 crayon_1.3.4 Quandl_2.10.0
#> [70] rmarkdown_2.3 grid_4.0.2 readxl_1.3.1
#> [73] blob_1.2.1 reprex_0.3.0 digest_0.6.25
#> [76] RcppParallel_5.0.2 munsell_0.5.0 GPfit_1.0-8
#> [79] quadprog_1.5-8

bike_transactions_tbl <- bike_sharing_daily %>%
select(dteday, cnt) %>%
set_names(c("date", "value"))

splits <- bike_transactions_tbl %>%
time_series_split(assess = "3 months", cumulative = TRUE)
#> Using date_var: date

Prophet

model_fit_prophet <- prophet_reg() %>%
set_engine("prophet", yearly.seasonality=TRUE) %>%
fit(value ~ date, training(splits))
#> Warning: The following arguments cannot be manually modified and were removed:
#> yearly.seasonality.
#> Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Created on 2020-10-15 by the reprex package (v0.3.0)

Conf Intervals - qnorm approach

Review if the qnorm() approach should be used.

Getting Started with Modeltime: Some Notes

Thank you for the page walking through Modeltime. I found two minor items that you may consider changing in the walkthrough text.

In the script m750 <- m4_monthly %>% filter(id == "M750") I received an error because filter tried to call stats::filter. That went away when I invoked dplyr::filter.
The comment says split 80/20 but the code calls for a 90/10 split with the argument prop=0.9.

Modeltime with R 3.6.2 - not available

I am using R version 3.6.2 and getting the following error. When I look at the package on CRAN it says it is available for R (≥ 3.5.0). Are there known issues with 3.6.2?

Error installing R package: Could not install package with error: 1: package ‘Modeltime’ is not available (for R version 3.6.2)

modeltime_refit: Support for Parallel Processing

Support Parallel Processing similar to how tune and other packages handle via control.

References: