Coder Social home page Coder Social logo

business-science / modeltime Goto Github PK

View Code? Open in Web Editor NEW
501.0 28.0 81.0 52.11 MB

Modeltime unlocks time series forecast models and machine learning in one framework

Home Page: https://business-science.github.io/modeltime/

License: Other

R 99.08% CSS 0.92%
time time-series forecasting tidymodels machine-learning-algorithms machine-learning prophet arima ets tbats

modeltime's Introduction

modeltime

CRAN_Status_Badge Codecov test coverage R-CMD-check

Tidy time series forecasting in R.

Mission: Our number 1 goal is to make high-performance time series analysis easier, faster, and more scalable. Modeltime solves this with a simple to use infrastructure for modeling and forecasting time series.

Quickstart Video

For those that prefer video tutorials, we have an 11-minute YouTube Video that walks you through the Modeltime Workflow.

Introduction to Modeltime

(Click to Watch on YouTube)

Tutorials

Installation

CRAN version:

install.packages("modeltime", dependencies = TRUE)

Development version:

remotes::install_github("business-science/modeltime", dependencies = TRUE)

Why modeltime?

Modeltime unlocks time series models and machine learning in one framework

No need to switch back and forth between various frameworks. modeltime unlocks machine learning & classical time series analysis.

  • forecast: Use ARIMA, ETS, and more models coming (arima_reg(), arima_boost(), & exp_smoothing()).
  • prophet: Use Facebook’s Prophet algorithm (prophet_reg() & prophet_boost())
  • tidymodels: Use any parsnip model: rand_forest(), boost_tree(), linear_reg(), mars(), svm_rbf() to forecast

Forecast faster

A streamlined workflow for forecasting

Modeltime incorporates a streamlined workflow (see Getting Started with Modeltime) for using best practices to forecast.


A streamlined workflow for forecasting

A streamlined workflow for forecasting


Meet the modeltime ecosystem

Learn a growing ecosystem of forecasting packages

The modeltime ecosystem is growing

The modeltime ecosystem is growing

Modeltime is part of a growing ecosystem of Modeltime forecasting packages.

Summary

Modeltime is an amazing ecosystem for time series forecasting. But it can take a long time to learn:

  • Many algorithms
  • Ensembling and Resampling
  • Machine Learning
  • Deep Learning
  • Scalable Modeling: 10,000+ time series

Your probably thinking how am I ever going to learn time series forecasting. Here’s the solution that will save you years of struggling.

Take the High-Performance Forecasting Course

Become the forecasting expert for your organization

High-Performance Time Series Forecasting Course

High-Performance Time Series Course

Time Series is Changing

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

How to Learn High-Performance Time Series Forecasting

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:

  • Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
  • Deep Learning with GluonTS (Competition Winners)
  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter Tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • Scalable Forecasting - Forecast 1000+ time series in parallel
  • and more.

Become the Time Series Expert for your organization.


Take the High-Performance Time Series Forecasting Course

modeltime's People

Contributors

albertoalmuinha avatar davisvaughan avatar emilhvitfeldt avatar flrs avatar jorane avatar mdancho84 avatar olivroy avatar regisely avatar steviey avatar tonyk7440 avatar topepo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modeltime's Issues

Arima parameters

Hi there! First, I must say I´m loving modeltime! Really great workflow for forecasting! I have one question (kind of more philosophical) and one suggestion (if it makes sense).

  1. When trainning with auto.arima, I get a model ARIMA(1,23), for example. Then, when refitting, these parameters may change. Wouldn´t it make more sense to keep the same parameters (in my example, 1,2,3), and recalculate just the coefficients? My rationale is that those are the parameters selected during trainning, the same way you select the best number of trees when using xgboost (and keep this number when refitting).
  2. As for my suggestion, in my team, we have found that it is interesting to fit multiple ARIMA models using different lengths of the time series as train set, and select the best length. Using modeltime, I can easily do that with time_series_split, but, when refitting the model, the whole series is used. It would be interesting that modeltime_refit could use the same length of the time series that was used during trainning (or maybe it is already possible, and I just can´t see how - I apologise if that´s the case).

Thanks!

Dependency on Version of tidyselect

I installed {modeltime} and had {tidyselect} version 0.2.5.

Trying to use function plot_time_series() resulted in this error.

Error: 'eval_select' is not an exported object from 'namespace:tidyselect'

Upgrading to {tidyselect} 1.1.0 fixed the problem.

So perhaps the DESCRIPTION file should import some minimal version of {tidyselect}, somewhere in (0.2.5 and 1.1.0].

dials parameters

I'm working on another dials release. I was thinking of moving an dials parameter objects in ancillary packages into dials. Not required or anything, but having them in one place is nice from an organization standpoint.

Let me know if you want to move any of yours in there and I can add them (or you can PR).

modeltime_accuracy when test has just one point

When my test set has just one value, modeltime_accuracy doesn´t return any metrics, although the predictions are made. Is there a restriction in using just one value for test? I know it does´t make much sense, but I got curious.

Modeltime Model Calibration Failure - All Models Failed

Hi!..
I have an error in calibration step. The error message is:

`> calibration_tbl %>%

  • modeltime::modeltime_calibrate(new_data = testing(splits), quiet = FALSE)
    Error:

── Model Calibration Failure Report ────────────────────────

A tibble: 1 x 4

.model_id .model .model_desc .nested.col

1 1 LM NA
All models failed Modeltime Calibration:

  • Model 1: Failed Calibration.

Potential Solution: Use modeltime_calibrate(quiet = FALSE) AND Check the Error/Warning Messages for clues as to why your model(s) failed calibration.
── End Model Calibration Failure Report ────────────────────

Error: All models failed Modeltime Calibration.
Run rlang::last_error() to see where the error occurred.
In addition: Warning messages:
1: Problem with mutate() input .nested.col.
ℹ prediction from a rank-deficient fit may be misleading
ℹ Input .nested.col is purrr::map2(...).
2: In predict.lm(object = object$fit, newdata = new_data, type = "response") :
prediction from a rank-deficient fit may be misleading
3: Problem with mutate() input .nested.col

ℹ Could not reconcile actual data. To reconcile, please remove actual data from modeltime_forecast() and add manually using bind_rows().
ℹ Input .nested.col is purrr::map2(...).
4: Could not reconcile actual data. To reconcile, please remove actual data from modeltime_forecast() and add manually using bind_rows(). `

Archivo.zip

Thanks!

Error: No date or date-time variable provided. Please supply a date or date-time variable as a...

Hi Matt,

I am completely new to GitHub and I am trying to get an answer to the following problem I encountered.

I am trying to use fit_resample() together with arima_reg() and I am getting an error message that I cannot understand.

I am using the following code:

library(modeltime)
library(tidymodels)

df<-
data.frame(day=seq(as.Date("2020-1-1"), as.Date("2020-3-31"), "days"),
value=sample(1:100, length(seq(as.Date("2020-1-1"), as.Date("2020-3-31"), "days"))))

splits<-
sliding_window(df, lookback = Inf, skip=6, assess_start = 7, assess_stop = 14, complete = T)

model_fit_arima <-
arima_reg() %>%
set_engine("auto_arima")%>%
fit_resamples(value~day, data=df, resamples=splits)

I am getting the following error message:

x Slice01: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...
x Slice02: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...
x Slice03: model: Error: No date or date-time variable provided. Please supply a date or date-time variable as a...

The same code works with other models, i.e. this works:

linear_reg() %>% set_engine("lm")%>% fit_resamples(value~day, data=df, resamples=splits)

and the same data works with auto_arima without resampling, i.e. this works:

arima_reg() %>% set_engine("auto_arima")%>% fit(value~day,data=df)

I am not sure, why day is not accepted as date anymore, since:

class(analysis(splits$splits[[1]])$day)=="Date"
[1] TRUE

Many thanks in advances for your help and best wishes, Jan


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] yardstick_0.0.7 workflows_0.2.1 tune_0.1.1 rsample_0.0.8
[5] recipes_0.1.14 parsnip_0.1.4 modeldata_0.1.0 infer_0.5.3
[9] dials_0.0.9 scales_1.1.1 tidymodels_0.1.1 modeltime_0.3.0
[13] RColorBrewer_1.1-2 skimr_2.1.2 forcats_0.5.0 purrr_0.3.4
[17] tibble_3.0.3 tidyverse_1.3.0 fable_0.2.1 fabletools_0.2.0
[21] tsibble_0.9.2 dummies_1.5.6 tseries_0.10-46 tscount_1.4.1
[25] magick_2.0 gganimate_1.0.2 lubridate_1.7.4 fpp2_2.3
[29] expsmooth_2.3 fma_2.3 forecast_8.5 readr_1.3.1
[33] data.table_1.12.8 haven_2.3.1 DMwR_0.4.1 xgboost_1.2.0.1
[37] circlize_0.4.5 plotROC_2.2.1 tableone_0.10.0 ggjoy_0.4.1
[41] ggridges_0.5.1 stringr_1.4.0 cowplot_0.9.4 ranger_0.11.1
[45] caret_6.0-81 pROC_1.16.2 rpart.plot_3.0.6 rpart_4.1-13
[49] party_1.3-1 strucchange_1.5-1 sandwich_2.5-0 zoo_1.8-4
[53] modeltools_0.2-22 mvtnorm_1.0-8 mctest_1.2 BSDA_1.2.0
[57] lattice_0.20-38 tidyr_1.1.0 plotrix_3.7-4 XML_3.98-1.17
[61] margins_0.3.23 broom_0.7.2 lme4_1.1-20 Matrix_1.2-15
[65] foreign_0.8-72 ggpubr_0.2 magrittr_1.5 reshape2_1.4.3
[69] ggplot2_3.3.2 readxl_1.3.1 dplyr_1.0.1 plyr_1.8.4

loaded via a namespace (and not attached):
[1] utf8_1.1.4 tidyselect_1.1.0 munsell_0.5.0 codetools_0.2-15
[5] future_1.11.1.1 withr_2.2.0 colorspace_1.4-1 knitr_1.29
[9] rstudioapi_0.11 ROCR_1.0-7 TTR_0.23-4 listenv_0.7.0
[13] repr_1.1.0 DiceDesign_1.8-1 farver_2.0.3 vctrs_0.3.2
[17] generics_0.0.2 TH.data_1.0-10 ipred_0.9-8 xfun_0.16
[21] R6_2.4.1 bitops_1.0-6 lhs_1.0.1 assertthat_0.2.1
[25] multcomp_1.4-8 nnet_7.3-12 gtable_0.3.0 globals_0.12.4
[29] timeDate_3043.102 rlang_0.4.7 GlobalOptions_0.1.0 splines_3.5.2
[33] ModelMetrics_1.2.2 yaml_2.2.1 prediction_0.3.6.2 abind_1.4-5
[37] modelr_0.1.6 backports_1.1.8 quantmod_0.4-13 tools_3.5.2
[41] lava_1.6.5 ltsa_1.4.6 ellipsis_0.3.1 gplots_3.0.1.1
[45] Rcpp_1.0.5 base64enc_0.1-3 progress_1.2.2 prettyunits_1.1.1
[49] slider_0.1.5 fracdiff_1.4-2 fs_1.5.0 survey_3.35-1
[53] furrr_0.1.0 warp_0.1.0 lmtest_0.9-36 reprex_0.3.0
[57] GPfit_1.0-8 hms_0.5.3 shape_1.4.4 compiler_3.5.2
[61] KernSmooth_2.23-15 crayon_1.3.4 minqa_1.2.4 StanHeaders_2.21.0-6
[65] htmltools_0.5.0 RcppParallel_5.0.2 DBI_1.1.0 tweenr_1.0.1
[69] dbplyr_1.4.2 MASS_7.3-51.1 cli_2.0.2 quadprog_1.5-5
[73] gdata_2.18.0 parallel_3.5.2 gower_0.1.2 pkgconfig_2.0.3
[77] coin_1.2-2 xml2_1.3.2 foreach_1.4.4 prodlim_2018.04.18
[81] anytime_0.3.7 rvest_0.3.5 distributional_0.2.0 digest_0.6.25
[85] cellranger_1.1.0 curl_4.3 gtools_3.8.1 urca_1.3-0
[89] nloptr_1.2.1 lifecycle_0.2.0 nlme_3.1-137 jsonlite_1.7.0
[93] fansi_0.4.1 pillar_1.4.6 httr_1.4.1 survival_2.43-3
[97] glue_1.4.1 xts_0.11-2 iterators_1.0.10 class_7.3-14
[101] stringi_1.4.6 caTools_1.17.1.1 e1071_1.7-0.1

Error: Problem with `mutate()` input `.is_null`. - New Factor Levels

While checking the accuracy table on the holdout test using the function modeltime_accuracy() throws me an error. Help!

library(modeltime)
library(tidymodels)
#> -- Attaching packages ---------------- tidymodels 0.1.0 --
#> v broom     0.7.0      v recipes   0.1.13
#> v dials     0.0.8      v rsample   0.0.7 
#> v dplyr     1.0.2      v tibble    3.0.3 
#> v ggplot2   3.3.2      v tune      0.1.1 
#> v infer     0.5.3      v workflows 0.2.1 
#> v parsnip   0.1.3      v yardstick 0.0.6 
#> v purrr     0.3.4
#> -- Conflicts ------------------- tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
library(modeltime.ensemble)
library(timetk)
library(tidyverse)
library(recipes)


df.total_pr1 <- read_csv("data.csv")
#> Parsed with column specification:
#> cols(
#>   Week = col_date(format = ""),
#>   Product_Name = col_character(),
#>   Demand_value = col_double()
#> )

df.total_pr1 %>%
  plot_seasonal_diagnostics(
    Week,Demand_value,
    .feature_set = c("week", "month.lbl"),
    .interactive = TRUE
  )
#> Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
#> Please use `group_by()` instead.
#> See vignette('programming') for more help
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_warnings()` to see where this warning was generated.

splits <- splits <- df.total_pr1 %>%
  arrange(Week) %>% 
  time_series_split(date_var = Week,
                    assess = "40 weeks", cumulative = TRUE)

splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(Week, Demand_value, .interactive = FALSE)

recipe_spec <- recipe(Demand_value ~ Week, df.total_pr1) %>%
  step_timeseries_signature(Week) %>%
  step_rm(matches("(iso$)|(xts$)|(day)|(hour)|(min)|(sec)|(am.pm)")) %>%
  step_mutate(Week_week = factor(Week_week, ordered = TRUE)) %>%
  step_dummy(all_nominal()) %>%
  step_normalize(contains("index.num"), Week_year)

recipe_spec %>% prep() %>% juice()
#> # A tibble: 52 x 35
#>    Week       Demand_value Week_index.num Week_year Week_half Week_quarter
#>    <date>            <dbl>          <dbl>     <dbl>     <int>        <int>
#>  1 2019-09-01        11773          0.792     0.550         2            3
#>  2 2019-10-01        13330          0.879     0.550         2            4
#>  3 2019-11-01         5743          0.969     0.550         2            4
#>  4 2019-12-01         6244          1.06      0.550         2            4
#>  5 2020-01-01         2560          1.15      1.57          1            1
#>  6 2020-03-01        24122          1.32      1.57          1            1
#>  7 2020-04-01         2980          1.41      1.57          1            2
#>  8 2020-05-01         1950          1.50      1.57          1            2
#>  9 2020-03-01          100          1.32      1.57          1            1
#> 10 2020-05-01          270          1.50      1.57          1            2
#> # ... with 42 more rows, and 29 more variables: Week_month <int>,
#> #   Week_mweek <int>, Week_week2 <int>, Week_week3 <int>, Week_week4 <int>,
#> #   Week_month.lbl_01 <dbl>, Week_month.lbl_02 <dbl>, Week_month.lbl_03 <dbl>,
#> #   Week_month.lbl_04 <dbl>, Week_month.lbl_05 <dbl>, Week_month.lbl_06 <dbl>,
#> #   Week_month.lbl_07 <dbl>, Week_month.lbl_08 <dbl>, Week_month.lbl_09 <dbl>,
#> #   Week_month.lbl_10 <dbl>, Week_month.lbl_11 <dbl>, Week_week_01 <dbl>,
#> #   Week_week_02 <dbl>, Week_week_03 <dbl>, Week_week_04 <dbl>,
#> #   Week_week_05 <dbl>, Week_week_06 <dbl>, Week_week_07 <dbl>,
#> #   Week_week_08 <dbl>, Week_week_09 <dbl>, Week_week_10 <dbl>,
#> #   Week_week_11 <dbl>, Week_week_12 <dbl>, Week_week_13 <dbl>

model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
  set_engine("glmnet")

#workflow
#elastic net
wflw_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec %>% step_rm(Week)) %>%
  fit(training(splits))

#xgboost
model_spec_xgboost <- boost_tree() %>%
  set_engine("xgboost")

set.seed(123)
wflw_fit_xgboost <- workflow() %>%
  add_model(model_spec_xgboost) %>%
  add_recipe(recipe_spec %>% step_rm(Week)) %>%
  fit(training(splits))

#prophet
model_spec_prophet <- prophet_reg(
  seasonality_yearly = TRUE
) %>%
  set_engine("prophet") 

wflw_fit_prophet <- workflow() %>%
  add_model(model_spec_prophet) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

submodels_tbl <- modeltime_table(
  wflw_fit_glmnet,
  wflw_fit_xgboost,
  wflw_fit_prophet
)

submodels_tbl
#> # Modeltime Table
#> # A tibble: 3 x 3
#>   .model_id .model     .model_desc          
#>       <int> <list>     <chr>                
#> 1         1 <workflow> GLMNET               
#> 2         2 <workflow> XGBOOST              
#> 3         3 <workflow> PROPHET W/ REGRESSORS

submodels_tbl %>% 
  modeltime_accuracy(testing(splits)) %>%
  table_modeltime_accuracy(.interactive = FALSE)
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Warning: Problem with `mutate()` input `.nested.col`.
#> i There are new levels in a factor: 14, 27
#> i Input `.nested.col` is `purrr::map(...)`.
#> Warning: There are new levels in a factor: 14, 27
#> Error: Problem with `mutate()` input `.is_null`.
#> x object '.calibration_data' not found
#> i Input `.is_null` is `purrr::map_lgl(.calibration_data, is.null)`.

Created on 2020-10-19 by the reprex package (v0.3.0)

Warning messages: 1: Problem with `mutate()` input `.nested.col`. i prediction from a rank-deficient fit may be misleading i Input `.nested.col` is `purrr::map(...)`. 2: In predict.lm(object = object$fit, newdata = new_data, type = "response") : prediction from a rank-deficient fit may be misleading

Problem

You are getting this warning message when using linear regression from linear_reg() with lm via set_engine("lm").

Warning messages:
1: Problem with `mutate()` input `.nested.col`.
i prediction from a rank-deficient fit may be misleading
i Input `.nested.col` is `purrr::map(...)`. 
2: In predict.lm(object = object$fit, newdata = new_data, type = "response") :
  prediction from a rank-deficient fit may be misleading

Why the Warning?

You have a rank deficient matrix, which isn't the end of the world. It's just a warning indicating your results could be misleading because you have a lot of features, some features may have zero variance, etc. Th

Solution 1 - Stick with LM & ingore warning

This is just a warning message. The vctrs library attempts to locate where the warning is occurring. Internally to Modeltime, the modeltime_calibrate() uses a temporary column called .nested.col that maps your models to calibration data. You get 2 warnings:

  1. The warning message 1st shows you where the warning occurs
  2. The warning message shows you the call to predict.lm() that generates the warning message.

To get rid of this message, wrap your code in suppressWarnings().

Solution 2 - Use an algorithm that implements regularization

Algorithms like GLMNET & XGBoost implement regularized machine learning, which reduces or eliminates the effect of poor predictors.

Lesson 11.11 - Model Inspection - Visualizing the Future Forecast

in the session 11.11 I was trying to forecast my future data and I got the following error:
Error: Problem occurred combining processed data with timestamps. Most likely cause is rows being added or removed during preprocessing. Try imputing missing values to retain the full number of rows.
All my lag models was removed for my data, follow my code:
forecast_future_tbl <- refit_tbl %>%
modeltime_forecast(
new_data = forecast_tbl,
actual_data = iniciativa_full1
)

Where you read : actual_data = iniciativa_full1 is actual_data = data_prepared_tbl
Follow my recipe for lag models:
recipe_spec_2_lag <- recipe_spec_base %>%
step_rm(data_planejada) %>%
step_naomit(starts_with("lag_"))
I did not have any issues befofe this step, I could do lag models normally

object '.key' not found

I get two top performing models from the modeltime_refit() function like so:

refit_tbl <- calibration_tbl %>%
  modeltime_refit(data = df_anomalized_tbl)

top_two_models <- refit_tbl %>% 
  modeltime_accuracy() %>% 
  arrange(mae) %>% 
  slice(1:2)

> refit_tbl %>%
+   filter(.model_id %in% top_two_models$.model_id)
# Modeltime Table
# A tibble: 2 x 5
  .model_id .model   .model_desc .type .calibration_data
      <int> <list>   <chr>       <chr> <list>           
1         4 <fit[+]> PROPHET     Test  <tibble [8 x 4]> 
2         6 <fit[+]> LM          Test  <tibble [8 x 4]> 

I then try to predict 1 year out like so

refit_tbl %>%
  filter(.model_id %in% top_two_models$.model_id) %>%
  modeltime_forecast(h = "1 year", actual_data = df_anomalized_tbl) %>%
  plot_modeltime_forecast(
    .legend_max_width = 25
    , .interactive = FALSE
    , .title = "IP Discharges Excess Days Forecast 1 Year Out"
  )

I get the following error:

> refit_tbl %>%
+   filter(.model_id %in% top_two_models$.model_id) %>%
+   modeltime_forecast(h = "1 year", actual_data = df_anomalized_tbl)
Error: Attempt to extend '.calibration_data' into the future using 'h' has failed.
Error: Attempt to extend '.calibration_data' into the future using 'h' has failed.
Error: Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Unknown or uninitialised column: `.key`. 

Here is the last rlang::error() :

rlang::last_error()
<error/dplyr_error>
Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Backtrace:
  1. dplyr::filter(., .model_id %in% top_two_models$.model_id)
  2. modeltime::modeltime_forecast(., h = "1 year", actual_data = df_anomalized_tbl)
 25. dplyr:::h(simpleError(msg, call))
Run `rlang::last_trace()` to see the full context.

Here is the last rlang::trace():

 rlang::last_trace()
<error/dplyr_error>
Problem with `filter()` input `..1`.
x object '.key' not found
i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.
Backtrace:
     x
  1. +-`%>%`(...)
  2. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  3. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  4. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5. |     \-`_fseq`(`_lhs`)
  6. |       \-magrittr::freduce(value, `_function_list`)
  7. |         +-base::withVisible(function_list[[k]](value))
  8. |         \-function_list[[k]](value)
  9. |           +-modeltime::modeltime_forecast(., h = "1 year", actual_data = df_anomalized_tbl)
 10. |           \-modeltime:::modeltime_forecast.mdl_time_tbl(...)
 11. |             \-ret %>% dplyr::filter(.model_desc == "ACTUAL" | .key == "prediction")
 12. |               +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 13. |               \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 14. |                 \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 15. |                   \-modeltime:::`_fseq`(`_lhs`)
 16. |                     \-magrittr::freduce(value, `_function_list`)
 17. |                       +-base::withVisible(function_list[[k]](value))
 18. |                       \-function_list[[k]](value)
 19. |                         +-dplyr::filter(., .model_desc == "ACTUAL" | .key == "prediction")
 20. |                         \-dplyr:::filter.data.frame(...)
 21. |                           \-dplyr:::filter_rows(.data, ...)
 22. |                             +-base::withCallingHandlers(...)
 23. |                             \-mask$eval_all_filter(dots, env_filter)
 24. \-base::.handleSimpleError(...)
 25.   \-dplyr:::h(simpleError(msg, call))

session info:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] anomalize_0.2.1            tidyquant_1.0.1            quantmod_0.4.17           
 [4] TTR_0.24.2                 PerformanceAnalytics_2.0.4 xts_0.12.1                
 [7] zoo_1.8-8                  janitor_2.0.1              DBI_1.1.0                 
[10] odbc_1.2.2                 timetk_2.4.0               lubridate_1.7.9           
[13] forcats_0.5.0              stringr_1.4.0              readr_1.4.0               
[16] tidyverse_1.3.0            modeltime_0.2.1            yardstick_0.0.7           
[19] workflows_0.2.1            tune_0.1.1                 tidyr_1.1.2               
[22] tibble_3.0.4               rsample_0.0.8              recipes_0.1.13            
[25] purrr_0.3.4                parsnip_0.1.3              modeldata_0.0.2           
[28] infer_0.5.3                ggplot2_3.3.2              dplyr_1.0.2               
[31] dials_0.0.9                scales_1.1.1               broom_0.7.1               
[34] tidymodels_0.1.1           pacman_0.5.1              

loaded via a namespace (and not attached):
  [1] readxl_1.3.1         backports_1.1.10     plyr_1.8.6           lazyeval_0.2.2      
  [5] splines_3.6.3        crosstalk_1.1.0.1    listenv_0.8.0        inline_0.3.16       
  [9] digest_0.6.25        foreach_1.5.1        htmltools_0.5.0      earth_5.3.0         
 [13] fansi_0.4.1          magrittr_1.5         globals_0.13.1       modelr_0.1.8        
 [17] gower_0.2.2          matrixStats_0.57.0   RcppParallel_5.0.2   hardhat_0.1.4       
 [21] prettyunits_1.1.1    forecast_8.13        tseries_0.10-47      colorspace_1.4-1    
 [25] blob_1.2.1           rvest_0.3.6          haven_2.3.1          callr_3.5.1         
 [29] crayon_1.3.4         jsonlite_1.7.1       progressr_0.6.0      survival_3.2-7      
 [33] iterators_1.0.13     glue_1.4.2           gtable_0.3.0         ipred_0.9-9         
 [37] V8_3.2.0             pkgbuild_1.1.0       Quandl_2.10.0        rstan_2.21.2        
 [41] Rcpp_1.0.5           plotrix_3.7-8        viridisLite_0.3.0    GPfit_1.0-8         
 [45] bit_4.0.4            Formula_1.2-3        stats4_3.6.3         tibbletime_0.1.6    
 [49] lava_1.6.8           StanHeaders_2.21.0-6 prodlim_2019.11.13   htmlwidgets_1.5.2   
 [53] httr_1.4.2           ellipsis_0.3.1       loo_2.3.1            pkgconfig_2.0.3     
 [57] farver_2.0.3         nnet_7.3-14          dbplyr_1.4.4         utf8_1.1.4          
 [61] tidyselect_1.1.0     labeling_0.3         rlang_0.4.8          DiceDesign_1.8-1    
 [65] reactR_0.4.3         TeachingDemos_2.12   munsell_0.5.0        cellranger_1.1.0    
 [69] tools_3.6.3          xgboost_1.2.0.1      cli_2.1.0            generics_0.0.2      
 [73] sweep_0.2.3          yaml_2.2.1           processx_3.4.4       bit64_4.0.5         
 [77] fs_1.5.0             future_1.19.1        nlme_3.1-149         reactable_0.2.3     
 [81] xml2_1.3.2           compiler_3.6.3       rstudioapi_0.11      plotly_4.9.2.1      
 [85] curl_4.3             reprex_0.3.0         lhs_1.1.1            stringi_1.5.3       
 [89] plotmo_3.6.0         ps_1.4.0             lattice_0.20-41      Matrix_1.2-18       
 [93] urca_1.3-0           vctrs_0.3.4          pillar_1.4.6         lifecycle_0.2.0     
 [97] furrr_0.2.0          lmtest_0.9-38        data.table_1.13.0    R6_2.4.1            
[101] gridExtra_2.3        codetools_0.2-16     MASS_7.3-53          assertthat_0.2.1    
[105] withr_2.3.0          fracdiff_1.5-1       parallel_3.6.3       hms_0.5.3           
[109] quadprog_1.5-8       grid_3.6.3           rpart_4.1-15         timeDate_3043.102   
[113] class_7.3-17         snakecase_0.11.0     prophet_0.6.1        pROC_1.16.2 

modeltime.rdb is corrupt

library(modeltime)
Error: package or namespace load failed for ‘modeltime’ in get(Info[i, 1], envir = env):
lazy-load database 'C:/R-4.0.2/library/modeltime/R/modeltime.rdb' is corrupt
In addition: Warning messages:
1: package ‘modeltime’ was built under R version 4.0.3
2: In get(Info[i, 1], envir = env) : internal error -3 in R_decompress1

Refit and Forecast 1 day ahead with exogenous regressors

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

Thank you Matt for the marvelous code.

I can't find any documentation/information how to refit and forecast 1 day ahead,
using exogenous regressors.

In the forecast package by Prof. Hyndman I would say:

arima.forecast <- forecast(arima.model,h=myH,xreg=newRegressors,biasadj=T)

When I say...

            unseenPredict<-calibration_table %>%
                modeltime_refit(model_table,data=allData) %>%
                #modeltime_forecast(h="3 months",actual_data=allData)# %>%
                modeltime_forecast(new_data=testing(splits),actual_data=allData)# %>%

... I get data by type actual and forecast until the last day contained in splits.
But I want to forecast one unseen day ahead, including exogenous regressors used while training.

In regard to "modeltime_forecast()" I read about a future tibble. But then the documentation PDF ends... "Forecasting Future Data: See future_frame() for creating future tibbles." "future_frame()" seems to be a dead link.

Are there any hints available- or a small code example?

Thank you.

Identical .model_desc for PROPHET

I have run two different PRHOPHET models like so:

# Prophet -----------------------------------------------------------------

model_fit_prophet <- prophet_reg() %>%
  set_engine(engine = "prophet") %>%
  fit(observed_cleaned ~ date_col, data = training(splits))

model_fit_prophet_boost <- prophet_boost(learn_rate = 0.1) %>% 
  set_engine("prophet_xgboost") %>%
  fit(observed_cleaned ~ date_col + as.numeric(date_col) + factor(hour(date_col), ordered = FALSE), data = training(splits))

When I add them to a modeltime_table() the model desc is the same, PROPHET

> models_tbl
# Modeltime Table
# A tibble: 7 x 3
  .model_id .model     .model_desc                              
      <int> <list>     <chr>                                    
1         1 <fit[+]>   ARIMA(1,1,2)(1,0,0)[12]                  
2         2 <fit[+]>   ARIMA(1,1,2)(1,0,0)[12] W/ XGBOOST ERRORS
3         3 <fit[+]>   ETS(M,AD,A)                              
4         4 <fit[+]>   PROPHET                                  
5         5 <fit[+]>   PROPHET                                  
6         6 <fit[+]>   LM                                       
7         7 <workflow> EARTH   

Would it be hard to implement naming the prophet_boost to something like PROPHET BOOST?

Recursive forecasting

Among the popular strategies of time series modelling, we can mention regression models with lagged variables. Such variables are often created by shifting our dependent variable(s). It makes us use some special approches to deal with lags in new data, i. a. recursive forecasting. As far as I know, currently there is no widely used R library, which deliver that feature. Interestingly, @edgBR
in #5 names fable and modeltime as packages for recursive forecasting, which indicates he probably used some other sense of this notion.

I've written this issue, because I started working on an add-in for tidymodels/modeltime ecosystem, which facilitates turning regular regression models into recursive ones. Proposed API may look as follows:

library(parsnip)
library(recipes)

dax_stock <- 
  as_tibble(EuStockMarkets) %>% 
  select(DAX) %>% 
  bind_rows(tibble(DAX = rep(NA, 30)))

recipe_dax_stock <-
  recipe(DAX ~ ., data = dax_stock) %>% 
  step_lag(all_outcomes(), lag = 1:5) %>% 
  prep()

model_linear <- 
  linear_reg() %>% 
  set_engine("lm") %>% 
  fit(DAX ~ ., data = dax_stock)

# Here, we add recursion to the model
# We pass recipe to re-generate new data after each step
# We get a model with additional class, say: 'recursive'
recursive_linear <- 
  model_linear %>% 
  recursive(recipe_dax_stock)

# predict.recursive, which internally calls predict.model_fit
recursive_linear %>% 
  predict(new_data)

Obviously, there is a couple of places, where the implementaions should be well thought out.
I can elaborate it later if needed.

After this longish introduction, I would like to ask:
Would you be interested in including recursive forecasting into modeltime or it lies outside the scope of this great library?

Ability to Forecast without Calibrating/Refitting

In situations where accuracy & confidence intervals are not required, the ability to forecast without running through the additional calibration and refitting steps is desired.

Proposed Implementation:

The user will skip the calibration step and go straight from modeltime_table() to modeltime_forecast(). From their, the user can provide new_data or h along with actual_data, and the forecast will be produced without confidence intervals (this is the purpose of calibration).

modeltime_table(
    model_fit_prophet,
    model_fit_lm
) %>%
    modeltime_forecast(
        h = "3 years",
        actual_data = m750
    ) %>%
    plot_modeltime_forecast(.conf_interval_show = F)

image

Auto.arima fit function

I was inspecting your package today. In first example, you use auto.arima model:

# Model 1: auto_arima ----
model_fit_arima_no_boost <- arima_reg() %>%
    set_engine(engine = "auto_arima") %>%
    fit(value ~ date, data = training(splits))

I don't understand why do you have formula input in the fit function when auto.arima function from forecast package doesn't have fornula input. It has only univariate series (y) argument. It's confusing for me to understand how formula is converted to the main function arguments.

namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required

library(tidymodels)
Error: package or namespace load failed for ‘tidymodels’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘rlang’ 0.4.6 is being loaded, but >= 0.4.7 is required
library(modeltime)
Error: package or namespace load failed for ‘modeltime’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required
library(tidyverse)
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
namespace ‘vctrs’ 0.2.4 is already loaded, but >= 0.3.0 is required
library(lubridate)

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

date, intersect, setdiff, union

library(timetk)
Error: package or namespace load failed for ‘timetk’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
namespace ‘dplyr’ 0.8.5 is already loaded, but >= 1.0.0 is required

Error: Tuning nnetar_reg() hyperparameter num_networks

Hello Matt,

thanks again for you fas implementation of NNETAR. I tried it out today and came accross the following error after trying to tune the num_networks hyperparameter using the following code:

tune_nnetar_model <-
  nnetar_reg(
    seasonal_period = 12,
    non_seasonal_ar = 1,
    seasonal_ar = 1,
    hidden_units = tune(),
    num_networks = tune(),
    penalty = tune(),
    epochs = tune()
  ) %>%
  set_engine("nnetar") %>%
  set_mode("regression")

Tuning

n_levels <- 1
tune_grid <- grid_regular(
  hidden_units(),
  num_networks(),
  penalty(),
  epochs(),
  levels = n_levels
)

Workflow

nnetar_workflow <- 
  workflow() %>% 
  add_model(tune_nnetar_model) %>% 
  add_recipe(steenkopies_recipe)

Modelling with resamples

nnetar_resampling <- 
  nnetar_workflow %>% 
  tune_grid(
    resamples = resampling_strategy_cv5fold,
    grid = tune_grid)
Error: Problem with `mutate()` input `object`. x Error when calling num_networks(): Error : 'num_networks' is not an exported object from 'namespace:dials' i Input `object` is `purrr::map(call_info, eval_call_info)`.

Sorry for asking if the cause should be obvious but I couldn't find anything related in the documentation.
I can add a reprex if needed.

Baselines for comparisons: NAIVE and Window Regression

Baseline Methods

An important way to compare cutting-edge performance is to use a baseline model, these are simple models that organizations are accustomed to using (e.g. simple moving average or naive models). We can use these to showcase how much better high-performance methods like stacking, ensembling, and better algorithms (e.g. XGBoost) can do.

Comparison models

There are currently two types of comparison models:

  1. Window Regression (window_reg) - This can be used to showcase how a moving average, weighted average, or even simple seasonal models would perform.
  2. NAIVE Regression (naive_reg) - This can be used to perform NAIVE (Pick most recent observation) and Seasonal NAIVE (replicate most recent seasonal sequence).

Parameter Tuning

  • Determine if tune helpers needed to be added to modeltime.
  • Parameter tuning vignette

Error: package or namespace load failed for ‘prophet’ in dyn.load(file, DLLpath = DLLpath, ...)

MacOS seems to have an issue (sometimes???) when loading prophet.

> library(prophet)
Loading required package: Rcpp
Loading required package: rlang
Error: package or namespace load failed for ‘prophet’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so':
  dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so, 6): Library not loaded: @rpath/libtbb.dylib
  Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/prophet/libs/prophet.so
  Reason: image not found

Solution

The easiest way to fix is to load StanHeaders.
This sets the C++ flags and libraries, which seems to solve the missed libtbb.dylib path.

updates for upcoming parsnip

I'll be sending parsnip 0.1.2 to CRAN very soon and it has some differences in how to choose encodings from how modeltime currently does it.

Using arima_reg as an example, the model definition should use the new set_encoding() interface (rather than passing indicators directly). To make sure that parsnip (and, soon workflows) does no modifications to the predictors columns, use

set_encoding(
  model = "arima_reg",
  eng = "auto_arima",
  mode = "regression",
  options = list(
    predictor_indicators = "none",
    compute_intercept = FALSE,
    remove_intercept = FALSE
  )
)

They same type of declaration is required for each engine/model combination.

The current GH version of parsnip can be used for testing.

arima_boost strange prediction

I'm not sure if this is a bug or what. I followd the "Getting started" post on Modeltime but on my own time series. I did not change the XGBoost parameters (min_n = 2 and learning_rate 0 0.015). The prediction is totally off, see graph. Why is this happening, any idea?

image

Forecast NNETAR - Add support in Modeltime

Hi all,

as already posted here:
"I would like to build a parsnip model for a simple neural net as implemented in the nnetar() function from the forecast package."

According to the hints from @topepo and @mdancho84 I followed this example to build a model from nnetar using the following code:

library(tidymodels)
library(modeltime)

set_new_model("narx_neuralnet")
set_model_mode(model = "narx_neuralnet", mode = "regression")
set_model_engine(
  "narx_neuralnet",
  mode = "regression",
  eng = "nnetar"
)
set_dependency("narx_neuralnet", eng = "nnetar", pkg = "forecast")

show_model_info("narx_neuralnet")


set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "hidden_units",
  original = "size",
  func = list(fun = "hidden_units", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "epochs",
  original = "repeats",
  func = list(fun = "epochs", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "nonseas_lags",
  original = "p",
  func = list(fun = "sample_size", pkg = "dials"),
  has_submodel = FALSE
)

set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "seas_lags",
  original = "P",
  func = list(fun = "sample_size", pkg = "dials"),
  has_submodel = FALSE
)

#decay = L2 regularization, 
set_model_arg(
  "narx_neuralnet",
  eng = "nnetar",
  parsnip = "penalty",
  original = "decay",
  func = list(fun = "penalty", pkg = "dials"),
  has_submodel = FALSE
)


narx_neuralnet <-
  function(mode = "regression",
           hidden_units = NULL,
           epochs = NULL,
           nonseas_lags = NULL,
           seas_lags = NULL,
           penalty = NULL) {
    # Check for correct mode
    if (mode != "regression") {
      stop("`mode` should be 'regression'", call. = FALSE)
    }

    # Capture the arguments in quosures
    args <- list(
      hidden_units = rlang::enquo(hidden_units),
      epochs = rlang::enquo(epochs),
      nonseas_lags = rlang::enquo(nonseas_lags),
      seas_lags = rlang::enquo(seas_lags),
      penalty = rlang::enquo(penalty)
    )

    parsnip::new_model_spec(
      "narx_neuralnet",
      args     = args,
      eng_args = NULL,
      mode     = mode,
      method   = NULL,
      engine   = NULL
    )
  }



# Bridge function for fitting
bridge_nnetar_fit <- function(x, y, 
                              hidden_units = NULL, 
                              epochs = NULL,
                              nonseas_lags = NULL,
                              seas_lags = NULL,
                              penalty = NULL, 
                              ...) {
  
  outcome    <- y # Comes in as a vector
  predictors <- x # Comes in as a data.frame (dates and possible xregs)
  
  # 2. Predictors - Handle Dates 
  index_tbl <- modeltime::parse_index_from_data(predictors)
  idx_col   <- names(index_tbl)
  idx       <- timetk::tk_index(index_tbl)
  
  # 3. Predictors - Handle Xregs
  xreg_recipe <- create_xreg_recipe(predictors, prepare = TRUE)
  xreg_matrix <- juice_xreg_recipe(xreg_recipe, format = "matrix")
  
  # 4. Fitting
  model_1 <- forecast::nnetar(y = outcome, 
                              x = predictors, 
                              size = hidden_units,
                              repeats = epochs,
                              p = nonseas_lags,
                              P = seas_lags,
                              decay = penalty,
                              ...)
  
  # 5. New Modeltime Bridge
  new_modeltime_bridge(
    class  = "bridge_nnetar_fit",
    models = list(model_1 = model_1),
    data   = tibble::tibble(
      idx_col   := idx,
      .actual    = y,
      .fitted    = model_1$fitted,
      .residuals = model_1$residuals
    ),
    extras = list(xreg_recipe = xreg_recipe), # Can add xreg preprocessors here
    desc   = stringr::str_c("NARX Model: ", model_1$model$method)
  )
  
}


print.bridge_nnetar_fit <- function(x, ...) {
  
  model <- x$models$model_1$model
  
  cat(x$desc)
  cat("\n")
  print(model$call)
  cat("\n")
  print(
    tibble(
      aic    = model$aic,
      bic    = model$bic,
      aicc   = model$aicc,
      loglik = model$loglik,
      mse    = model$mse
    )
  )
  invisible(x)
}

When testing the bridge function bridge_nnetar_fit with

data_reprex <- beaver1 %>% 
  as_tibble() %>% 
  mutate(day = as.Date(day, origin = "2019-01-01")) %>% 
  mutate(time = str_pad(as.character(time), 4, pad = "0")) %>% 
  mutate(date = lubridate::ymd_hm(str_c(day, time)),
         .before = 1) %>% 
  select(-day, -time)

nnetar_test <- bridge_nnetar_fit(
  x = select(data_reprex, -temp),
  y = pull(data_reprex, temp),
  epochs = 100,
  penalty = 0.001,
  seas_lags = 1,
  nonseas_lags = 1,
  hidden_units = 5
)

(Its not the best dataset for this purpose, but it doesn't matter at this point)

It gives me the error message:

Error in is.constant(na.interp(x)) : 
  'list' object cannot be coerced to type 'double'

I couldn't figure out what's its cause. I would be happy if anyone could help me out with that.
Thanks a lot,
Max

add_formula(): Error: No date or date-time variable provided. Please supply a date or date-time variable as a predictor.

The workflow add_formula() interface does not seem to be transferring the date as an encoded feature.

library(tidymodels)
library(modeltime)
library(tidyverse)
library(lubridate)
library(timetk)

# Data
m750 <- m4_monthly %>% filter(id == "M750")

# Split Data 80/20
splits <- initial_time_split(m750, prop = 0.9)

# --- MODELS ---

model_spec_arima <- arima_reg() %>%
    set_engine(engine = "auto_arima") 

# --- WORKFLOW ---
workflow() %>% 
    add_model(model_spec_arima) %>% 
    add_formula(value ~ date) %>%  # <-- Error here
    fit(training(splits))

# > Error: No date or date-time variable provided. Please supply a date or date-time variable as a predictor.

Modeltime Ecosystem Roadmap: New Algorithms & Models

Modeltime Ecosystem Roadmap

The modeltime project roadmap tracks the overall development of the Modeltime Ecosystem of forecasting packages. Modeltime is a cutting-edge ecosystem for forecasting using strategies and best practices that won or placed highly in major forecasting competitions. We have a state-of-the-art Time Series Forecasting Course (DS4B 203-R) that teaches Machine Learning, Deep Learning, and Feature Engineering for Time Series. Take this course to become the forecasting expert for your organization.

Forecasting Approaches

  • Global Models: Panel data and global modeling was introduced in Modeltime >= 0.3.0 and Modeltime Ensemble >= 0.3.0. We now have a vignette covering Forecasting with Panel Data.
  • Iterative Forecasting: The sknifedatar package extends modeltime for iterative forecasting. We are also working on an iterative forecasting solution that can be tracked with Issue #122. Nested Forecasting has been implemented. Refer to the Nested Forecasting Tutorial.

Ensembles (Stacking, Averaging)

Please refer to modeltime.ensemble R package: https://business-science.github.io/modeltime.ensemble/

Forecast

These algorithms are provided in the base modeltime package. Refer to included models in the model list: https://business-science.github.io/modeltime/articles/modeltime-model-list.html

GluonTS (Deep Learning)

Please refer to modeltime.gluonts R package: https://github.com/business-science/modeltime.gluonts

Autoregressive (Recursive) Forecasting

Recursive forecasting is required when Lags of Features < Forecast Horizon. Please refer to the recursive() function that allows any model to be converted into a recursive prediction model: https://business-science.github.io/modeltime/reference/recursive.html

Resampling & Backtesting

Hyper Parameter Tuning & Parallel Processing

Global Baseline Models

Refer to baseline algorithms #37

H2O

This project is being managed via modeltime.h2o: https://github.com/business-science/modeltime.h2o

General Additive Models (GAMs)

  • Parsnip now incorporates GAMs

GARCH Models

Refer to garchmodels (Website, GitHub).

  • GARCH
  • RUGARCH
  • RMGARCH

Bayesian Models

Refer to bayesmodels (Website, GitHub)

  • Sarima: bayesmodels connects to the bayesforecast package.
  • Garch: bayesmodels connects to the bayesforecast package.
  • Random Walk (Naive): bayesmodels connects to the bayesforecast package.
  • State Space Model: bayesmodels connects to the bayesforecast and bsts packages.
  • Stochastic Volatility Model: bayesmodels connects to the bayesforecast package.
  • Generalized Additive Models (GAMS): bayesmodels connects to the brms package.
  • Adaptive Splines Surface: bayesmodels connects to the BASS package.
  • Exponential Smoothing: bayesmodels connects to the Rglt package.

Boosted Models: CatBoost, LightGBM

Refer to boostime (GitHub).

  • ARIMA + Catboost
  • ARIMA + LightGBM
  • Prophet + Catboost
  • Prophet + LightGBM

Conformal Predictions

  • Investigate integration of probably for Conformal Regression and Confidence Intervals. Follow GH Issue #173

Vector Autoregression (VAR)

  • Investigate integration of VAR models. Follow GH Issue #77.

Hierarchical Time Series

Modeltime Targets

  • Develop a reproducible targets workflow for ARIMA, Prophet, and Exponential Smoothing. Incorporate reporting and error diagnostics. #109

Spark: Distributed Compute

Investigate the possibility of adding Spark backend to improve scalability of modeltime, specifically for nested (iterative) forecasting.

Facebook Neural Prophet

  • Create a Modeltime Neural Prophet Extension.

LinkedIn Greykite

Uber Orbit

Facebook Kats

Temporal Hierachical Forecasting (THIEF)

  • Implement the thief package for ensembling using temporal hierarchical forecast (THIEF) aggregations. #117

Smooth

Investigate smooth Library. https://github.com/config-i1/smooth The smooth library has several benefits including:

  • ETS + XREGS: An es() function that implements Exponential Smoothing with handling of xregs. This can be a new engine in exp_smoothing()
  • A Seasonal ARIMA auto.ssarima() / ssarima() implemented in state space. These can be additional options in arima_reg().

These methods have shown promising results in comparison to Facebook Prophet.

Follow #119.

Merlion

Pytorch Forecasting

  • Investigate Pytorch forecasting #163

ets :: alternative model string

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

An alternative model string for ets seems not to work...

model_spec_ets <- exp_smoothing() %>%
    set_engine("ets",model=as.character(params$model))

Error in forecast::ets(outcome, model = model_ets, damped = damping_ets,  : 
  formal argument "model" matched by multiple actual arguments

argument names

Would it be possible to use more descriptive argument names? For example, maybe non_seasonal_term or similar for p. We try to avoid jargon with the names.

Since the argument names are similar, maybe optimize them for tab-complete too.

Prophet :: Logistic Growth - Modeltime Support with logistic_cap & logistic_floor

Ubuntu: 16.4 LTS, R: 4.0.2, modeltime: 0.0.2

If param growth is set to logistic, param: 'cap' has to be set e.g.:

https://facebook.github.io/prophet/docs/saturating_forecasts.html

It seems this can't be done either in "set_engine()" nor in "prophet_reg()".

Message & Traceback:
`Error in setup_dataframe(m, history, initialize_scales = TRUE) :
Capacities must be supplied for logistic growth.

  1. ├─global::bAlgos(i, f, "testing") ~/R/daScript.R:32006:12
  2. │ └─global::standAloneAlgos(...) ~/R/daScript.R:31621:12
  3. │ └─%>%(...) ~/R/daScript.R:16790:20
  4. │ ├─base::withVisible(eval(quote(_fseq(_lhs)), env, env))
  5. │ └─base::eval(quote(_fseq(_lhs)), env, env)
  6. │ └─base::eval(quote(_fseq(_lhs)), env, env)
  7. │ └─_fseq(_lhs)
  8. │ └─magrittr::freduce(value, _function_list)
  9. │ ├─base::withVisible(function_list[k])
  10. │ └─function_list[k]
  11. │ ├─parsnip::fit(...)
  12. │ └─parsnip::fit.model_spec(...)
  13. │ └─parsnip:::form_xy(...)
  14. │ └─parsnip:::xy_xy(...)
  15. │ ├─base::system.time(...)
  16. │ └─parsnip:::eval_mod(...)
  17. │ └─rlang::eval_tidy(e, ...)
  18. ├─modeltime::prophet_fit_impl(...)
  19. │ └─prophet::fit.prophet(m, df)
  20. │ └─prophet:::setup_dataframe(m, history, initialize_scales = TRUE)
  21. │ └─base::stop("Capacities must be supplied for logistic growth.")
  22. └─(function () ...
  23. └─lobstr::cst() ~/R/daScript.R:8:25`
    Thank you.

indicators in fit()

I noticed this argument in the fit() method. We are working on a general way to control if indicators are produced in a model-specific way in parsnip (that would also be used by workflows). This is related to tidymodels/parsnip#290 and tidymodels/workflows#34

I'm in the process of implementing this and it probably won't be ready for about a month (so hopefully that wouldn't disqualify it). A prototype of the user-facing code is here.

Prophet Regressors

    # Add regressors
    xreg_nms <- names(xreg_tbl)
    if (length(xreg_nms) > 0) {
        for (nm in xreg_nms) {
            m <- prophet::add_regressor(m, name = nm, prior.scale = 10000)
        }
    }

The add_regressor function also allows one to specify if the regressor will be standardized TRUE/FALSE. Is it possible to include this option in the modeltime framework?

Cannot set Prophet yearly.seasonality

suppressMessages(library(tidyverse))
suppressMessages(library(tidyquant))
suppressMessages(library(timetk))
suppressMessages(library(tidymodels))
suppressMessages(library(modeltime))
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] modeltime_0.2.1 yardstick_0.0.7
#> [3] workflows_0.2.1 tune_0.1.1.9000
#> [5] rsample_0.0.8 recipes_0.1.13
#> [7] parsnip_0.1.3 modeldata_0.0.2
#> [9] infer_0.5.2 dials_0.0.9
#> [11] scales_1.1.1 broom_0.7.0
#> [13] tidymodels_0.1.1 timetk_2.3.0
#> [15] tidyquant_1.0.1 quantmod_0.4.17
#> [17] TTR_0.23-6 PerformanceAnalytics_2.0.4
#> [19] xts_0.12-0 zoo_1.8-8
#> [21] lubridate_1.7.9 forcats_0.5.0
#> [23] stringr_1.4.0 dplyr_1.0.2
#> [25] purrr_0.3.4 readr_1.3.1
#> [27] tidyr_1.1.2 tibble_3.0.4
#> [29] ggplot2_3.3.2 tidyverse_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] colorspace_1.4-1 ellipsis_0.3.1 class_7.3-17
#> [4] fs_1.5.0 rstudioapi_0.11 listenv_0.8.0
#> [7] furrr_0.2.0 prodlim_2019.11.13 fansi_0.4.1
#> [10] xml2_1.3.2 codetools_0.2-16 splines_4.0.2
#> [13] knitr_1.29.4 jsonlite_1.7.1 pROC_1.16.2
#> [16] dbplyr_1.4.4 compiler_4.0.2 httr_1.4.2
#> [19] backports_1.1.10 assertthat_0.2.1 Matrix_1.2-18
#> [22] cli_2.1.0 htmltools_0.5.0 tools_4.0.2
#> [25] gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5
#> [28] cellranger_1.1.0 DiceDesign_1.8-1 vctrs_0.3.4
#> [31] iterators_1.0.12 timeDate_3043.102 gower_0.2.2
#> [34] xfun_0.17 globals_0.13.1 rvest_0.3.5
#> [37] lifecycle_0.2.0 future_1.19.1 MASS_7.3-51.6
#> [40] ipred_0.9-9 hms_0.5.3 parallel_4.0.2
#> [43] yaml_2.2.1 curl_4.3 StanHeaders_2.21.0-6
#> [46] rpart_4.1-15 stringi_1.5.3 highr_0.8
#> [49] foreach_1.5.0 lhs_1.1.0 lava_1.6.8
#> [52] rlang_0.4.8 pkgconfig_2.0.3 evaluate_0.14
#> [55] lattice_0.20-41 tidyselect_1.1.0 plyr_1.8.6
#> [58] magrittr_1.5 R6_2.4.1 generics_0.0.2
#> [61] DBI_1.1.0 pillar_1.4.6 haven_2.3.1
#> [64] withr_2.3.0 survival_3.1-12 nnet_7.3-14
#> [67] modelr_0.1.8 crayon_1.3.4 Quandl_2.10.0
#> [70] rmarkdown_2.3 grid_4.0.2 readxl_1.3.1
#> [73] blob_1.2.1 reprex_0.3.0 digest_0.6.25
#> [76] RcppParallel_5.0.2 munsell_0.5.0 GPfit_1.0-8
#> [79] quadprog_1.5-8

bike_transactions_tbl <- bike_sharing_daily %>%
select(dteday, cnt) %>%
set_names(c("date", "value"))

splits <- bike_transactions_tbl %>%
time_series_split(assess = "3 months", cumulative = TRUE)
#> Using date_var: date

Prophet

model_fit_prophet <- prophet_reg() %>%
set_engine("prophet", yearly.seasonality=TRUE) %>%
fit(value ~ date, training(splits))
#> Warning: The following arguments cannot be manually modified and were removed:
#> yearly.seasonality.
#> Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Created on 2020-10-15 by the reprex package (v0.3.0)

Getting Started with Modeltime: Some Notes

Thank you for the page walking through Modeltime. I found two minor items that you may consider changing in the walkthrough text.

  1. In the script m750 <- m4_monthly %>% filter(id == "M750") I received an error because filter tried to call stats::filter. That went away when I invoked dplyr::filter.
  2. The comment says split 80/20 but the code calls for a 90/10 split with the argument prop=0.9.

Modeltime with R 3.6.2 - not available

I am using R version 3.6.2 and getting the following error. When I look at the package on CRAN it says it is available for R (≥ 3.5.0). Are there known issues with 3.6.2?

Error installing R package: Could not install package with error: 1: package ‘Modeltime’ is not available (for R version 3.6.2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.