jjf234 / roll Goto Github PK

Fast and efficient computation of rolling and expanding statistics for time-series data.

C++ 85.99% R 13.52% C 0.49%

roll's Introduction

roll

Overview

roll is a package that provides fast and efficient computation of rolling and expanding statistics for time-series data.

The default algorithm in the roll package, and suitable for most applications, is an online algorithm. Based on the speed requirements and sequential nature of many problems in practice, online algorithms are a natural fit for computing rolling and expanding statistics of time-series data. That is, as observations are added and removed from a window, online algorithms update statistics and discard observations from memory (Welford, 1962; West, 1979); as a result, the amount of time to evaluate each function is significantly faster as the computation is independent of the window. In contrast, an offline algorithm requires all observations in memory to calculate the statistic for each window. Note that online algorithms are prone to loss of precision due to round-off error; hence, users can trade speed for accuracy and select the offline algorithm by setting the online argument to FALSE. Also, the RcppParallel package is used to parallelize the online algorithms across columns and across windows for the offline algorithms.

As mentioned above, the numerical calculations use the RcppParallel package to parallelize rolling and expanding statistics of time-series data. The RcppParallel package provides a complete toolkit for creating safe, portable, high-performance parallel algorithms, built on top of the Intel Threading Building Blocks (TBB) and TinyThread libraries. By default, all the available cores on a machine are used for parallel algorithms. If users are either already taking advantage of parallelism or instead want to use a fixed number or proportion of threads, then set the number of threads in the RcppParallel package with the RcppParallel::setThreadOptions function.

Installation

Install the released version from CRAN:

install.packages("roll")

Or the development version from GitHub:

# install.packages("devtools")
devtools::install_github("jasonjfoster/roll")

Usage

Load the package and supply a dataset:

library(roll)

n <- 15
x <- rnorm(n)
y <- rnorm(n)
weights <- 0.9 ^ (n:1)

Then, to compute rolling and expanding means, use the roll_mean function:

# rolling means with complete windows
roll_mean(x, width = 5)

# rolling means with partial windows
roll_mean(x, width = 5, min_obs = 1)

# expanding means with partial windows
roll_mean(x, width = n, min_obs = 1)

# expanding means with weights and partial windows
roll_mean(x, width = n, min_obs = 1, weights = weights)

Or use the roll_lm function to compute rolling and expanding regressions:

# rolling regressions with complete windows
roll_lm(x, y, width = 5)

# rolling regressions with partial windows
roll_lm(x, y, width = 5, min_obs = 1)

# expanding regressions with partial windows
roll_lm(x, y, width = n, min_obs = 1)

# expanding regressions with weights and partial windows 
roll_lm(x, y, width = n, min_obs = 1, weights = weights)

Note that handling of missing values is supported as well (see the min_obs, complete_obs, and na_restore arguments).

References

Welford, B.P. (1962). "Note on a Method for Calculating Corrected Sums of Squares and Products." Technometrics, 4(3), 419-420.

West, D.H.D. (1979). "Updating Mean and Variance Estimates: An Improved Method." Communications of the ACM, 22(9), 532-535.

roll's People

Contributors

Stargazers

Watchers

Forkers

fdoperezi davisvaughan yutannihilation ywhcuhk systats cgiachalis

roll's Issues

Does roll_quantile have an "online" algorithm?

Hi,

(1) I noticed that you said.

Version 1.1.5

New roll_quantile function for computing rolling and expanding quantiles of time-series data
Note: roll_quantile function is not calculated using an online algorithm
in
https://cran.r-project.org/web/packages/roll/news/news.html

(2) I noticed that exists the function:
RollQuantileOfflineVec
And does not exist the function:
RollQuantileOnlineVec
in
DEC 2020
https://github.com/jjf234/roll/blob/b640085a6af337ed150ef869b588880009f54038/src/roll_vec.h

(3) I noticed that roll_quantile has an option "online"

roll_quantile(x, width, weights = rep(1, width), p = 0.5,
min_obs = width, complete_obs = FALSE, na_restore = FALSE,
online = FALSE)

online logical. Process observations using an online algorithm
in
https://cran.r-project.org/web/packages/roll/roll.pdf

So, does roll_quantile have an "online" algorithm (or not)?

Thanks,

Andre Mikulec

roll with time interval not with width

I have a panel data with stock id and time, I want to compute the average price over a time interval. Since the time is irregular, it is more efficient to roll with time interval, not with width. Could the author add the functionality? And similar problem for roll_lm. Thank you very much.

roll_cov

I tried this function on a stock returns dataset and surprisingly got an output of NA only. I thought there were some returns missing in my dataset, but then I noticed, this is even more surprisingly, that the simple example given here https://cran.r-project.org/web/packages/roll/roll.pdf provides the same NA output. What could the cause be?

roll_any() and roll_all()

Would you consider extending roll to also think about logical vectors? roll_any() and roll_all() are admittedly esoteric, but can sometimes be useful, and would complete the matrix of summary functions at https://github.com/r-lib/vctrs/issues/9

roll_lm gets slower with repeated iterations in the same R session

I find that computation time when using 'roll_lm' repetitively increases substantially. This is true for different data sets as well as for the same data set. I've attached an example together with the input data and the profvis output. This runs rolling window regressions for a bunch of firms on annual data, and repeats that analysis a number of times, in this case on the same data but in my actual application I have a series of simulated data sets on which I want to run this.

For a series of five iterations, the time taken by roll_lm is as follows on my machine

[[1]]
user system elapsed
15.79 5.17 6.22

[[2]]
user system elapsed
41.27 8.44 10.58

[[3]]
user system elapsed
71.13 14.08 15.65

[[4]]
user system elapsed
96.97 18.94 20.40

[[5]]
user system elapsed
135.19 26.80 27.25

Hence, by iteration five the total CPU time is about 9 times as much as for the first iteration, which requires the same computations.

Restarting the R session appears to help, but other than that nothing (e.g., rm()) seems to matter for the performance degradation.

I didn't manage to attach a zip with the code and data, but the file in the link contains the relevant files including session info; I'm using the MRAN version of R 3.3.0 with the MKL libraries.

Desired functionality of roll_sd when center is provided

I have written a couple unit tests to ensure that roll_scale matches the result of the 4 permutations of the way center and scale can be provided roll_scale.

The four tests are as follows:

roll_scale(data, 60, scale = T, center = T)
roll_scale(data, 60, scale = T, center = F)
roll_scale(data, 60, scale = F, center = T)
roll_scale(data, 60, scale = F, center = F)

These need to be equivalent to the following calls in order:

(data - roll_mean(data,60)) / roll_sd(data,60)
data / roll_sd(data,60)
data - roll_mean(data,60)
data (or possibly data with nas populated in the first 59 entries)

require(roll)
require(testthat)

testthat::expect_equivalent(roll_scale(data, 60, scale = T, center = T),(data - roll_mean(data,60)) / roll_sd(data,60))
testthat::expect_equivalent(roll_scale(data, 60, scale = T, center = F),data / roll_sd(data,60))
testthat::expect_equivalent(roll_scale(data, 60, scale = F, center = T),data - roll_mean(data,60))
testthat::expect_equivalent(roll_scale(data, 60, scale = F, center = F), data)

The tests fail in the following ways (only one fail truthfully in case #2)

> testthat::expect_equivalent(roll_scale(data, 60, scale = T, center = T),(data - roll_mean(data,60)) / roll_sd(data,60))
> testthat::expect_equivalent(roll_scale(data, 60, scale = T, center = F),data / roll_sd(data,60))
Error: roll_scale(data, 60, scale = T, center = F) not equivalent to data/roll_sd(data, 60).
Mean relative difference: 0.006639237
> testthat::expect_equivalent(roll_scale(data, 60, scale = F, center = T),data - roll_mean(data,60))
> testthat::expect_equivalent(roll_scale(data, 60, scale = F, center = F), data)
Error: roll_scale(data, 60, scale = F, center = F) not equivalent to `data`.
'is.NA' value mismatch: 0 in current 59 in target

Please load with dget.
rollDataExample.txt

Support vectors?

It would be useful if this worked:

x <- rnorm(100)
roll::roll_mean(x, 5)
#> Error in roll::roll_mean(x, 5): Not a matrix.

^{Created on 2019-07-23 by the reprex package (v0.3.0)}

Add p-values for coefficients for roll_lm

Would it be at all possible to add p-values (Pr(>|t|)) to the output of roll_lm? It would be really useful when performing the rolling regression to observe if variables remained statistically significant through the regression period.

Thanks!

R Packages: roll vs rollregres

Hi !

I was working on both packages and realised that the coefficients are not the same. Please, can someone confirm or not ?

It is important because Roll package computes standard erros while Rollregress doesn't.

I look forward any help to find a package on r that computes rolling coefficients, standard errors and R².

Thanks a lot.

Document online functionality

What does it mean to calculate online/offline algorithm?

roll_rank( )

Thanks for the great package, I've benefitted from it a lot.

Would you consider adding a rolling version of rank ?

It works like :

x=rnorm(n)
sapply(width:n,function(i) tail(rank(x[(i-width+1):i]),1))

ERROR: compilation failed for package 'roll'

> update.packages()
roll :
 Version 1.1.1 installed in W:/R-3.5._/R_LIBS_USER_3.5._
 Version 1.1.2 available at http://cran.revolutionanalytics.com
Update? (Yes/no/cancel) y

  There is a binary version available but the source version is later:
     binary source needs_compilation
roll  1.1.1  1.1.2              TRUE

Do you want to install from sources the package which needs compilation? (Yes/no/cancel) y
installing the source package 'roll'

. . . 

sh: -c: line 2: syntax error near unexpected token `('
sh: -c: line 2: `    echo W:/Rtools35/mingw_32/bin/g++  -shared -s -static-libgcc -o roll.dll roll-win.def RcppExports.o init.o roll.o -fopenmp -L"W:/R-35~1._/App/R-PORT~1/bin/i386" -lRlapack -L"W:/R-35~1._/App/R-PORT~1/bin/i386" -lRblas -lgfortran -lm -lquadmath                         .libPaths() 1 W:/R-3.5._/R_LIBS_USER_3.5._      2 W:/R-3.5._/App/R-Portable/library  from cluster process, Hello Andre! Set memory limit: R_MAX_MEM_SIZE; 7077888 -LW:/R-3.5._/R_LIBS_USER_3.5._/RcppParallel/lib/i386 -ltbb -ltbbmalloc from cluster process, Goodbye Andre!   -L"W:/R-35~1._/App/R-PORT~1/bin/i386" -lR ; \'
make: *** [W:/R-35~1._/App/R-PORT~1/share/make/winshlib.mk:13: roll.dll] Error 1
ERROR: compilation failed for package 'roll'
* removing 'W:/R-3.5._/R_LIBS_USER_3.5._/roll'
* restoring previous 'W:/R-3.5._/R_LIBS_USER_3.5._/roll'

The downloaded source packages are in
        'W:\R-3.5._\R_USER_3.5.__R_STUDIO\AppData\Local\Temp\RtmpyuSPNJ\downloaded_packages'
Warning message:
In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package 'roll' had non-zero exit status

roll_min and roll_max

Would you consider implementing rolling versions of min() and max()

apply user defined function

Thanks for the great package. Any chance we can have a zoo::roll_apply type of thing?

`partial` support

Would it be possible to add a partial argument to all suited roll functions similar to zoo::rollapply()? Take the following zoo based example:

## sample data
set.seed(
  1899
)

x = sample(
  1:5
  , size = 10
  , replace = TRUE
)

## compute rolling variance
zoo::rollapply(
  x
  , width = 3
  , FUN = var
  , partial = TRUE
)
# [1] 2.0000000 2.3333333 2.3333333 4.3333333 5.3333333 5.3333333 4.0000000 2.3333333 0.3333333
# [10] 0.0000000

The closest I get with roll would currently be

roll::roll_var(
  x
  , width = 3
  , min_obs = 1
)
# [1]        NA 2.0000000 2.3333333 2.3333333 4.3333333 5.3333333 5.3333333 4.0000000 2.3333333
# [10] 0.3333333

which adds a leading NA and omits the last variance value. Ultimately, I would want to have the same output as obtained from zoo::rollapply().

Am I missing something here? My data set has > 1e6 data points, so a Rcpp based approach would be highly appreciated from a computational point of view. Unfortunately, RcppRoll doesn't handle partial window indexes either, see #18.

[DOC] - Revert to previous default formatting for function usage

With roxygen2 0.7.0 the default formatting for function usage has changed. See NEWS.

I feel the previous default was much better.

Here's how it looks now:

seg faults with roll_min / max

I have sometimes been experiencing various segfaults / memory issues, when using roll_min / roll_max. The errors are not happening every run.

E.g.

double free or corruption (!prev)

corrupted size vs. prev_size

I've seen other errors as well. However, the same code but with roll_mean, roll_sd etc. works as expected.

I've tried to find a minimal example for this error but I'm struggling to recreate it in a smaller example. Are you aware of this error? Or do you have an inkling as to what it's about?

roll_mad() - Mean | Median Absolute Deviation

Would that fit to the existing framework / family functions?

Adapt functions to work on vectors.

Would you consider modifying the functions so that they allow as input a vector and return a vector. I'm under the impression it would not break any existing code. Today I often end up creating functions as the one below.

roll_sum_vec <- function(x, ...) {
   return(roll_sum(as.matrix(x), ...)[,1])
}

roll_eigen

hello, I noticed that this function was removed, its a pity it is quite useful to have
NN

Any chance to get back roll_eigen, roll_vif, and roll_pcr?

Thanks for this very powerful package.

Would it be possible to add back roll_eigen, roll_vif, and roll_pcr?

Thanks.

struggling to install the package

Installation failed: Could not find build tools necessary to build roll

support for `align`?

In #37 you briefly discuss that the package currently only supports zoo's align = "right" option. Are there any plans to support align = "center"? The added value for me would be that this package allows for min_obs, which I couldn't figure out for zoo's implementation.

roll_lm checks nrow(x) > width rather than nrow(x) > min_obs

Right now roll_lm throws the error when you try to run a rolling regression on data that is shorter than width:

Error in eval(substitute(expr), envir, enclos) : 
  value of 'width' must be between one and number of rows in 'x' and 'y'

It seems more appropriate to check the nrow(x) and nrow(y) are just greater than min_obs. Here's an example:

library(roll)
n_vars <- 10
n_obs <- 50
x <- matrix(rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
y <- matrix(rnorm(n_obs), nrow = n_obs, ncol = 1)
# Rolling regressions
result <- roll_lm(x, y, width=100, min_obs = 10)

Crash in roll_max with vectors of lenth around 545

roll::roll_max(rep(1,545), 10)

causes crash in R-4.0.2 with roll v1.1.4

Inconsistent behavior of roll_sd from 32bit and 64bit R

I am using roll_scale function on a xts object, and find there is inconsistent behavior from 32bit or 64bit R version.

Below is a sample I am referring to

On 32-bit R

library(xts)
library(roll)
sample <- structure(c(7.145, 7.145, 7.145, 7.145, 7.145, 7.145, 7.3684, 
                      7.3684, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 
                      7.26, 7.26, 7.26), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1507507200, 
                                                                                                                                                              1507593600, 1507680000, 1507766400, 1507852800, 1508112000, 1508198400, 
                                                                                                                                                              1508284800, 1508371200, 1508457600, 1508716800, 1508803200, 1508889600, 
                                                                                                                                                              1508976000, 1509062400, 1509321600, 1509408000, 1509494400, 1509580800, 
                                                                                                                                                              1509667200), tzone = "UTC", tclass = "Date"), .Dim = c(20L, 1L
                                                                                                                                                              ), .Dimnames = list(NULL, "value"))
roll_scale(sample, 3)
# value
# 2017-10-09         NA
# 2017-10-10         NA
# 2017-10-11        NaN
# 2017-10-12        NaN
# 2017-10-13        NaN
# 2017-10-16        NaN
# 2017-10-17  1.1547005
# 2017-10-18  0.5773503
# 2017-10-19 -1.1547005
# 2017-10-20 -0.5773503
# 2017-10-23        NaN
# 2017-10-24        NaN
# 2017-10-25        NaN
# 2017-10-26        NaN
# 2017-10-27        NaN
# 2017-10-30        NaN
# 2017-10-31        NaN
# 2017-11-01        NaN
# 2017-11-02        NaN
# 2017-11-03        NaN

sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: i386-w64-mingw32/i386 (32-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252
# [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
# [5] LC_TIME=English_Singapore.1252
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base
# 
# other attached packages:
#   [1] roll_1.0.7 xts_0.10-1 zoo_1.8-1
# 
# loaded via a namespace (and not attached):
#   [1] compiler_3.4.2        Rcpp_0.12.14          grid_3.4.2
# [4] RcppParallel_4.3.20.2 lattice_0.20-35

On 64-bit R

library(xts)
library(roll)
sample <- structure(c(7.145, 7.145, 7.145, 7.145, 7.145, 7.145, 7.3684, 
                      7.3684, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 7.26, 
                      7.26, 7.26, 7.26), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1507507200, 
                                                                                                                                                              1507593600, 1507680000, 1507766400, 1507852800, 1508112000, 1508198400, 
                                                                                                                                                              1508284800, 1508371200, 1508457600, 1508716800, 1508803200, 1508889600, 
                                                                                                                                                              1508976000, 1509062400, 1509321600, 1509408000, 1509494400, 1509580800, 
                                                                                                                                                              1509667200), tzone = "UTC", tclass = "Date"), .Dim = c(20L, 1L
                                                                                                                                                              ), .Dimnames = list(NULL, "value"))
roll_scale(sample, 3)
# value
# 2017-10-09         NA
# 2017-10-10         NA
# 2017-10-11        NaN
# 2017-10-12        NaN
# 2017-10-13        NaN
# 2017-10-16        NaN
# 2017-10-17  1.1547005
# 2017-10-18  0.5773503
# 2017-10-19 -1.1547005
# 2017-10-20 -0.5773503
# 2017-10-23 -0.8164966
# 2017-10-24 -0.8164966
# 2017-10-25 -0.8164966
# 2017-10-26 -0.8164966
# 2017-10-27 -0.8164966
# 2017-10-30 -0.8164966
# 2017-10-31 -0.8164966
# 2017-11-01 -0.8164966
# 2017-11-02 -0.8164966
# 2017-11-03 -0.8164966

sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# Matrix products: default
# 
# locale:
#   [1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252
# [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
# [5] LC_TIME=English_Singapore.1252
# 
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base
# 
# other attached packages:
#   [1] roll_1.0.7 xts_0.10-1 zoo_1.8-1
# 
# loaded via a namespace (and not attached):
#   [1] compiler_3.4.2        Rcpp_0.12.14          grid_3.4.2
# [4] RcppParallel_4.3.20.2 lattice_0.20-35

In the sample xts object, the values since 2017-10-19 are the same. 32-bit R would give me all NA values since then.

Floating Point Precision Issues in 1.1.2 vs. 1.1.3

Hi Jason,

Thanks for the great package. I've recently upgraded from 1.1.2 to 1.1.3 and it has broken some of my existing code (in another package I've written). After tracking down the issue, it seems that roll_sd is returning 0 if the values and weights are too small. This was not a problem in 1.1.2.

Here's a minimal reproducible example.

library(roll)
getNamespaceVersion("roll") # 1.1.2

set.seed(42)
x <- matrix(runif(100)) / 1000
width <- 10
x_sd_large_wts_112  <- roll_sd(x, width = width, weights = rep(1, width))
x_sd_small_wts_112  <- roll_sd(x, width = width, weights = rep(0.01, width))

all.equal(x_sd_large_wts_112, x_sd_small_wts_112) # TRUE
tail(x_sd_large_wts_112) # some small numbers
tail(x_sd_small_wts_112) # some small numbers

After upgrading to the new release,

detach("package:roll", unload=TRUE)
install.packages("roll")
library(roll)
getNamespaceVersion("roll") # 1.1.3

set.seed(42)
x <- matrix(runif(100)) / 1000
width <- 10
x_sd_large_wts_113  <- roll_sd(x, width = width, weights = rep(1, width))
x_sd_small_wts_113  <- roll_sd(x, width = width, weights = rep(0.01, width))

all.equal(x_sd_large_wts_113, x_sd_small_wts_113) # FALSE
tail(x_sd_large_wts_113) # some small numbers
tail(x_sd_small_wts_113) # all zeros

Thanks for looking into this!

error with latest version of RcppParallel

I get an error about tbb.dll after loading rgdal or sf package under windows platform, while If the roll package is first loaded after rgdal or sf, it is OK. I have tested it both with roll v1.1.4 and v1.1.5.
I think this is related with the tbb.dll required by gdal which is included in rgdal/sf package, while the roll package depend on the tbb.dll in RcppParallel.
Would you please give me an suggestion what the best way to work with both roll and sf package?

require(roll)
Loading required package: roll
Error: package or namespace load failed for ‘roll’:
.onLoad failed in loadNamespace() for 'RcppParallel', details:
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'C:/Program Files/R/library/RcppParallel/libs/x64/RcppParallel.dll':
LoadLibrary failure:

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936

attached base packages:
[1] grid tools graphics grDevices
[5] utils datasets stats methods
[9] base

other attached packages:
[1] RcppArmadillo_0.9.850.1.0
[2] gdata_2.18.0
[3] gstat_1.1-6
[4] rgdal_1.4-8
[5] mapproj_1.2.6
[6] maptools_0.9-4
[7] rasterVis_0.45
[8] latticeExtra_0.6-28
[9] RColorBrewer_1.1-2
[10] lattice_0.20-38
[11] scales_1.0.0
[12] ggthemes_4.0.1
[13] ggplot2_3.3.0
[14] POT_1.1-7
[15] RSapp_0.0.1
[16] data.table_1.11.8
[17] tibble_2.1.3
[18] purrr_0.3.2
[19] dplyr_0.8.1
[20] plyr_1.8.4
[21] magrittr_1.5
[22] reshape2_1.4.3
[23] raster_2.7-15
[24] sp_1.3-1
[25] maps_3.3.0
[26] my_3.0.3

loaded via a namespace (and not attached):
[1] gtools_3.8.1 zoo_1.8-4
[3] tidyselect_0.2.5 colorspace_1.3-2
[5] viridisLite_0.3.0 spacetime_1.2-2
[7] yaml_2.2.0 rlang_0.4.5
[9] hexbin_1.27.2 pillar_1.3.1
[11] foreign_0.8-71 glue_1.3.1
[13] withr_2.2.0 sessioninfo_1.1.0
[15] stringr_1.4.0 munsell_0.5.0
[17] gtable_0.2.0 codetools_0.2-16
[19] labeling_0.3 parallel_3.6.0
[21] xts_0.11-1 Rcpp_1.0.1
[23] FNN_1.1.2.1 digest_0.6.18
[25] stringi_1.4.3 cli_1.0.1
[27] crayon_1.3.4 pkgconfig_2.0.2
[29] assertthat_0.2.1 rstudioapi_0.11
[31] R6_2.4.0 intervals_0.15.1
[33] compiler_3.6.0

Export the predictions and the residuals of the roll_lm

Hi, I want to add the predictions and the the residuals of the roll_lm function in two data.frames (let's say p_df and r_df for the predictions and the residuals, respectively). It might be a very easy task but I can't find how to do it.

This is what I have done so far:

x <- as.matrix(test[, 2:3])
y <- test[, 1]

m <- roll_lm(x = x, y = y, width = 5, complete_obs = TRUE)

From here, how can I extract the predictions and the residuals? In base R, the predictions of a linear model can be obtained by using the predict function and the residuals by subtracting the predictions from the y variable.

Here is a small data set:

 test <- structure(list(ntl = c(9.14866638183594, 15.3856477737427, 16.3302040100098, 
12.454291343689, 10.4823837280273, 11.394606590271, 8.1963529586792, 
4.50725030899048, 3.95374751091003, 5.73203563690186, 14.3955335617065, 
17.0745468139648, 14.2944135665894, 10.333722114563, 9.80743503570557, 
12.5352020263672, 19.8813304901123, 29.2410221099854, 32.8321876525879, 
29.575023651123), pop = c(31.2753772735596, 55.8289375305176, 
56.4003105163574, 33.795223236084, 31.0511913299561, 30.5730743408203, 
13.667106628418, 7.08161020278931, 6.89333772659302, 13.9001550674438, 
35.5272178649902, 42.4625587463379, 32.9688529968262, 21.4302787780762, 
12.6151924133301, 17.4939270019531, 38.1474113464355, 60.8120536804199, 
65.3665008544922, 53.8765907287598), tirs = c(30.9432029724121, 
31.7461566925049, 32.6338005065918, 32.6965866088867, 31.9309749603271, 
33.7227897644043, 31.1048107147217, 30.1847438812256, 30.2888336181641, 
33.8297653198242, 33.7649192810059, 32.5485496520996, 31.0377178192139, 
30.5556716918945, 30.0720176696777, 29.4081420898438, 31.0848445892334, 
33.8344841003418, 34.1492614746094, 32.9989166259766)), row.names = c(NA, 
20L), class = "data.frame")

Thank you.

object '_roll_roll_mean' not found

I installed from github and the below bug appeared.

    > roll::roll_mean(xts(rnorm(100), seq.Date(Sys.Date(), length.out = 100, by=1)), width=5L)
    Error in roll::roll_mean(xts(rnorm(100), seq.Date(Sys.Date(), length.out = 100,  : 
      object '_roll_roll_mean' not found

removing the github version fixed the problem. I tried it with debian and the github version works just fine -- so the problem seems to be specific with Windows.

hope this helps. it's a fantastic package.

roll_eigen

Are you able to continue to include the roll_eigen function? I have found it useful and would like to continue to update the package with new enhancements and changes without losing this functionality.

Thanks.

Online algorithm and increasing exponential weights

Issue description

We don’t get the expected results when online = TRUE and the weights
is a vector of increasing exponential weights, i.e., within the sliding
window the most recent observation has the smallest weight and the
oldest the highest weight (grows exponentially as we go further back in time).

Minimal example:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = TRUE)

## [1]  NA NaN NaN NaN NaN

# NOT EXP. WEIGHTS BUT PROBLEMATIC CASE
roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = TRUE)

## [1]  NA  NA NaN NaN NaN

When switching to off-line it works as expected:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = FALSE)

## [1] NA  1  2  3  4

roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = FALSE)

## [1] NA NA  1  2  3

The above examples might be considered corner case because of the
selected weights.

So let’s consider the following example that it seems to work as
expected:

# ONLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = TRUE)

## [1]  NA 2.6 4.4 6.2 8.0

# OFFLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = FALSE)

## [1]  NA 2.6 4.4 6.2 8.0

The above examples are equal, although not identical. In addition, we
used a vector of length 5. So let’s re-run the above examples by
increasing the input size.

# ONLINE AND INCR. EXP. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8),
                       online = TRUE)

# OFFLINE AND INCR. EXP. WEIGHTS
res.offline <- roll_sum(1:200,
                        width = 2L,
                        weights = c(1, 0.8),
                        online = FALSE)

all.equal(res.offline, res.online)

## [1] "Mean relative difference: 2.976919"

identical(res.offline, res.online)

## [1] FALSE

By increasing the vector size to 200, the online version didn’t give the
expected result. To get an idea of the different results between online
and offine compare the following:

tail(res.offline)

## [1] 350.0 351.8 353.6 355.4 357.2 359.0

tail(res.online)

## [1]  7362.624  9117.580 11310.825 14051.932 17477.865 21759.831

For completeness, for arbitrary weights the algorithim switches to
offline mode and throws a warning:

# ONLINE AND ARB. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8, 0.2),
                       online = TRUE)

## Warning in roll_sum(1:200, width = 2L, weights = c(1, 0.8, 0.2), online =
## TRUE): 'online' is only supported for equal or exponential 'weights'

Multiple Test Scenarios

To investigate further, I have created a custom test function to check
the output from base vs roll-online vs roll-offline for sum function.
The custom function accepts two input vectors and has the option to
reverse the weights.

run_check <- function(x, y, wts, rev_wts = FALSE) {

  n <- length(wts)

  if (rev_wts) {
    wts <- rev(wts)
  }

    x_base <- zoo::rollapplyr(x, n, function(x) sum(x * wts), fill = NA)
    y_base <- zoo::rollapplyr(y, n, function(x) sum(x * wts), fill = NA)



  x_online_true <- roll_sum(x, width = n,
                               weights = wts,
                               online = TRUE)

  x_online_false <- roll_sum(x, width = n,
                                weights = wts,
                                online = FALSE)




  y_online_true <- roll_sum(y, width = n,
                                weights = wts,
                                online = TRUE)

  y_online_false <- roll_sum(y, width = n,
                                weights = wts,
                                online = FALSE)


  out <- data.frame(x = c(all.equal(x_base, x_online_false),
                              all.equal(x_online_true, x_online_false)),
                    y = c(all.equal(y_base, y_online_false),
                              all.equal(y_online_true, y_online_false))
                    )


  rownames(out) <- c("base vs offline  : ",
                     "online vs offline: ")
  colnames(out) <- c(paste("x", NROW(x), sep = "_"),
                     paste("y", NROW(y), sep = "_"))

  out
}

Test inputs:

x_100 <- 1:100
x_1000 <- 1:1000

Case: Decreasing Exponential Weights

Takeaways:

All test cases worked as expected.

rv <- FALSE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

Case: Increasing Exponential Weights

Similar test cases as above with their weights reversed.

Takeaways:

base vs offline test cases produce equal results.
online vs offline test cases produce inconsistent results.
Correct results when weights either increase or decrease by half,
i.e, 0.5 ^ (3:0) or rev(0.5 ^ (3:0)) will give the same result
under online or offline.

rv <- TRUE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)

##                                           x_100
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1
##                                          y_1000
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1

run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)

##                     x_100                      y_1000
## base vs offline  :   TRUE                        TRUE
## online vs offline:   TRUE Mean relative difference: 1

run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

[DOCS] - type of quantile algorithm

It's great to see that the new version will include a rolling quantile implementation. Is it possible to document which type of algorithm is used; it looks like Type 2.

Weighting in roll_sum

I am trying to calculate a rolling sum with weights but I cannot make sense of the results I am getting.

roll_sum(as.matrix(c(1, 0, 0, 0)), width = 4, weights = c(0.5, 0.3, 0.2, 0.1), min_obs = 1, online = FALSE)

I expected this: 0.5, 0.8, 1.0, 1.1

Is this a bug or am I missunderstanding how the weighting works?

I asked this question here to no avail: https://stackoverflow.com/questions/57737496/how-does-the-weighting-in-the-roll-package-work-exactly

Dynamic sliding window

Hello! My question is can I use dynamic sliding window?

That is, I would like to replace
width is not a single number, but a vector of numbers

set.seed(1)
window_size <- sample(1:10,20,replace = T)
x <- rnorm(w_size)

roll::roll_mean(x = x,width = window_size)

That is, I need an analogue of this function

f <- function(x){
res <- rep(NA,length(x))
for(i in 10:length(x)){
  # window size 
  n <- window_size[i]-1
  # window vector 
  ii <- (i-n):i
  res[i] <- mean(x[ii])
}
res
}

I understand width only takes one value, which is
width = integer. Window size.
But perhaps there is a non-obvious way out of this situation
Thanks

cov/cor between two metrics

For example like
(-1 * correlation(open, volume, 10))
If we have two metrics for open and volume each, and each column for a symbol. It will be much more convenient if we can use something like "roll_cor(open, volume, 10)" or "roll_cor(list(open, volume), 10)" rather than transform into vectors then transform back.

roll_lm residual standard error

       I am sorry to bother you again. I am heavily relying on your roll_lm function. I am doing pairs trading in stock market, so I need to analyze the residuals, such as the residual standard errors. Is it convenient to add the functionality? or could you just output the residual as a list as lm does in base R. 
      I have try to read you code. But my cpp coding ability is very weak, so it is impossible for me to solve this problem on my own. Your help will be much appreciated.

Rolling which.min and which.max

Would you consider adding a rolling version of which.min and which.max? Or, do you see a way to compose this using the already implemented rolling functions?

Add roll_median

Looking for a rolling median roll_median function. Doesn't look like the most recent CRAN version includes this. Let me know if you are interested in adding.

Thanks, Matt

[Feature Request] Add flag for sample and population standard deviations

Can you add the flag to calculate sd of both sample and population, essentially making the adjustment for N in the averaging to be N-1?

Thanks

roll_cov with missing values in columns

Hi Jason,
In a time series settings, we have often new symbols that have a different starting date.

>data
             [,1]        [,2]        [,3]
[1,] -0.012799387 -0.04934229          NA
[2,]  0.020777262 -0.07074330          NA
[3,]  0.009478172 -0.01814425          NA
[4,] -0.052992939  0.02366151          NA
[5,] -0.097409442 -0.14201689 -0.08207020
[6,]  0.051052327  0.05400464  0.02308781

Hence, we have trailing NA's before in these columns. The roll_cov function leads automatically to have NA's on the covariance of the dates we wish to have the covariance of let's say the first 2columns:

>cov = roll_cov(data,NULL,2)
>cov[,,2]
    [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
[3,]   NA   NA   NA

Ideally, it would return a covariance matrix of 2 by 2:

> cov = roll_cov(data[,1:2],NULL,2)
> cov[,,2]
              [,1]          [,2]
[1,]  0.0005636957 -0.0003592870
[2,] -0.0003592870  0.0002290015

Is this possible?
Thank you