kevinushey / rcpproll Goto Github PK

Fast rolling functions through Rcpp

R 45.34% C++ 54.66%

rcpproll's Introduction

RcppRoll

This package provides windowed-versions of commonly-used mathematical and statistical functions.

Install the latest release from CRAN with:

install.packages("RcppRoll")

Or, install the development version with:

install_github("kevinushey/RcppRoll")

rcpproll's People

Contributors

Stargazers

Watchers

Forkers

kismsu chenwq tyler-roberts jjchern ijlyttle siaosing freephys kvantas jraffa captaintsao flukeandfeather antaldaniel jamesmbaazam danielmaangi

rcpproll's Issues

argument n should be NULL by default...

...or something similar.

# the outputs of these functions calls are identical (n doesn't alter the result)
roll_meanr(1:10, n = 0, c(0, 1))
roll_meanr(1:10, n = 2, c(0, 1))
roll_meanr(1:10, n = 100, c(0, 1))

# however, you have to give it an arbitrary value as the following throws an error
roll_meanr(1:10, weights = c(0, 1))

maybe warn when weights and n differ and weights wins?

hi. thanks very much for RcppRoll. but... :)

i find the following behavior odd

require(RcppRoll)
Loading required package: RcppRoll
roll_sum(1:9, n=2)
[1] 3 5 7 9 11 13 15 17
roll_sum(1:9, weights=1, n=2)
[1] 1 2 3 4 5 6 7 8 9
roll_sum(1:9, weights=c(1,1), n=2)
[1] 3 5 7 9 11 13 15 17
`

i would have expected weights to have been recycled, but, if i understand the code correctly, n is forced to the length of weights (if they are specified). i wonder if you might think of generating a warning message in this case? i certainly spent time scratching my head puzzling over this.

cheers!

Behavior of na.rm = T in roll_median

I'm confused by what na.rm = TRUE is supposed to do in roll_median:

library(RcppRoll)
x <- 1:20
x[3:4] <- NA
x

roll_medianr(x = x,n = 5,na.rm = TRUE)
[1] NA NA NA NA  2  2 NA NA  7  8  9 10 11 12 13 14 15 16 17 18
roll_maxr(x = x,n = 5,na.rm = TRUE)
[1] NA NA NA NA  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Why is the median in position 6 equal to 2 rather than 3.5? Why are there NAs positions 7 and 8?

Am I missing something obvious or is this a bug?

Rewrite + take advantage of RcppParallel

A lot of the internals should be rewritten (as this package was generated when I was less proficient with C++ / Rcpp).

In addition, it would be awesome if we could take advantage of RcppParallel for easily parallelizing operations both:

Across columns of a matrix, and
Within columns (by splitting ranges)

Argument "align" has no effect on function roll_mean

The "align" argument has no effect on the behavior of function roll_mean(). The function roll_mean() always calculates the rolling mean using a centered window, regardless of the "align" argument.

da_ta <- cumsum(rnorm(500))
foo <- RcppRoll::roll_mean(da_ta, align="left", n=51)
bar <- RcppRoll::roll_mean(da_ta, align="right", n=51)
all.equal(foo, bar)
	[1] TRUE

Align = "middle" vs "center"

For roll_mean(), the documentation specifies align can be {"left", "middle", "right"} but the function specifies {"left", "center", "right"}.

Lines 36 and 23 of this file: https://github.com/kevinushey/RcppRoll/blob/master/R/RcppRoll.R

Sums at end of sequence - partial?

I have a sequence
0 1 0 1 1 1 1 0 1 0 1 0
I want the rolling sum of 3 periods with a right align.
1, 2, 2, 3, 3, 2, 2 , 1, 2, 1, 1, 0.
Instead I am getting
1, 2, 2, 3, 3, 2, 2 , 1, 2, 1, NA, NA
Is this what partial is supposed to solve? How can I get the last n-1 periods to sum correctly?

roll_max(c(NA, NA, 3, 2, 1), n=2L, fill=NA, align="right", na.rm=TRUE) now gives c(NA, -Inf, 3, 3, 2)

roll_max(c(NA, NA, 3, 2, 1), n=2L, fill=NA, align="right", na.rm=TRUE) now gives c(NA, -Inf, 3, 3, 2).
If I remember correctly, previous package versions gave the "desired" result c(NA, NA, 3, 3, 2).

Similarly roll_min(c(NA, NA, 3, 2, 1), 2L, fill=NA, align="right", na.rm=TRUE) results to c(NA, Inf, 3, 2, 1).

I currently have package version 0.3.0 installed on R version 4.1.1

Thank you.

Inconsistent Handling of NAs using roll_median()

Hi there,

Thank you for developing this extremely useful package.

When using roll_median() in the presence of NA values the output is inconsistent with all other functions, e.g. roll_mean(), roll_sum(), etc.

According to the documentation, all of the parameters are the same by default across roll_...() functions.

# a vector with NAs
vec <- c(1:5, rep(NA,3), 6:10, NA)

# expected output of computed & NA values
roll_mean(vec, 5)
roll_var(vec, 5)

# output using roll_median()
roll_median(vec, 5)

Also for this function, there is no difference whether na.rm = c(TRUE, FALSE).

Thank you.

Cheers,
Christian

Add examples to make the package more user-friendly

In the README or in a separate vignette. I ended up going through https://www.mytinyshinys.com/2017/05/10/rcpproll/ in order to make sure that I am using it right. It would be helpful to add such examples to the package documentation.

not working with by > 1

With the by argument set to > 1 the left out points are returned as random/uninitialized values.
Example:
roll_mean(rep(1,20), n = 3, by = 3)
[1] 1.000000e+00 1.071419e+200 9.385277e+223 1.000000e+00 4.311955e-80 1.840099e-80 1.000000e+00 7.724200e-37 2.647809e-32
[10] 1.000000e+00 1.426530e-71 3.791529e+179 1.000000e+00 3.315553e-33 5.391965e+241 1.000000e+00 7.096584e-110 9.486060e-322

A workaround is:
foo = roll_mean(rep(1,20), n = 3, by = 3)
foo = foo[seq(1, length(foo), by = 3)]
but it would be nice if this was done inside the roll_ functions.

Using RcppRoll with Performance Analytics

Hi,

I would like to use the functions defined in Performance Analytics package like Burke Ratio, MaxDrawdown and Sharpe Ratio and then roll them using your package as I believe your package will give me substantial performance benefits. My knowledge in C++ is limited and it would be very difficult for me to recreate those R functions available in Performance Analytics package in C++. I would like to know if I can use your package to roll those R functions or do I have to rewrite them in C++?

Regards
SD

Request for roll_mad

It would be great if you could add a roll_mad (for median absolute deviation : median(abs(Xi - median(X))) ) in the functions of your package.

Update RcppRoll for R 4.0

The CRAN version RcppRoll is not availible for the R version 4

rownames, colnames lost after applying roll_* function

rownames, colnames lost after applying roll_* function on matrix.

roll_median is crashing

Randomly, from time to time, either my R crashes or I get an error message by roll_median_..._numeric about "NULL provided where numeric is expected" or "internal provided where numeric is expected" or something like that.

It's happening across many different computers. It may work for a good long time, but then it may crash in the first few minutes as well.

I'm using it exclusively with a window of 3. I have no idea how to provide more information.

roll_any and roll_all

Not often that useful, but nice to have for completeness.

Error installing from source on OSX 10.8

Hello Kevin..

I am trying to install RcppRoll from source using install_github("kevinushey/RcppRoll") but i am getting the following error (which I couldn't decipher); and i was wondering if you knew the cause/reason?

Thanks

---------------------- error output below this line---------------

Installing github repo RcppRoll/master from kevinushey
Downloading master.zip from https://github.com/kevinushey/RcppRoll/archive/master.zip
Installing package from /var/folders/jc/md64bd2x7nq0_j5ppqp89k_00000gn/T//Rtmp6Mmi9i/master.zip
arguments 'minimized' and 'invisible' are for Windows only
Installing RcppRoll
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL
'/private/var/folders/jc/md64bd2x7nq0_j5ppqp89k_00000gn/T/Rtmp6Mmi9i/devtools4b25b9accdc/RcppRoll-master'
--library='/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages' --install-tests

* installing source package 'RcppRoll' ...
** libs
g++ -arch x86_64 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I"/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include" -I"/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/RcppArmadillo/include" -fPIC -mtune=core2 -g -O2 -c RcppExports.cpp -o RcppExports.o
g++ -arch x86_64 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -I/usr/local/include -I"/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include" -I"/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/RcppArmadillo/include" -fPIC -mtune=core2 -g -O2 -c rollit.cpp -o rollit.o
rollit.cpp:245:14: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (!::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:245:12: error: no matching function for call to 'is_nan'
if (!::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:260:14: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (!::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:260:12: error: no matching function for call to 'is_nan'
if (!::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:324:14: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (!::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:324:12: error: no matching function for call to 'is_nan'
if (!::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:337:14: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (!::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:337:12: error: no matching function for call to 'is_nan'
if (!::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:357:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:357:11: error: no matching function for call to 'is_nan'
if (::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:371:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:371:11: error: no matching function for call to 'is_nan'
if (::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:422:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:422:11: error: no matching function for call to 'is_nan'
if (::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:437:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) {
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:437:11: error: no matching function for call to 'is_nan'
if (::isnan(x[offset + i])) {
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:455:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) continue;
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:455:11: error: no matching function for call to 'is_nan'
if (::isnan(x[offset + i])) continue;
^~~~~~~
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: candidate template ignored: could not match 'VectorBase<RTYPE, na, type-parameter-0-2>' against 'const double'
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
rollit.cpp:468:13: error: no member named 'isnan' in the global namespace; did you mean 'is_nan'?
if (::isnan(x[offset + i])) continue;
~~^~~~~
is_nan
/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/Rcpp/include/Rcpp/sugar/functions/is_nan.h:49:33: note: 'is_nan' declared here
inline sugar::IsNaN<RTYPE,NA,T> is_nan( const Rcpp::VectorBase<RTYPE,NA,T>& t){
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [rollit.o] Error 1
ERROR: compilation failed for package 'RcppRoll'

removing '/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/RcppRoll'
restoring previous '/Users/ahmednagi/Documents/R_WorkSpaces/Rpackages/RcppRoll'

get_rollit_source() fails: "object 'outFile' not found"

> library('RcppRoll')
> get_rollit_source(roll_max, edit=FALSE)
Error in get("outFile", envir = environment(fun)) : 
  object 'outFile' not found
> ls(environment(roll_max))
 [1] "get_rollit_source" "roll_max"          "roll_maxl"         "roll_maxr"         "roll_mean"        
 [6] "roll_meanl"        "roll_meanr"        "roll_median"       "roll_medianl"      "roll_medianr"     
[11] "roll_min"          "roll_minl"         "roll_minr"         "roll_prod"         "roll_prodl"       
[16] "roll_prodr"        "roll_sd"           "roll_sdl"          "roll_sdr"          "roll_sum"         
[21] "roll_suml"         "roll_sumr"         "roll_var"          "roll_varl"         "roll_varr"        
[26] "rollit"            "rollit_example"    "rollit_raw"

Perhaps your package organization scheme changed from when this function was written?

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppRoll_0.2.2

loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2    yaml_2.1.14    Rcpp_0.12.13

`na.rm` Argument to Deal with Missing Values

Something like this:

> library(zoo)
> ( x <- c(2, 4, 6, NA, 8) )
[1]  2  4  6 NA  8
> rollapplyr(x, 2, mean, na.rm = F)   # same as `roll_mean(x, 2)`
[1]  3  5 NA NA
> rollapplyr(x, 2, mean, na.rm = T)   # `roll_mean(x, 2, na.rm = T)`
[1] 3 5 6 8

Can't find rollit from lastest version

Kevin, is there any replacement for the rollit?

Argument to pad with NAs when n > 1

Hi Kevin, just wondering if you might consider including an option to pad the returned vectors/matrices with NAs when n > 1?

I'd love to be able to do something like this

data.frame(a = 1:10, b = roll_sum(1:10, 3, padNA = TRUE))

instead of this

data.frame(a = 1:10, b = c(rep(NA,2),roll_sum(1:10, 3))).

Of course data.frame(a = 1:x, b = roll_sum(1:x, n = 3)) throws an error because length of roll_sum(x,n) for n > 1 is shorter than length(1:x).

Separate logic into 'populate' and 'fill' methods

populate will fill a vector with values from the actual function applied for each method, while fill will fill that vector with e.g. NA.

These functions could potentially run in parallel since they do not read or write the same memory units.

Use testthat

For unit tests.

Update to CRAN version

Hi Kevin. I am using your RcppRoll package from github, and will use it as a dependency in a package I am writing for identifying anomalous time series. In order to upload my package to CRAN, it would be useful if the CRAN version of RcppRoll was updated to match the version on github. Would that be possible?

Thanks.
Rob

Feature request for roll_quantile

Needed for skewed distributions.

Seems like it should be an easy extension of roll_median.

Incorrect mean values with na.rm = T

> tmp = c(1,1,1,1,NA,NA,NA,NA,1,1)
> roll_mean(tmp, 4, c(1,3,3,1), na.rm=T)
[1] 1.000000 1.166667 1.000000 0.500000      NaN 0.500000 1.000000

1.166667 and 0.5 are unexpected values. There should be only 1s, and one NaN.

This seems to work correctly when a default weighting is used:

> roll_mean(tmp, 4, na.rm=T)
[1]   1   1   1   1 NaN   1   1

rolling with multivariable

any idea about rolling with multivarible for a matrix or data fram (data table)?

Roll backwards

Would it be possible to implement backwards rolling. Thus, it would be a very convenient way to implement multi periods condition when handling panel data.

library(tidyverse)
#> + ggplot2 2.2.1          Date: 2017-05-29
#> + tibble  1.3.3             R: 3.3.2
#> + tidyr   0.6.3            OS: Windows 10 x64
#> + readr   1.1.1           GUI: RTerm
#> + purrr   0.2.2.2      Locale: Danish_Denmark.1252
#> + dplyr   0.5.0            TZ: Europe/Paris
#> + stringr 1.2.0        
#> + forcats 0.2.0
#> Warning: package 'ggplot2' was built under R version 3.3.3
#> Warning: package 'tibble' was built under R version 3.3.3
#> Warning: package 'tidyr' was built under R version 3.3.3
#> Warning: package 'readr' was built under R version 3.3.3
#> Warning: package 'purrr' was built under R version 3.3.3
#> Warning: package 'dplyr' was built under R version 3.3.3
#> Warning: package 'forcats' was built under R version 3.3.3
#> -- Conflicts ----------------------------------------------------
#> * filter(),  from dplyr, masks stats::filter()
#> * lag(),     from dplyr, masks stats::lag()
library(RcppRoll)
#> Warning: package 'RcppRoll' was built under R version 3.3.3

df <- tibble(id = rep(1, each = 10), wage = c(980, rep(1000, each = 9)))

df %>%
  mutate(wagecri = if_else(wage >= 1000, 1, 0)) %>%
  mutate(crimean = roll_meanr(wagecri, n = 2))
#> # A tibble: 10 x 4
#>       id  wage wagecri crimean
#>    <dbl> <dbl>   <dbl>   <dbl>
#>  1     1   980       0      NA
#>  2     1  1000       1     0.5
#>  3     1  1000       1     1.0
#>  4     1  1000       1     1.0
#>  5     1  1000       1     1.0
#>  6     1  1000       1     1.0
#>  7     1  1000       1     1.0
#>  8     1  1000       1     1.0
#>  9     1  1000       1     1.0
#> 10     1  1000       1     1.0

# I would like the result to be as follows


tibble(id = rep(1, each = 10), wage = c(980, rep(1000, each = 9))) %>%
  mutate(wagecri = if_else(wage >= 1000, 1, 0)) %>%
  mutate(crimean = c(0.5, rep(1, each = 8), NA_real_))
#> # A tibble: 10 x 4
#>       id  wage wagecri crimean
#>    <dbl> <dbl>   <dbl>   <dbl>
#>  1     1   980       0     0.5
#>  2     1  1000       1     1.0
#>  3     1  1000       1     1.0
#>  4     1  1000       1     1.0
#>  5     1  1000       1     1.0
#>  6     1  1000       1     1.0
#>  7     1  1000       1     1.0
#>  8     1  1000       1     1.0
#>  9     1  1000       1     1.0
#> 10     1  1000       1      NA

Best regards,
Jakob

Functionality for missing values in a time series

I am finding RcppRoll very convenient to use in conjunction with dplyr with one caveat: if I am doing rolling summaries over a numeric vector which is indexed by date (or a time period), then I may still want this to be used for calculating the rolling window (with value 0). It is analogous to using OLAP functions in SQL with range.

mutate_over() is my most recent attempt at implementing this functionality. @kevinushey @hadley I am wondering whether something like this will sit in RcppRoll or dplyr in the future?

rollit function not available

Hi Kevin,
I hope you are well. Greetings from Lusaka.
I have installed RcppRoll through gitub
devtools::install_github("kevinushey/RcppRoll")

I have tried the standards function roll_mean, roll_sum, roll_sd. They work great. Now I want to create my own rolling function with rollit. But there is not such function in the package.

What am I missing?

Using user defined function in RcppRoll

Hello, I recently came across RcppRoll, sorry for the stupid question but is the rollit function allow the use of a user defined function (R function or cpp function).
I kind of see that calling a r function from Rcpp might be even worse than looping in R itself, but it would provide some kind of sugar is some cases. Typically using data.table by or equivalent dplyr behaviour one could call rolling functions by groups...
Do I understand correctly ?
If yes even if it seems trivial it might be worth adding a few words in the vignette so people won't expect this feature (and spend time trying :) ).

Thanks for providing this package to the community

Is this package still maintained?

RcppRoll roll_median na.rm doesn't work

I run into this problem. It seems 'na.rm' in roll_median doesn't work.

Weight vector is normalized after call with normalize = TRUE

First, thanks for making RcppRoll available! It's very helpful for a new data science class that I'm teaching, especially because I can't expect my students to code Rcpp snippets.

Here's a little bug that I ran into. The normalize = TRUE flag will result in w itself being normalized. I can easily avoid this with weights = copy(w) but nonetheless I doubt this is the intended behavior.

Example:

x = 1:6
w = c(0.5, 0.5, 1.0)
roll_sum(x, n = 3L, weights = w, normalize = TRUE, align = "right",  fill = NA)

The result is that

> w
[1] 0.75 0.75 1.50

Feature request for roll_ema (rolling exponential moving average)

It would be nice if we can have a roll_ema

roll_sum overwrites weights when renormalizing them

A bothering and quite dangerous bug is that the weights renormalization overwrites the given weights vector, instead of creating an inner temporary one :

a = 1:25
> d <- data.frame(w = c(0.2, 0.1, 0.1, 0.05, 0.05) ) # the total weight is 0.5 on purpose
> d
     w
1 0.20
2 0.10
3 0.10
4 0.05
5 0.05
> roll_sum(a, n = length(d$w), weights = d$w)
 [1]  11.5  16.5  21.5  26.5  31.5  36.5  41.5  46.5  51.5  56.5
       61.5  66.5  71.5  76.5  81.5  86.5  91.5  96.5 101.5 106.5 111.5

> d
    w
1 2.0
2 1.0
3 1.0
4 0.5
5 0.5

The fix should be quite simple, just creating a local weight vector inside the functions

More over, the weights renormalization seems to be enabled by default, which might be nice to mention in the popup help.

Thanks for all the work and this wonderful library.

segfault -- vector length smaller than window length

If you repeatedly call a rolling function, e.g. roll_sumr and the vector length is smaller than the window length, then after some time it will segfault.

If the segfault bug is fixed, I think the current behavior of the dev version on github is better than v0.1.0 on CRAN, i.e. to just return a NA-vector (of appropriate class) that has the vector length (as opposed to the longer window length).

Consider a default fill = `NA` for rollr, rolll

Because why would someone care how a window is aligned unless you planned on filling in the holes with something?

weighted median

I think the weight is placed on the ordered values, not on the original values passed in.
E.g.
> roll_medianr(5:3, 3, weight = c(1,0,0))
gives NA NA 3 1xsmallest, 0x4, 0x5
Not NA NA 5 as expected (1x5,0x4,0x3)

My guess is this stems from sorting first without respect to the weights...
` NumericVector copy(x.begin() + offset, x.begin() + offset + n);
std::sort(copy.begin(), copy.end());

double weights_sum = sum(weights);

int k = 0;
double sum = weights_sum - weights[0];

while (sum > weights_sum / 2) {
  ++k;
  sum -= weights[k];
}

return copy[k]; `

Adding na.locf

I think it would be nice to have the equivalent of zoo::na.locf in RcppRoll as well.

And it would be nice to limit how far the NAs are carried forward (or backward if fromLast=T). This is similar to the maxgap argument, but the difference is that maxgap just kills everything (which I think is not good, or deserves an additional argument):

x <- 1:8
x[4:5] <- NA
na.locf(x, maxgap = 1)     # Current `zoo` behavior.
[1]  1  2  3 NA NA  6  7  8
na.locf(x, maxcarry = 1)     # This is what I mean.
[1]  1  2  3 3 NA  6  7  8