devtools::install_github("clarkjoe/overtime", dependencies = TRUE)
The intent of overtime
is to help machine learning developers generate lots of summary statistics extremely quickly. While there are many default summary statistics initially processed, including more will be available in future releases. The default summary statistics are:
sum
mean
median
sd
max
min
sd / mean
(coefficient of variation)- OO2
- OO3
- Largest positive sequence
- Largest negative sequence
- Largest zero sequence
- Largest increasing sequence
- Largest decreasing sequence
- Largest increasing positive sequence
- Largest decreasing positive sequence
- Largest increasing negative sequence
- Largest decreasing negative sequence
The actual content of the package is smaller than most, but its scope of usability is a wide net. Below is a simple example:
library(overtime)
library(tidyverse)
library(magrittr)
data <- readRDS('../data/rawData.rds')
nestedData <- data %>%
overtime_by("day") %>%
overtime_get()
unnestedData <- nestedData %>%
overtime_unnest()
BOOM! It's that simple.
AccountNumber | D_Cognostics |
---|---|
A | tibble [1 x 18] |
B | tibble [1 x 18] |
C | tibble [1 x 18] |
AccountNumber | D_Count | D_Count | D_Count | D_Mean | D_Median | D_SD | D_Max | D_Min | ... |
---|---|---|---|---|---|---|---|---|---|
A | 13 | 13 | 13 | 4.33 | 5 | 3.06 | 7 | 1 | ... |
B | 10 | 10 | 10 | 3.33 | 4 | 2.08 | 5 | 1 | ... |
C | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
Currently, only a specific data format works with overtime. Here is an example data format:
AccountNumber | Date | Count |
---|---|---|
A | 2014-11-01 | 1 |
A | 2014-11-02 | 4 |
B | 2014-11-01 | 0 |
B | 2014-11-01 | 12 |
C | 2014-11-02 | 8 |
C | 2014-11-01 | 47 |
There must be:
- Grouped variable
- Continious dates
- All group variables must have equal number of dates
- There can be no date jumps (eg. 2014-11-01 | 2014-11-03)
- Numeric counts
- Each cell must be a positive integer (eg. no NA or -1)
An article that talks to the benefit of data generated by overtime
.
This package is in development. Functionality and documentation is being improved and polished.