Coder Social home page Coder Social logo

m4-methods's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

m4-methods's Issues

MLP with additive seasonality?

Hi,

I'm having issues replicated the results for the MLP method, especially for the hourly dataset.

I'm using the hyper-parameter settings found in https://github.com/Mcompetitions/M4-methods/blob/master/ML_benchmarks.py.

Yearly Quarterly Monthly Weekly Daily Hourly
MLP -7.910408 -7.948233 -4.790635 -44.658715 -51.489338 147.500501

Values are percentage difference between published sMAPE and replicated ones, i.e. a value of 100 means 100% difference, positive values indicate the replicated results are worse than published results.

Screenshot 2020-03-31 at 11 32 54

The plot shows part of H1 training series, the full test series and MLP point forecasts, where y_pred_orig are the point forecasts found in the submission-MLP.rar file, y_pred_add are point forecasts I obtain with additive deseasonalisation and y_pred_mul are point forecasts I obtain with multiplicative deseasonalisation.

I find similar patterns for RNN. Are you using additive seasonality by any chance? Any other idea where the deviation may come from?

dataset

Hello, excuse me. I would like to ask, what do V1, V2, V3, V4, V5, V6, etc. in the dataset represent? What do Y1, Y2, Y3, Y4 represent?

Replicating ARIMA results

How were the ARIMA results generated? I can only find the submission file with the results, but not the code used to generate them. Was any preprocessing applied? Any non-default hyper-parameters settings? Did you use forecast's auto.arima (v8.2)?

Replicating "260 - KaterinaKou" method results

Hi,
I am having issues replicating the results from the code submitted by Nikoletta-Zampeta Legaki. Running the code on RStudio gives following error:

Error in datasets[[j]][[i]] : subscript out of bounds

Your assistance in this regard is much appreciated!

test dataset

Hello, is this the best validation dataset in the dataset? Is it generated by someone else's model? Thaks.

Reproducing benchmark point forecasts

Hi,

What version of the forecast package was used for calculating the point forecasts of the benchmarks?

I'm asking because if i try to reproduce the benchmarks point forecasts with the code from "Benchmarks and Evaluation.R" i get different results for SES, Holt etc. I'm using forecast 8.21 on R 4.3.1 on Linux.

Thank you.

Inconsistent seasonality tests

The seasonality tests in Python and R seem to give different results.

If you run the R code snippet below, you get FALSE. If you run the Python snippet you get True.

R code:

# copied from https://github.com/Mcompetitions/M4-SeasonalityTest <- function(input, ppy){
  #Used to determine whether a time series is seasonal
  tcrit <- 1.645
  if (length(input)<3*ppy){
    test_seasonal <- FALSE
  }else{
    xacf <- acf(input, plot = FALSE)$acf[-1, 1, 1]
    clim <- tcrit/sqrt(length(input)) * sqrt(cumsum(c(1, 2 * xacf^2)))
    test_seasonal <- ( abs(xacf[ppy]) > clim[ppy] )
    
    if (is.na(test_seasonal)==TRUE){ test_seasonal <- FALSE }
  }
  
  return(test_seasonal)
}

data <- c(2.62434536, -0.61175641, -0.52817175, -1.07296862,  1.86540763,
          -2.3015387 ,  1.74481176, -0.7612069 ,  1.3190391 , -0.24937038,
          1.46210794, -2.06014071,  0.6775828 , -0.38405435,  1.13376944,
          -1.09989127)
ppy <- 4 
SeasonalityTest(data, ppy)

Python code:

import numpy as np
from math import sqrt
data = np.array([2.62434536, -0.61175641, -0.52817175, -1.07296862,  1.86540763,
-2.3015387 ,  1.74481176, -0.7612069 ,  1.3190391 , -0.24937038,
1.46210794, -2.06014071,  0.6775828 , -0.38405435,  1.13376944,
-1.09989127])

# copied from https://github.com/Mcompetitions/M4-methods/blob/master/ML_benchmarks.py
def seasonality_test(original_ts, ppy):
    """
    Seasonality test
    :param original_ts: time series
    :param ppy: periods per year
    :return: boolean value: whether the TS is seasonal
    """
    s = acf(original_ts, 1)
    for i in range(2, ppy):
        s = s + (acf(original_ts, i) ** 2)

    limit = 1.645 * (sqrt((1 + 2 * s) / len(original_ts)))

    return (abs(acf(original_ts, ppy))) > limit


def acf(data, k):
    """
    Autocorrelation function
    :param data: time series
    :param k: lag
    :return:
    """
    m = np.mean(data)
    s1 = 0
    for i in range(k, len(data)):
        s1 = s1 + ((data[i] - m) * (data[i - k] - m))

    s2 = 0
    for i in range(0, len(data)):
        s2 = s2 + ((data[i] - m) ** 2)

    return float(s1 / s2)

ppy = 4
seasonality_test(data, ppy)

The difference is that in the Python code you do not take the square of the autocorrelation coefficient at the first lag, i.e.

s = acf(original_ts, 1) ** 2

Questions about the range and format of the data's StartingDate

Hi all, I was wondering if there is information regarding the date's format or the range of the StartingDate column given in the M4-info.csv or not.

From my observation, the StartingDate is usually in the format of "DD-MM-YY hh:mm", but there are some that break this rule or is ambiguous. For example:

  • M369 whose StartingDate is 1882-07-01 12:00:00 (which I supposed is the 1st of July, 1882).
  • M376 whose StartingDate is 01-01-17 12:00 (which I can't tell if it is 1st of January of 1917 or 2017).

Any clarification is appreciated!

RNN benchmark data reshaping

Hello. Thanks for sharing this repository.

I'm looking at ML_benchmarks.py#L156 and noticed that you just reshape x_train to have sequences along each row. This means that your rows do not have any overlap between them. Doesn't this hinder the RNN's performance?

dataset

Hello, in this paper "The M4 Competition: 100000 time series and 61 forecasting methods", it is proposed that the M4 dataset is divided into six data frequencies and six application fields. For the Yearly dataset, Micro accounts for 6538, Industry accounts for 3716, Macro accounts for 3903, Finance accounts for 6519, Demographic accounts for 1088, and Other accounts for 1236. May I ask which rows of the entire dataset are this Micro in? What are the rows of Industry in the entire dataset? What are the rows of Macro in the entire dataset? What are the rows of Finance in the entire dataset?

M4 theta method (question)

Hi,

Came across this topic when searching for existing forecast tool.
In respect to the method shown, is there any reference on the theta model exhibited? (i.e.: logic of the codes)

Much appreciated.

where can I find template_Naive.csv?

Hello :)
I was trying to run the predict.py, however, it shows that there in line 105 of predict.py need template_Naive.csv, which I couldn't find in M4 dataset. Where can I find it?

Thank you very much

Benchmarks naive_seasonal

In the file 'Benchmarks and Evaluation.R', function naive_seasonal, line 43, is ''+ frcst - frcst" actually meaningful? It seems that it does nothing

Under which license is the data released?

Depending on where the dataset comes from, this might affect which license affects the M4 dataset.

As a for-profits business, we need to figure out what is possible for us to do with this data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.