This is a repository to highlight certain parts of the Python for Time Series Data Analysis course, and record some of the key skills and concepts covered.
This part of the course asked me to apply concepts for exponential smoothing, using statsmodels.tsa.
For this set of exercises we're using data from the Federal Reserve Economic Database (FRED) concerning the Industrial Production Index for Electricity and Gas Utilities from January 1970 to December 1989.
Data source: https://fred.stlouisfed.org/series/IPG2211A2N
To do:
- Import the data
- Set a date index and assign a frequency to the DatetimeIndex.
- Add a 12-month Simple Moving Average (SMA)
- Add a 12-month Exponentially Weighted Moving Average (EWMA)
- Use a Holt-Winters fitted model & Triple Exponential Smoothing (TES)
- Plot the above
Repository:
(Link)
Notes:
Repository contains the Jupyter notebook and CSV datasets
[Simple & triple exponential smoothing]
There are three common ways to evaluate a forecasting model:
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
- Minimizing the RMSE will lead to forecasts of the mean.
Mean Absolute Error (MAE)
- A forecast method that minimizes the MAE will lead to forecasts of the median.
Most people use RMSE for the main metric for evaluating predictions.
It punishes larger values and stays in the same units as the original data
#Create a dataset
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame(np.random.randint(20,30,(50,2)),
columns=['test','predictions'])
df.plot(figsize=(12,4));
#Evaluate the predictions
from statsmodels.tools.eval_measures import mse, rmse, meanabs
MSE = mse(df['test'],df['predictions'])
RMSE = rmse(df['test'],df['predictions'])
MAE = meanabs(df['test'],df['predictions'])
print(f'Model Mean Squared Error (MSE): {MSE:.3f}')
print(f'Model Root Mean Squared Error (RMSE): {RMSE:.3f}')
print(f'Model Mean Absolute Error (MAE): {MAE:.3f}')
The Adfuller method allows us to look for stationarity within a dataset.
- Essentially, does a timeseries dataset remain stationary (similar mean and variance) across time, regardless of trends and potential noise?
This is a function for printing a more user friendly adfuller report in python.
from statsmodels.tsa.stattools import adfuller
def adf_test(series,title=''):
"""
Pass in a time series and an optional title, returns an ADF report
"""
print(f'Augmented Dickey-Fuller Test: {title}')
result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
labels = ['ADF test statistic','p-value','# lags used','# observations']
out = pd.Series(result[0:4],index=labels)
for key,val in result[4].items():
out[f'critical value ({key})']=val
print(out.to_string()) # .to_string() removes the line "dtype: float64"
if result[1] <= 0.05:
print("Strong evidence against the null hypothesis")
print("Reject the null hypothesis")
print("Data has no unit root and is stationary")
else:
print("Weak evidence against the null hypothesis")
print("Fail to reject the null hypothesis")
print("Data has a unit root and is non-stationary")
# To use, type:
# adf_test(dataframe['column_name'])
The Granger causality test is a a hypothesis test to determine if one time series is useful in forecasting another.
It observes changes in one series and sees if these changes are correlated to changes in another after a consistent amount of time.
Dataset:
[samples.csv]
#Import a dataset & set index to datetime and assign a datetime frequency
df3 = pd.read_csv('../Data/samples.csv',
index_col=0,
parse_dates=True)
df3.index.freq = 'MS'
df3[['a','d']].plot(figsize=(16,5));
# Import the statistical package needed
# Add a semicolon at the end to avoid duplicate output
from statsmodels.tsa.stattools import grangercausalitytests
grangercausalitytests(df3[['a','d']],
maxlag=3);
# A visual representation of the correlation found
# Two days after something happens with column a, the data in column d reacts.
df3['a'].iloc[2:].plot(figsize=(16,5),
legend=True);
df3['d'].shift(2).plot(legend=True);
A piece of code to be used as a template for seasonal forecasting with an exogenous variable using SARIMAX.
© 2022 GitHub, Inc.
Terms
Privacy