[Nov 2022] Python for Time Series Data Analysis

Description

This is a repository to highlight certain parts of the Python for Time Series Data Analysis course, and record some of the key skills and concepts covered.

Tools

Repository description
Exponential smoothing
- Task
- Solution
Evaluating forecasting models
- Code
Stationarity test: adfuller
- Code
Causality test: granger
- Code
Forecasting: SARIMAX
- Code
Reference material

📈Exponential smoothing

This part of the course asked me to apply concepts for exponential smoothing, using statsmodels.tsa.

⬆

🎯Task

For this set of exercises we're using data from the Federal Reserve Economic Database (FRED) concerning the Industrial Production Index for Electricity and Gas Utilities from January 1970 to December 1989.

Data source: https://fred.stlouisfed.org/series/IPG2211A2N

To do:

Import the data

Set a date index and assign a frequency to the DatetimeIndex.

Add a 12-month Simple Moving Average (SMA)

Add a 12-month Exponentially Weighted Moving Average (EWMA)

Use a Holt-Winters fitted model & Triple Exponential Smoothing (TES)

Plot the above

⬆

💡Solution

Repository: (Link)
Notes: Repository contains the Jupyter notebook and CSV datasets

[Simple & triple exponential smoothing]

🤔Evaluating forecasting models

There are three common ways to evaluate a forecasting model:

Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
- Minimizing the RMSE will lead to forecasts of the mean.
Mean Absolute Error (MAE)
- A forecast method that minimizes the MAE will lead to forecasts of the median.

Most people use RMSE for the main metric for evaluating predictions.
It punishes larger values and stays in the same units as the original data

⬆

🐍Code

#Create a dataset
  
import numpy as np
import pandas as pd

np.random.seed(42)
df = pd.DataFrame(np.random.randint(20,30,(50,2)),
                  columns=['test','predictions'])
df.plot(figsize=(12,4));

#Evaluate the predictions
  
from statsmodels.tools.eval_measures import mse, rmse, meanabs

MSE = mse(df['test'],df['predictions']) 
RMSE = rmse(df['test'],df['predictions'])
MAE = meanabs(df['test'],df['predictions'])

print(f'Model Mean Squared Error (MSE): {MSE:.3f}')
print(f'Model Root Mean Squared Error (RMSE): {RMSE:.3f}')
print(f'Model Mean Absolute Error (MAE): {MAE:.3f}')

⬆

👨🏼‍💻Stationarity test: adfuller

The Adfuller method allows us to look for stationarity within a dataset.

Essentially, does a timeseries dataset remain stationary (similar mean and variance) across time, regardless of trends and potential noise?

This is a function for printing a more user friendly adfuller report in python.

⬆

🐍Code

from statsmodels.tsa.stattools import adfuller
    
def adf_test(series,title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
print(f'Augmented Dickey-Fuller Test: {title}')
result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
    
labels = ['ADF test statistic','p-value','# lags used','# observations']
out = pd.Series(result[0:4],index=labels)

for key,val in result[4].items():
    out[f'critical value ({key})']=val
        
print(out.to_string())          # .to_string() removes the line "dtype: float64"
    
if result[1] <= 0.05:
    print("Strong evidence against the null hypothesis")
    print("Reject the null hypothesis")
    print("Data has no unit root and is stationary")
else:
    print("Weak evidence against the null hypothesis")
    print("Fail to reject the null hypothesis")
    print("Data has a unit root and is non-stationary")

# To use, type: 
# adf_test(dataframe['column_name'])

⬆

👨🏼‍💻Causality test: granger

The Granger causality test is a a hypothesis test to determine if one time series is useful in forecasting another.
It observes changes in one series and sees if these changes are correlated to changes in another after a consistent amount of time.

⬆

🐍Code

Dataset: [samples.csv]

#Import a dataset & set index to datetime and assign a datetime frequency

df3 = pd.read_csv('../Data/samples.csv',
                  index_col=0,
                  parse_dates=True)
    
df3.index.freq = 'MS'
    
df3[['a','d']].plot(figsize=(16,5));

# Import the statistical package needed
# Add a semicolon at the end to avoid duplicate output
    
from statsmodels.tsa.stattools import grangercausalitytests
    
grangercausalitytests(df3[['a','d']],
                      maxlag=3);

# A visual representation of the correlation found
# Two days after something happens with column a, the data in column d reacts.

df3['a'].iloc[2:].plot(figsize=(16,5),
                       legend=True);
    
df3['d'].shift(2).plot(legend=True);