Coder Social home page Coder Social logo

python-for-time-series-data-analysis's Introduction

[Nov 2022] Python for Time Series Data Analysis

StackOverflow Github LinkedIn Signal Email

Description

This is a repository to highlight certain parts of the Python for Time Series Data Analysis course, and record some of the key skills and concepts covered.

Tools

Jupyter Notebook Python Pandas matplotlib statsmodels

Table of contents

📈Exponential smoothing

This part of the course asked me to apply concepts for exponential smoothing, using statsmodels.tsa.

🎯Task

For this set of exercises we're using data from the Federal Reserve Economic Database (FRED) concerning the Industrial Production Index for Electricity and Gas Utilities from January 1970 to December 1989.

Data source: https://fred.stlouisfed.org/series/IPG2211A2N

To do:

  • Import the data
  • Set a date index and assign a frequency to the DatetimeIndex.
  • Add a 12-month Simple Moving Average (SMA)
  • Add a 12-month Exponentially Weighted Moving Average (EWMA)
  • Use a Holt-Winters fitted model & Triple Exponential Smoothing (TES)
  • Plot the above

💡Solution

Repository: (Link)
Notes: Repository contains the Jupyter notebook and CSV datasets

Screenshot 2022-10-30 at 11 37 13

[Simple & triple exponential smoothing]

🤔Evaluating forecasting models


There are three common ways to evaluate a forecasting model:

  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
    • Minimizing the RMSE will lead to forecasts of the mean.
  • Mean Absolute Error (MAE)
    • A forecast method that minimizes the MAE will lead to forecasts of the median.

Most people use RMSE for the main metric for evaluating predictions.
It punishes larger values and stays in the same units as the original data

🐍Code

#Create a dataset
  
import numpy as np
import pandas as pd

np.random.seed(42)
df = pd.DataFrame(np.random.randint(20,30,(50,2)),
                  columns=['test','predictions'])
df.plot(figsize=(12,4));

Example dataset plot

#Evaluate the predictions
  
from statsmodels.tools.eval_measures import mse, rmse, meanabs

MSE = mse(df['test'],df['predictions']) 
RMSE = rmse(df['test'],df['predictions'])
MAE = meanabs(df['test'],df['predictions'])

print(f'Model Mean Squared Error (MSE): {MSE:.3f}')
print(f'Model Root Mean Squared Error (RMSE): {RMSE:.3f}')
print(f'Model Mean Absolute Error (MAE): {MAE:.3f}')

Example dataset plot

👨🏼‍💻Stationarity test: adfuller

The Adfuller method allows us to look for stationarity within a dataset.

  • Essentially, does a timeseries dataset remain stationary (similar mean and variance) across time, regardless of trends and potential noise?

This is a function for printing a more user friendly adfuller report in python.

🐍Code

from statsmodels.tsa.stattools import adfuller
    
def adf_test(series,title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
print(f'Augmented Dickey-Fuller Test: {title}')
result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
    
labels = ['ADF test statistic','p-value','# lags used','# observations']
out = pd.Series(result[0:4],index=labels)

for key,val in result[4].items():
    out[f'critical value ({key})']=val
        
print(out.to_string())          # .to_string() removes the line "dtype: float64"
    
if result[1] <= 0.05:
    print("Strong evidence against the null hypothesis")
    print("Reject the null hypothesis")
    print("Data has no unit root and is stationary")
else:
    print("Weak evidence against the null hypothesis")
    print("Fail to reject the null hypothesis")
    print("Data has a unit root and is non-stationary")

# To use, type: 
# adf_test(dataframe['column_name'])

Example adfuller test

👨🏼‍💻Causality test: granger

The Granger causality test is a a hypothesis test to determine if one time series is useful in forecasting another.
It observes changes in one series and sees if these changes are correlated to changes in another after a consistent amount of time.

🐍Code

Dataset: [samples.csv]

#Import a dataset & set index to datetime and assign a datetime frequency

df3 = pd.read_csv('../Data/samples.csv',
                  index_col=0,
                  parse_dates=True)
    
df3.index.freq = 'MS'
    
df3[['a','d']].plot(figsize=(16,5));

Screenshot 2022-11-08 at 13 56 53

# Import the statistical package needed
# Add a semicolon at the end to avoid duplicate output
    
from statsmodels.tsa.stattools import grangercausalitytests
    
grangercausalitytests(df3[['a','d']],
                      maxlag=3);

Screenshot 2022-11-08 at 13 57 10

# A visual representation of the correlation found
# Two days after something happens with column a, the data in column d reacts.

df3['a'].iloc[2:].plot(figsize=(16,5),
                       legend=True);
    
df3['d'].shift(2).plot(legend=True);   

Screenshot 2022-11-08 at 13 57 19

📈Forecasting: SARIMAX

A piece of code to be used as a template for seasonal forecasting with an exogenous variable using SARIMAX.

Screenshot 2022-12-21 at 11 32 54

🐍Code

Code: [Code Snippet]

Screenshot 2022-12-21 at 11 35 09

📚Reference material



© 2022 GitHub, Inc. Terms Privacy

python-for-time-series-data-analysis's People

Contributors

samtaylor92 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.