Coder Social home page Coder Social logo

mamimo's Introduction

MaMiMo

This is a small library that helps you with your everyday Marketing Mix Modelling. It contains a few saturation functions, carryovers and some utilities for creating with time features. You can also read my article about it here: >>>Click<<<.

Give it a try via pip install mamimo!

Small Example

You can create a marketing mix model using different components from MaMiMo as well as scikit-learn. First, we can create a dataset via

from mamimo.datasets import load_fake_mmm

data = load_fake_mmm()

X = data.drop(columns=['Sales'])
y = data['Sales']

X contains media spends only now, but you can enrich it with more information.

Feature Engineering

MaMiMo lets you add time features, for example, via

from mamimo.time_utils import add_time_features, add_date_indicators


X = (X
     .pipe(add_time_features, month=True)
     .pipe(add_date_indicators, special_date=["2020-01-05"])
     .assign(trend=range(200))
)

This adds

  • a month column (integers between 1 and 12),
  • a binary column named special_date that is 1 on the 5h of January 2020 and 0 everywhere else, and
  • a (so far linear) trend which is only counting up from 0 to 199.

X looks like this now:

1_iPkUH70amWOZijv6LVhM3A

Building a Model

We can now build a final model like this:

from mamimo.time_utils import PowerTrend
from mamimo.carryover import ExponentialCarryover
from mamimo.saturation import ExponentialSaturation
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

cats =  [list(range(1, 13))] # different months, known beforehand

preprocess = ColumnTransformer(
    [
     ('tv_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['TV']),
     ('radio_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Radio']),
     ('banners_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Banners']),
    ('month', OneHotEncoder(sparse=False, categories=cats), ['month']),
    ('trend', PowerTrend(), ['trend']),
    ('special_date', ExponentialCarryover(), ['special_date'])
    ]
)

model = Pipeline([
    ('preprocess', preprocess),
    ('regression', LinearRegression(
        positive=True,
        fit_intercept=False # no intercept because of the months
        ) 
    )
])

This builds a model that does the following:

  • the media channels are preprocessed using the adstock transformation, i.e. a carryover effect and a saturation is added
  • the month is one-hot (dummy) encoded
  • the trend is changed from linear to something like t^a, with some exponent a to be optimized
  • the special_date 2020-01-05 gets a carryover effect as well, meaning that not only on this special week there was some special effect on the sales, but also the weeks after it

Training The Model

We can then hyperparameter tune the model via

from scipy.stats import randint, uniform
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit

tuned_model = RandomizedSearchCV(
    model,
    param_distributions={
        'preprocess__tv_pipe__carryover__window': randint(1, 10),
        'preprocess__tv_pipe__carryover__strength': uniform(0, 1),
        'preprocess__tv_pipe__saturation__exponent': uniform(0, 1),
        'preprocess__radio_pipe__carryover__window': randint(1, 10),
        'preprocess__radio_pipe__carryover__strength': uniform(0, 1),
        'preprocess__radio_pipe__saturation__exponent': uniform(0, 1),
        'preprocess__banners_pipe__carryover__window': randint(1, 10),
        'preprocess__banners_pipe__carryover__strength': uniform(0, 1),
        'preprocess__banners_pipe__saturation__exponent': uniform(0, 1),
        'preprocess__trend__power': uniform(0, 2),           
        'preprocess__special_date__window': randint(1, 10),  
        'preprocess__special_date__strength': uniform(0, 1), 
    },
    cv=TimeSeriesSplit(),
    random_state=0,
    n_iter=1000, # can take some time, lower number for faster results
)

tuned_model.fit(X, y)

You can also use GridSearch, Optuna, or other hyperparameter tune methods and packages here, as long as it is compatible to scikit-learn.

Analyzing

With tuned_model.predict(X) and some plotting, we get

1_Bf4NKiUPNVVH87-7PNNZGw

You can get the best found hyperparameters using print(tuned_model.best_params_).

Plotting

You can compute the channel contributions via

from mamimo.analysis import breakdown

contributions = breakdown(tuned_model.best_estimator_, X, y)

This returns a dataframe with the contributions of each channel fo each time step, summing to the historical values present in y. You can get a nice plot via

ax = contributions.plot.area(
    figsize=(16, 10),
    linewidth=1,
    title="Predicted Sales and Breakdown",
    ylabel="Sales",
    xlabel="Date",
)
handles, labels = ax.get_legend_handles_labels()
ax.legend(
    handles[::-1],
    labels[::-1],
    title="Channels",
    loc="center left",
    bbox_to_anchor=(1.01, 0.5),
)

1_SIlnsYXxRjhSZf-1jE4aDQ

Wow, that's a lot of channels. Let us group some of them together.

group_channels = {'Baseline': [f'month__month_{i}' for i in range(1, 13)] + ['Base', 'trend__trend']} 
# read: 'Baseline consists of the months, base and trend.'
# You can add more groups!

contributions = breakdown(
    tuned_model.best_estimator_,
    X,
    y,
    group_channels
)

If we plot again, we get

1_xHzrUMMTKGxo7dvKpebjNg

Yay!


ko-fi

mamimo's People

Contributors

amanabdullayev avatar garve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mamimo's Issues

ERROR: Could not find a version that satisfies the requirement mamimo (from versions: none)

Hello Dr. Robert, I found your article very useful, thank you alot for the sharing.
By the way, I am trying to follow your steps in the article, and when I pip install mamimo, it shows me 2 errors,
ERROR: Could not find a version that satisfies the requirement mamimo (from versions: none) ERROR: No matching distribution found for mamimo
Do you know why is it?

It does not support numpy>=1.22

Hi Dr. Kubler,

I was trying to download MaMiMo to give it a try but when trying to do pip install it gave me 2 errors.

  1. requires daal==2021.4.0 which I don't have, do I just install that myself first?
  2. It requires nupy<1.22 but it says I have 1.22.4 as IT recently installed the latest Anaconda on my system.

Any help you can provide is greatly appreciated!

code for plot graphic

Hi Sr.Garve

Congralutations for your project! It´s fantastic :)

Although it is not a question especially of the mamimo package, I would like to know if you could pass me the code to be able to represent the saturation and carryover effects of your article in towardsdatascienc:

https://towardsdatascience.com/an-upgraded-marketing-mix-modeling-in-python-5ebb3bddc1b6

Specifically these that I show below

image

image

I can't figure out how to get this data to plot the graph and check both fantastic effects shown here.

Thanks for your help :)

Unable to Install mamimo

As a first step, I'm unable to install mamimo library. I'm getting below error:

Error
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement mamimo
ERROR: No matching distribution found for mamimo
Note: you may need to restart the kernel to use updated packages.

ERROR: Could not find a version that satisfies the requirement mamimo
ERROR: No matching distribution found for mamimo

Here is my Python Version Details
/opt/anaconda3/bin/python
3.8.8 (default, Apr 13 2021, 12:59:45)
[Clang 10.0.0 ]
sys.version_info(major=3, minor=8, micro=8, releaselevel='final', serial=0)

Train/test split does not take into account carryover effects in test set

Hi, and thank you for the great package. It is very intuitive and has helped me a lot.

Looking into the README file, in the section Training The Model, the RandomizedSearchCV first splits the initial X and y to train and test sets, and then applies the preprocessing pipeline (carryover, saturation, and the model) to train set and test set separately.
But this results in missing all the carryover effects that would be caused from the media at the end of the train set to the beginning of the test set, which I think decreases the validity and accuracy of the model.

Shouldn't the preprocessing be applied first, and then pass the preprocessed dataset to the grid search?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.