Coder Social home page Coder Social logo

davedavis / marketing-etl Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 3.0 193 KB

Marketing-ETL is an application that pulls data from the Google Ads API, the Microsoft Ads API and Adobe.io and warehouses the data into a local DB for advanced custom reporting.

License: Apache License 2.0

Dockerfile 0.15% Python 99.85%

marketing-etl's Introduction

About

DG Tracker is an ETL application that tracks extracts account, campaign and ads report data from the Google Ads and Microsoft Ads APIs, transforms them using pre-built models with appropriate relationships and loads them into a database. It also pulls metric reports, based on RSID, from Adobe Analytics and creates an RSID/Account(s) relationship so that both spend and site metrics can be tracked together.

Most databases are supported as it uses SQLAlchemy as an ORM which allows easy switching of database provider.

This is the ETL component of a larger reporting suite. So what you do with the data once it's loaded, is up to you. Sample DB views for paid media program managers will be provided in the example directory. If you are interested in the post ETL application, please see my other repos.

DB Requirements

This app is quite flexible on DB choice. However, If you're using MySQL as your database of choice, you'll need to change the max_allowed_packets setting to something large for the ads reports to actually write to the DB. So in your /etc/my.cnf file, add:

[mysqld] max_allowed_packet=999M

This is especially if you're extracting data for a large account or lots of accounts. Packet size needs to be increased because the DB write time is exponentially faster when using the bulk write functions of each RDBMS.

ToDo

Supplemental Requirements

  • Must be stateless.
  • Must default to running current week from CLI and parameterize for backfill.
  • Must be containerized and scalable using swarm or Kubernetes.
  • Container must run on both ARM and X86 so necessary wheels need to be built manually.
  • Final Container/Image must be under 100Mb
  • Must use clustered managed DB.
  • Must have an Airflow DAG for hourly runs.
  • Must deliver reports via mail.
  • Must stay within rate limits.
  • All secrets must be docker/github or env.
  • Must have robust test suite (but not TDD).

Maybe Pile

  • Use GitHub Actions for CI/CD
  • Initial development script to automate Docker rebuilds

marketing-etl's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.