Coder Social home page Coder Social logo

mpearmain / data_contracts_sandbox Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 2.7 MB

This is a repo designed to show the workflow of a data contract, with pre-commit hooks and GitHub Actions on how to have the contract power a data platform

License: MIT License

Python 100.00%
data-contracts data-engineering-infrastructure data-mesh

data_contracts_sandbox's Introduction

Data Contracts Sandbox

This repository is used to maintain and manage our data contracts. These contracts are defined using JSON schemas. In this example we use pre-commit hooks to validate the contracts, and GitHub Actions to automatically bump the version when changes are pushed to the master branch.

In a production setting you could do far more sophisticated things i.e,

  • In the pre-commit activities you could
    • Lookup in the company AD to see if the dataOwner exists and fire warnings if not
    • Validate the contract :)
  • In the actions you could parse the contract (with a parser.py script) and automatatically send changes to downstream services that utilise the metadata
    • Data governance and when fields change in there access control
    • Data discoverability and updating the meta information used by these services)
    • Create / evolve schema registry entries
    • Create / evolve catalog entries
  • A very intersting extension would be mergin this with (https://github.com/BauplanLabs/no-jvm-wap-with-iceberg) to show case data loading based on contracts, and branching etc

image

Prerequisites

  • Python 3.7 or later
  • git for version control
  • GitHub account for managing the repository

Repository Structure

The repository has the following structure:

.
├── LICENSE
├── README.md
├── contract_schema.json
├── test_contract.json
├── validate.py
└── github_actions/version_bump.yaml 
  • LICENSE: The license for this repository.
  • README.md: This document, which provides an overview of the repository.
  • contract_schema.json: The JSON schema for our data contracts.
  • test_contract.json: A sample data contract that conforms to our schema.
  • validate.py: The Python script that validates our data contracts against the schema.
  • github_actions/version_bump.yaml The yaml script for version bumping when we commit to the repo

JSON Schemas and Data Contracts

Our data contracts are defined using JSON schemas. These schemas provide a clear and concise definition of the structure and type of data that our contracts should have.

The contracts themselves are JSON documents that conform to the structure defined in the schema.

Pre-commit Validation

We use a pre-commit hook to validate our data contracts. This hook runs every time you make a commit. It checks that the changes you've made to the contracts are valid according to the schema.

To set up the pre-commit hook, follow these steps: Create a shell script named pre-commit in the .git/hooks/ directory of your repository. This script will be executed every time you try to commit changes.

#!/bin/sh

python validate.py

if [ $? -ne 0 ]; then
 echo "Data validation failed, commit rejected"
 exit 1
fi

Make the pre-commit hook executable by running chmod +x .git/hooks/pre-commit.

Automatic Version Bumping

We use a GitHub Action to automatically bump the version of our contracts when changes are pushed to the main branch. The configuration for this action can be found in github_actions/version_bump.yaml.

The following action must be taken to add this to the repo

Create a new GitHub Action: In your GitHub repository, navigate to the "Actions" tab and click on "New Workflow". If you have no existing workflows, you can also click on "Set up a workflow yourself".

Set up the workflow: In the new workflow file, you will need to set up the triggering event and the jobs that the workflow will run. An example file is found github_actions/version_bump.yaml

In this example, the workflow triggers when you push to the master branch. The job it runs checks out your code and then uses the github-tag-action to automatically bump the version and push a new tag.

Configure the action: The github-tag-action can be configured with several environment variables. Here are some of the most commonly used ones:

  • GITHUB_TOKEN (required): This is required for the action to have permission to push the new tag to your repository. The token is provided by GitHub, you just need to reference it as shown above.
  • DEFAULT_BUMP (optional): Which type of version bump to use if none is explicitly provided in the commit message (major, minor, or patch). Defaults to minor.
  • WITH_V (optional): Whether or not to prefix the tag with 'v'. If true, your tags will look like v1.0.0. If false, they will look like 1.0.0.

Commit and push the workflow: Once you have configured the workflow to your liking, commit and push it to your repository. The workflow will now trigger according to the event you specified.

data_contracts_sandbox's People

Contributors

mpearmain avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.