Coder Social home page Coder Social logo

sodadata / soda-github-action Goto Github PK

View Code? Open in Web Editor NEW
11.0 6.0 0.0 40 KB

:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.

Home Page: https://www.soda.io/

License: Apache License 2.0

Dockerfile 11.04% Shell 18.80% Python 70.16%
data-engineering data-monitoring data-observability data-quality data-quality-checks data-quality-monitoring data-quality-testing data-reliability data-testing data-unit-tests data-validation dataquality datatesting pipeline-testing snowflake

soda-github-action's Introduction

Soda GitHub Action

GitHub Super-Linter .github/workflows/tests.yaml

Soda enables Data Engineers to test data for quality where and when they need to. It works by taking the data quality checks that you prepare and using them to run a scan of datasets in a data source.

A scan is a CLI command which instructs Soda to prepare optimized SQL queries that execute data quality checks on your data source to find invalid, missing, or unexpected data. When checks fail, they surface bad-quality data and present check results that help you investigate and address quality issues.

Add the GitHub Action for Soda to your GitHub Workflow to automatically execute scans for data quality during development.

For example, in a repository in which you are adding a transformation or making changes to a dbt model, you can add the Soda GitHub Action to your workflow. With each new PR, or commit to an existing PR, it executes a Soda scan for data quality and presents the results of the scan in a comment in the pull request, and in a report in Soda Cloud.

Where the scan results indicate an issue with data quality, Soda notifies you both in the PR comment, and by email so that you can investigate and address any issues before merging the PR into production.

Refer to Soda documentation for an example use case.

Use the Soda GitHub Action

Add the action to your GitHub Workflow, as in the following example in the Perform Soda Scan step.

name: Scan for data quality

on: pull_request
jobs:
  soda_scan:
    runs-on: ubuntu-latest
    name: Run Soda Scan
    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Perform Soda Scan
        uses: sodadata/soda-github-action@v1
        env:
          SODA_CLOUD_API_KEY: ${{ secrets.SODA_CLOUD_API_KEY }}
          SODA_CLOUD_API_SECRET: ${{ secrets.SODA_CLOUD_API_SECRET }}
          SNOWFLAKE_USERNAME: ${{ secrets.SNOWFLAKE_USERNAME }}
          SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
        with:
          soda_library_version: v1.0.4
          data_source: snowflake
          configuration: ./configuration.yaml
          checks: ./checks.yaml

Refer to testing files and the test workflow for more context for the example.

Action inputs

Name Description Required Default
soda_library_version Version of the Soda Library that runs the scan. Supply a specific version, such as v1.0.4, or latest.
See soda-library docker images for possible versions. Compatible with Soda Library 1.0.4 and higher.
-
data_source Name of data source on which to perform the scan. -
configuration File path to configuration YAML file. See Soda docs. -
checks File path to checks YAML file. See Soda docs. Compatible with shell filename extensions.
Identify multiple check files, if you wish. For example: ./checks_*.yaml or ./{check1.yaml,check2.yaml}
-

Self-hosted runners

  • Windows runners are not supported, including the use of official Windows-based images such as windows-latest.
  • MacOS runners require installation of Docker because macos-latest does not come with Docker pre-installed.

Access Soda documentation for more information.

soda-github-action's People

Contributors

dirkgroenen avatar gregkaczan avatar janet-can avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

soda-github-action's Issues

Configuration path does not exist

I have added this code to my GitHub actions pipeline -
- name: Perform Soda Scan
uses: sodadata/[email protected]
env:
SODA_CLOUD_API_KEY: SODA_CLOUD_API_KEY
SODA_CLOUD_API_SECRET: SODA_CLOUD_API_SECRET
SNOWFLAKE_USERNAME: Username
SNOWFLAKE_PASSWORD: Password
with:
soda_library_version: v1.0.4
data_source: snowflake_soda_poc
configuration: ${{ github.workspace }}/soda/configuration.yaml
checks: ${{ github.workspace }}/soda/checks.yaml

It is adding /tmp/workspace/ as a prefix to my configuration path due to this i'm getting this error -
Configuration path '/tmp/workspace//home/runner/work/dbt_training/dbt_training/soda/configuration.yaml' does not exist
[17:08:14] Path "/tmp/workspace//home/runner/work/dbt_training/dbt_training/soda/checks.yaml" does not exist

Any help is appreciated...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.