Coder Social home page Coder Social logo

waterdipai / datachecks Goto Github PK

View Code? Open in Web Editor NEW
129.0 2.0 18.0 4.28 MB

Open Source Data Quality Monitoring.

Home Page: https://datachecks.io

License: Apache License 2.0

Python 91.90% Dockerfile 0.07% Makefile 0.08% TypeScript 6.64% CSS 0.94% JavaScript 0.38%
data-engineering data-validation dataops dataquality metrics mlops postgresql python data-governance data-observability

datachecks's Introduction

Logo

Open Source Data Quality Monitoring.

License Versions coverage coverage Status

⭐️ If you like it, star the repo

Why Data Monitoring?

APM (Application Performance Monitoring) tools are used to monitor the performance of applications. APM tools are mandatory part of dev stack. Without AMP tools, it is very difficult to monitor the performance of applications.

why_data_observability

But for Data products regular APM tools are not enough. We need a new kind of tools that can monitor the performance of Data applications. Data monitoring tools are used to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.

What is datachecks?

Datachecks is an open-source data monitoring tool that helps to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.

Datachecks can generate several reliability, uniqueness, completeness metrics from several data sources

Reports: Data Quality Visualisation

You can generate with just one command. It generates a beautiful data quality report with all the metrics. This html report can be shared with the team.

why_data_observability

CLI: Data Quality Visualisation in Bash

Data quality report can be generated in the terminal. It is very useful for debugging. All it takes is one command.

why_data_observability

Getting Started

Install datachecks with the command that is specific to the database.

Install Datachecks

To install all datachecks dependencies, use the below command.

pip install datachecks -U

Create the config file

With a simple config file, you can generate data quality reports for your data sources. Below is the sample config example. For more details, please visit the config guide

why_data_observability

Run from CLI

Generate Report in Terminal

datachecks inspect -C config.yaml

Generate HTML Report

datachecks inspect -C config.yaml  --html-report

Please visit the Quick Start Guide

Supported Data Sources

Datachecks supports sql and search data sources. Below are the list of supported data sources.

Data Source Type Supported
Postgres Transactional Database 👍
MySql Transactional Database 👍
MS SQL Server Transactional Database 🔜
OpenSearch Search Engine 👍
Elasticsearch Search Engine 👍
GCP BigQuery Data Warehouse 👍
DataBricks Data Warehouse 👍
Snowflake Data Warehouse 🔜
AWS RedShift Data Warehouse 👍

Metric Types

Metric Description
Reliability Metrics Reliability metrics detect whether tables/indices/collections are updating with timely data
Numeric Distribution Metrics Numeric Distribution metrics detect changes in the numeric distributions i.e. of values, variance, skew and more
Uniqueness Metrics Uniqueness metrics detect when data constraints are breached like duplicates, number of distinct values etc
Completeness Metrics Completeness metrics detect when there are missing values in datasets i.e. Null, empty value
Validity Metrics Validity metrics detect whether data is formatted correctly and represents a valid value

Overview

datacheck_architecture

What Datacheck does not do?

Community & Support

For additional information and help, you can use one of these channels:

  • Slack (Live chat with the team, support, discussions, etc.)
  • GitHub issues (Bug reports, feature requests)

Contributions

🙌 We greatly appreciate contributions - be it a bug fix, new feature, or documentation!

Check out the contributions guide and open issues.

Datachecks contributors: 💙

Telemetry

Usage Analytics & Data Privacy

License

This project is licensed under the terms of the APACHE 2 License.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.