Coder Social home page Coder Social logo

hot-vs-cool / sqlservervalidationtoolkit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from simonjonas/sqlservervalidationtoolkit

0.0 1.0 0.0 528 KB

A tool for rule-based-validation of SQL Server data

License: MIT License

C# 94.44% SQLPL 5.19% PLpgSQL 0.37%

sqlservervalidationtoolkit's Introduction

SqlServerValidationToolkit

A tool for data-validation of SQL Server tables

##What is it? The SqlServerValidationToolkit allows you to quickly define rules for column-values. You can use the tool to get an overview of invalid valid and to constantly corrrect them. Everytime you run the validation you have a current view of all invalid values.

You can use the tool in a local mode where the metadata are stored locally or in a shared mode where you store them in a SQL Server database that you can access from an SSIS-package.

##Rule types The following rule-types are available:

  • MinMax-rule
  • Like-rule with a like-expression
  • Comparison-rule to compare the value of one column with the value of another column
  • CustomQuery-rule which allows you to define a custom query and define your own error types. The query returns the id of the invalid values and the errorType-code.

##SSIS Integration You can integrate the validation toolkit in the SSIS-package.

  • Prepare the validation toolkit: In the SqlServerValidationToolkit.Configurator.exe.config-file go to the bottom of the file. Set the "StoreMetadataInSqlServer"-setting to "True" and in the "MetadataDatabaseConnectionString" set the connection string to the metadata-database.
  • Define the rules for the validation as in the local mode.
  • Integrate the tool into SSIS: In the project add an "Execute Process" Task and set "-v" in the Arguments-row.
  • Handle the invalid data: For example if you want to import the data into a data warehouse you can access the view [Validation_EntriesWithWrongValues] from the metadata database to access the Id-values of the invalid entries to exclude them from the import.

##What are the alternatives?

  • A simple way is to use SSIS with lookup transformations and conditional splits to weed out invalid data. This is a standard approach but if you have a lot of tables to validate, the validation logic is spread over many places. Also it means that you have to change and redeploy the SSIS package when the validation logic changes. This can be time consuming. The validation toolkit extracts the validation logic into an external component.
  • The "Data validation transform"-component from PragmaticsWorks (http://pragmaticworks.com/Products/Features/Feature/DataValidationTransform) can validate regular expressions.
  • Data Quality services (https://msdn.microsoft.com/en-us/library/ff877917.aspx) is the service from Microsoft to validate data. The validation toolkit is a simple alternative.

##Where can I use it? In a perfect world you would not need this tool. The applications would all validate the input-data themselves and give the user immediate feedback if one of the rules is broken. However sometimes applications don't validate or they don't validate the right thing and after a while your boss comes and sais the existing data needs to be shipped to a data warehouse. What do you do? You validate yourself. You write all SQL-queries that filter the wrong values in the ETL-process or you correct them one by one.

To help you in this process the tool was written. If you use it, you will not need to write so many SQL queries and keep track of all the wrong values. The tool does that for you. You can also copy the generated SQL queries and use them in your ETL-process. And with the customQuery-rule you can validate very complex situations.

##Example

Below you see the configuration for a MinMax-rule for the column "Length" of the table "Babies". Valid values are between 0 and 60. Null values should be interpreted as error: Configuration

The rule in the screenshot creates the following query:

SELECT BabyID,CASE WHEN (Length < 10) THEN 'TooLow' WHEN (60 < Length) THEN 'TooHigh' WHEN (Length IS NULL) THEN 'NotEntered' END AS ErrorType_code
	FROM 
	Babies
	WHERE 
    (Length < 10 OR 60 < Length)
	OR Length IS NULL

For the null values there are three strategies:

  • Ignore: Null-values are not validated
  • InterpretAsError: Null-values have a special code 'NotEntered'
  • ConvertToDefaultValue: Values are converted to 0 for int-columns, an empty string for string-columns and the current time for datetime-columns

After the validation you can see all invalid values in the "WrongValues"-tab: wrongValues

##Why are the imported tables called Source? Because I would like to extend the tool so that you can also validate views in a future version.

sqlservervalidationtoolkit's People

Contributors

simonjonas avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.