Coder Social home page Coder Social logo

sodadata / soda-core Goto Github PK

View Code? Open in Web Editor NEW
1.8K 12.0 185.0 2.83 MB

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Home Page: https://go.soda.io/core-docs

License: Apache License 2.0

Shell 0.10% Python 99.50% ANTLR 0.22% Dockerfile 0.17%
python data-engineering data-governance data-monitoring data-observability data-profiling data-quality data-quality-checks data-quality-monitoring data-reliability

soda-core's People

Contributors

aayush16 avatar bastienboutonnet avatar baturayo avatar bbakersmith avatar dakue-soda avatar deenkar avatar dirkgroenen avatar duyet avatar gpby avatar janet-can avatar jczuurmond avatar joristruong avatar linderttobias avatar m1n0 avatar milanaleksic avatar nathadfield avatar paololeonard avatar pre-commit-ci[bot] avatar rkshvish avatar robertomorandeira avatar rolandrmgservices avatar scottatdisney avatar stuart-robinson avatar tdstark avatar thiagodeschamps avatar tituskx avatar tombaeyens avatar vijaykiran avatar vinod901 avatar wintersrd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soda-core's Issues

Deal with reserved keywords

All data sources have reserved keywords, if these are used for table/column names we have to quote them. Can we just quote everything to get around the issue?

Improved error message - Statistical check

On purpose, I attempted to evaluate for avg on a varchar column. Naturally, there was an error, but the error message doesn’t explicitly tell me, “You can’t calculate average on varchar content, stoopid.” Could we replace the query execution error with something that will more directly lead me to my error? I’m afraid the HINT didn’t help…

checks for sodatest_customers_b7580920:
 - avg(pct) between 1 and 10
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Query execution error in local_postgres_sodasql.sodatest_customers_b7580920.aggregation[0]: function avg(character varying) does not exist
LINE 2:  AVG(pct) 
     ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

SELECT 
 AVG(pct) 
FROM sodatest_customers_b7580920
 +-> function avg(character varying) does not exist
LINE 2:  AVG(pct) 
     ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Metrics avg were not computed for check avg(pct) between 1 and 10
Scan summary:
1/1 check NOT EVALUATED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   avg(pct) between 1 and 10 [NOT EVALUATED]
    check_value: None
1 checks not evaluated. See logs for details.
2 errors.
Oops! 2 errors. 0 failures. 0 warnings. 0 pass.
ERRORS:
Query execution error in local_postgres_sodasql.sodatest_customers_b7580920.aggregation[0]: function avg(character varying) does not exist
LINE 2:  AVG(pct) 
     ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

SELECT 
 AVG(pct) 
FROM sodatest_customers_b7580920
 +-> function avg(character varying) does not exist
LINE 2:  AVG(pct) 
     ^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Metrics avg were not computed for check avg(pct) between 1 and 10

For an integer column, this worked as expected.

checks for sodatest_customers_b7580920:
 - avg(distance) between 1 and 10
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Scan summary:
1/1 check FAILED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   avg(distance) between 1 and 10 [FAILED]
    check_value: 445.6666666666667
Oops! 1 failures. 0 warnings. 0 errors. 0 pass.

Issues when creating percentage validity checks

This check fails:

checks for retail_orders:
   - invalid_count(discount) < 1:
       valid format:  positive percentage

With this exception:

(.venv) $soda scan -V -d aws_postgres_retail orders.yml
Soda Core 0.0.1
Reading configuration file "/Users/albin/.soda/configuration.yml"
Reading SodaCL file "orders.yml"
Scan execution starts
Query aws_postgres_retail.retail_orders.aggregation[0]:
SELECT 
  COUNT(CASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? *(\d+([\.,]\d+)?|([\.,]\d+)) *% *$') THEN 1 END) 
FROM retail_orders
Query execution error in aws_postgres_retail.retail_orders.aggregation[0]: operator does not exist: double precision ~ unknown
LINE 2: ...ASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

SELECT 
  COUNT(CASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? *(\d+([\.,]\d+)?|([\.,]\d+)) *% *$') THEN 1 END) 
FROM retail_orders
  +-> operator does not exist: double precision ~ unknown
LINE 2: ...ASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Metrics invalid_count were not computed for check invalid_count(discount) < 1
Scan summary:
1/1 query ERROR
  aws_postgres_retail.retail_orders.aggregation[0] [ERROR] 0:00:00.617506
SELECT 
  COUNT(CASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? *(\d+([\.,]\d+)?|([\.,]\d+)) *% *$') THEN 1 END) 
FROM retail_orders
    operator does not exist: double precision ~ unknown
    LINE 2: ...ASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? ...
                                                                 ^
    HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

1/1 check NOT EVALUATED: 
    retail_orders in aws_postgres_retail
      invalid_count(discount) < 1 [NOT EVALUATED]
        check_value: None
1 checks not evaluated.
2 errors.
Oops! 2 errors. 0 failures. 0 warnings. 0 pass.
ERRORS:
Query execution error in aws_postgres_retail.retail_orders.aggregation[0]: operator does not exist: double precision ~ unknown
LINE 2: ...ASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

SELECT 
  COUNT(CASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? *(\d+([\.,]\d+)?|([\.,]\d+)) *% *$') THEN 1 END) 
FROM retail_orders
  +-> operator does not exist: double precision ~ unknown
LINE 2: ...ASE WHEN NOT (discount IS NULL) AND NOT (discount ~ '^ *\+? ...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Metrics invalid_count were not computed for check invalid_count(discount) < 1
(.venv) $

Screenshot 2022-03-14 at 11 40 20

Improve error message missing column - Statistical

For this check, I left out the column name to see what would happen and it errored as expected but maybe the error message could be improved?
'NoneType' object has no attribute 'column_name' + a stacktrace is... not super useful. Could it be shortened to just "Missing column name"?

(Also, need to correct the docs for the metric: should be 'stddev' instead of 'stdev'.)

checks for sodatest_customers_b7580920:
  - stdev > 4
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Unknown error while executing scan.
  | 'NoneType' object has no attribute 'column_name'
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/scan.py", line 323, in execute
  |     self.__create_check(check_cfg, data_source_scan, partition)
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/scan.py", line 470, in __create_check
  |     check = Check.create(
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/check.py", line 37, in create
  |     return MetricCheck(check_cfg, data_source_scan, partition, column)
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/metric_check.py", line 79, in __init__
  |     metric = data_source_scan.resolve_metric(metric)
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/data_source_scan.py", line 46, in resolve_metric
  |     metric.ensure_query()
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/numeric_query_metric.py", line 126, in ensure_query
  |     self.partition.ensure_query_for_metric(self)
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/partition.py", line 56, in ensure_query_for_metric
  |     sql_aggregation_expression = metric.get_sql_aggregation_expression()
  |   File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/execution/numeric_query_metric.py", line 92, in get_sql_aggregation_expression
  |     values_expression = self.column.column_name
  | AttributeError: 'NoneType' object has no attribute 'column_name'

Include calculation result in passed tests - Statistical

A thought: would it be valuable to show the check_value of a statistical calculation when a test passes? I like seeing the value when tests fail. Is that possible?

checks for sodatest_customers_b7580920:
  - min(size) >= 0
  - max(size) <= 5000
  - avg(size) between -5 and 1000
  - sum(distance) = 100
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Scan summary:
2/4 checks PASSED: 
    sodatest_customers_b7580920 in local_postgres_sodasql
      max(size) <= 5000 [PASSED]
      avg(size) between -5 and 1000 [PASSED]
2/4 checks FAILED: 
    sodatest_customers_b7580920 in local_postgres_sodasql
      min(size) >= 0 [FAILED]
        check_value: -3.0
      sum(distance) = 100 [FAILED]
        check_value: 4011
Oops! 2 failures. 0 warnings. 0 errors. 2 pass.

Test coverage for data types

Ensure we have test coverage for most of the common data types.

  • integer
  • decimals
  • varchar
  • date
  • time
  • timestamp
  • boolean

Improve error message - Schema checks

This check works as expected:

checks for sodatest_customers_b7580920:
 - schema:
   warn:
    when required column missing: [sombrero]
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Scan summary:
1/1 check WARNED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   schema [WARNED]
    missing_column_names = [sombrero]
    schema_measured = [id character varying, size double precision, sizetxt character varying, distance integer, pct character varying, cat character varying, country character varying, zip character varying, email character varying, date date, ts timestamp without time zone, ts_with_tz timestamp with time zone]
Only 1 warning. 0 failure. 0 errors. 0 pass.

But when I purposely messed up my syntax by adding an errant when, the error message is misleading in pointing out what exactly is wrong. The colon in line 5 isn’t the problem, the aimless extra when is the issue. More precise error message?

checks for sodatest_customers_b7580920:
 - schema:
   warn:
    when
    when required column missing: [sombrero]
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
YAML syntax error
 | mapping values are not allowed here
 |  in "<unicode string>", line 5, column 37:
 |       when required column missing: [sombrero]
 |                     ^ (line: 5)
 | Stacktrace:
 | Traceback (most recent call last):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/common/parser.py", line 65, in _parse_yaml_str
 |   file_dict = yaml.load(yaml_str)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/main.py", line 434, in load
 |   return constructor.get_single_data()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 119, in get_single_data
 |   node = self.composer.get_single_node()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 76, in get_single_node
 |   document = self.compose_document()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 99, in compose_document
 |   node = self.compose_node(None, None)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 141, in compose_node
 |   node = self.compose_sequence_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 185, in compose_sequence_node
 |   node.value.append(self.compose_node(node, index))
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 113, in compose_node
 |   if self.parser.check_event(AliasEvent):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/parser.py", line 146, in check_event
 |   self.current_event = self.state()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/parser.py", line 629, in parse_block_mapping_value
 |   if self.scanner.check_token(ValueToken):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 1796, in check_token
 |   self._gather_comments()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 1838, in _gather_comments
 |   self.fetch_more_tokens()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 288, in fetch_more_tokens
 |   return self.fetch_value()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 657, in fetch_value
 |   raise ScannerError(
 | ruamel.yaml.scanner.ScannerError: mapping values are not allowed here
 |  in "<unicode string>", line 5, column 37:
 |       when required column missing: [sombrero]
 |                     ^ (line: 5)

 +-> line=5,col=37 in checks.yaml
Scan summary:
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 0 pass.
ERRORS:
YAML syntax error
 | mapping values are not allowed here
 |  in "<unicode string>", line 5, column 37:
 |       when required column missing: [sombrero]
 |                     ^ (line: 5)
 | Stacktrace:
 | Traceback (most recent call last):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/soda_core/soda_core/common/parser.py", line 65, in _parse_yaml_str
 |   file_dict = yaml.load(yaml_str)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/main.py", line 434, in load
 |   return constructor.get_single_data()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/constructor.py", line 119, in get_single_data
 |   node = self.composer.get_single_node()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 76, in get_single_node
 |   document = self.compose_document()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 99, in compose_document
 |   node = self.compose_node(None, None)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 141, in compose_node
 |   node = self.compose_sequence_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 185, in compose_sequence_node
 |   node.value.append(self.compose_node(node, index))
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 143, in compose_node
 |   node = self.compose_mapping_node(anchor)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 223, in compose_mapping_node
 |   item_value = self.compose_node(node, item_key)
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/composer.py", line 113, in compose_node
 |   if self.parser.check_event(AliasEvent):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/parser.py", line 146, in check_event
 |   self.current_event = self.state()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/parser.py", line 629, in parse_block_mapping_value
 |   if self.scanner.check_token(ValueToken):
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 1796, in check_token
 |   self._gather_comments()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 1838, in _gather_comments
 |   self.fetch_more_tokens()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 288, in fetch_more_tokens
 |   return self.fetch_value()
 |  File "/Users/Janet/Documents/GitHub/soda-sql-v3/.venv/lib/python3.8/site-packages/ruamel/yaml/scanner.py", line 657, in fetch_value
 |   raise ScannerError(
 | ruamel.yaml.scanner.ScannerError: mapping values are not allowed here
 |  in "<unicode string>", line 5, column 37:
 |       when required column missing: [sombrero]
 |                     ^ (line: 5)

 +-> line=5,col=37 in checks.yaml

Quotes in schema check

Testing checks with quotes on table names and column names.

  • Test 001: I can successfully execute a scan for row_count check with quotes on table name.
  • Test 002: I can successfully execute a scan for schema check without quotes on table name.
  • Test 003: I cannot execute a scan for schema check with quotes on table name.

Set up:

  • using sodacl-workshop repo with latest version of soda-core
  • db is adventureworks running in docker container on the sodacl-workshop repo
  • connected to dev.sodadata.io account with API keys set in configuration.yml

Test 001 - row_count check with table name in quotes; works as expected
checks.yml:

checks for "dim_account":
  - row_count > 0

configuration.yml

data_source adventureworks:
  type: postgres
  connection:
    host: db
    username: postgres
    password: secret
  database: postgres
  schema: public

Scan input:
soda scan -d adventureworks -V -c configuration.yml checks.yml

Scan output:

Soda Core 0.0.1
Reading configuration file "configuration.yml"
Reading SodaCL file "checks.yml"
Scan execution starts
Query adventureworks."dim_account".aggregation[0]:
SELECT 
  COUNT(*) 
FROM "dim_account"
Scan summary:
1/1 query OK
  adventureworks."dim_account".aggregation[0] [OK] 0:00:00.034581
1/1 check PASSED: 
    "dim_account" in adventureworks
      row_count > 0 [PASSED]
        check_value: 99
All is good. No failures. No warnings. No errors.

Test 002 - schema check; works as expected
checks.yml:

checks for dim_account:
 - schema:
     warn:
       when required column missing: [sombrero]

configuration.yml
as above

Scan input:
soda scan -d adventureworks -V -c configuration.yml checks.yml

Scan output:

Soda Core 0.0.1
Reading configuration file "configuration.yml"
Reading SodaCL file "checks.yml"
Scan execution starts
Query adventureworks.dim_account.schema[dim_account]:
SELECT column_name, data_type, is_nullable 
FROM information_schema.columns 
WHERE lower(table_name) = 'dim_account' 
  AND lower(table_catalog) = 'postgres' 
  AND lower(table_schema) = 'public'
ORDER BY ORDINAL_POSITION
Scan summary:
1/1 query OK
  adventureworks.dim_account.schema[dim_account] [OK] 0:00:00.118499
1/1 check WARNED: 
    dim_account in adventureworks
      schema [WARNED]
        missing_column_names = [sombrero]
        schema_measured = [account_key integer, parent_account_key integer, account_code_alternate_key integer, parent_account_code_alternate_key integer, account_description character varying, account_type character varying, operator character varying, custom_members character varying, value_type character varying, custom_member_options character varying]
Only 1 warning. 0 failure. 0 errors. 0 pass.

Test 003 - schema check with table name in quotes; error
checks.yml:

checks for "dim_account":
 - schema:
     warn:
       when required column missing: [sombrero]

configuration.yml
as above

Scan input:
soda scan -d adventureworks -V -c configuration.yml checks.yml

Scan output:

Soda Core 0.0.1
Reading configuration file "configuration.yml"
Reading SodaCL file "checks.yml"
Scan execution starts
Query adventureworks."dim_account".schema["dim_account"]:
SELECT column_name, data_type, is_nullable 
FROM information_schema.columns 
WHERE lower(table_name) = '"dim_account"' 
  AND lower(table_catalog) = 'postgres' 
  AND lower(table_schema) = 'public'
ORDER BY ORDINAL_POSITION
Metrics schema were not computed for check schema
Scan summary:
1/1 query OK
  adventureworks."dim_account".schema["dim_account"] [OK] 0:00:00.097976
1/1 check NOT EVALUATED: 
    "dim_account" in adventureworks
      schema [NOT EVALUATED]
1 checks not evaluated.
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 0 pass.
ERRORS:
Metrics schema were not computed for check schema

error message is repeated twice

➜ soda scan -ef environments.yml -cf actor-checks.yml
Couldn't execute scan
  | connection to server at "localhost" (::1), port 5432 failed: Connection refused
  |     Is the server running on that host and accepting TCP/IP connections?
  | connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
  |     Is the server running on that host and accepting TCP/IP connections?
INFO   | Reading SodaCL file "environments.yml"
INFO   | Reading SodaCL file "actor-checks.yml"
ERROR  | Couldn't execute scan: connection to server at "localhost" (::1), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

Referential integrity output needs to be less verbose and more informative

Given this SodaCL

checks for fact_product_inventory:
  - reference from (product_key) to dim_product (product_key)

the output of running soda scan results in 111455 lines because there are too many rows in fact_product_inventory with a product_key that's not there in dim_product. This makes it very inconvenient to see which keys are missing.

...
483         20140616 2014-06-16    '$36.80'  0        0         4            
483         20140617 2014-06-17    '$36.62'  0        0         4            
483         20140618 2014-06-18    '$36.78'  0        0         4            
483         20140619 2014-06-19    '$36.88'  0        0         4            
483         20140620 2014-06-20    '$36.86'  0        0         4            
483         20140621 2014-06-21    '$36.84'  0        0         4            
483         20140622 2014-06-22    '$37.01'  0        0         4            
483         20140623 2014-06-23    '$37.13'  0        0         4            
483         20140624 2014-06-24    '$37.29'  0        0         4            
483         20140625 2014-06-25    '$37.19'  0        0         4            
483         20140626 2014-06-26    '$37.13'  0        0         4            
483         20140627 2014-06-27    '$37.13'  0        0         4            
483         20140628 2014-06-28    '$37.26'  0        0         4            
483         20140629 2014-06-29    '$37.16'  0        0         4            
483         20140630 2014-06-30    '$37.22'  0        0         4   
...

Suggestion: only show the distinct columns (in this case product_key) that are violating the referential integrity check.

Context: https://soda-community.slack.com/archives/C032BDZN3HV/p1645463084048419

For each must filter on schema

Using the only wildcard, %, with the for each check does not include the datasource schema resulting in a query which scans all tables within the information_schema.tables table.

QUERY PRODUCE...

SELECT table_name
FROM information_schema.tables
WHERE (lower(table_name) like '%');

SHOULD BE...

SELECT table_name
FROM information_schema.tables
WHERE (lower(table_name) like '%')
AND lower(table_schema) = '[SCHEMA_NAME]';

Soda Scan

soda scan -d raw_stripe stitch_stripe.yml

Configuration

data_source raw_stripe:
  type: snowflake
  connection:
    username: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    account: ${SNOWFLAKE_ACCOUNT}
    database: RAW
    schema: STRIPE
    warehouse: my_warehouse
    role: my_role

Check

for each table T:
  tables:
    - raw_stripe.charges # works
    - raw_stripe.% # works but queries all tables
    - '%' # works but queries all tables 
  checks:
    - row_count > 0

Result
Queries [RANDOM] tables not in the schema which results in execution errors

Soda Core 3.0.0b4
Reading configuration file "configuration.yml"
Adding SodaCL dir checks/raw/
Reading SodaCL file "checks/raw/stitch_stripe.yml"
Scan execution starts
Query raw_stripe.for_each_table_T[0]:
SELECT table_name 
FROM information_schema.tables
WHERE (lower(table_name) like '%')

Query raw_stripe.[RANDOM_TABLE].aggregation[0]:
SELECT 
  COUNT(*) 
FROM [RANDOM_TABLE].
Query execution error in raw_stripe.[RANDOM_TABLE]..aggregation[0]: 002003 (42S02): SQL compilation error:
Object '[RANDOM_TABLE].' does not exist or not authorized.

[...] -- ALL other tables in `information_schema.tables`

Bad error message for non existing data source

soda scan -d does_not_exist -c configuration.yml checks.yml

results in

Soda Core 3.0.0b4
Unknown error while executing scan.
  | dictionary changed size during iteration
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/Users/m1n0/dev/soda/soda-sql/soda-core/soda/core/soda/scan.py", line 267, in execute
  |     for data_source_scan_cfg in self._sodacl_cfg.data_source_scan_cfgs.values():
  | RuntimeError: dictionary changed size during iteration

Empty `table` property in cloud request for standalone checks causes cloud integration to fail

Soda Cloud does not support check results without table property. This property is filled with checks for X and is not present in "standalone" checks with no table property e.g.

checks:
  ....

Options:

  1. Disallow checks without table association
    This seems very suboptimal as such checks have been identified as useful

  2. Detect a table from the checks automatically
    Seems unreliable and too much work

  3. Change cloud to accept checks with no table.
    This is technically easy, but more challenging from product point of view - the UI would have to deal with such checks appropriately.

Review log summary

For the CLI output for checks status:

  • why does a warning include the text Only? Only 1 warning. 0 failure. 0 error. 0 pass. If there are more than one warnings, the Only is still there: Only 2 warnings. 0 failure. 0 errors. 0 pass. Are 50 warnings still Only? 100? Maybe we could use a different word, like Note. (However, the failures trump the warnings in order of precedence so 1 failure and 2 warnings seems fine. Oops! 1 failure. 2 warnings. 0 errors. 0 pass.)
  • Sometimes it says 0 warnings and sometimes it says No warnings— is that intentional? All is good. No failures. No warnings. No errors. Oops! 1 failure. 0 warnings. 0 errors. 0 pass. I suppose I am advocating for consistency when the status is 0/none.
  • Can we change the output message to apply plural where appropriate? Oops! 1 failures. 0 pass. 2 pass. 0 failure
    Why does the word Oops!appear when there is a failure? It implies that something is wrong when, really, a failed test means that something is working correctly/as expected! Maybe instead:Note` ?

Identity changes unexpectedly

Given the checks.yml configuration

checks for bitcoin:
  - row_count between 0 and 1000000
  - min(usd):
      warn: when < 20000
      fail: when < 10000
  - max(usd):
      warn: when > 70000
      fail: when > 100000

execution of soda scan will create 3 new checks.

After changing the threshold of the first check and leaving the other 2 checks unchanged in checks.yml:

checks for bitcoin:
  - row_count:
      warn: when between 20 and 100
  - min(usd):
      warn: when < 20000
      fail: when < 10000
  - max(usd):
      warn: when > 70000
      fail: when > 100000

all of a sudden min(usd) and max(usd) get new identity resulting in creation of new checks in soda-cloud and you end up with 6 checks instead of 4.

Attribute error due to a parsing issue with a typo

@tombaeyens you asked me to file an issue with the situation I had a full on traceback

my check was written like so with an intentional typo to see what the parser says:

checks for new_fact_currency_rate:
  - countsame as fact_currency_rate

This gave:

root@0c07d91bb019:/sodacl# soda scan -d adventureworks -c configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Invalid check "countsame as fact_currency_rate": mismatched input 'as' expecting {'between', 'not', 'warn', 'fail', 'pass', '!=', '<>', '<=', '>=', '=', '<', '>'}
  +-> line=9,col=5 in checks.yaml
  +-> See https://go.soda.io/SECTHEADER
No threshold specified for check "countsame as fact_currency_rate"
  +-> line=9,col=5 in checks.yaml
Unknown error while executing scan.
  | 'NoneType' object has no attribute 'column_name'
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/scan.py", line 327, in execute
  |     self.__create_check(check_cfg, data_source_scan, partition)
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/scan.py", line 480, in __create_check
  |     check.ensure_metrics_and_queries(data_source_scan=data_source_scan, partition=partition)
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/execution/metric_check.py", line 84, in ensure_metrics_and_queries
  |     metric = data_source_scan.resolve_metric(metric)
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/execution/data_source_scan.py", line 46, in resolve_metric
  |     metric.ensure_query()
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/execution/numeric_query_metric.py", line 183, in ensure_query
  |     self.partition.ensure_query_for_metric(self)
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/execution/partition.py", line 58, in ensure_query_for_metric
  |     sql_aggregation_expression = metric.get_sql_aggregation_expression()
  |   File "/usr/local/lib/python3.8/dist-packages/soda_core/execution/numeric_query_metric.py", line 149, in get_sql_aggregation_expression
  |     values_expression = self.column.column_name
  | AttributeError: 'NoneType' object has no attribute 'column_name'

Improved CLI output - Duplicates check

This check passed, as expected, but I don’t understand why the Sample information appears in the output.

checks for sodatest_customers_b7580920:
 - duplicates(id) = 0
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Sample 2022-03-02 15:22:36.421468/local_postgres_sodasql/sodatest_customers_b7580920/local_postgres_sodasql.sodatest_customers_b7580920.id.duplicates:
id frequency
-- ---------
Scan summary:
1/1 check PASSED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   duplicates(id) = 0 [PASSED]
All is good. No failures. No warnings. No errors.

[cloud] Fix freshness value to be an int milliseconds

Freshness check sends the value as str(time delta) this should be milliseconds

{
      "identity": "check-test-workshop_ds-ORDERS-daily-created_at-Freshness-bfdcb2cd",
      "name": "freshness using created_at < 2h",
      "type": "metricThreshold",
      "definition": "checks for ORDERS [daily]:\n  freshness using created_at < 2h",
      "location": {
        "filePath": "checks/02_advanced_and_cool.yml",
        "line": 34,
        "col": 5
      },
      "dataSource": "workshop_ds",
      "table": "ORDERS",
      "column": "created_at",
      "metrics": [
        "metric-test-workshop_ds-ORDERS-daily-created_at-max"
      ],
      "outcome": "fail",
      "diagnostics": {
        "value": "27 days, 2:29:20.044015",
        "maxColumnTimestamp": "2022-02-24 16:25:52",
        "maxColumnTimestampUtc": "2022-02-24 16:25:52+00:00",
        "nowVariableName": "NOW",
        "nowTimestamp": "2022-03-23T18:55:12.044015",
        "nowTimestampUtc": "2022-03-23 18:55:12.044015+00:00",
        "freshness": "27 days, 2:29:20.044015"
      }
    }

For each improve logging

For each, when used as designed, works perfectly:

for each table T:
 tables:
  - sodatest_customers_b7580920
  - sodatest_orders_f7532be6
 checks:
  - count > 0
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Scan summary:
2/2 checks PASSED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   count > 0 [PASSED]
  sodatest_orders_f7532be6 in local_postgres_sodasql
   count > 0 [PASSED]
All is good. No failures. No warnings. No errors.

But, because I’m a dummy, I was initially using the For Each check to check columns in a single table instead of tables/dataset in data source. Even though it CLEARLY says tables in the checks file. Duh. But the CLI output showed me all tests passed everything was fine, even though there was no way it could’ve found non-existent table names like id and size. Could we add something so that for each errors out if the input are non-table names? For me and the other stoopids of the world?

for each table T:
 tables:
  - id
  - size
 checks:
  - count > 0
scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Scan summary:
All is good. No failures. No warnings. No errors.

Improve error message

Invalid check "columns types": no viable alternative at input 'columnstypes'
  +-> checks.yml line=4,col=5 "checks for actor"."[1]"(columns types)
Couldn't detect table check type of 'columns types'
  +-> checks.yml line=4,col=5 "checks for actor"."[1]"(columns types)
INFO   | Reading SodaCL file "environments.yml"
INFO   | Reading SodaCL file "checks.yml"
ERROR  | Invalid check "columns types": no viable alternative at input 'columnstypes' checks.yml line=4,col=5 "checks for actor"."[1]"(columns types)
ERROR  | Couldn't detect table check type of 'columns types' checks.yml line=4,col=5 "checks for actor"."[1]"(columns types)
  • In the error "columnstypes" without spaces shows up
  • The error is caught in 2 places ?

Switch to regex library

Switch from the built-in re regex library to another external library for regex resolving.

compatible API, enhanced capability and standard unicode support

~/.soda/configuration.yml not automatically used

Using soda-core-postgres, 3.0.0b4.
Python 3.9.2

I have created a configuration.yml file in my ~/.soda directory. If I try to use the following scan command that ought to implicitly find the file, it produces an error.

soda scan -d aws_postgres_retail checks.yml
Soda Core 3.0.0b4
No configuration file specified nor found on ~/.soda/configuration.yml
Unknown error while executing scan.
  | dictionary changed size during iteration
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/Users/Janet/.pyenv/versions/3.9.2/lib/python3.9/site-packages/soda/scan.py", line 267, in execute
  |     for data_source_scan_cfg in self._sodacl_cfg.data_source_scan_cfgs.values():
  | RuntimeError: dictionary changed size during iteration

But if I explicitly identify the configuration.yml and path, it works just fine.

soda scan -d aws_postgres_retail -c ~/.soda/configuration.yml checks.yml
Soda Core 3.0.0b4
Scan summary:
1/1 check PASSED: 
    orders in aws_postgres_retail
      row_count > 0 [PASSED]
All is good. No failures. No warnings. No errors.

wrong indentation in configuration.yml results in query errors

environments.yml

warehouse dvdrental:
  type: postgres
  connection:
    host: localhost
    port: '5432'
    username: sodasql
    password:
    database: dvdrental
    schema: public

checks.yml

warehouse: dvdrental
checks for actor:
  - count > 0

Scan command output

➜ soda scan -ef environments.yml -cf actor-checks.yml
Query execution error in dvdrental.actor.aggregation[0]: relation "actor" does not exist
LINE 3: FROM actor
             ^

SELECT
  COUNT(*)
FROM actor
  | relation "actor" does not exist
  | LINE 3: FROM actor
  |              ^
Metrics count were not computed for check count > 0
INFO   | Reading SodaCL file "environments.yml"
INFO   | Reading SodaCL file "actor-checks.yml"
INFO   | Query dvdrental.actor.aggregation[0]:
SELECT
  COUNT(*)
FROM actor
ERROR  | Query execution error in dvdrental.actor.aggregation[0]: relation "actor" does not exist
LINE 3: FROM actor
             ^

SELECT
  COUNT(*)
FROM actor: relation "actor" does not exist
LINE 3: FROM actor
             ^

ERROR  | Metrics count were not computed for check count > 0
INFO   | Scan results:
DEBUG  |   1 queries ERROR
DEBUG  |     dvdrental.actor.aggregation[0] | ERROR | 0:00:00.003656
SELECT
  COUNT(*)
FROM actor
DEBUG  |     relation "actor" does not exist
DEBUG  |     LINE 3: FROM actor
DEBUG  |                  ^
INFO   | 1 checks not evaluated.  See logs above for details.
INFO   | Apart from the checks that have not been evaluated, no other checks failed or warned.

Valid format dates backwards

These valid formats backwards in soda/sodacl/format_cfg.py. date eu should be DD/MM/YY and date us should be MM/DD/YY.

            "date eu": rf"^{s}{month}[-\./]{day}[-\./]{year}{s}$",
            "date us": rf"^{s}{day}[-\./]{month}[-\./]{year}{s}$",

Improved CLI output - Invalid & missing

A couple of unexpected things with this check. I copy+pasted this out of the docs, changing only the column name and adding the extra check for count > 0.

checks for sodatest_customers_b7580920:
 - count > 0
 - invalid_percent(country) < 1%:
   missing values: [N/A, No value, null]

And the output confused me.

  • First, because it advised that Only strings or only numbers allowed in missing values: [‘N/A’, ‘No value’, None]
  • and secondly, because it executed the check anyway and found that it passed!

Of course, it has no correct missing values to check against, but I would expect that if the missing values errors out, the check that uses the missing values would also error out, or at least not pass.

soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Only strings or only numbers allowed in missing values: ['N/A', 'No value', None]
 +-> line=4,col=7 in checks.yaml
Counting invalid without valid specification does not make sense
Scan summary:
2/2 checks PASSED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   count > 0 [PASSED]
   invalid_percent(country) < 1% [PASSED]
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 2 pass.
ERRORS:
Only strings or only numbers allowed in missing values: ['N/A', 'No value', None]
 +-> line=4,col=7 in checks.yaml

This one worked, though, without the square brackets. Though now it occurs to me that the square brackets are perhaps not needed in the check, but were in the docs to communicate "these are your options for the value to put here".

checks for sodatest_customers_b7580920:
 - count > 0
 - invalid_percent(country) < 1%:
   missing values: N/A
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Counting invalid without valid specification does not make sense
Scan summary:
2/2 checks PASSED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   count > 0 [PASSED]
   invalid_percent(country) < 1% [PASSED]
All is good. No failures. No warnings. No errors.

And this did not work as expected. Though I was hoping the error message might tell me what I think I have already deduced, that you can only use a string for missing values, not a list.

checks for sodatest_customers_b7580920:
 - count > 0
 - invalid_percent(country) < 1%:
   missing values: N/A, None
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Counting invalid without valid specification does not make sense
Scan summary:
2/2 checks PASSED: 
  sodatest_customers_b7580920 in local_postgres_sodasql
   count > 0 [PASSED]
   invalid_percent(country) < 1% [PASSED]
All is good. No failures. No warnings. No errors.

Change over time checks fail with `AttributeError: 'NoneType' object has no attribute 'get'`

When running the following checks with a valid Soda Cloud connection I get an error:

checks for CUSTOMERS:

  # Detailed schema expectations control
  - schema:
      fail:
        when required column missing: ['id', 'size', 'distance']
        when forbidden column present: ['email_address', 'pii_%']
        when wrong column type:
          id: varchar
          distance: integer
        when wrong column index:
          id: 0

      # And/or simple notifications on any schema change:
      warn:
        when schema changes: any

The error returned is:

Unknown error while executing scan.
  | 'NoneType' object has no attribute 'get'
  | Stacktrace:
  | Traceback (most recent call last):
  |   File "/home/dirk/work/soda/code/meuk/data-council-workshop/.venv/lib/python3.8/site-packages/soda/scan.py", line 339, in execute
  |     historic_value = historic_data.get(historic_descriptor)
  | AttributeError: 'NoneType' object has no attribute 'get'

Log loading of files

Log when configuration files and SodaCL files are being loaded when -v is present. (using logs.debug)

This might already the case for SodaCL files: to be double checked.
But I suspect it may not be done for configuration files.

Unexpected error with warn and statistical, freshness checks

Trying to use a warn in a check that uses a statistical metric and it SKIPPED the check.

checks for sodatest_customers_b7580920:
  warn:
    when min(size) >= 0
Soda Core 0.0.1
Skipping section "checks for sodatest_customers_b7580920" because content is not a list
  +-> line=1,col=1 in checks.yaml
Scan summary:
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 0 pass.
ERRORS:
Skipping section "checks for sodatest_customers_b7580920" because content is not a list
  +-> line=1,col=1 in checks.yaml

I tried this one as well:

checks for sodatest_customers_b7580920:
  warn:
    - when stddev(size) between 3 and 4
Soda Core 0.0.1
Skipping section "checks for sodatest_customers_b7580920" because content is not a list
  +-> line=1,col=1 in checks.yaml
Scan summary:
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 0 pass.
ERRORS:
Skipping section "checks for sodatest_customers_b7580920" because content is not a list
  +-> line=1,col=1 in checks.yaml

Same error appears when I attempt to use warn with a freshness check:

checks for sodatest_customers_b7580920:
  warn:
    when freshness using last_name < 1h

Improved message output for Reference checks

I get that the output displays a sample of the rows that failed the test, right? If so, could the label read "Failed samples"?

Additionally, why does the output list all columns? Could it be limited to just the columns I specified in the check?? (probably not, but I'm asking anyway!)

If there are tables with, like, 50 columns or something, what does the output look like?

checks for sodatest_orders_f7532be6:
  - reference from (customer_country, customer_zip) to sodatest_customers_b7580920 (country, zip)
soda scan -d local_postgres_sodasql -c ~/.soda/configuration.yaml -ch checks.yaml
Soda Core 0.0.1
Sample 2022-03-03 11:09:05.365801/local_postgres_sodasql/local_postgres_sodasql.reference[customer_country,customer_zip]:
id   customer_id_nok customer_id_ok customer_country customer_zip text   
---- --------------- -------------- ---------------- ------------ -------
'O3' 'ID1'           'ID2'          'BE'             '2000'       'three'
'O4'                 'ID1'          'BE'                          'four' 
'O5' 'ID98'          'ID4'                           '2360'       'five' 
'O6' 'ID99'          'ID1'          'UK'             '2360'       'six'  
                     'ID3'                                        'seven'
Scan summary:
1/1 check FAILED: 
    sodatest_orders_f7532be6 in local_postgres_sodasql
      reference from (customer_country, customer_zip) to sodatest_customers_b7580920 (country, zip) [FAILED]
Oops! 1 failures. 0 warnings. 0 errors. 0 pass.

Exclude tables

for each table T:
  tables:
    - fact_%
    - exclude fact_sales_quota
  checks:
    - count > 0

Expected - all fact_* except fact_sales_quota should be included.
Actual - all fact_* are included, exclude is not taken into account

Check `for each` doesn't support `schema.table_name`

The for each table T: check works for <table_name> but not <schema.table_name> unlike, and as expected, the checks for check which supports <schema.table_name>

Example Checks.yml

# Not working...
for each table T:
  tables:
    - android.application_installed
  checks:
    - row_count > 0

# Working...
checks for android.application_installed:
  - row_count > 0

Result - for each table T:

$ soda scan -d raw -c configuration.yml -V checks/checks.yml
Soda Core 3.0.0b4
Reading configuration file "configuration.yml"
Reading SodaCL file "checks/checks.yml"
Cross data source table sets not yet supported. android.application_installed refers to non default data source  android. Default data source is raw
Scan execution starts
Query raw.for_each_table_T[0]:
SELECT table_name 
FROM information_schema.tables
Query raw.APPLICATION_INSTALLED.aggregation[0]:

Result - checks for:

$ soda scan -d raw -c configuration.yml -V checks/checks.yml
Soda Core 3.0.0b4
Reading configuration file "configuration.yml"
Reading SodaCL file "checks/checks.yml"
Scan execution starts
Query raw.android.application_installed.aggregation[0]:
SELECT 
  COUNT(*) 
FROM android.application_installed

Invalid checks file doesn't result in error

➜ soda scan -ef environments.yml -cf actor-checks.yml
Couldn't add SodaCL file actor-checks.yml
  | 'warehouse'
INFO   | Reading SodaCL file "environments.yml"
INFO   | Reading SodaCL file "actor-checks.yml"
ERROR  | Couldn't add SodaCL file actor-checks.yml: 'warehouse'
INFO   | Scan results:
INFO   | All is good. No checks failed or warned.

Histograms

Port histogram metrics as to be used in profiling.

Row count comparison partitions

See docs/soda_checks_yaml.md. L69 find the TODO

Consider if we should push it to the user to define the right variables and avoid clashes between the variable names when comparing?

So will we map the variables of 1 table to the other as the user makes the reference in the check? Or do we force the user to submit the right variables in the scan as input.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.