Comments (5)
SAS-3274
from soda-core.
The problem with the current identity generation is that dataset level checks often will need an identity_suffix. We need to improve the default behavior so that in less circumstances the user needs to fiddle with the identity.
from soda-core.
Goal: The goal is create a better correlation mechanism between checks in the files and checks in Soda Cloud. We want to impose the least amount of burden on the user and avoid the need for users to configure correlation ids themselves. But we also want to ensure that users can edit contract files freely and move checks around while still preserving the identity. Not preserving the identity causes a new check on Soda Cloud being created, loosing the metric history and historic check results.
Proposed solution 1:
The check identity is composed of
- scope
warehouse_name
schema_name
dataset_name
column_name
- check type
- check type specific correlation properties
expression_sql
hash formetric_expression
checksquery_sql
hash formetric_query
checks
- user defined
correlation_id
Users then only have to specify a correlation_id
if the scope and check type specific correlation properties do not distinctly identify a check in the source YAML file.
Proposed solution 2:
Check identity is a composition of
warehouse_name
schema_name
dataset_name
column_name
- check type
- check name
We can use the check name to provide uniqueness. But in that case, users have to know that changing the name potentially will change the check id and hence break history in Soda Cloud.
from soda-core.
In the hellofresh use case, the name is handled as a unique ID, and the user must fill it.
I think the solution 2 can be the best one. We can write our docs explaining that the name is also linked to the soda cloud, and by changing the name, the history will be lost. But I think the users expect that because it is a unique ID for us.
from soda-core.
More analysis notes:
Solution 2 would also work best on the Soda Cloud backend.
Based on the correlation properties, contracts lib should create an identity
property in the genrated SodaCL using a hash (and not the full property serialized text)
We can offer a renaming check workflow through an name_deprecated
property. That should translate to an identity_deprecated
in SodaCL. Soda Cloud backend work should be planned to support the identity migration. The changes needed in Soda Cloud backend should not be too big as the data model is matching this strategy.
from soda-core.
Related Issues (20)
- unable to use camel case postgres columns with soda contracts HOT 1
- Contract check level filters HOT 1
- On Oracle datasource discover table columns metadata and profile columns get table and column metadata fail HOT 1
- Contract quoting of schemas, datasets and columns HOT 2
- Contract API docs update HOT 3
- Add contract support for failed rows query HOT 1
- Add contract spark session API HOT 2
- Duplicate count check: on Oracle datasource wrong query to select failed rows HOT 2
- Issue connecting to db2 from soda-core HOT 3
- Yaml emitter error while executing scans concurrently HOT 4
- not able to install in databricks enviornment HOT 2
- Duckdb: schema metric not computed for db in file HOT 1
- Invalid configuration header: expected "data_source {data source name}" HOT 2
- Soda Core Trino 3.3.3 and 3.1.1 : Metrics 'schema' were not computed for check 'schema' HOT 1
- Spark partitioned tables HOT 2
- Issue to install soda-core-pandas-dask via Poetry in Windows HOT 2
- Migrate soda-core-athena to use newer PyAthena >= 3.0.10 HOT 1
- Enable more authentication options for Databricks data source HOT 2
- Check on missing_percent assumes 0.999984 == 1 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from soda-core.