Comments (7)
@abuckenheimer You're totally right.
This is tougher than it should be today because a test doesn't actually have a good link between itself and the model it's defined on. It's going to be even trickier after the introduction of the where
config in v0.20.0 (#3392). If this config is set, instead of templating out {{ model }}
as <your dataset>.foo
, dbt will template it out as (select * from <your dataset>.foo where col = 1) foo
.
At the time, I was thinking that it was important for the subquery to be aliased the same as the model identifier, in case the generic test definition depended on using the model identifier as a column alias. But it wouldn't really be a generic test if it did, now would it?
Here are the options I can think of:
- Not do this. In general, I don't think it's a good practice for a BigQuery column to share the name of its view/table, for exactly this reason—it causes unpleasant surprises when querying.
- Change the
{{ model }}
templating to always alias the selected-from object/subquery asmodel
, whether or not thewhere
config is supplied. In theunique
,not_null
, andaccepted_values
tests, qualify columns with themodel
alias. - Add another subquery + alias, to wrap around the subquery + alias associated with the optional
where
config:
{% macro default__test_unique(model, column_name) %}
select
model.{{ column_name }},
count(*) as n_records
from (select * from {{ model }}) model
where model.{{ column_name }} is not null
group by model.{{ column_name }}
having count(*) > 1
{% endmacro %}
Option 3 is something you can do right now, in your own project, to get this working in the meantime. But I'm leaning toward option 2, which would require adjusting this (gnarly) code:
In order to always alias, and always use the same alias (just the word model
):
def build_model_str(self):
targ = self.target
cfg_where = "config.get('where')"
alias = "model"
if isinstance(self.target, UnparsedNodeUpdate):
identifier = self.target.name
target_str = f"{{{{ ref('{targ.name}') }}}}"
elif isinstance(self.target, UnpatchedSourceDefinition):
target_str = f"{{{{ source('{targ.source.name}', '{targ.table.name}') }}}}"
unfiltered = f"{target_str} {alias}"
filtered = f"(select * from {target_str} where {{{{{cfg_where}}}}}) {alias}"
return f"{{% if {cfg_where} %}}{filtered}{{% else %}}{unfiltered}{{% endif %}}"
From there, it's as simple as adding model.
to qualify the columns referenced in generic test definitions.
I don't love using model
as an alias—it's an overloaded term, and the thing being tested could just as easily be a source, snapshot, or seed—but for better or for worse, generic test definitions have been written with {{ model }}
in their bodies. I struggle to come up with a better alias name that isn't overly clunky (thing_being_tested
).
What do you think? If you're on board with the proposal, we may be able to sneak it in for v0.20.0rc2, since it's a relevant tweak to code that's changing in v0.20.
from dbt-bigquery.
Yes, I think you're right, we should pick an alias that's unlikely to clash. It would be a shame to fix this issue (name + column clash) and create another!
There's another option, which now feels so obvious that I'm embarrassed I didn't think of it earlier. We could rewrite generic tests to use an "import" CTE, similar to our style guide:
with dbt_test__target as (
select * from {{ model }}
)
select
{{ column_name }},
count(*) as n_records
from dbt_test__target
where {{ column_name }} is not null
group by {{ column_name }}
having count(*) > 1
I don't think we'd need to qualify the column name at all, assuming there's no change the column is also named dbt_test__target
.
from dbt-bigquery.
Resolved by #10
from dbt-bigquery.
I think your onto something with the model
alias but I actually think instead of picking a natural name you should pick an unnatural one, model
is just as likely to run into the disambiguation problem I ran into (presumably multiple people have columns with model
as the name), but since its injected by dbt the reasoning behind it may be a bit more obscure. If you use something like __dbt_model
as the name your less likely to have a collision and consequently have everything work naturally without having to rely on the user specifying <alias>.<column>
. Also think __dbt_model
may better hint to the user that this is an implementation detail of dbt and not something they'd normally have to worry about.
from dbt-bigquery.
ah actually I may be conflating your approaches 2 and 3. I guess what your saying for 2 is that users should always qualify model columns with the model.<column>
syntax which may let dbt better understand the test expression in the process. While 3 is that dbt tests macros should always alias to something (which is more relevant for the __dbt_model
comment above)
from dbt-bigquery.
love it
from dbt-bigquery.
@abuckenheimer Any interest in a contribution? :)
from dbt-bigquery.
Related Issues (20)
- the partition expiry is set to 7 days by default for temp table
- [ADAP-1082] [Feature] Migrate base adapter references to match dbt-core
- [ADAP-1090] [Feature] Adding Pyspark job options to the model config
- [Regression] `docs generate` raises error when querying external tables
- [Regression] Resolve broken proto build in CI HOT 2
- [Bug] jobs timing out early regardles of job_execution_timeout_seconds HOT 9
- [Feature] Support Python Stored procedure as a way to submit python models HOT 1
- [Feature] Submit bigframe syntax in python models HOT 1
- [Bug] For BigQuery, a multiline string needs to be escaped with 3 quotes instead of 1 like in `default__string_literal()`
- [Unit Testing] Support primative types + objects and arrays in dbt-bigquery
- [Feature] Change BigQuery `labels` to be additive instead of "clobber" HOT 5
- [Bug] Enforced contracts break recursive CTE rules HOT 3
- [CT-3486] [Feature] Include JSON-formatted query comments for all queries within incremental models HOT 6
- “Syntax error: Unexpected keyword DECLARE” on valid query string HOT 1
- Set_sql_header is incompatible with {{ this }}
- [Bug] Model run throws - Operation did not complete within designated timeout of 900 seconds HOT 2
- [Bug] Error when loading NULL result from JSON column HOT 1
- [CT-1153] [Feature] BigQuery Incremental - `on_schema_change: ignore` cannot ignore missing columns HOT 7
- Python models are running out of CPUs in CI HOT 1
- [Feature] Spike on supporting Py3.12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-bigquery.