Comments (7)
So what we want is:
on_schema_change |
In case of new columns... | In case of removed columns... |
---|---|---|
classic |
columns ignored | run failed |
ignore |
columns ignored | columns ignored |
fail |
run failed | run failed |
append_new_columns |
columns added | columns ignored |
sync_all_columns |
columns added | columns removed |
- A new
classic
mode, wired as the currentignore
mode - Fixing
ignore
to actually ignore removed columns
I'm good for it!
from dbt-bigquery.
Hello @Fleid,
We're really excited about this feature! 😄
Could you please let us know if development is currently underway for it? Has the DBT community reached a decision on this issue and approved updates to the behavior? And if the work hasn't started yet, would it be possible for me to get involved and contribute to its development? 🚀
from dbt-bigquery.
I am seeing this same behavior in Redshift.
For further reference, if you set on_schema_change='append_new_columns'
, the materialization works just fine even if there are columns in the target table that do not exist in the source.
from dbt-bigquery.
I can reproduce this. After spending some time looking at the code, I feel like the actual descriptions of the options are:
on_schema_change |
In case of new columns... | In case of removed columns... |
---|---|---|
ignore |
columns ignored | run failed |
fail |
run failed | run failed |
append_new_columns |
columns added | columns ignored |
sync_all_columns |
columns added | columns removed |
@jtcohen6 it may just be that ignore
is mislabeled and should really be original
? Looks like everybody took for granted what the original behavior was, but now that the doc's not there anymore, it's hard to say looking at it from the future ;) Do you remember the time?
Anyway, in incremental.sql
, we try process_schema_changes
to get the target schema, and if we get nothing we pull the existing one (with the removed rows).
https://github.com/dbt-labs/dbt-core/blob/e8da84fb9e177d9eee3d7722d3d6906bb283183d/core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql#L45-L49
In on_schema_change.sql
, we abandon ship if the option is ignore
:
So yes, ignore
will result in a DML statement being built with dest_columns
listing the removed rows.
Speaking of DML statement, looking at incremental merge (through 3 layers of dispatch):
https://github.com/dbt-labs/dbt-core/blob/f841a7ca76b44429eb94174bc5d0c2fecbf2763b/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L40-L44
We're looping on update_columns
which comes from:
So really just dest_columns
. Note, I don't think we have documented merge_exclude_columns
, only merge_update_columns
.
And back to the merge, for inserted columns:
Which can only fail, since it tries to pull columns from the source that have been removed.
I'd love a sanity check here, but my thinking is that this bug is actually a documentation clarification (ignore
really is original
or classic
) plus an enhancement (a new mode that actually does ignore
/ignore
on that table at the top)
from dbt-bigquery.
@Fleid nice sleuth work
Although I didn't actually sanity check this for you, I did create the following issue for the documentation clarification:
We can always yank that issue if it turns out to not be needed.
How are you thinking about accomplishing the full switch from ignore
→ original
or classic
? One PR or multiple? How would we communicate the change?
from dbt-bigquery.
Can we make this into a feature request then? It doesn't make a lot of sense, semantically, that append_new_columns
will ignore extra columns in the db, but ignore
will fail. I think the use case mentioned in the ticket is kind of a reasonable thing to support (i.e. dbt being able to allow another system to manage db schemas).
from dbt-bigquery.
@awong-evoiq I just switched this issue from a bug to a feature request. Could you take a look at the following subject line and see if you want to adjust it accordingly?
BigQuery Incremental -
on_schema_change
: ignore cannot ignore missing columns
@Fleid would you be willing to re-assess this through the lens of a feature request?
from dbt-bigquery.
Related Issues (20)
- BigQuery tags do not work HOT 1
- [Bug] State modified does not pick up changes to policy tags
- [Bug] dbt grant doesn't work for clone
- [Feature] Improvement in data processed/cost incurred in insert_overwrite method HOT 4
- [Release Improvements] Refresh workflows for the `pyproject.toml` migration
- [Bug] persist_docs not working for seeds HOT 2
- Import relevant pytest(s) for cross-database `cast` macro
- [Bug] `docs generate` appears to be returning no table metadata when run with the `--no-compile` option HOT 1
- [Tech Debt] `test_dbt_debug` is not dropping its test schema
- [Bug] `docs generate` does not find all schemas when there are more than 10K schemas
- Cross-database `date` macro
- [Bug] incremental run with __dbt_tmp table does not log the real bytes_billed in run_results.json HOT 1
- [Bug] Requested entity not found when writing a Python model to BigQuery
- [Feature] support copy multiple tables in parallel using copy_partitions
- [Feature] Implement batch metadata freshness using `INFORMATION_SCHEMA.TABLE_STORAGE` instead of `client.get_table` HOT 1
- BigQuery authorized dataset HOT 1
- [Bug] --empty flag not working on Pseudo-columns HOT 2
- [Feature] Support overriding `grant_access_to`
- [Bug] many timeouts with `priority: interactive` config HOT 7
- [Bug] Incremental models don't update new elements in struct column
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-bigquery.