Is this a new bug in dbt-core? <li class="task-l

So what we want is: <code class="notranslate

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[CT-1153] [Feature] BigQuery Incremental - `on_schema_change: ignore` cannot ignore missing columns about dbt-bigquery HOT 7 OPEN

awong-evoiq commented on June 9, 2024 1

[CT-1153] [Feature] BigQuery Incremental - `on_schema_change: ignore` cannot ignore missing columns

from dbt-bigquery.

Comments (7)

Fleid commented on June 9, 2024 3

So what we want is:

`on_schema_change`	In case of new columns...	In case of removed columns...
`classic`	columns ignored	run failed
`ignore`	columns ignored	columns ignored
`fail`	run failed	run failed
`append_new_columns`	columns added	columns ignored
`sync_all_columns`	columns added	columns removed

A new classic mode, wired as the current ignore mode
Fixing ignore to actually ignore removed columns

I'm good for it!

from dbt-bigquery.

sudo-pradip commented on June 9, 2024 1

Hello @Fleid,

We're really excited about this feature! 😄

Could you please let us know if development is currently underway for it? Has the DBT community reached a decision on this issue and approved updates to the behavior? And if the work hasn't started yet, would it be possible for me to get involved and contribute to its development? 🚀

from dbt-bigquery.

mbarugelCA commented on June 9, 2024

I am seeing this same behavior in Redshift.

For further reference, if you set on_schema_change='append_new_columns', the materialization works just fine even if there are columns in the target table that do not exist in the source.

from dbt-bigquery.

Fleid commented on June 9, 2024

I can reproduce this. After spending some time looking at the code, I feel like the actual descriptions of the options are:

`on_schema_change`	In case of new columns...	In case of removed columns...
`ignore`	columns ignored	run failed
`fail`	run failed	run failed
`append_new_columns`	columns added	columns ignored
`sync_all_columns`	columns added	columns removed

@jtcohen6 it may just be that ignore is mislabeled and should really be original? Looks like everybody took for granted what the original behavior was, but now that the doc's not there anymore, it's hard to say looking at it from the future ;) Do you remember the time?

Anyway, in incremental.sql, we try process_schema_changes to get the target schema, and if we get nothing we pull the existing one (with the removed rows).
https://github.com/dbt-labs/dbt-core/blob/e8da84fb9e177d9eee3d7722d3d6906bb283183d/core/dbt/include/global_project/macros/materializations/models/incremental/incremental.sql#L45-L49

In on_schema_change.sql, we abandon ship if the option is ignore:

https://github.com/dbt-labs/dbt-core/blob/7efb6ab62dae3ea9da4e6e28e9a30aa4e0e74919/core/dbt/include/global_project/macros/materializations/models/incremental/on_schema_change.sql#L102-L106

So yes, ignore will result in a DML statement being built with dest_columns listing the removed rows.

Speaking of DML statement, looking at incremental merge (through 3 layers of dispatch):
https://github.com/dbt-labs/dbt-core/blob/f841a7ca76b44429eb94174bc5d0c2fecbf2763b/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L40-L44

We're looping on update_columns which comes from:

https://github.com/dbt-labs/dbt-core/blob/f841a7ca76b44429eb94174bc5d0c2fecbf2763b/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L11-L12

So really just dest_columns. Note, I don't think we have documented merge_exclude_columns, only merge_update_columns.

And back to the merge, for inserted columns:

https://github.com/dbt-labs/dbt-core/blob/f841a7ca76b44429eb94174bc5d0c2fecbf2763b/core/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L47-L50

Which can only fail, since it tries to pull columns from the source that have been removed.

I'd love a sanity check here, but my thinking is that this bug is actually a documentation clarification (ignore really is original or classic) plus an enhancement (a new mode that actually does ignore/ignore on that table at the top)

from dbt-bigquery.

dbeatty10 commented on June 9, 2024

@Fleid nice sleuth work

Although I didn't actually sanity check this for you, I did create the following issue for the documentation clarification:

dbt-labs/docs.getdbt.com#2902

We can always yank that issue if it turns out to not be needed.

How are you thinking about accomplishing the full switch from ignore → original or classic? One PR or multiple? How would we communicate the change?

from dbt-bigquery.

awong-evoiq commented on June 9, 2024

Can we make this into a feature request then? It doesn't make a lot of sense, semantically, that append_new_columns will ignore extra columns in the db, but ignore will fail. I think the use case mentioned in the ticket is kind of a reasonable thing to support (i.e. dbt being able to allow another system to manage db schemas).

from dbt-bigquery.

dbeatty10 commented on June 9, 2024

@awong-evoiq I just switched this issue from a bug to a feature request. Could you take a look at the following subject line and see if you want to adjust it accordingly?

BigQuery Incremental - on_schema_change: ignore cannot ignore missing columns

@Fleid would you be willing to re-assess this through the lens of a feature request?

from dbt-bigquery.

[CT-1153] [Feature] BigQuery Incremental - `on_schema_change: ignore` cannot ignore missing columns about dbt-bigquery HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent