The dbt-hive from cloudera

upgrade to support dbt-core v1.5.0

Background

The latest version of dbt Core,dbt-core==1.5.0rc1, was published on April 13, 2023 (PyPI | Github).

How to upgrade

dbt-labs/dbt-core#7213 is an open discussion with more detailed information. If you have questions, please put them there!

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.5.0 release of your adapter.

Add support Python 3.11 (if you haven't already)
Add support for relevant tests (there's a lot of new ones!)
Add support model contracts
~~Add support for materialized views~~ (this likely will be bumped to 1.6.0)

the next minor release: `1.6.0`

FYI, dbt-core==1.6.0 is expected to be released at the end of July, with a release cut at least two weeks prior.

hive parameters not settable

Is it possible to set hive parameters like set hive.tez.container.size=8192 for all connections or per model?

I added the statement in a dbt macro like


{% macro hive_settings() %}

{% set sql %}
    set hive.tez.container.size=8192
{% endset %}

{% do run_query(sql) %}

{% do log("Hive settings applied", info=True) %}
{% endmacro %}

But runnning it dbt run-operation hive_settings ends in

Running with dbt=1.3.2
Hive settings applied
Encountered an error while running operation: Runtime Error
Runtime Error
Unable to establish connection to Hive server: Error while compiling statement: FAILED: ParseException line4:8 cannot recognize input near 'set' 'hive' '.' in statement

Additionally, it would be nice to set them via profiles.yml similiar to the server_side_parameters in the dbt-spark module, see https://docs.getdbt.com/reference/warehouse-setups/spark-setup#odbc.

dbt-hive can only connect to default database

Hi,

I am trying to connect database specified in profiles.yml.
but it seems dbt-hive that connecting default database, not using schema property.

Is there a way to connect specific database using dbt-hive?

Question: multi-threads option support?

Do you have plans to support multi-threading options in profiles.yml?

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe	Date (intended)	Date (Actual)	Event
D - 3 weeks	Sep 21	Oct 10	dbt Labs informs maintainers of upcoming minor release
D - 2 weeks	Sep 28	Sep 28	core 1.3 RC is released
Day D	October 12	Oct 12	core 1.3 official is published
D + 2 weeks	October 26	Nov 7	dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

Python Models (if applicable)
Incremental Materialization: cleanup and standardization
More functional adapter tests to inherit

Unable to install on Windows

Hi attempting to pip install dbt-hive on Windows 10

Unable to install the kerberos dependency

Initial error error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools":

With latest Build Tools then installed a more specific error:

fatal error C1083: Cannot open include file: 'gssapi/gssapi.h': No such file or directory
      error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for kerberos
  Running setup.py clean for kerberos
Failed to build kerberos

Issue when installing from master branch

Hello !

I'm trying to install the package directly from the GitHub repository in order to get the most up-to-date work (mainly this commit) with the following command:

pip install git+https://github.com/cloudera/dbt-hive.git

Unfortunately I'm met with the error

error: can't copy 'dbt/adapters/hive/.env': doesn't exist or not a regular file

It seems to be related to the fact that in setup.py there is data_files=[('', ['dbt/adapters/hive/.env'])],
while this file doesn't exist at the specified location in the master branch.

Would it be possible to add an empty folder at this location to fix this ?

Thank you for your support !

Setting incremental does not take effect

version:
dbt core =1.3.3
dbt hive =1.3.1
python =anaconda3/python3.9

I set config and 'is_ Incremental 'is in SQL but does not take effect. After the second execution, it will be prompted that the same table already exists in hive. I checked the SQL and found that converting to 'create table as' is not an insert syntax

test_1.sql:
{{
config(
materialized='incremental'
)
}}

select * from {{ ref('test') }}

{% if is_incremental() %}

where event_time = (select max(event_time) from {{ this }} )

{% endif %}

error：Hive server: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: testdb.test_1

Support config on_schema_change for incremental models

Hello Team,
Does the dbt-hive support on_schema_change featured released in DBT version 0.21.0
customer would like to use this config.

NEW on_schema_change CONFIG IN DBT VERSION v0.21.0
Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.

Strings may be double quoted when using dbt seed

When using dbt seed, strings may be double quoted. There is no currently known workaround. A fix is being worked on. Internal ticket: https://jira.cloudera.com/projects/DBT/issues/DBT-225

Passing Hive configuration properties

While experimenting with this adapter, I'm trying to find a way to pass configuration properties to Hive.
For example, I would like to be able to run:

set hive.auto.convert.join=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled=true;

SELECT * FROM mytable

Apparently with Impyla we need to pass these properties in a configuration dictionary when execute-ing a query.

Is it possible to attach such a dictionary in a model's config ?

Cannot create external tables with partitions and non-default file format

Hi,

Thanks for developing this dbt adapter for Hive,

I have being suffering some issues while trying to create external partitions tables with some specific file formats , for instance using next configuration

{{
  config(
    materialized="incremental",
    incremental_strategy= "insert_overwrite",
    external=True,
    file_format= "orc",
    partition_by=["some colum"],
  ) 
}}

Looks like this issue is happening because when the file format is added when table is being created (

dbt-hive/dbt/include/hive/macros/adapters.sql

Lines 125 to 137 in f293ba9

    
             {% else %} 
        
               create {% if is_external == true -%}external{%- endif %} table {{ relation }} 
        
             {% endif %} 
        
             {{ file_format_clause() }} 
        
             {{ options_clause() }} 
        
             {{ partition_cols(label="partitioned by") }} 
        
             {{ clustered_cols(label="clustered by") }} 
        
             {{ location_clause() }} 
        
             {{ comment_clause() }} 
        
             {{ properties_clause(_properties) }} 
        
             as 
        
               {{ sql }} 
        
           {%- endif %}

)

This is not consistent with Hive CREATE TABLE definition that expect STORED AS TO BE after partitioning and before location.

So that, I guess it should be something like following

    {% else %}
      create {% if is_external == true -%}external{%- endif %} table {{ relation }}
    {% endif %}
    {{ options_clause() }}
    {{ partition_cols(label="partitioned by") }}
    {{ clustered_cols(label="clustered by") }}
    {{ file_format_clause() }}
    {{ location_clause() }}
    {{ comment_clause() }}
    {{ properties_clause(_properties) }}
    as
      {{ sql }}

Best Regards,
Zuma

upgrade to support dbt-core v1.4.0

Background

The latest version of dbt Core,dbt-core==1.4.0, was published on January 25, 2023 (PyPI | Github). In fact, a patch, dbt-core==1.4.1 (PyPI | Github), was also released on the same day.

How to upgrade

dbt-labs/dbt-core#6624 is an open discussion with more detailed information. If you have questions, please put them there! dbt-labs/dbt-core#6849 is for keeping track of the community's progress on releasing 1.4.0

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.4.0 release of your adapter.

support Python 3.11 (only if your adapter's dependencies allow)
Consolidate timestamp functions & macros
Replace deprecated exception functions
Add support for more tests

the next minor release: 1.5.0

FYI, dbt-core==1.5.0 is expected to be released at the end of April. Please plan on allocating a more effort to upgrade support compared to previous minor versions. Expect to hear more in the middle of April.

At a high-level expect much greater adapter test coverage (a very good thing!), and some likely heaving renaming and restructuring as the API-ification of dbt-core is now well underway. See https://github.com/dbt-labs/dbt-core/milestone/82 for more information.

Incremental update seem to fail

To reproduce:

Use the https://github.com/cloudera/dbt-impala-example
Do the first usual run (dbt run)
Run the dbt run again, you should see the following error:

10:43:15 2 of 2 ERROR creating incremental model dbtdemo_mart_covid.covid_cases ......... [ERROR in 33.75s]
10:43:15
10:43:15 Finished running 1 view model, 1 incremental model in 121.74s.
10:43:15
10:43:15 Completed with 1 error and 0 warnings:
10:43:15
10:43:15 Runtime Error in model covid_cases (models/marts/covid/covid_cases.sql)
10:43:15 Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 4:60 Table not found 'covid_cases__dbt_tmp'
10:43:15
10:43:15 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe	Date (intended)	Date (Actual)	Event
D - 3 weeks	Sep 21	Oct 10	dbt Labs informs maintainers of upcoming minor release
D - 2 weeks	Sep 28	Sep 28	core 1.3 RC is released
Day D	October 12	Oct 12	core 1.3 official is published
D + 2 weeks	October 26	Nov 7	dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

Python Models (if applicable)
Incremental Materialization: cleanup and standardization
More functional adapter tests to inherit

	{% else %}
	create {% if is_external == true -%}external{%- endif %} table {{ relation }}
	{% endif %}
	{{ file_format_clause() }}
	{{ options_clause() }}
	{{ partition_cols(label="partitioned by") }}
	{{ clustered_cols(label="clustered by") }}
	{{ location_clause() }}
	{{ comment_clause() }}
	{{ properties_clause(_properties) }}
	as
	{{ sql }}
	{%- endif %}

cloudera / dbt-hive Goto Github PK

dbt-hive's People

Contributors

Stargazers

Watchers

Forkers

dbt-hive's Issues

Background

How to upgrade

the next minor release: 1.6.0

Background

How to upgrade

Background

How to upgrade

the next minor release: 1.5.0

Background

How to upgrade

Recommend Projects

Recommend Topics

Recommend Org

the next minor release: `1.6.0`