cloudera / dbt-hive Goto Github PK
View Code? Open in Web Editor NEWThe dbt-hive adapter allows you to use dbt with Apache Hive and Cloudera Data Platform.
License: Apache License 2.0
The dbt-hive adapter allows you to use dbt with Apache Hive and Cloudera Data Platform.
License: Apache License 2.0
The latest version of dbt Core,dbt-core==1.5.0rc1
, was published on April 13, 2023 (PyPI | Github).
dbt-labs/dbt-core#7213 is an open discussion with more detailed information. If you have questions, please put them there!
The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.5.0 release of your adapter.
1.6.0
FYI, dbt-core==1.6.0
is expected to be released at the end of July, with a release cut at least two weeks prior.
Is it possible to set hive parameters like set hive.tez.container.size=8192
for all connections or per model?
I added the statement in a dbt macro like
{% macro hive_settings() %}
{% set sql %}
set hive.tez.container.size=8192
{% endset %}
{% do run_query(sql) %}
{% do log("Hive settings applied", info=True) %}
{% endmacro %}
But runnning it dbt run-operation hive_settings
ends in
Running with dbt=1.3.2
Hive settings applied
Encountered an error while running operation: Runtime Error
Runtime Error
Unable to establish connection to Hive server: Error while compiling statement: FAILED: ParseException line4:8 cannot recognize input near 'set' 'hive' '.' in statement
Additionally, it would be nice to set them via profiles.yml
similiar to the server_side_parameters
in the dbt-spark module, see https://docs.getdbt.com/reference/warehouse-setups/spark-setup#odbc.
Hi,
I am trying to connect database specified in profiles.yml.
but it seems dbt-hive that connecting default database, not using schema
property.
Is there a way to connect specific database using dbt-hive?
Do you have plans to support multi-threading options in profiles.yml?
The latest release cut for 1.3.0, dbt-core==1.3.0rc2
was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).
We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version
Timeframe | Date (intended) | Date (Actual) | Event |
---|---|---|---|
D - 3 weeks | Sep 21 | Oct 10 | dbt Labs informs maintainers of upcoming minor release |
D - 2 weeks | Sep 28 | Sep 28 | core 1.3 RC is released |
Day D | October 12 | Oct 12 | core 1.3 official is published |
D + 2 weeks | October 26 | Nov 7 | dbt-adapter 1.3 is published |
dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0
Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.
Hi attempting to pip install dbt-hive
on Windows 10
Unable to install the kerberos
dependency
Initial error error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools":
With latest Build Tools then installed a more specific error:
fatal error C1083: Cannot open include file: 'gssapi/gssapi.h': No such file or directory
error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for kerberos
Running setup.py clean for kerberos
Failed to build kerberos
Hello !
I'm trying to install the package directly from the GitHub repository in order to get the most up-to-date work (mainly this commit) with the following command:
pip install git+https://github.com/cloudera/dbt-hive.git
Unfortunately I'm met with the error
error: can't copy 'dbt/adapters/hive/.env': doesn't exist or not a regular file
It seems to be related to the fact that in setup.py
there is data_files=[('', ['dbt/adapters/hive/.env'])],
while this file doesn't exist at the specified location in the master branch.
Would it be possible to add an empty folder at this location to fix this ?
Thank you for your support !
version:
dbt core =1.3.3
dbt hive =1.3.1
python =anaconda3/python3.9
I set config and 'is_ Incremental 'is in SQL but does not take effect. After the second execution, it will be prompted that the same table already exists in hive. I checked the SQL and found that converting to 'create table as' is not an insert syntax
test_1.sql:
{{
config(
materialized='incremental'
)
}}
select * from {{ ref('test') }}
{% if is_incremental() %}
where event_time = (select max(event_time) from {{ this }} )
{% endif %}
error:Hive server: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: testdb.test_1
Hello Team,
Does the dbt-hive support on_schema_change
featured released in DBT version 0.21.0
customer would like to use this config.
NEW on_schema_change CONFIG IN DBT VERSION v0.21.0
Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.
When using dbt seed, strings may be double quoted. There is no currently known workaround. A fix is being worked on. Internal ticket: https://jira.cloudera.com/projects/DBT/issues/DBT-225
While experimenting with this adapter, I'm trying to find a way to pass configuration properties to Hive.
For example, I would like to be able to run:
set hive.auto.convert.join=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled=true;
SELECT * FROM mytable
Apparently with Impyla we need to pass these properties in a configuration
dictionary when execute
-ing a query.
Is it possible to attach such a dictionary in a model's config
?
Hi,
Thanks for developing this dbt adapter for Hive,
I have being suffering some issues while trying to create external partitions tables with some specific file formats , for instance using next configuration
{{
config(
materialized="incremental",
incremental_strategy= "insert_overwrite",
external=True,
file_format= "orc",
partition_by=["some colum"],
)
}}
Looks like this issue is happening because when the file format is added when table is being created (
dbt-hive/dbt/include/hive/macros/adapters.sql
Lines 125 to 137 in f293ba9
This is not consistent with Hive CREATE TABLE definition that expect STORED AS TO BE after partitioning and before location.
So that, I guess it should be something like following
{% else %}
create {% if is_external == true -%}external{%- endif %} table {{ relation }}
{% endif %}
{{ options_clause() }}
{{ partition_cols(label="partitioned by") }}
{{ clustered_cols(label="clustered by") }}
{{ file_format_clause() }}
{{ location_clause() }}
{{ comment_clause() }}
{{ properties_clause(_properties) }}
as
{{ sql }}
Best Regards,
Zuma
The latest version of dbt Core,dbt-core==1.4.0
, was published on January 25, 2023 (PyPI | Github). In fact, a patch, dbt-core==1.4.1
(PyPI | Github), was also released on the same day.
dbt-labs/dbt-core#6624 is an open discussion with more detailed information. If you have questions, please put them there! dbt-labs/dbt-core#6849 is for keeping track of the community's progress on releasing 1.4.0
The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.4.0 release of your adapter.
FYI, dbt-core==1.5.0
is expected to be released at the end of April. Please plan on allocating a more effort to upgrade support compared to previous minor versions. Expect to hear more in the middle of April.
At a high-level expect much greater adapter test coverage (a very good thing!), and some likely heaving renaming and restructuring as the API-ification of dbt-core is now well underway. See https://github.com/dbt-labs/dbt-core/milestone/82 for more information.
To reproduce:
10:43:15 2 of 2 ERROR creating incremental model dbtdemo_mart_covid.covid_cases ......... [ERROR in 33.75s]
10:43:15
10:43:15 Finished running 1 view model, 1 incremental model in 121.74s.
10:43:15
10:43:15 Completed with 1 error and 0 warnings:
10:43:15
10:43:15 Runtime Error in model covid_cases (models/marts/covid/covid_cases.sql)
10:43:15 Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 4:60 Table not found 'covid_cases__dbt_tmp'
10:43:15
10:43:15 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2
The latest release cut for 1.3.0, dbt-core==1.3.0rc2
was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).
We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version
Timeframe | Date (intended) | Date (Actual) | Event |
---|---|---|---|
D - 3 weeks | Sep 21 | Oct 10 | dbt Labs informs maintainers of upcoming minor release |
D - 2 weeks | Sep 28 | Sep 28 | core 1.3 RC is released |
Day D | October 12 | Oct 12 | core 1.3 official is published |
D + 2 weeks | October 26 | Nov 7 | dbt-adapter 1.3 is published |
dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0
Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.