Coder Social home page Coder Social logo

dbt-hive's People

Contributors

bachng2017 avatar himanshuajmera avatar hpasumarthi avatar raghotham avatar sanjeevgitprofile avatar sujitkp-blr avatar tovganesh avatar zczhuohuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbt-hive's Issues

upgrade to support dbt-core v1.5.0

Background

The latest version of dbt Core,dbt-core==1.5.0rc1, was published on April 13, 2023 (PyPI | Github).

How to upgrade

dbt-labs/dbt-core#7213 is an open discussion with more detailed information. If you have questions, please put them there!

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.5.0 release of your adapter.

  • Add support Python 3.11 (if you haven't already)
  • Add support for relevant tests (there's a lot of new ones!)
  • Add support model contracts
  • Add support for materialized views (this likely will be bumped to 1.6.0)

the next minor release: 1.6.0

FYI, dbt-core==1.6.0 is expected to be released at the end of July, with a release cut at least two weeks prior.

hive parameters not settable

Is it possible to set hive parameters like set hive.tez.container.size=8192 for all connections or per model?

I added the statement in a dbt macro like


{% macro hive_settings() %}

{% set sql %}
    set hive.tez.container.size=8192
{% endset %}

{% do run_query(sql) %}

{% do log("Hive settings applied", info=True) %}
{% endmacro %}

But runnning it dbt run-operation hive_settings ends in

Running with dbt=1.3.2
Hive settings applied
Encountered an error while running operation: Runtime Error
Runtime Error
Unable to establish connection to Hive server: Error while compiling statement: FAILED: ParseException line4:8 cannot recognize input near 'set' 'hive' '.' in statement

Additionally, it would be nice to set them via profiles.yml similiar to the server_side_parameters in the dbt-spark module, see https://docs.getdbt.com/reference/warehouse-setups/spark-setup#odbc.

dbt-hive can only connect to default database

Hi,

I am trying to connect database specified in profiles.yml.
but it seems dbt-hive that connecting default database, not using schema property.

Is there a way to connect specific database using dbt-hive?

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe Date (intended) Date (Actual) Event
D - 3 weeks Sep 21 Oct 10 dbt Labs informs maintainers of upcoming minor release
D - 2 weeks Sep 28 Sep 28 core 1.3 RC is released
Day D October 12 Oct 12 core 1.3 official is published
D + 2 weeks October 26 Nov 7 dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • Python Models (if applicable)
  • Incremental Materialization: cleanup and standardization
  • More functional adapter tests to inherit

Unable to install on Windows

Hi attempting to pip install dbt-hive on Windows 10

Unable to install the kerberos dependency

Initial error error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools":

With latest Build Tools then installed a more specific error:

fatal error C1083: Cannot open include file: 'gssapi/gssapi.h': No such file or directory
      error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for kerberos
  Running setup.py clean for kerberos
Failed to build kerberos

Issue when installing from master branch

Hello !

I'm trying to install the package directly from the GitHub repository in order to get the most up-to-date work (mainly this commit) with the following command:

pip install git+https://github.com/cloudera/dbt-hive.git

Unfortunately I'm met with the error

error: can't copy 'dbt/adapters/hive/.env': doesn't exist or not a regular file

It seems to be related to the fact that in setup.py there is data_files=[('', ['dbt/adapters/hive/.env'])],
while this file doesn't exist at the specified location in the master branch.

Would it be possible to add an empty folder at this location to fix this ?

Thank you for your support !

Setting incremental does not take effect

version:
dbt core =1.3.3
dbt hive =1.3.1
python =anaconda3/python3.9

I set config and 'is_ Incremental 'is in SQL but does not take effect. After the second execution, it will be prompted that the same table already exists in hive. I checked the SQL and found that converting to 'create table as' is not an insert syntax

test_1.sql:
{{
config(
materialized='incremental'
)
}}

select * from {{ ref('test') }}

{% if is_incremental() %}

where event_time = (select max(event_time) from {{ this }} )

{% endif %}

error:Hive server: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: Table already exists: testdb.test_1

Support config on_schema_change for incremental models

Hello Team,
Does the dbt-hive support on_schema_change featured released in DBT version 0.21.0
customer would like to use this config.

NEW on_schema_change CONFIG IN DBT VERSION v0.21.0
Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.

Passing Hive configuration properties

While experimenting with this adapter, I'm trying to find a way to pass configuration properties to Hive.
For example, I would like to be able to run:

set hive.auto.convert.join=true;
set hive.stats.fetch.partition.stats=true;
set hive.vectorized.execution.enabled=true;

SELECT * FROM mytable

Apparently with Impyla we need to pass these properties in a configuration dictionary when execute-ing a query.

Is it possible to attach such a dictionary in a model's config ?

Cannot create external tables with partitions and non-default file format

Hi,

Thanks for developing this dbt adapter for Hive,

I have being suffering some issues while trying to create external partitions tables with some specific file formats , for instance using next configuration

{{
  config(
    materialized="incremental",
    incremental_strategy= "insert_overwrite",
    external=True,
    file_format= "orc",
    partition_by=["some colum"],
  ) 
}}

Looks like this issue is happening because when the file format is added when table is being created (

{% else %}
create {% if is_external == true -%}external{%- endif %} table {{ relation }}
{% endif %}
{{ file_format_clause() }}
{{ options_clause() }}
{{ partition_cols(label="partitioned by") }}
{{ clustered_cols(label="clustered by") }}
{{ location_clause() }}
{{ comment_clause() }}
{{ properties_clause(_properties) }}
as
{{ sql }}
{%- endif %}
)

This is not consistent with Hive CREATE TABLE definition that expect STORED AS TO BE after partitioning and before location.

So that, I guess it should be something like following

    {% else %}
      create {% if is_external == true -%}external{%- endif %} table {{ relation }}
    {% endif %}
    {{ options_clause() }}
    {{ partition_cols(label="partitioned by") }}
    {{ clustered_cols(label="clustered by") }}
    {{ file_format_clause() }}
    {{ location_clause() }}
    {{ comment_clause() }}
    {{ properties_clause(_properties) }}
    as
      {{ sql }}

Best Regards,
Zuma

upgrade to support dbt-core v1.4.0

Background

The latest version of dbt Core,dbt-core==1.4.0, was published on January 25, 2023 (PyPI | Github). In fact, a patch, dbt-core==1.4.1 (PyPI | Github), was also released on the same day.

How to upgrade

dbt-labs/dbt-core#6624 is an open discussion with more detailed information. If you have questions, please put them there! dbt-labs/dbt-core#6849 is for keeping track of the community's progress on releasing 1.4.0

The above linked guide has more information, but below is a high-level checklist of work that would enable a successful 1.4.0 release of your adapter.

  • support Python 3.11 (only if your adapter's dependencies allow)
  • Consolidate timestamp functions & macros
  • Replace deprecated exception functions
  • Add support for more tests

the next minor release: 1.5.0

FYI, dbt-core==1.5.0 is expected to be released at the end of April. Please plan on allocating a more effort to upgrade support compared to previous minor versions. Expect to hear more in the middle of April.

At a high-level expect much greater adapter test coverage (a very good thing!), and some likely heaving renaming and restructuring as the API-ification of dbt-core is now well underway. See https://github.com/dbt-labs/dbt-core/milestone/82 for more information.

Incremental update seem to fail

To reproduce:

  1. Use the https://github.com/cloudera/dbt-impala-example
  2. Do the first usual run (dbt run)
  3. Run the dbt run again, you should see the following error:

10:43:15 2 of 2 ERROR creating incremental model dbtdemo_mart_covid.covid_cases ......... [ERROR in 33.75s]
10:43:15
10:43:15 Finished running 1 view model, 1 incremental model in 121.74s.
10:43:15
10:43:15 Completed with 1 error and 0 warnings:
10:43:15
10:43:15 Runtime Error in model covid_cases (models/marts/covid/covid_cases.sql)
10:43:15 Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 4:60 Table not found 'covid_cases__dbt_tmp'
10:43:15
10:43:15 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

upgrade to support dbt-core v1.3.0

Background

The latest release cut for 1.3.0, dbt-core==1.3.0rc2 was published on October 3, 2022 (PyPI | Github). We are targeting releasing the official cut of 1.3.0 in time for the week of October 16 (in time for Coalesce conference).

We're trying to establish a following precedent w.r.t. minor versions:
Partner adapter maintainers release their adapter's minor version within four weeks of the initial RC being released. Given the delay on our side in notifying you, we'd like to set a target date of November 7 (four weeks from today) for maintainers to release their minor version

Timeframe Date (intended) Date (Actual) Event
D - 3 weeks Sep 21 Oct 10 dbt Labs informs maintainers of upcoming minor release
D - 2 weeks Sep 28 Sep 28 core 1.3 RC is released
Day D October 12 Oct 12 core 1.3 official is published
D + 2 weeks October 26 Nov 7 dbt-adapter 1.3 is published

How to upgrade

dbt-labs/dbt-core#6011 is an open discussion with more detailed information, and dbt-labs/dbt-core#6040 is for keeping track of the community's progress on releasing 1.2.0

Below is a checklist of work that would enable a successful 1.2.0 release of your adapter.

  • Python Models (if applicable)
  • Incremental Materialization: cleanup and standardization
  • More functional adapter tests to inherit

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.