quintoandar / hive-metastore-client Goto Github PK

A client for connecting and running DDLs on hive metastore.

License: Apache License 2.0

Makefile 4.28% Python 42.12% Thrift 53.60%

hive hive-metastore hive-metastore-client etl python data-engineering package metastore ddls

hive-metastore-client's Issues

Add TABLEPROPERTIES to CREATE TABLE

Thanks for this amazing client!
Is there a way to automatically scan and sync partitions from file system and metastore?
SOmething like the MSCK repair tool in hive?

ColumnBuilder has no attribute 'write'

I could use a hint as to what might throw this error out of the Thrift client.
I'm using thrift version 0.13.0 with this.

I'm mostly following the examples for building a table.

Traceback (most recent call last):
  File "./load_metastore.py", line 126, in <module>
    mc.create_table(my_table)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2632, in create_table
    self.send_create_table(tbl)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2639, in send_create_table
    args.write(self._oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 20777, in write
    self.tbl.write(oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 5253, in write
    self.sd.write(oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 4897, in write
    iter170.write(oprot)
AttributeError: 'ColumnBuilder' object has no attribute 'write'

alter and drop tables

Feature related:

Sorry to bother you. I have a need to alter and drop tables. There aren't any examples in your code base for how to do that, and it really isn't obvious. For now, when I need to alter a table, I'm dropping out to the presto command line and dropping it, then jumping back into this client to build the updated version. It is rather inelegant.

If this is already supported and you could point me to a better approach, I'd love to learn more.

Thank you!

thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')

name: call create_table raise java.lang.NullPointerException
about: Problems and issues with code or docs
title: ''
labels: bug
assignees: ''

Describe the bug

i use the python lib, and call create_table func then raise java.lang.NullPointerException:

Traceback (most recent call last):
File "gittb.py", line 53, in
hive_metastore_client.create_table(table)
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2633, in create_table
self.recv_create_table()
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2659, in recv_create_table
raise result.o3
thrift_files.libraries.thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')

To Reproduce

Steps to reproduce the behavior:

pip install hive-metastore-client
code file createtable.py:
from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import (
ColumnBuilder,
SerDeInfoBuilder,
StorageDescriptorBuilder,
TableBuilder,
)

HIVE_HOST = "xxx"
HIVE_PORT = 9083

columns = [
ColumnBuilder("id", "string", "col comment").build(),
ColumnBuilder("client_name", "string").build(),
ColumnBuilder("amount", "string").build(),
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]

partition_keys = [
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]

serde_info = SerDeInfoBuilder(
serialization_lib="org.apache.hadoop.hive.ql.io.orc.OrcSerde"
).build()

storage_descriptor = StorageDescriptorBuilder(
columns=columns,
location="s3a://mys3bucket/xx",
input_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
output_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
serde_info=serde_info,
).build()

table = TableBuilder(
table_name="test_tmp_table",
db_name="default",
owner="owner name",
storage_descriptor=storage_descriptor,
partition_keys=partition_keys,
).build()

with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_metastore_client:
hive_metastore_client.create_table(table)
3. python createtable.py
4. See error

Screenshots

If applicable, add screenshots to help explain your problem.

Expected behavior

A clear and concise description of what you expected to happen.

Environment

Python Python 3.8.5:
Lib version:
Hive Metastore 1.2.2:
Other (e.g. OS):

Additional info

Add any other context about the problem here.

Method add_partitions doesn't respect storage descriptor of Partition

I want to add a hive partition to the standalone metastore using Python's HiveMetastoreClient with a custom path. So, in other words, I want to reproduce hive command

alter table table_name add partition(dt='2022051705') location '2022/05/17/05';
I use the following code but it creates partition with default path 'bucket_name/table_name/dt=2022051704' (it creates new folder) instead of 'bucket_name/table_name/2022/05/17/04' where files are stored

from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import (
    StorageDescriptorBuilder,
    SerDeInfoBuilder,
    PartitionBuilder
)

HIVE_HOST = "xx.xx.xx.xx"
HIVE_PORT = 9083
DATABASE_NAME = 'default'
TABLE_NAME = 'table_name'

columns = [columns_list]

serde_info = SerDeInfoBuilder(
    serialization_lib="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
).build()

partition_storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
    location="/2022/05/17/04",
    input_format="org.apache.hadoop.mapred.TextInputFormat",
    output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
    serde_info=serde_info,
).build()

partition_list = [
    PartitionBuilder(
        values=["2022051704"], db_name=DATABASE_NAME, table_name=TABLE_NAME,
        sd=partition_storage_descriptor
    ).build()
]


with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_client:
    hive_client.add_partitions_if_not_exists(DATABASE_NAME, TABLE_NAME, partition_list)

Additional question. Why is it required to specify columns list in StorageDescriptorBuilder although columns had been determined when the table was created?

Setting table_type = 'EXTERNAL_TABLE' builds a 'MANAGED_TABLE'

Sorry to open one more...

When I create an external table it ends up being created as a managed table.

            my_table = TableBuilder(
                            table_name='my_table',
                            db_name=table['DatabaseName'],
                            storage_descriptor=storage_descriptor,
                            partition_keys=partition_keys,
                            parameters=parameters,
                            table_type='EXTERNAL_TABLE',
                            owner='root'
                            ).build()

However when I look in the metastore postgresql database for that table:

# select "TBL_TYPE" from "TBLS"  where "TBL_NAME" = 'my_table';
   TBL_TYPE
---------------
 MANAGED_TABLE
(1 row)

fwiw, create table WITH (external_location = xxx) works fine from the presto client and creates the EXTERNAL_TABLE type in the database.

I'm still looking for a root cause or work-around, but thought I'd log what I've run into while I'm looking. Your examples and tests don't include creating an external table.

Kerberos support

Does this client works ith Kerberos authentication i activated in Hive Metastore Service?

Do not make `storage_descriptor` require all arguments for `TableBuilder`

storage_descriptor should not require all arguments for table builder because of virtual view. e.g.

table = TableBuilder(
    table_name="test_view",
    db_name="default",
    owner="test",
    table_type="VIRTUAL_VIEW",
    storage_descriptor=storage_descriptor,
    view_expanded_text="select * from test",
    view_original_text="select * from test"
).build()

When the user wants to create a virtual view, he/she should be able to just pass columns to storage descriptor instead of everything.

Instead of

storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
    location="s3a://path/to/file",
    input_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
    output_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
    serde_info=serde_info,
).build()

user should pass just

storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
).build()

Program hangs after first method call

Describe the bug

When calling methods in a loop, only the first call succeeds. The next subsequent call seems to hang indefinitely. Event if you use the same table in succession. This happens with any method call, not just get_partition_keys_objects

To Reproduce

Steps to reproduce the behavior:

from hive_metastore_client import HiveMetastoreClient

tables = [
  "my_table",
  "my_table",
]
with HiveMetastoreClient(
    "my_url"
) as hive_client:
  for table in tables:
    print(table)
    print(hive_client.get_partition_keys_objects("default", table))

Expected behavior

Each call should succeed in a timely manner

Environment

Python version: 3.7
Lib version: 1.0.9
Hive Metastore version: 2.3.8
Other (e.g. OS): Ubuntu 18.04

Hive Metastore Client Cataloging for Delta

Hi guys, we here at CVCCorp have a limitation for Hive cataloging regarding Delta data.

This would be an example of what the cataloging model for data in Delta should look like.

CREATE EXTERNAL TABLE table_teste(
tabela STRING,
data_update STRING,
count BIGINT)
STORED BY 'io.delta.hive.DeltaStorageHandler'
LOCATION 's3://bucket-name/example/table_teste/';

Our motivations in using data in Delta are because we use Databricks and our Benchmark, Delta has better performance.
We also centralized all metadata in a Hive Cluster for integration with Databricks.

Any questions I will be in contact with Lucas on LinkedIn.

Confusing extra 's' in library name

The first thing the examples and documentation tell you to do is:

from hive_metastore_client import HiveMetastoreClient

That does not actually work. This does:

from hive_metastore_client.hive_mestastore_client import HiveMetastoreClient

Note that there is an extra s in the second invocation: hive_mestastore_client.

That was confusing for a few minutes.

quintoandar / hive-metastore-client Goto Github PK

hive-metastore-client's Issues

Add TABLEPROPERTIES to CREATE TABLE

ColumnBuilder has no attribute 'write'

alter and drop tables

thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')

Method add_partitions doesn't respect storage descriptor of Partition

Setting table_type = 'EXTERNAL_TABLE' builds a 'MANAGED_TABLE'

Kerberos support

Do not make `storage_descriptor` require all arguments for `TableBuilder`

Program hangs after first method call

Hive Metastore Client Cataloging for Delta

Confusing extra 's' in library name

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent