Coder Social home page Coder Social logo

quintoandar / hive-metastore-client Goto Github PK

View Code? Open in Web Editor NEW
51.0 181.0 15.0 334 KB

A client for connecting and running DDLs on hive metastore.

License: Apache License 2.0

Makefile 4.28% Python 42.12% Thrift 53.60%
hive hive-metastore hive-metastore-client etl python data-engineering package metastore ddls

hive-metastore-client's Issues

Add TABLEPROPERTIES to CREATE TABLE

Thanks for this amazing client!
Is there a way to automatically scan and sync partitions from file system and metastore?
SOmething like the MSCK repair tool in hive?

ColumnBuilder has no attribute 'write'

I could use a hint as to what might throw this error out of the Thrift client.
I'm using thrift version 0.13.0 with this.

I'm mostly following the examples for building a table.

Traceback (most recent call last):
  File "./load_metastore.py", line 126, in <module>
    mc.create_table(my_table)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2632, in create_table
    self.send_create_table(tbl)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2639, in send_create_table
    args.write(self._oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 20777, in write
    self.tbl.write(oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 5253, in write
    self.sd.write(oprot)
  File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 4897, in write
    iter170.write(oprot)
AttributeError: 'ColumnBuilder' object has no attribute 'write'

alter and drop tables

Feature related:

Sorry to bother you. I have a need to alter and drop tables. There aren't any examples in your code base for how to do that, and it really isn't obvious. For now, when I need to alter a table, I'm dropping out to the presto command line and dropping it, then jumping back into this client to build the updated version. It is rather inelegant.

If this is already supported and you could point me to a better approach, I'd love to learn more.

Thank you!

thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')


name: call create_table raise java.lang.NullPointerException
about: Problems and issues with code or docs
title: ''
labels: bug
assignees: ''


Describe the bug

i use the python lib, and call create_table func then raise java.lang.NullPointerException:

Traceback (most recent call last):
File "gittb.py", line 53, in
hive_metastore_client.create_table(table)
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2633, in create_table
self.recv_create_table()
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2659, in recv_create_table
raise result.o3
thrift_files.libraries.thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')

To Reproduce

Steps to reproduce the behavior:

  1. pip install hive-metastore-client
  2. code file createtable.py:
    from hive_metastore_client import HiveMetastoreClient
    from hive_metastore_client.builders import (
    ColumnBuilder,
    SerDeInfoBuilder,
    StorageDescriptorBuilder,
    TableBuilder,
    )

HIVE_HOST = "xxx"
HIVE_PORT = 9083

columns = [
ColumnBuilder("id", "string", "col comment").build(),
ColumnBuilder("client_name", "string").build(),
ColumnBuilder("amount", "string").build(),
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]

partition_keys = [
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]

serde_info = SerDeInfoBuilder(
serialization_lib="org.apache.hadoop.hive.ql.io.orc.OrcSerde"
).build()

storage_descriptor = StorageDescriptorBuilder(
columns=columns,
location="s3a://mys3bucket/xx",
input_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
output_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
serde_info=serde_info,
).build()

table = TableBuilder(
table_name="test_tmp_table",
db_name="default",
owner="owner name",
storage_descriptor=storage_descriptor,
partition_keys=partition_keys,
).build()

with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_metastore_client:
hive_metastore_client.create_table(table)
3. python createtable.py
4. See error

Screenshots

If applicable, add screenshots to help explain your problem.

Expected behavior

A clear and concise description of what you expected to happen.

Environment

  • Python Python 3.8.5:
  • Lib version:
  • Hive Metastore 1.2.2:
  • Other (e.g. OS):

Additional info

Add any other context about the problem here.

Method add_partitions doesn't respect storage descriptor of Partition

I want to add a hive partition to the standalone metastore using Python's HiveMetastoreClient with a custom path. So, in other words, I want to reproduce hive command

alter table table_name add partition(dt='2022051705') location '2022/05/17/05';
I use the following code but it creates partition with default path 'bucket_name/table_name/dt=2022051704' (it creates new folder) instead of 'bucket_name/table_name/2022/05/17/04' where files are stored

from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import (
    StorageDescriptorBuilder,
    SerDeInfoBuilder,
    PartitionBuilder
)

HIVE_HOST = "xx.xx.xx.xx"
HIVE_PORT = 9083
DATABASE_NAME = 'default'
TABLE_NAME = 'table_name'

columns = [columns_list]

serde_info = SerDeInfoBuilder(
    serialization_lib="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
).build()

partition_storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
    location="/2022/05/17/04",
    input_format="org.apache.hadoop.mapred.TextInputFormat",
    output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
    serde_info=serde_info,
).build()

partition_list = [
    PartitionBuilder(
        values=["2022051704"], db_name=DATABASE_NAME, table_name=TABLE_NAME,
        sd=partition_storage_descriptor
    ).build()
]


with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_client:
    hive_client.add_partitions_if_not_exists(DATABASE_NAME, TABLE_NAME, partition_list)

Additional question. Why is it required to specify columns list in StorageDescriptorBuilder although columns had been determined when the table was created?

Setting table_type = 'EXTERNAL_TABLE' builds a 'MANAGED_TABLE'

Sorry to open one more...

When I create an external table it ends up being created as a managed table.

            my_table = TableBuilder(
                            table_name='my_table',
                            db_name=table['DatabaseName'],
                            storage_descriptor=storage_descriptor,
                            partition_keys=partition_keys,
                            parameters=parameters,
                            table_type='EXTERNAL_TABLE',
                            owner='root'
                            ).build()

However when I look in the metastore postgresql database for that table:

# select "TBL_TYPE" from "TBLS"  where "TBL_NAME" = 'my_table';
   TBL_TYPE
---------------
 MANAGED_TABLE
(1 row)

fwiw, create table WITH (external_location = xxx) works fine from the presto client and creates the EXTERNAL_TABLE type in the database.

I'm still looking for a root cause or work-around, but thought I'd log what I've run into while I'm looking. Your examples and tests don't include creating an external table.

Kerberos support

Does this client works ith Kerberos authentication i activated in Hive Metastore Service?

Do not make `storage_descriptor` require all arguments for `TableBuilder`

storage_descriptor should not require all arguments for table builder because of virtual view. e.g.

table = TableBuilder(
    table_name="test_view",
    db_name="default",
    owner="test",
    table_type="VIRTUAL_VIEW",
    storage_descriptor=storage_descriptor,
    view_expanded_text="select * from test",
    view_original_text="select * from test"
).build()

When the user wants to create a virtual view, he/she should be able to just pass columns to storage descriptor instead of everything.

Instead of

storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
    location="s3a://path/to/file",
    input_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
    output_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
    serde_info=serde_info,
).build()

user should pass just

storage_descriptor = StorageDescriptorBuilder(
    columns=columns,
).build()

Program hangs after first method call

Describe the bug

When calling methods in a loop, only the first call succeeds. The next subsequent call seems to hang indefinitely. Event if you use the same table in succession. This happens with any method call, not just get_partition_keys_objects

To Reproduce

Steps to reproduce the behavior:

from hive_metastore_client import HiveMetastoreClient

tables = [
  "my_table",
  "my_table",
]
with HiveMetastoreClient(
    "my_url"
) as hive_client:
  for table in tables:
    print(table)
    print(hive_client.get_partition_keys_objects("default", table))

Expected behavior

Each call should succeed in a timely manner

Environment

  • Python version: 3.7
  • Lib version: 1.0.9
  • Hive Metastore version: 2.3.8
  • Other (e.g. OS): Ubuntu 18.04

Hive Metastore Client Cataloging for Delta

Hi guys, we here at CVCCorp have a limitation for Hive cataloging regarding Delta data.

This would be an example of what the cataloging model for data in Delta should look like.

CREATE EXTERNAL TABLE table_teste(
tabela STRING,
data_update STRING,
count BIGINT)
STORED BY 'io.delta.hive.DeltaStorageHandler'
LOCATION 's3://bucket-name/example/table_teste/';

Our motivations in using data in Delta are because we use Databricks and our Benchmark, Delta has better performance.
We also centralized all metadata in a Hive Cluster for integration with Databricks.

Any questions I will be in contact with Lucas on LinkedIn.

Confusing extra 's' in library name

The first thing the examples and documentation tell you to do is:

from hive_metastore_client import HiveMetastoreClient

That does not actually work. This does:

from hive_metastore_client.hive_mestastore_client import HiveMetastoreClient                                                                                                                

Note that there is an extra s in the second invocation: hive_mestastore_client.

That was confusing for a few minutes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.