quintoandar / hive-metastore-client Goto Github PK
View Code? Open in Web Editor NEWA client for connecting and running DDLs on hive metastore.
License: Apache License 2.0
A client for connecting and running DDLs on hive metastore.
License: Apache License 2.0
Thanks for this amazing client!
Is there a way to automatically scan and sync partitions from file system and metastore?
SOmething like the MSCK repair tool in hive?
I could use a hint as to what might throw this error out of the Thrift client.
I'm using thrift version 0.13.0 with this.
I'm mostly following the examples for building a table.
Traceback (most recent call last):
File "./load_metastore.py", line 126, in <module>
mc.create_table(my_table)
File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2632, in create_table
self.send_create_table(tbl)
File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2639, in send_create_table
args.write(self._oprot)
File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 20777, in write
self.tbl.write(oprot)
File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 5253, in write
self.sd.write(oprot)
File "/home/rotten/.virtualenvs/load_metastore/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ttypes.py", line 4897, in write
iter170.write(oprot)
AttributeError: 'ColumnBuilder' object has no attribute 'write'
Feature related:
Sorry to bother you. I have a need to alter and drop tables. There aren't any examples in your code base for how to do that, and it really isn't obvious. For now, when I need to alter a table, I'm dropping out to the presto command line and dropping it, then jumping back into this client to build the updated version. It is rather inelegant.
If this is already supported and you could point me to a better approach, I'd love to learn more.
Thank you!
name: call create_table raise java.lang.NullPointerException
about: Problems and issues with code or docs
title: ''
labels: bug
assignees: ''
Describe the bug
i use the python lib, and call create_table func then raise java.lang.NullPointerException:
Traceback (most recent call last):
File "gittb.py", line 53, in
hive_metastore_client.create_table(table)
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2633, in create_table
self.recv_create_table()
File "/home/ec2-user/lfyang/spark-ui/jupyter/yes/lib/python3.8/site-packages/thrift_files/libraries/thrift_hive_metastore_client/ThriftHiveMetastore.py", line 2659, in recv_create_table
raise result.o3
thrift_files.libraries.thrift_hive_metastore_client.ttypes.MetaException: MetaException(message='java.lang.NullPointerException')
To Reproduce
Steps to reproduce the behavior:
HIVE_HOST = "xxx"
HIVE_PORT = 9083
columns = [
ColumnBuilder("id", "string", "col comment").build(),
ColumnBuilder("client_name", "string").build(),
ColumnBuilder("amount", "string").build(),
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]
partition_keys = [
ColumnBuilder("year", "string").build(),
ColumnBuilder("month", "string").build(),
ColumnBuilder("day", "string").build(),
]
serde_info = SerDeInfoBuilder(
serialization_lib="org.apache.hadoop.hive.ql.io.orc.OrcSerde"
).build()
storage_descriptor = StorageDescriptorBuilder(
columns=columns,
location="s3a://mys3bucket/xx",
input_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
output_format="org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat",
serde_info=serde_info,
).build()
table = TableBuilder(
table_name="test_tmp_table",
db_name="default",
owner="owner name",
storage_descriptor=storage_descriptor,
partition_keys=partition_keys,
).build()
with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_metastore_client:
hive_metastore_client.create_table(table)
3. python createtable.py
4. See error
Screenshots
If applicable, add screenshots to help explain your problem.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Additional info
Add any other context about the problem here.
I want to add a hive partition to the standalone metastore using Python's HiveMetastoreClient with a custom path. So, in other words, I want to reproduce hive command
alter table table_name add partition(dt='2022051705') location '2022/05/17/05';
I use the following code but it creates partition with default path 'bucket_name/table_name/dt=2022051704' (it creates new folder) instead of 'bucket_name/table_name/2022/05/17/04' where files are stored
from hive_metastore_client import HiveMetastoreClient
from hive_metastore_client.builders import (
StorageDescriptorBuilder,
SerDeInfoBuilder,
PartitionBuilder
)
HIVE_HOST = "xx.xx.xx.xx"
HIVE_PORT = 9083
DATABASE_NAME = 'default'
TABLE_NAME = 'table_name'
columns = [columns_list]
serde_info = SerDeInfoBuilder(
serialization_lib="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
).build()
partition_storage_descriptor = StorageDescriptorBuilder(
columns=columns,
location="/2022/05/17/04",
input_format="org.apache.hadoop.mapred.TextInputFormat",
output_format="org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
serde_info=serde_info,
).build()
partition_list = [
PartitionBuilder(
values=["2022051704"], db_name=DATABASE_NAME, table_name=TABLE_NAME,
sd=partition_storage_descriptor
).build()
]
with HiveMetastoreClient(HIVE_HOST, HIVE_PORT) as hive_client:
hive_client.add_partitions_if_not_exists(DATABASE_NAME, TABLE_NAME, partition_list)
Additional question. Why is it required to specify columns list in StorageDescriptorBuilder although columns had been determined when the table was created?
Sorry to open one more...
When I create an external table it ends up being created as a managed table.
my_table = TableBuilder(
table_name='my_table',
db_name=table['DatabaseName'],
storage_descriptor=storage_descriptor,
partition_keys=partition_keys,
parameters=parameters,
table_type='EXTERNAL_TABLE',
owner='root'
).build()
However when I look in the metastore postgresql database for that table:
# select "TBL_TYPE" from "TBLS" where "TBL_NAME" = 'my_table';
TBL_TYPE
---------------
MANAGED_TABLE
(1 row)
fwiw, create table WITH (external_location = xxx)
works fine from the presto client and creates the EXTERNAL_TABLE type in the database.
I'm still looking for a root cause or work-around, but thought I'd log what I've run into while I'm looking. Your examples and tests don't include creating an external table.
Does this client works ith Kerberos authentication i activated in Hive Metastore Service?
storage_descriptor
should not require all arguments for table builder because of virtual view. e.g.
table = TableBuilder(
table_name="test_view",
db_name="default",
owner="test",
table_type="VIRTUAL_VIEW",
storage_descriptor=storage_descriptor,
view_expanded_text="select * from test",
view_original_text="select * from test"
).build()
When the user wants to create a virtual view, he/she should be able to just pass columns to storage descriptor instead of everything.
Instead of
storage_descriptor = StorageDescriptorBuilder(
columns=columns,
location="s3a://path/to/file",
input_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
output_format="org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
serde_info=serde_info,
).build()
user should pass just
storage_descriptor = StorageDescriptorBuilder(
columns=columns,
).build()
Describe the bug
When calling methods in a loop, only the first call succeeds. The next subsequent call seems to hang indefinitely. Event if you use the same table in succession. This happens with any method call, not just get_partition_keys_objects
To Reproduce
Steps to reproduce the behavior:
from hive_metastore_client import HiveMetastoreClient
tables = [
"my_table",
"my_table",
]
with HiveMetastoreClient(
"my_url"
) as hive_client:
for table in tables:
print(table)
print(hive_client.get_partition_keys_objects("default", table))
Expected behavior
Each call should succeed in a timely manner
Environment
Hi guys, we here at CVCCorp have a limitation for Hive cataloging regarding Delta data.
This would be an example of what the cataloging model for data in Delta should look like.
CREATE EXTERNAL TABLE table_teste(
tabela STRING,
data_update STRING,
count BIGINT)
STORED BY 'io.delta.hive.DeltaStorageHandler'
LOCATION 's3://bucket-name/example/table_teste/';
Our motivations in using data in Delta are because we use Databricks and our Benchmark, Delta has better performance.
We also centralized all metadata in a Hive Cluster for integration with Databricks.
Any questions I will be in contact with Lucas on LinkedIn.
The first thing the examples and documentation tell you to do is:
from hive_metastore_client import HiveMetastoreClient
That does not actually work. This does:
from hive_metastore_client.hive_mestastore_client import HiveMetastoreClient
Note that there is an extra s in the second invocation: hive_mestastore_client.
That was confusing for a few minutes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.