Coder Social home page Coder Social logo

presto-chart's Issues

Cannot create s3-backed hive metastore

Hello,

Objective

I want to deploy this chart to an on-prem kubrnetes cluster with a catalog that makes use of minio for the hive metastore and backend storage. For reference, see -
https://blog.minio.io/building-an-on-premise-ml-ecosystem-with-minio-powered-by-presto-weka-r-and-s3select-feature-fefbbaa87054

Reason to think it should work

I have managed to use the prestosql/presto docker image to achieve this successfully on a dev machine (just docker, no k8s). This proof of concept used 1 minio container, 1 presto container, and 1 jupyter python container to connect to presto and push/pull data.

In order to get it working, I had to sort out networking and to create the right catalog file. The catalog file I'm using is -

# lake.catalog
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3://presto/
hive.allow-drop-table=true
hive.s3.aws-access-key=<USER>
hive.s3.aws-secret-key=<PASSWORD>
hive.s3.endpoint=<URL>
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3select-pushdown.enabled=true
hive.storage-format=parquet

How far I've gotten with wiwdata/presto-chart

I have a jupyter notebook up in the cluster. I have minio up in the cluster.

  1. I am able to bring up a wiwdata presto cluster with defaults.
  2. I am able to interact with the presto using the python notebook. I am able to interact with the minio server using the python notebook. So networking is posing no issues.

Where I am stuck

I cannot bring up a presto cluster with a working catalog configmap. This is what my configmap looks like -

---
# Source: presto/templates/configmap-catalog.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: presto-catalog
  labels:
    app: presto
    chart: presto-1
    release: presto-1579859901
data:
  hive.properties: |
    connector.name=hive-hadoop2
    hive.metastore=file
    hive.metastore.catalog.dir=s3://presto/
    hive.allow-drop-table=true
    hive.s3.aws-access-key=<USER>
    hive.s3.aws-secret-key=<PASSWORD>
    hive.s3.endpoint="minio-service.default.svc.cluster.local:9000"
    hive.s3.path-style-access=true
    hive.s3.ssl.enabled=false
    hive.s3select-pushdown.enabled=true
    hive.storage-format=parquet
---

** All other values in values.yaml are unchanged**

Unfortunately, the cluster keeps entering a crashloop.

riaz@k3s-dev:~/presto-chart$ sudo kubectl get all
NAME                                                 READY   STATUS             RESTARTS   AGE
pod/kubernetes-cockpit-tlnsw                         1/1     Running            0          3h54m
pod/minio-69c5c44c7c-74dkh                           1/1     Running            0          136m
pod/presto-1579859901-worker-845cd7cb9c-2tkzz        0/1     CrashLoopBackOff   3          3m45s
pod/presto-1579859901-worker-845cd7cb9c-hh295        0/1     CrashLoopBackOff   3          3m45s
pod/presto-1579859901-coordinator-7df8fc5c45-m699k   0/1     CrashLoopBackOff   3          3m45s

NAME                                       DESIRED   CURRENT   READY   AGE
replicationcontroller/kubernetes-cockpit   1         1         1       3h54m

NAME                         TYPE           CLUSTER-IP      EXTERNAL-IP                                PORT(S)    AGE
service/kubernetes           ClusterIP      10.43.0.1       <none>                                     443/TCP    13h
service/workbench            ExternalName   <none>          proxy-public.workbench.svc.cluster.local   80/TCP     13h
service/kubernetes-cockpit   ClusterIP      10.43.32.92     <none>                                     443/TCP    3h54m
service/minio-service        ClusterIP      10.43.102.159   <none>                                     9000/TCP   136m
service/presto-1579859901    ClusterIP      10.43.214.17    <none>                                     80/TCP     3m45s

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/minio                           1/1     1            1           136m
deployment.apps/presto-1579859901-worker        0/2     2            0           3m45s
deployment.apps/presto-1579859901-coordinator   0/1     1            0           3m45s

NAME                                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/minio-69c5c44c7c                           1         1         1       136m
replicaset.apps/presto-1579859901-worker-845cd7cb9c        2         2         0       3m45s
replicaset.apps/presto-1579859901-coordinator-7df8fc5c45   1         1         0       3m45s

I will post the logs in the next message so that they don't clog up this one, but the salient error (I think) is this -

2020-01-24T11:13:07.350Z	ERROR	main	com.facebook.presto.server.PrestoServer	Unable to create injector, see the following errors:

1) Explicit bindings are required and com.facebook.presto.hive.authentication.HdfsAuthentication is not explicitly bound.
  while locating com.facebook.presto.hive.authentication.HdfsAuthentication
    for the 3rd parameter of com.facebook.presto.hive.HdfsEnvironment.<init>(HdfsEnvironment.java:50)
  at com.facebook.presto.hive.HiveClientModule.configure(HiveClientModule.java:68)

2) Explicit bindings are required and com.facebook.presto.hive.s3.S3ConfigurationUpdater is not explicitly bound.
  while locating com.facebook.presto.hive.s3.S3ConfigurationUpdater
    for the 2nd parameter of com.facebook.presto.hive.HdfsConfigurationUpdater.<init>(HdfsConfigurationUpdater.java:77)
  at com.facebook.presto.hive.HiveClientModule.configure(HiveClientModule.java:66)

3) Error: Could not coerce value 'parquet' to com.facebook.presto.hive.HiveStorageFormat (property 'hive.storage-format') in order to call [public com.facebook.presto.hive.HiveClientConfig com.facebook.presto.hive.HiveClientConfig.setHiveStorageFormat(com.facebook.presto.hive.HiveStorageFormat)]

4) Configuration property 'hive.s3.aws-access-key' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

5) Configuration property 'hive.s3.aws-secret-key' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

6) Configuration property 'hive.s3.endpoint' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

7) Configuration property 'hive.s3.path-style-access' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

8) Configuration property 'hive.s3.ssl.enabled' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

9) Configuration property 'hive.s3select-pushdown.enabled' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

10) Configuration property 'hive.storage-format' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:233)

10 errors


This suggests to me that this docker container has been built with a version of presto that doesn't support the s3-backed hive metastore.

Is this correct? If so, could I build an updated one?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.