Coder Social home page Coder Social logo

acryldata / datahub-helm Goto Github PK

View Code? Open in Web Editor NEW
157.0 12.0 237.0 997 KB

Repository of helm charts for deploying DataHub on a Kubernetes cluster

License: Apache License 2.0

Mustache 100.00%
datahub kubernetes helm hacktoberfest hacktoberfest2021

datahub-helm's Introduction

DataHub Kubernetes Helm Charts

Artifact Hub

Introduction

This repo provides the Kubernetes Helm charts for deploying Datahub and it's dependencies (Elasticsearch, optionally Neo4j, MySQL, and Kafka) on a Kubernetes cluster.

Setup

  1. Set up a kubernetes cluster
  2. Install the following tools:
    • kubectl to manage kubernetes resources
    • helm to deploy the resources based on helm charts. Note, we only support Helm 3.

Components

Datahub consists of 4 main components: GMS, MAE Consumer (optional), MCE Consumer (optional), and Frontend. Kubernetes deployment for each of the components are defined as subcharts under the main Datahub helm chart.

The main components are powered by 4 external dependencies:

  • Kafka
  • Local DB (MySQL, Postgres, MariaDB)
  • Search Index (Elasticsearch)
  • Graph Index (Supports either Elasticsearch or Neo4j)

The dependencies must be deployed before deploying Datahub. We created a separate chart for deploying the dependencies with example configuration. They could also be deployed separately on-prem or leveraged as managed services.

Quickstart

Assuming kubectl context points to the correct kubernetes cluster, first create kubernetes secrets that contain MySQL and Neo4j passwords.

kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=datahub --from-literal=mysql-password=datahub
kubectl create secret generic neo4j-secrets --from-literal=neo4j-password=datahub --from-literal=NEO4J_AUTH=neo4j/datahub

The above commands sets the passwords to "datahub" as an example. Change to any password of choice.

Add datahub helm repo by running the following

helm repo add datahub https://helm.datahubproject.io/

Then, deploy the dependencies by running the following

helm install prerequisites datahub/datahub-prerequisites

Note, the above uses the default configuration defined here. You can change any of the configuration and deploy by running the following command.

helm install prerequisites datahub/datahub-prerequisites --values <<path-to-values-file>>

Run kubectl get pods to check whether all the pods for the dependencies are running. You should get a result similar to below.

NAME                                               READY   STATUS      RESTARTS   AGE
elasticsearch-master-0                             1/1     Running     0          62m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv   2/2     Running     1          63m
prerequisites-kafka-0                              1/1     Running     2          62m
prerequisites-mysql-0                              1/1     Running     1          62m
prerequisites-neo4j-0                              1/1     Running     0          52m
prerequisites-zookeeper-0                          1/1     Running     0          62m

deploy Datahub by running the following

helm install datahub datahub/datahub

Values in values.yaml have been preset to point to the dependencies deployed using the prerequisites chart with release name "prerequisites". If you deployed the helm chart using a different release name, update the quickstart-values.yaml file accordingly before installing.

Run kubectl get pods to check whether all the datahub pods are running. You should get a result similar to below.

NAME                                               READY   STATUS      RESTARTS   AGE
datahub-acryl-datahub-actions-58b676f77c-c6pfx     1/1     Running     0          4m2s
datahub-datahub-frontend-84c58df9f7-5bgwx          1/1     Running     0          4m2s
datahub-datahub-gms-58b676f77c-c6pfx               1/1     Running     0          4m2s
datahub-datahub-mae-consumer-7b98bf65d-tjbwx       1/1     Running     0          4m3s
datahub-datahub-mce-consumer-8c57d8587-vjv9m       1/1     Running     0          4m2s
datahub-elasticsearch-setup-job-8dz6b              0/1     Completed   0          4m50s
datahub-kafka-setup-job-6blcj                      0/1     Completed   0          4m40s
datahub-mysql-setup-job-b57kc                      0/1     Completed   0          4m7s
elasticsearch-master-0                             1/1     Running     0          97m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv   2/2     Running     1          99m
prerequisites-kafka-0                              1/1     Running     2          97m
prerequisites-mysql-0                              1/1     Running     1          97m
prerequisites-neo4j-0                              1/1     Running     0          88m
prerequisites-zookeeper-0                          1/1     Running     0          97m

You can run the following to expose the frontend locally. Note, you can find the pod name using the command above. In this case, the datahub-frontend pod name was datahub-datahub-frontend-84c58df9f7-5bgwx.

kubectl port-forward <datahub-frontend pod name> 9002:9002

You should be able to access the frontend via http://localhost:9002.

Once you confirm that the pods are running well, you can set up ingress for datahub-frontend to expose the 9002 port to the public.

Default Credentials

There are a few keys and credentials created as part of the deployment using randomized values. They can be overridden using various configuration parameters.

Also consider changing the default credentials used by any of the underlying data stores pulled in by the companion helm chart for the prerequisites. Refer to the upstream helm charts or point to your own managed data stores for these components.

DataHub Login

For controlling the default admin password, see the following configuration.

Encryption Key

Used by the Play framework and GMS to encrypt secrets at the application level, this can be configured here.

Token Signing Key

Used to sign tokens for authentication, see configuration here.

Contributing

We welcome contributions from the community. Please refer to our Contributing Guidelines for more details.

Community

Join our slack workspace for discussions and important announcements.

datahub-helm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datahub-helm's Issues

datahub-frontend creates two Classic loadbalancers instead of ALB

Describe the bug
I'm trying to use latest Helm chart to deploy DataHub into EKS (v0.2.89)
When I add ingress section (step from https://datahubproject.io/docs/deploy/aws) and apply the change, two Classic Loadbalancers are created - one with 9002 listener, second one with 8080 listener.

To Reproduce

  1. Set enabled: true in the ingress section of datahub-frontend Helm chart, in values.yaml file.
  2. Fill in certificate-arn and host
  3. Apply with helm upgrade --install datahub datahub/datahub --values values.yaml
  4. Go to AWS Console --> EC2 --> Load Balancers
  5. See two Classic Loadbalancers

Expected behavior
One ALB is created, with TCP 9002 and TCP 8080 listeners along with the certificate.

Screenshots
All pods are running correctly:
Screenshot 2022-08-17 at 16 18 51

Additional context
There's a document: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/ingress_class/ stating that kubernetes.io/ingress.class is deprecated. I'm not sure if this may point you into the right direction.

On Adding OIDC Config , Helm chart errors out

how to reproduce -

datahub-frontend:
...
  extraEnvs:
    - name: AUTH_OIDC_ENABLED
      value: true
    - name: AUTH_OIDC_CLIENT_ID
      value: "<change-me>"
    - name: AUTH_OIDC_CLIENT_SECRET
      value: "<change-me>"
    - name: AUTH_OIDC_DISCOVERY_URI
      value: "<change-me>"
    - name: AUTH_OIDC_BASE_URL
      value: "<change-me>"
helm install datahub datahub/datahub --values ./charts/datahub/values.yaml -n datahub

Error: Deployment in version "v1" cannot be handled as a Deployment: v1.Deployment.Spec: v1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects " or n, but found t, error found in #10 byte of ...|,"value":true},{"nam|..., bigger context ...|geEvent_v1"},{"name":"AUTH_OIDC_ENABLED","value":true},{"name":"AUTH_OIDC_CLIENT_ID","value":"f8bcbc|...

i believe , toYaml loosing double quote to be the issue

https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/subcharts/datahub-frontend/templates/deployment.yaml#L141

Latest chart is failing to install

Describe the bug
I am getting the following error on chart 0.2.132
(template: datahub/templates/datahub-auth-secrets.yml:5:7: executing "datahub/templates/datahub-auth-secrets.yml" at <.enabled>: can't evaluate field enabled in type bool)

Unable to install the helm charts

Error I get when I try and install the helm chart:

Error: This command needs 1 argument: chart name

Helm version :

Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}

Support JAAS frontend authentication configuration

Hi here,

Is your feature request related to a problem? Please describe.
It would be nice that configuring the jaas authentication method for the frontend, as described in datahub frontend authentication could be natively supported by the helm charts

Describe the solution you'd like
Ideally the content of the jaas.conf and maybe of the /datahub-frontend/conf/user.props there referred, could be easily configurable by passing the content to a value (property) of the frontend subchart

Describe alternatives you've considered
If there is a way to achieve this without modifying the existing chart please let me know ;)

Thanks!

Allow [white|black]listObjectNames in jmx configuration

Is your feature request related to a problem? Please describe.

For convenience and economical reasons (we pay a fee per each additional custom metric) we would like to configure the whitelistObjectNames and blacklistObjectNames as described in jmx exporter configuration

Describe the solution you'd like
It should be possible to define the two lists from the values files together with the lowercaseOutputName and lowercaseOutputLabelNames ones

datahub-kafka-setup-job OOMKilled

The memory resource limits for this job seem too low:

resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 300m
memory: 256Mi

The job is getting OOMKilled:

NAME↑             PF  IMAGE                                 READY   STATE       INIT     RESTARTS PROBES(L:R)   CPU  MEM  CPU/R:L  MEM/R:L  %CPU/R  %CPU/L  %MEM/R   %MEM/L PORTS   AGE       │
│ kafka-setup-job   ●   linkedin/datahub-kafka-setup:v0.9.5   false   OOMKilled   false           0 off:off         0    0  300:500  256:512       0       0       0        0         3m58s

image

│ Created topic PlatformEvent_v1.                                                                                                                                                                                                                                                                                                               │
│ Created topic MetadataChangeEvent_v4.                                                                                                                                                                                                                                                                                                         │
│ Completed updating config for topic _schemas.                                                                                                                                                                                                                                                                                                 │
│ Created topic MetadataChangeLog_Versioned_v1.                                                                                                                                                                                                                                                                                                 │
│ Created topic MetadataChangeLog_Timeseries_v1.                                                                                                                                                                                                                                                                                                │
│ ./kafka-setup.sh: line 85:    23 Killed                  kafka-topics.sh --create --if-not-exists --command-config $CONNECTION_PROPERTIES_PATH --bootstrap-server $KAFKA_BOOTSTRAP_SERVER --partitions $PARTITIONS --replication-factor $REPLICATION_FACTOR --topic $METADATA_AUDIT_EVENT_NAME                                                │
│ ./kafka-setup.sh: line 85:    25 Killed                  kafka-topics.sh --create --if-not-exists --command-config $CONNECTION_PROPERTIES_PATH --bootstrap-server $KAFKA_BOOTSTRAP_SERVER --partitions $PARTITIONS --replication-factor $REPLICATION_FACTOR --topic $FAILED_METADATA_CHANGE_EVENT_NAME                                        │
│ ./kafka-setup.sh: line 85:    28 Killed                  kafka-topics.sh --create --if-not-exists --command-config $CONNECTION_PROPERTIES_PATH --bootstrap-server $KAFKA_BOOTSTRAP_SERVER --partitions $PARTITIONS --replication-factor $REPLICATION_FACTOR --topic $METADATA_CHANGE_PROPOSAL_TOPIC                                           │
│ ./kafka-setup.sh: line 85:    29 Killed                  kafka-topics.sh --create --if-not-exists --command-config $CONNECTION_PROPERTIES_PATH --bootstrap-server $KAFKA_BOOTSTRAP_SERVER --partitions $PARTITIONS --replication-factor $REPLICATION_FACTOR --topic $FAILED_METADATA_CHANGE_PROPOSAL_TOPIC                                    │
│ ./kafka-setup.sh: line 85:    31 Killed                  kafka-topics.sh --create --if-not-exists --command-config $CONNECTION_PROPERTIES_PATH --bootstrap-server $KAFKA_BOOTSTRAP_SERVER --partitions $PARTITIONS --replication-factor $REPLICATION_FACTOR --topic $DATAHUB_USAGE_EVENT_NAME

Mysql setup job fails if require_secure_transport is set to ON on mysql server

We are using RDS Aurora for mysql backend. Our server parameter has require_secure_transport set to ON. Mysql setup job fails with the following error

atul.atri@C02FD3A3MD6M iac-datahub-db % kubectl logs  datahub-mysql-setup-job-9d7fl -n datahub
2022/08/30 12:51:12 Waiting for: tcp://<redacted>:3306
2022/08/30 12:51:12 Connected to tcp://<redacted>:3306
-- create datahub database
CREATE DATABASE IF NOT EXISTS datahub CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
USE datahub;

-- create metadata aspect table
create table if not exists metadata_aspect_v2 (
  urn                           varchar(500) not null,
  aspect                        varchar(200) not null,
  version                       bigint(20) not null,
  metadata                      longtext not null,
  systemmetadata                lo

ngtext,
  createdon                     datetime(6) not null,
  createdby                     varchar(255) not null,
  createdfor                    varchar(255),
  constraint pk_metadata_aspect_v2 primary key (urn,aspect,version)
);

-- create default records for datahub user if not exists
DROP TABLE if exists temp_metadata_aspect_v2;
CREATE TABLE temp_metadata_aspect_v2 LIKE metadata_aspect_v2;
INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES(
  'urn:li:corpuser:datahub',
  'corpUserInfo',
  0,
  '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"[email protected]"}',
  now(),
  'urn:li:corpuser:__datahub_system'
), (
  'urn:li:corpuser:datahub',
  'corpUserEditableInfo',
  0,
  '{"skills":[],"teams":[],"pictureLink":"https://raw.githubusercontent.com/datahub-project/datahub/master/datahub-web-react/src/images/default_avatar.png"}',
  now(),
  'urn:li:corpuser:__datahub_system'
);
-- only add default records if metadata_aspect is empty
INSERT INTO metadata_aspect_v2
SELECT * FROM temp_metadata_aspect_v2
WHERE NOT EXISTS (SELECT * from metadata_aspect_v2);
DROP TABLE temp_metadata_aspect_v2;

-- create metadata index table
CREATE TABLE IF NOT EXISTS metadata_index (
 `id` BIGINT NOT NULL AUTO_INCREMENT,
 `urn` VARCHAR(200) NOT NULL,
 `aspect` VARCHAR(150) NOT NULL,
 `path` VARCHAR(150) NOT NULL,
 `longVal` BIGINT,
 `stringVal` VARCHAR(200),
 `doubleVal` DOUBLE,
 CONSTRAINT id_pk PRIMARY KEY (id),
 INDEX longIndex (`urn`,`aspect`,`path`,`longVal`),
 INDEX stringIndex (`urn`,`aspect`,`path`,`stringVal`),
 INDEX doubleIndex (`urn`,`aspect`,`path`,`doubleVal`)
);
ERROR 3159 (HY000): Connections using insecure transport are prohibited while --require_secure_transport=ON.
2022/08/30 12:51:12 Command exited with error: exit status 1

Mysql setup job was successful after I set require_secure_transport to OFF.

datahub-upgrade fails on v0.8.14

Hello,

I am facing an issue while trying to upgrade datahub from v0.8.6 to 0.8.14 . When I run the datahub upgrade, I get the following error:
Executing Step 2/7: GMSQualificationStep... java.io.FileNotFoundException: http://datahub-gms:8080/config at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1890) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492) at com.linkedin.datahub.upgrade.nocode.GMSQualificationStep.lambda$executable$0(GMSQualificationStep.java:63) at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:97) at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:57) at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:39) at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:30) at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:44) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:813) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:797) at org.springframework.boot.SpringApplication.run(SpringApplication.java:324) at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:139) at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) ERROR: Cannot connect to GMSat host datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.

The service is there and it listens on port 8080. I am also getting 404 when I try to curl this endpoint: http://datahub-gms:8080/config

But the service is there.


If I run the upgrade job on the old version of GMS(0.8.6) I am getting the following error:
:: Spring Boot :: (v2.1.4.RELEASE) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder". SLF4J: Defaulting to no-operation MDCAdapter implementation. SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details. Sep 29, 2021 7:28:17 AM org.neo4j.driver.internal.logging.JULogger info INFO: Direct driver instance 583015088 created for server address localhost:7687 Starting upgrade with id NoCodeDataMigration... Executing Step 1/7: RemoveAspectV2TableStep... Completed Step 1/7: RemoveAspectV2TableStep successfully. Executing Step 2/7: GMSQualificationStep... Completed Step 2/7: GMSQualificationStep successfully. Executing Step 3/7: UpgradeQualificationStep... -- V1 table does not exist Failed to qualify upgrade candidate. Aborting the upgrade... Step with id UpgradeQualificationStep requested an abort of the in-progress update. Aborting the upgrade... Upgrade NoCodeDataMigration completed with result ABORTED. Exiting...

I am not really sure if it is a bug or if I am missing anything. Any ideas here?

datahub-gms and datahub-mae consumer don't properly start when GRAPH_SERVICE_IMPL is set to elasticsearch

Describe the bug
Datahub-gms and datahub-mae consumer do not recognise GRAPH_SERVICE_IMPL env variable
To Reproduce
Start the services with parameter GRAPH_SERVICE_IMPL=elasticsearch . It fails because it can't connect to neo4j.
Expected behavior
It should just use elasticsearch.

Additional context
I managed to make it work by adjusting the default entrypoing for datahub-gms and datahub-mae-consumer. This is how it looks:

`#!/bin/sh

# Add default URI (http) scheme if needed
if ! echo $NEO4J_HOST | grep -q "://" ; then
    NEO4J_HOST="http://$NEO4J_HOST"
fi

if [[ -z $ELASTICSEARCH_USERNAME ]]; then
    ELASTICSEARCH_HOST_URL=$ELASTICSEARCH_HOST
else
  if [[ -z $ELASTICSEARCH_AUTH_HEADER ]]; then
    ELASTICSEARCH_HOST_URL=$ELASTICSEARCH_USERNAME:$ELASTICSEARCH_PASSWORD@$ELASTICSEARCH_HOST
  else
    ELASTICSEARCH_HOST_URL=$ELASTICSEARCH_HOST
  fi
fi

# Add default header if needed
if [[ -z $ELASTICSEARCH_AUTH_HEADER ]]; then
    ELASTICSEARCH_AUTH_HEADER="Accept: */*"
fi

if [[ $ELASTICSEARCH_USE_SSL == true ]]; then
    ELASTICSEARCH_PROTOCOL=https
else
    ELASTICSEARCH_PROTOCOL=http
fi

WAIT_FOR_NEO4J=""

if [[ $GRAPH_SERVICE_IMPL == neo4j ]]; then
  WAIT_FOR_NEO4J=" -wait $NEO4J_HOST "
fi

OTEL_AGENT=""
if [[ $ENABLE_OTEL == true ]]; then
  OTEL_AGENT="-javaagent:opentelemetry-javaagent-all.jar "
fi

PROMETHEUS_AGENT=""
if [[ $ENABLE_PROMETHEUS == true ]]; then
  PROMETHEUS_AGENT="-javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-mae-consumer/scripts/prometheus-config.yaml "
fi

dockerize \
  -wait tcp://$(echo $KAFKA_BOOTSTRAP_SERVER | sed 's/,/ -wait tcp:\/\//g') \
  -wait $ELASTICSEARCH_PROTOCOL://$ELASTICSEARCH_HOST_URL:$ELASTICSEARCH_PORT -wait-http-header "$ELASTICSEARCH_AUTH_HEADER" \
  $WAIT_FOR_NEO4J \
  -timeout 240s \
  java $JAVA_OPTS $JMX_OPTS $OTEL_AGENT $PROMETHEUS_AGENT -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar  `

Error when adding helm repository

Describe the bug

Datahub Helm Repository cannot be added.

To Reproduce
Steps to reproduce the behavior:

  1. Open a terminal
  2. Run
helm repo add datahub https://helm.datahubproject.io/
  1. See error
Error: looks like "https://helm.datahubproject.io/" is not a valid chart repository or cannot be reached: Get "https://helm.datahubproject.io/index.yaml": dial tcp: lookup helm.datahubproject.io on [::1]:53: read udp [::1]:35408->[::1]:53: read: connection refused

Expected behavior

Helm chart repository correctly added.

Additional context

The error above is observed under Helm v3.9.0. If we run the helm repo add datahub https://helm.datahubproject.io/ command using Helm v3.1.0, we obtain the following error message:

Error: error converting YAML to JSON: yaml: line 29: could not find expected ':'
helm.go:75: [debug] error converting YAML to JSON: yaml: line 29: could not find expected ':'

How can I configure datahub with kafka tls bootstrap-servers without ssl-configuration

Hello
Is it possible to run DataHub with kafka configured with tls authentication mode without ssl configuration?
I use AWS MSK with TLS authentication type, unauthenticated access is enabled and use private endpoints :
kafka:
bootstrap:
server: "b-1.xxxxxx.kafka.us-east-1.amazonaws.com:9094"
zookeeper:
server: "z-1.xxxxxx.kafka.us-east-1.amazonaws.com:2182"
springKafkaConfigurationOverrides:
security.protocol: SSL

It 's ok if I connect to kafka from client-machine via kafka-topics. For example "kafka-topics.sh --bootstrap-server b-1.xxxxxx.kafka.us-east-1.amazonaws.com:9094 --command-config=client.properties --list" with security.protocol=SSL as client.properties return a list of topics.

But in AWS EKS I got an error in kafka-setup-job:
[main] ERROR io.confluent.admin.utils.cli.KafkaReadyCommand - Error while running kafka-ready.
org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:65)
at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:138)
at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150)
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore of type JKS
at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:73)
at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
... 3 more
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore of type JKS
at org.apache.kafka.common.security.ssl.SslEngineBuilder.createSSLContext(SslEngineBuilder.java:160)
at org.apache.kafka.common.security.ssl.SslEngineBuilder.(SslEngineBuilder.java:102)
at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:93)
at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:71)
... 7 more
Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore of type JKS
at org.apache.kafka.common.security.ssl.SslEngineBuilder$SecurityStore.load(SslEngineBuilder.java:289)
at org.apache.kafka.common.security.ssl.SslEngineBuilder.createSSLContext(SslEngineBuilder.java:142)
... 10 more
Caused by: java.io.IOException: Is a directory
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:48)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:245)
at java.base/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:223)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
at java.base/java.security.DigestInputStream.read(DigestInputStream.java:125)
at java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
at java.base/sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:662)
at java.base/sun.security.util.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:222)
at java.base/java.security.KeyStore.load(KeyStore.java:1479)
at org.apache.kafka.common.security.ssl.SslEngineBuilder$SecurityStore.load(SslEngineBuilder.java:286)
... 11 more
Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:258)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:254)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:112)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1826)
at kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:280)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
at kafka.admin.TopicCommand.main(TopicCommand.scala)

In kafka-setup-job in kafka-setup.sh there are next strings:
if [[ $KAFKA_PROPERTIES_SECURITY_PROTOCOL == "SSL" ]]; then
echo "ssl.keystore.location=$KAFKA_PROPERTIES_SSL_KEYSTORE_LOCATION" >> $CONNECTION_PROPERTIES_PATH
echo "ssl.keystore.password=$KAFKA_PROPERTIES_SSL_KEYSTORE_PASSWORD" >> $CONNECTION_PROPERTIES_PATH
echo "ssl.key.password=$KAFKA_PROPERTIES_SSL_KEY_PASSWORD" >> $CONNECTION_PROPERTIES_PATH
echo "ssl.truststore.location=$KAFKA_PROPERTIES_SSL_TRUSTSTORE_LOCATION" >> $CONNECTION_PROPERTIES_PATH
echo "ssl.truststore.password=$KAFKA_PROPERTIES_SSL_TRUSTSTORE_PASSWORD" >> $CONNECTION_PROPERTIES_PATH
echo "ssl.endpoint.identification.algorithm=$KAFKA_PROPERTIES_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM" >> $CONNECTION_PROPERTIES_PATH
fi
Does it mean that DataHub can't work without full SSL configuration?

Independent mysql DB.

Hello.
I want to use independent MySQL from my RDS instead containerized mysql from datahub installation.
I think this is possible cuz I we have this params:
mysqlSetupJob.enabled = false|true
global.sql.datasource.hostForMysqlClient = "myRDS sql link"
global.sql.datasource.url = "jdbc:mysql: myRDS sql link"

I feel like I missed something.
What params could help me to build datahub with independent DB ?
Thank you.

Editing user.props on k8s

Hi guys,

Which approach are you using to edit user.props to change the default password deployed on k8s?

Update dependencies

Describe the bug
The policy/v1beta1 API version of PodDisruptionBudget is no longer served as of v1.25.
So the CRD PodDisruptionBudget is not working for the Elasticserach cluster.
The new helm chart version of Elasticsearch should fix this.

To Reproduce
Steps to reproduce the behavior:

  1. Install new k3s version / k8s version
  2. try to install prerequires

Unable to install datahub helm chart

Running this command: helm install datahub datahub/datahub returns the following error:

Error: failed pre-install: warning: Hook pre-install datahub/templates/kafka-setup-job.yml failed: Job in version "v1" cannot be handled as a Job: v1.Job.Spec: v1.JobSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.v1.EnvVar.Value: ReadString: expects " or n, but found 3, error found in #10 byte of ...|,"value":3},{"name":|..., bigger context ...|isites-kafka:9092"},{"name":"PARTITIONS","value":3},{"name":"REPLICATION_FACTOR","value":3}],"image"|...

All the prerequisites get installed fine..
image

Kafka sasl.jaas.config cannot be passed as a secret

Hi here,

we use Kafka together with sasl and ScramLoginModule, so that one of the properties that we end up passing to the containers is:

global:
  springKafkaConfigurationOverrides:
    security.protocol: SASL_SSL
    sasl.mechanism: SCRAM-SHA-256
    sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required serviceName="datahub" username="user" password="passwd";

Clearly enough the sasl.jaas.config should better be passed as a secret. Do you think that it makes sense to support this use case?

Thanks!

Don't impose to create secret

Hello,
You shouldn't impose user to create secrets before installing helm package. It goes against best practices of infrastructure as code where you define all your values in the hlem charts.
For exemple, we use Hashicorp Vault to pull our credentials directly in our values.yaml files before installing the chart.

By imposing us to create a secret file containing our secrets it goes against our policies.

existingSecret: mysql-secrets

existingPasswordSecret: neo4j-secrets

Please removes thoses lines as they can't be overriten.

Image Tag not correctly set for DataHub-Ingestion-Cron

Describe the bug
Helm Templating Error when enabling datahub-ingestion-cron-job(s).

To Reproduce
Steps to reproduce the behaviour:
Install DataHub using the helm-chart with a cron-job enabled along with the prerequisites. When running helm upgrade --install .. it fails with:

Error: template: datahub/charts/datahub-ingestion-cron/templates/cron.yaml:38:109: executing "datahub/charts/datahub-ingestion-cron/templates/cron.yaml" at <.Values.image.tag>: nil pointer evaluating interface {}.image

It seems that the image-tag is not set properly by helm.

Expected behavior
The command should run successfully without errors.

Additional context
DataHub Cron-Job version: v0.9.5
Helm-Chart version: 0.2.129

I get the same error for earlier versions of the helm-chart until v 0.2.120.
Version 0.2.119 is the most recent version of the helm-chart which doesn't produce this error.

Support the Rancher Charts

Is your feature request related to a problem? Please describe.
We use Rancher to deploy DataHub.
Rancher has Rancher Charts that is extension of Helm charts.
Using Rancher Charts, we can edit (overwrite) values.yaml from the UI.

Describe the solution you'd like
To support the Rancher Charts, we need to add questions.yaml and app-readme.md.
And I created these files for datahub chart. Please refer to Additional context.
I will create these for prerequisites chart.

Describe alternatives you've considered
n/a

Additional context
This is the screenshot of Rancher UI.
Users can set the values when launch or upgrade App.
Also, it is possible to specify a private registry (like registry.local) in an air-gap environment for the Image Registry.

image

sample of questions.yaml:

---
categories:
  - datahub
  - datacatalog
questions:
  - variable: datahub-gms.enabled
    default: true
    type: boolean
    label: Enable GMS
    group: "datahub-gms"
    description: "Enable GMS"
    show_subquestion_if: true
    subquestions:
      - variable: datahub-gms.image.registry
        required: true
        default: "docker.io"
        type: string
        label: datahub-gms Image Registry
        group: "datahub-gms"
        description: "Image registry for datahub-gms"
      - variable: datahub-gms.image.repository
        required: true
        default: "linkedin/datahub-gms"
        type: string
        label: datahub-gms Image Repository
        group: "datahub-gms"
        description: "Image repository for datahub-gms"
      - variable: datahub-gms.image.tag
        required: true
        default: "v0.8.38"
        type: string
        label: datahub-gms Image Tag
        group: "datahub-gms"
        description: "Image tag for datahub-gms"
  - variable: datahub-frontend.enabled
    default: true
    type: boolean
    label: Enable Frontend
    group: "datahub-frontend"
    description: "Enable Frontend"
    show_subquestion_if: true
    subquestions:
      - variable: datahub-frontend.image.registry
        required: true
        default: "docker.io"
        type: string
        label: datahub-frontend Image Registry
        group: "datahub-frontend"
        description: "datahub-frontend Image Registry"
      - variable: datahub-frontend.ingress.enabled
        default: false
        type: boolean
        label: datahub-frontend Ingress
        group: "datahub-frontend"
        description: "Set up ingress to expose react front-end"
  - variable: acryl-datahub-actions.enabled
    default: true
    type: boolean
    label: Enable Actions
    group: "acryl-datahub-actions"
    description: "Enable Actions"
    show_subquestion_if: true
    subquestions:
      - variable: acryl-datahub-actions.image.registry
        required: true
        default: "docker.io"
        type: string
        label: acryl-datahub-actions Image Registry
        group: "acryl-datahub-actions"
        description: "acryl-datahub-actions Image Registry"
      - variable: acryl-datahub-actions.image.repository
        required: true
        default: "public.ecr.aws/datahub/acryl-datahub-actions"
        type: string
        label: acryl-datahub-actions Image Repository
        group: "acryl-datahub-actions"
        description: "Image repository for acryl-datahub-actions"
      - variable: acryl-datahub-actions.image.tag
        required: true
        default: "v0.0.1-beta.13"
        type: string
        label: acryl-datahub-actions Image Tag
        group: "acryl-datahub-actions"
        description: "Image tag for acryl-datahub-actions"
...

Elastic security/auth issue

Hi

trying to get up an instance using Elastic as the graph service but getting the following exception in the GMS pod.

Caused by: org.elasticsearch.ElasticsearchStatusException: method [HEAD], host [http://datahub-elastic-es-http:9200], URI [/graph_service_v1?ignore_throttled=false&ignore_unavailable=false&expand_wildcards=open%2Cclosed&allow_no_indices=false], status line [HTTP/1.1 401 Unauthorized]

my global values for Elastic are below, and addition of the auth section was enough to get the elastic start up job to work.

global:
      graph_service_impl: elasticsearch

      elasticsearch:
        host: "datahub-elastic-es-http"
        port: "9200"
        auth:
          username: elastic
          password:
            secretRef: datahub-elastic-es-elastic-user
            secretKey: elastic

Any thoughts?
Thanks a lot

Jonny

datahub-postgres-setup 0.9.6.1 image not found

Describe the bug

│   Warning  Failed     11s   kubelet            Failed to pull image "acryldata/datahub-postgres-setup:v0.9.6.1": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/acryldata/datahub-postgres-setup:v0 │
│ .9.6.1": failed to resolve reference "docker.io/acryldata/datahub-postgres-setup:v0.9.6.1": docker.io/acryldata/datahub-postgres-setup:v0.9.6.1: not found

To Reproduce
Steps to reproduce the behavior:
Change Helm version to v0.9.6.1,
terraform apply works as expected, datahub-postgres-setup job fails to pull image

Expected behavior
terraform apply works as expected, datahub-postgres-setup job pulls image and runs normally

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Is kafka optional

Hello, Is it possible to deploy DataHub without Kafka? If my understanding is correct Kafka is used for ingestion process and audit events. For ingestion, I will use HTTP and I don't need usage metrics. The problem is that helm chart for datahub-frontend allows providing SSL configuration for ES only when datahub_analytics_enabled is set to true. Does it mean that without Kafka ES is also unused or it is a bug?

Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition

Not able to upgrade to the latest version
Trying to upgrade to latest version leading to following error :

client.go:607: [debug] datahub-datahub-upgrade-job: Jobs active: 1, jobs failed: 3, jobs succeeded: 0
upgrade.go:431: [debug] warning: Upgrade "datahub" failed: post-upgrade hooks failed: timed out waiting for the condition
Error: UPGRADE FAILED: post-upgrade hooks failed: timed out waiting for the condition
helm.go:84: [debug] post-upgrade hooks failed: timed out waiting for the condition
UPGRADE FAILED
main.newUpgradeCmd.func2
	helm.sh/helm/v3/cmd/helm/upgrade.go:200
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:902
main.main
	helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
	runtime/proc.go:255
runtime.goexit
	runtime/asm_amd64.s:1581
make: *** [Makefile:174: deploy-helm] Error 1

To Reproduce
Steps to reproduce the behavior:

  1. Trying to Upgrade the chart from 0.2.109 to 0.2.130
  2. helm upgrade the chart

Expected behavior
Helm upgraded to latest version of datahub

Allow users to configure NodePort as the service type for Frontend and GMS

I'd like to be able to configure the services that are created by the Frontend and GMS to be of type NodePort. For example, here is how the MySQL chart does it and here and here is how the Airflow chart does it. I think the MySQL one is more user-friendly, but please let me know what you think and I'm happy to open a PR.

Describe alternatives you've considered
kubectl port-forward service/prerequisites-mysql 3306:3306 works, but it would be beneficial to do it directly through the Kubernetes deployment.

passing sasl.mechanism and sasl.jaas.config to values not working

Describe the bug
Iam passing sasl.mechanism and sasl.jaas.config values to springKafkaConfigurationOverrides is not picking by kafka.

To Reproduce
Add values in main values.yaml under
springKafkaConfigurationOverrides:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Kafka prerequisite has hardcoded release-name

Describe the bug
Deployment fails when deploying prerequisites with a release name other than prerequisites, because it is hardcoded in the values-file:
https://github.com/acryldata/datahub-helm/blob/datahub-prerequisites-0.0.12/charts/prerequisites/values.yaml#L70

cp-helm-charts:
  # Schema registry is under the community license
  cp-schema-registry:
    enabled: true
    kafka:
      bootstrapServers: "prerequisites-kafka:9092"  # <<release-name>>-kafka:9092

The schema registry fails with the error:

Couldn't resolve server prerequisites-kafka:9092 from bootstrap.servers as DNS resolution failed for prerequisites-kafka

To Reproduce
Steps to reproduce the behavior:

  • Deploy the prerequisites helm-chart with another release name than prerequisuites.
  • Watch the schema registry log.

Expected behavior
The chart should by default use the Helm release name, instead of it being hardcoded.

springKafkaConfigurationOverrides props are not parsed properly

Describe the bug

Error: UPGRADE FAILED: pre-upgrade hooks failed: warning: Hook pre-upgrade datahub/templates/kafka-setup-job.yml failed: Job in version "v1" cannot be handled as a Job: v1.Job.Spec: v1.JobSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects " or n, but found 1, error found in #10 byte of ...|,"value":100}],"imag|..., bigger context ...|ame":"KAFKA_PROPERTIES_MAX_POLL_RECORDS","value":100}],"image":"linkedin/datahub-kafka-setup:v0.8.33|...

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Helm upgrade fails when using hostAliases in crons

Describe the bug
When trying to specify hostAliases in crons, I cannot upgrade. Am I doing something incorrect or is there a bug in the way the file is being interpreted? Following the documentation found here: https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/datahub-ingestion-cron

datahub-ingestion-cron:
  enabled: true
  image:
    repository: acryldata/datahub-ingestion
    tag: "v0.9.2"
  crons:
    my-cron:
      hostAliases:
        - ip: "10.50.1.200"
          hostnames:
            - "example.com"
helm upgrade --install datahub datahub/datahub --values datahub-values.yaml
Error: UPGRADE FAILED: template: datahub/charts/datahub-ingestion-cron/templates/cron.yaml:30:27: executing "datahub/charts/datahub-ingestion-cron/templates/cron.yaml" at <include "common.tplvalues.render" (dict "value" .hostAliases "context" $)>: error calling include: template: no template "common.tplvalues.render" associated with template "gotpl"

Expected behavior
Expect upgrade to work

A short description of the bug

Describe the bug

datahub-mysql-setup job still runs even when values.yml mysql.enabled=False and postgresql.enabled=True.

To Reproduce
Steps to reproduce the behavior:
Use a values.yml to override mysql.enabled

Expected behavior
Use postgresql instead of mysql as backend database.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Error: secret "datahub-encryption-secrets" not found

Hello.
In cause of if I will kill existing pod with gms, next pod cannot to start cuz generated secret "datahub-encryption-secrets" is not exist. I got this error:
Error: secret "datahub-encryption-secrets" not found
This is temporary secret.
Looks like I need to full redeployment of app for get new encryption secret.
Also I saw this option "global.datahub.encryptionKey.secretRef"
but I cant understood what requirements and form of this.
Could you explain how right to use this option for avoid issues with temporary generated secrets.
Thank you.

Elasticsearch single node config

Thank you for these excellent Helm charts! I noticed that, by default, the prerequisites values.yaml spins up 3 Elasticsearch nodes and was wondering if it would be OK to run this as a single instance for local testing. I was reading through the exposed Elasticsearch chart values here and I'm not sure I get how the cluster, master, data and client fields from here are meant to work, since they don't seem documented in the upstream elasticsearch chart. I'm no Helm expert, though.

I was able to start DataHub with Elasticsearch (with graph_service_impl: elasticsearch) in single instance mode in a single node Kind cluster using this config:

elasticsearch:
  enabled: true
  replicas: 1
  minimumMasterNodes: 1
  antiAffinity: "soft"

But I'm quite new to DataHub and I need to figure out if this setup works as expected. For now, the pods seem healthy and I can access the UI.

How can I activate the Impact analysis feature?

I would like to enable the Impact analysis feature, is it supported by helm?
My gms config is as follows. supportsImpactAnalysis is false

{
  "models": {},
  "versions": {
    "linkedin/datahub": {
      "version": "v0.8.35",
      "commit": "bb341f740cf70adc2dd67fa6b6eb174c629c3164"
    }
  },
  "managedIngestion": {
    "defaultCliVersion": "0.8.35",
    "enabled": true
  },
  "statefulIngestionCapable": true,
  "supportsImpactAnalysis": false,
  "telemetry": {
    "enabledCli": true,
    "enabledIngestion": false
  },
  "datasetUrnNameCasing": false,
  "retention": "true",
  "noCode": "true"
}

A short description of the bug

Describe the bug
UnknownTopicOrPartitionException when running kafka-setup-job

To Reproduce
Steps to reproduce the behavior:
Install datahub using the helm chart along with the prerequisites chart. When running the kafka-configs.sh script, it fails with:

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
        at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
        at kafka.admin.ConfigCommand$.getResourceConfig(ConfigCommand.scala:577)
        at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:347)
        at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:327)
        at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:98)
        at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 

None of the topics created in the previous steps with kafka-topics.sh appear to fail, so I'm not exactly sure what this command isn't finding.

Expected behavior
The command should run successfully.

Additional context
I'm running this on GKE, Kubernetes v1.24.5.
Datahub chart version: 0.2.111
Prerequisites chart version: 0.0.10

how to login via ldap configured in jaas.conf

hi here

I could not find any reference on the process to login via ldap configured in jaas.conf

these are the things i've tried.

I removed the user.prop file from the frontend image(v0.2.87) and added the jaas.conf file.

jaas.conf

WHZ-Authentication {
    com.sun.security.auth.module.LdapLoginModule required
    userProvider="ldap://{ldap_host}:{ldap_port}"
    authIdentity="cn=${0},ou=members,o=identitymaster"
    java.naming.security.authentication="simple"
    debug="true"
};

Then there was a problem that I could connect all ID and password I entered.

I want to allow access only to IDs with ldap privileges.

Any help would be appreciated.

Thanks

DataHub frontend Ingress - HTTP 1.0 client does not support chunked response

Is your feature request related to a problem? Please describe.
Behind the scenes we have a VMWare NSX-T virtual network which is not under our direct control. We enabled the ingress for the datahub-frontend through the Helm chart in the following way:

    datahub-frontend:
      enabled: true
      # Set up ingress to expose react front-end
      ingress:
        enabled: true
        hosts:
          - host: datahub.my-dev.online.net
            paths: ["/"]

but we are receiving the following exception messages in the datahub-frontend logs:
play.core.server.common.ServerResultException: HTTP 1.0 client does not support chunked response

The site can be reached only through the LoadBalancer in our case.

Describe the solution you'd like
We would like to be able to define the used HTTP version via the Helm chart.

Or is there another way how to create a "workaround" for our issue, please?

Thank you very much in advance!

Regards Tomas.

Error when installing ALB ingress for datahub frontend

Describe the bug
I'm following the guide to setting up DataHub on AWS
https://datahubproject.io/docs/deploy/aws

I've gotten to the stage where I need to update values.yaml to in order to attach the ALB ingress to the frontend.

It looks like this -

datahub-frontend:
  enabled: true
  image:
    repository: linkedin/datahub-frontend-react
    tag: "latest"
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: instance
      alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:ap-southeast-2:601628467906:certificate/xxxxxx
      alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
      alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    hosts:
      - host: xxxx
        redirectPaths:
          - path: /*
            name: ssl-redirect
            port: use-annotation
        paths:
          - /*

When I run the upgrade however I am getting the following error.

Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(Ingress.spec.rules[0].http.paths[0].backend.service.port.number): invalid type for io.k8s.api.networking.v1.ServiceBackendPort.number: got "string", expected "integer"

Expected behavior
Successful Upgrade

Additional context
ALB seems to there, and site is reachable when using port forwarding for local access

loadBalancerIP attr is missed in service.yaml template

Describe the bug
I need to specify loadBalancerIP but it's imposible since it's missed in helm template

Expected behavior
Something like

spec:
  {{- if .Values.service.loadBalancerIP }}
  loadBalancerIP: "{{ .Values.service.loadBalancerIP }}"
  {{- end }}

in service.yaml

Using special characters in pod annotations values

It is not possible to add special characters to pod annotation values.

For example I need to add this line for filebeat according to the official docs

 co.elastic.logs/multiline.pattern: '^\['

And I get an error even if I change single to double quotes:

Error: Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field ObjectMeta.spec.template.metadata.annotations of type string
helm.go:84: [debug] Deployment in version "v1" cannot be handled as a Deployment: json: cannot unmarshal number into Go struct field ObjectMeta.spec.template.metadata.annotations of type string

It would be great if you change pod annotations template to something like this:

annotations:
  {{- range $key, $value := .Values.podAnnotations }}
  {{ $key }}: {{ $value | quote }}
  {{- end }}

ES indexes are missing on v0.8.14

Describe the bug
I am trying to deploy datahub v0.8.14 . I have had a previous version, but I dropped the database and I recreated elasticsearch. I am then deploying and using the init jobs for mysql, kafka and elasticsearch. The jobs do not fail. However, when I try to open datahub, I am getting a white screen and this error on GMS:
09:15:09.704 [qtp544724190-9] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [datajobdocument]] 09:21:30.808 [qtp544724190-16] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [corpuserinfodocument]] 09:21:30.875 [qtp544724190-9] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [chartdocument]] 09:21:30.875 [qtp544724190-14] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [dataflowdocument]] 09:21:30.875 [qtp544724190-13] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [dashboarddocument]] 09:28:35.177 [qtp544724190-14] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=index_not_found_exception, reason=no such index [datasetdocument]]

I also tried creating the missing indexes manually, but then I am getting the following error:
13:00:08.844 [qtp544724190-15] ERROR c.l.metadata.dao.search.ESSearchDAO - Search query failed:Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]

To Reproduce
I just deploy datahub and this is what happens with ES.

Expected behavior
Functional ES search

Am I missing something?

Manage elasticsearch is not reachable with a port

I ran into an issue with the helm chart, where it defaults to appending :9200 (the port) to the elastic URI, although the AWS ElasticSeararch doesn't work that way (it's reached just https://elastic-host.domain.com without the port). This causes the elasticearch init-job to fail,

2021/07/22 14:38:55 Waiting for: https://myuser:[email protected]:9200

I got past this step by setting it to yaml null, ~:

global:
    elasticsearch:
        port: ~

However, gms does not heed the port variable, and continues to use :9200.

Is there an env var I can use to force GMS onto another port?

Another heads up is that the same happened when not using the default mysql port, but there I could just change it back to default.

Edit: Solved by using port 443 (TLS) in stead of 9200. I'll see if I can contribute a note to the docs somewhere.

Ability to add custom labels to datahub sub-charts

Is your feature request related to a problem? Please describe.
While deploying Datahub in kubernetes with datahub-helm repo I came across a situation where I could not add custom labels to the Ingress configuration for datahub-frontend and datahub-gms charts. I use special labels to the Ingress configuration to identify the type of traffic routing required for my application.

Describe the solution you'd like
Define ingress.customLabels: {} object in values.yaml file for datahub-frontend and datahub-gms; and overwrite them from outside like

datahub-frontend:
  enabled: true
  ingress:
    enabled: true
    customLabels:
      label-name1: value1
      label-name2: value2

Describe alternatives you've considered
Only alternative I had was to fork the frontend and GMS charts to be able to add custom labels I needed

Failed to Create Datahub-frontend Ingress

Helm Upgrade Failed with following error:
ValidationError(Ingress.spec.rules[0].http): missing required field "paths" in io.k8s.api.networking.v1.HTTPIngressRuleValue

Following are the values:

    global:
      graph_service_impl: elasticsearch
    datahub-frontend:
      ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: nginx
        tls:
        - hosts:
          - datahub.cloud.com
          secretName: tls-wildcard
        rules:
        - host: datahub.cloud.com
          http:
            paths:
              - path: /
                pathType: ImplementationSpecific
                backend:
                  service:
                    name: frontend
                    port:
                      number: 80```

Helm dependency is broken

Describe the bug

helm dependency update ./prerequisites
generates error

Error: can't get a valid version for repositories mysql, kafka. Try changing the version constraint in Chart.yaml

To Reproduce

  • Checkout master branch of this repo
  • Run helm dependency update ./prerequisites

Expected behavior
Get all the dependencies

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.