stackabletech / druid-operator Goto Github PK
View Code? Open in Web Editor NEWAn Operator for Apache Druid for Stackable Data Platform
License: Other
An Operator for Apache Druid for Stackable Data Platform
License: Other
https://druid.apache.org/docs/latest/operations/dynamic-config-provider.html
This can possibly be used to directly get mounted env vars into the config file, for things like zookeeper discovery. If it works we should use it.
As a user of druid services I want to be able to use HDFS for deep storage.
Using local storage for deep storage only works if the storage location is reachable by all relevant druid services (middle-manager, historical server).
I've noticed this problem here and have alread reported it upstream, I wanted to document the problem in our repo as well.
TLDR: If the s3 extension is loaded but not actually used (local deep storage, loading data locally) errors appear that a connection to s3 cannot be made.
We could possibly add a switch to decide whether the extension should be loaded or not. If proper credentials are provided as well, it works.
There is however a feature where you load s3 data from a bucket with credentials that you just provide in the Web UI. To allow this ad-hoc s3 access, the extension must be loaded at all times. If we instead provide a switch in the CRD whether it should be loaded or not, that's not possible anymore.
Each process is a seperate pod, and some should run together on the same node (Historical & MiddleManager, Overlord & Coordinator) and that can be modelled with pod affinity.
Useful info in the official druid-operator: https://github.com/druid-io/druid-operator
When implementing support for S3 in the operators out there we took some shortcuts like ignoring accessStyle
and tls
settings for now. We need to honor them and not just silently ignore them.
What needs to be done is described in stackabletech/issues#226
We would like to emit Kubernetes events for all errors, please see the epic for details stackabletech/issues#158
The documentation (in the docs/ directory) needs to contain a section on the versions of the products we explicitly support. Please see the epic stackabletech/issues#139 for details
Currently this operator watches resources in all namespaces.
I'd like this to be configurable so I can specify which namespace to watch.
This should be a clap
argument (which then can be provided on the command line or in an env var) called --watch-namespace
.
It is okay to only take a single namespace for now.
See stackabletech/issues#162 for the overarching epic
Annotation prometheus.io/scrape: "true"
should be added to role services
As a user of services deployed by this operator I'd like to know how to discover its connection details.
It's done when
NOTE: This ticket is part of an epic and autocreated for all our operators. It might not apply to this operator in particular, in that case please comment and close
stackabletech/documentation#86
Acceptance Criteria
We could use the /status/health
endpoint or others (/druid/broker/v1/readiness
for the broker and historical) to implement readiness probes.
To clean up config.rs
we should check all the properties that are set per process and remove the ones that have a default entirely (for now). The ones that are left over should be moved to the product config with a sensible default. At the end, the file should be entirely removed.
We do not map anything into config properties in the resource for now, until we know which properties we should map. Overriding properties is still possible by using configOverrides
.
Configuration Reference (states the defaults for each property): https://druid.apache.org/docs/latest/configuration/index.html
As a user I want to have the druid integration tests moved to the operator repository and for the stability to be improved (e.g. currently the python tests start before the druid components are reachable).
is this label really necessary?
Implementation ticket for #168
The Druid operator should be extended to be able to deploy ingestion specs from definitions provided in CRDs.
Ingestion specs can be defined by the user via a customresource, which is watched by a controller in the Druid operator that then provisions these specs.
The crd will contain at least the following:
These objects will initially be considered read-only, so changes to them will not be propagated to Druid by the controller.
The initial implementation will not be a perfect ingestion task management solution, but rather a first attempt to offer something useful to our users.
The user needs to decide themselves what the appropriate failure behavior is for the spec they provide to the operator, whether duplicate data might be created by retrying etc.
The defined failure options should offer simple solutions for all scenarios:
net2
crate has been deprecated; usesocket2
instead
Details | |
---|---|
Status | unmaintained |
Package | net2 |
Version | 0.2.37 |
URL | deprecrated/net2-rs@3350e38 |
Date | 2020-05-01 |
The net2
crate has been deprecated
and users are encouraged to considered socket2
instead.
See advisory page for additional details.
Originally posted by sbernauer April 5, 2022
In our Docaton we had to use something like this as we were not able to specify s3.pathStyleAccess: true
in the CRD
We need to add that attribute to avoid using custom druid settings
apiVersion: druid.stackable.tech/v1alpha1
kind: DruidCluster
metadata:
name: druid-nytaxidata
spec:
version: 0.22.1
zookeeperConfigMapName: simple-druid-znode
metadataStorageDatabase:
dbType: postgresql
connString: jdbc:postgresql://postgresql-druid/druid
host: postgresql-druid
port: 5432
user: druid
password: druid
s3:
endpoint: http://minio:9000
credentialsSecret: druid-s3-credentials
deepStorage:
storageType: s3
bucket: nytaxidata
baseKey: storage
brokers:
configOverrides:
runtime.properties:
druid.s3.enablePathStyleAccess: "true" ### <<< HERE
roleGroups:
default:
selector:
matchLabels:
kubernetes.io/os: linux
config: {}
replicas: 1
```</div>
I'd like to have a CustomResource/ConfigMap that I can use to define all my Datasources/Ingestions in Druid.
We decided to just take the verbatim JSON as it is generated by the Druid Web UI and have this in a CustomResource.
I thought about having just a ConfigMap instead of a CR but that opens the question on how to reference the Druid cluster in question.
It could be the other way around: A "config map selector" in Druid that selects all ConfigMaps that should be added but that also doesn't seem very agile. Either way this needs to be authorized later.
Metrics can either be written to a log file or posted to an HTTP endpoint. To get that to work with Prometheus, we should use Druid Exporter, deploy it with druid and use that for metrics.
Once we run on docker that should be easy to package together so I'm blocking this issue on the docker issue.
I'd like v0.22.0 packaged, together with a the stackable startup script.
This is the directory structure that I'm working with right now:
druid-0.22.0/
└── apache-druid-0.22.0
├── bin
├── conf
├── lib
├── stackable/
└── run-druid
└── ...
The coordinator supports running the overlord itself, which is how we do it at the moment.
It might be useful to some to have it run in a seperate process/pod.
supposedly seperating them makes Druid more resilient.
This allows for more flexibility and means we don't have to release a new operator for a new upstream version.
This should only be done once updated with templating from stackabletech/operator-templating#55.
build.rs
The k8s healthcheck probes make HTTP requests which get blocked by Druid if Authorization is enabled.
We need a technical user to make these probe requests, in case authentication and authorization are enabled. Otherwise the endpoints cannot be queried.
Druid supports an authentication and authorization chain, with multiple authenticators/authorizers. We can add a second mini-authenticator for just a single health-check user, or maybe reuse the existing one for basic auth, on top of LDAP. we can then use this user to do our health checks. The user should be created automatically, with generated credentials.
failure is officially deprecated/unmaintained
Details | |
---|---|
Status | unmaintained |
Package | failure |
Version | 0.1.8 |
URL | rust-lang-deprecated/failure#347 |
Date | 2020-05-02 |
The failure
crate is officially end-of-life: it has been marked as deprecated
by the former maintainer, who has announced that there will be no updates or
maintenance work on it going forward.
The following are some suggested actively developed alternatives to switch to:
See advisory page for additional details.
Please see the epic stackabletech/issues#129 for details
See Druid documentation
https://druid.apache.org/docs/latest/ingestion/native-batch.html#s3-input-source
Acceptance Criteria
Potential segfault in the time crate
Details | |
---|---|
Package | time |
Version | 0.1.44 |
URL | time-rs/time#293 |
Date | 2020-11-18 |
Patched versions | >=0.2.23 |
Unaffected versions | =0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6 |
Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.
The affected functions from time 0.2.7 through 0.2.22 are:
time::UtcOffset::local_offset_at
time::UtcOffset::try_local_offset_at
time::UtcOffset::current_local_offset
time::UtcOffset::try_current_local_offset
time::OffsetDateTime::now_local
time::OffsetDateTime::try_now_local
The affected functions in time 0.1 (all versions) are:
at
at_utc
now
Non-Unix targets (including Windows and wasm) are unaffected.
Pending a proper fix, the internal method that determines the local offset has been modified to always return None
on the affected operating systems. This has the effect of returning an Err
on the try_*
methods and UTC
on the non-try_*
methods.
Users and library authors with time in their dependency tree should perform cargo update
, which will pull in the updated, unaffected code.
Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3 series.
No workarounds are known.
See advisory page for additional details.
These Acceptance Criteria need to be met:
This depends on stackabletech/docker-images#6 which provides initial Docker images but might require further changes to the images.
This is the same as we did for ZooKeeper in stackabletech/zookeeper-operator#466 but with a new structure according to stackabletech/issues#293.
apiVersion: druid.stackable.tech/v1alpha1
kind: DruidCluster
metadata:
name: druid
spec:
version: 24.0.0-stackable0.1.0
commonConfig:
tls:
# client-server encryption (only server requires a trusted certificate)
serverSecretClass: String # defaults to "tls"
# server-server encryption
internalSecretClass: String # defaults to "tls"
# This should be a Vector. Can be a vector of Strings but preferably an extra struct containing at least a
# String to reference the operator-rs AuthenticationClass (plus optional settings if required)
authentication:
# mTLS (client and server require a trusted certificate)
- authenticationClass: druid-tls-authentication-class # String
authorization:
opa:
configMapName: druid-opa
# all other top level configuration should be under shared-/global-/cluster-config as well
zookeeperConfigMapName: simple-druid-znode
metadataStorageDatabase:
dbType: postgresql
connString: jdbc:postgresql://druid-postgresql/druid
host: druid-postgresql
port: 5432
user: druid
password: druid
deepStorage: ...
This is done when
SecretClass
, AuthenticationClass
)version
or image
and stopped
are moved to commonConfig
(See next for opa config map)commonConfig.authorization
As an admin I'd like all requests to Druid resources optionally be authorized via OpenPolicyAgent (OPA).
To support this in Druid we need to
Authorizer
interfaceI'm happy to split this up into multiple tickets instead of one big one.
I can help with that or whoever picks it up is welcome to do so.
There are currently three directories with multiple files that together (as far as I can tell) make up a single example.
I'd like each file in the examples folder to be useful on its own which means it should include everything that's needed including a comment at the top detailing what's happening in this file for a new user.
This makes it easy for people to get started by issuing a command like kubectl apply -f https://github.com/stackabletech/.....
I'm fine with keeping three directories but I'm also fine with getting rid of the directories.
But each example should contain the ZK cluster definition as well as a Znode and use said Znode configmap instead of the ZooKeeper wide configmap
Data race when sending and receiving after closing a
oneshot
channel
Details | |
---|---|
Package | tokio |
Version | 0.1.22 |
URL | tokio-rs/tokio#4225 |
Date | 2021-11-16 |
Patched versions | >=1.8.4, <1.9.0,>=1.13.1 |
Unaffected versions | <0.1.14 |
If a tokio::sync::oneshot
channel is closed (via the
oneshot::Receiver::close
method), a data race may occur if the
oneshot::Sender::send
method is called while the corresponding
oneshot::Receiver
is await
ed or calling try_recv
.
When these methods are called concurrently on a closed channel, the two halves
of the channel can concurrently access a shared memory location, resulting in a
data race. This has been observed to cause memory corruption.
Note that the race only occurs when both halves of the channel are used
after the Receiver
half has called close
. Code where close
is not used, or where the
Receiver
is not await
ed and try_recv
is not called after calling close
,
is not affected.
See tokio#4225 for more details.
See advisory page for additional details.
The indexer is experimental, and the druid distribution doesn't provide a nice working example to easily add it right away. So this is spun out into a seperate issue now and deferred to later.
Indexer: https://druid.apache.org/docs/0.22.0/design/indexer.html
As a user of the Druid operator I want to refer to an existing S3 connection as specified in stackabletech/documentation#177
This is done when
See stackabletech/operator-rs#377 which is required for this
Part of this epic stackabletech/issues#241
usage.adoc
with product specific information and link to common shared resources conceptusage.adoc
Relevant part of the code: https://github.com/stackabletech/druid-operator/blob/129c5e9769f513c9a9f318392f6b508b2a1f2a81/rust/operator-binary/src/config.rs
There are the jvm settings already in there, should be factored out somewhere
Documentation is missing for Monitoring.
Please see other operators (e.g. ZooKeeper) for the snippet to copy.
Potential segfault in
localtime_r
invocations
Details | |
---|---|
Package | chrono |
Version | 0.4.19 |
URL | chronotope/chrono#499 |
Date | 2020-11-10 |
Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.
No workarounds are known.
See advisory page for additional details.
xml-rs is Unmaintained
Details | |
---|---|
Status | unmaintained |
Package | xml-rs |
Version | 0.8.4 |
URL | https://github.com/netvl/xml-rs/issues |
Date | 2022-01-26 |
xml-rs is a XML parser has open issues around parsing including integer
overflows / panics that may or may not be an issue with untrusted data.
Together with these open issues with Unmaintained status xml-rs
may or may not be suited to parse untrusted data.
See advisory page for additional details.
Implement initial Druid Operator for all Server-/Process Types (https://druid.apache.org/docs/latest/design/processes.html) (ACs: )
Acceptance Criteria
Operator can start/stop/restart a Druid Cluster
Druid configs can be applied and updated
Monitoring is integrated
all Process types are supported (Coordinator, Overlord, Broker, Historical, MiddleManager and Peons, Indexer (Optional), Router (Optional)
all Server types are supported (Master, Query, Data)
support Maturity Level 1 (Is there more todo than in AC 1?)
tbd
As a user of druid services I want to use the HDFS config-map to reference the HDFS endpoint for druid deep storage. Instead of:
deepStorage:
hdfs:
configMapName: production
storageDirectory: /data
I want to use the hdfs config map and the properties contained in the "hdfs-site.xml" key therein.
This is done when
deepStorage
is a complex enum that can be extended to support more options lateransi_term is Unmaintained
Details | |
---|---|
Status | unmaintained |
Package | ansi_term |
Version | 0.12.1 |
URL | ogham/rust-ansi-term#72 |
Date | 2021-08-18 |
The maintainer has adviced this crate is deprecated and will not
receive any maintenance.
The crate does not seem to have much dependencies and may or may not be ok to use as-is.
Last release seems to have been three years ago.
The below list has not been vetted in any way and may or may not contain alternatives;
See advisory page for additional details.
As a user I'd like to use my existing LDAP/AD credentials to log into Druid. This was already done in e.g. NiFi or Trino. This can be especially helpful for writing tests.
The LDAP support should be integrated in the structure from PR #6 which must be finished first.
apiVersion: druid.stackable.tech/v1alpha1
kind: DruidCluster
metadata:
name: druid
spec:
version: 24.0.0-stackable0.1.0
clusterConfig:
tls:
serverSecretClass: String # defaults to "tls"
internalSecretClass: String # defaults to "tls"
authentication:
- authenticationClass: druid-tls-authentication-class # String
- authenticationClass: druid-ldap-authentication-class # String
This is done when
This depends on the reference architecture developed in stackabletech/issues#170
Currently our operators will not act on removed information from the CR in some/most/all cases.
One example:
HBase operator has three roles (master, regionServer, restServer). If I create a HBase server CR with a restServer component and then remove it later (entirely, not setting replicas to 0) our operator will not clean up the STS that belongs to this role.
Use the ClusterResource
struct from operator-rs to manage Kubernetes resources belonging to a Cluster object. An example of its usage can be found in the Superset Operator: https://github.com/stackabletech/superset-operator/blob/main/rust/operator-binary/src/superset_controller.rs#L241
ZNode
and S3Connection
resources are not deleted because the operator cannot know if they are stale or not. operator-rs
(0.25
at the moment).NOTE: This is part of an epic (stackabletech/issues#203) and might not apply to this operator. If that is the case please comment on this issue and just close it. This issue was created as part of a special bulk creation operation.
This issue provides visibility into Renovate updates and their statuses. Learn more
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
Authorize access to Druid by simple OPA RegoRules
Acceptance Criteria
As an analyst I want to log in into Druid having READ or WRITE Access
https://druid.apache.org/docs/latest/operations/security-overview.html
https://druid.apache.org/docs/latest/operations/security-user-auth.html
Acceptance Criteria:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.