banzaicloud / thanos-operator Goto Github PK
View Code? Open in Web Editor NEWKubernetes operator for deploying Thanos
License: Apache License 2.0
Kubernetes operator for deploying Thanos
License: Apache License 2.0
Can we get a new Docker tag/version that incorporates the Query Frontend?
Thanks!
Is your feature request related to a problem? Please describe.
Describe the solution you'd like to see
Describe alternatives you've considered
Additional context
Describe the bug
Operated resources are created without a serviceAccount specified, causing the resources to use the default
service account for the namespace. In many environments with restrictive pod security policies, service accounts are created with least privilege necessary to instantiate resources.
Steps to reproduce the issue:
Create any object-kind in thanos-operator that prompts the generation of a deployment, see that said resource is running in 'default' service account instead of the service account installed with the helm chart.
Expected behavior
The thanos-operator would utilize the service account generated by the helm chart, or have the ability to specify the service account to be used when creating operated resources.
Screenshots
ns/monitoring pod/thanos-operator-6cf7b55df6-jjv6v sa/thanos-operator psp/readwritefs state/Running
ns/monitoring pod/thanos-objstore-bucket-546478d96c-xqzbq sa/default psp/restricted state/PendingCreateContainerConfigError
ns/monitoring pod/thanos-objstore-compactor-5ffd7b764b-9tjjt sa/default psp/restricted state/PendingCreateContainerConfigError
Additional context
Utilizing helm-chart version 0.1.0 / operator version banzaicloud/thanos-operator:0.1.0
Now manager do not produce any messages on errors, resource creation.
Will be very nice to have something like audit logs of manager ongoing processes/errors + be able to configure verbosity
Is your feature request related to a problem? Please describe.
The thanos-operator is distributed by helm chart. While the operator's Deployment is simple enough to recreate, there are lengthy, relatively non-customizable CRD and ClusterRole declarations in this repository that helm non-users have to keep up with.
Describe the solution you'd like to see
What do you think of publishing a jsonnet library for the operator's resources, similar to https://github.com/prometheus-operator/kube-prometheus?
This would allow the operator to be easily consumed by jsonnet-based deployment systems, such as https://tanka.dev.
Describe alternatives you've considered
Additional context
If you're interested in this idea, I'm happy to try to PR it.
Describe the bug
The name in the label seems to tag on the remote name:
Warning FailedCreate 14m (x730 over 8d) statefulset-controller create Pod thanos-sausw2-perf-compute-remote-sausw2-perf-compute-rule-0 in StatefulSet thanos-sausw2-perf-compute-remote-sausw2-perf-compute-rule failed error: Pod "thanos-sausw2-perf-compute-remote-sausw2-perf-compute-rule-0" is invalid: metadata.labels: Invalid value: "thanos-sausw2-perf-compute-remote-sausw2-perf-compute-rule-768cbbd9b8": must be no more than 63 characters
this is making it too long. it seems like you're taking these 3 lines to form the name:
Labels:
app.kubernetes.io/managed-by=thanos-sausw2-perf-general
app.kubernetes.io/name=rule
monitoring.banzaicloud.io/storeendpoint=remote-sausw2-perf-general
Steps to reproduce the issue:
Normal Thanos Operator deployment
Expected behavior
no repeats on the name making it unnecessarily long. make the metadata label name generation more graceful.
When u manually changing / deleting resources created by operator nothing happen.
Will be very cool at least have a recreation of resources based on CRD state, even better will be if operator will overwrite any manual changes.
The Helm chart for the operator creates a ClusterRole
and ClusterRoleBinding
that give very broad cluster-wide access, including being able to access all Secrets, manipulate all Deployments, and so on. This is upsetting our security folks who want to decrease attack surfaces whenever possible using the principle of least-privilege, so I wonder if all this access is really needed or that we at least could get away with a regular Role
and RoleBinding
inside the Helm release namespace (when the operator only manipulates Thanos components in this namespace), or be able to specify namespace(s) that we are allowed to do things in.
I do not have an in-depth understanding of the exact K8s permissions the Thanos operator needs for all its actions, but I think it should be possible to limit it to only manage workloads in Thanos-related namespaces?
Is your feature request related to a problem? Please describe.
When u setup a lot of store-gateway in one cluster, u are no able to set custom nodeselectors/resource qouta's for particular store ( like we do for compactor, for example)
Describe the solution you'd like to see
Add nodeselector and resource fields into the StoreEndpoint spec like we do for compactor and ObjectStore
Describe alternatives you've considered
Manually editing stores after operator created it, but for big amount of stores this is nit the case.
Additional context
Can provide a pull request, if u ok with it
Is your feature request related to a problem? Please describe.
I want to be able to annotate the ingress to let cert-manager and nginx handle the certs etc...
Describe the solution you'd like to see
I'd love to have the ability to do:
HTTPIngress:
host: query.example.com
path: /
annotations:
"kubernetes.io/ingress.class": nginx
"cert-manager.io/cluster-issuer": "letsencrypt-prod"
"kubernetes.io/tls-acme": "true"
Is your feature request related to a problem? Please describe.
To handle very large TSDB buckets, allow sharding with hashmod
Describe the solution you'd like to see
Specifying a replica count for stores will automatically generate a bucket relabel config that splits with hashmod
.
spec:
containers:
- args:
- store
- |
--selector.relabel-config=
- action: hashmod
source_labels: ["__block_id"]
target_label: shard
modulus: 3
- action: keep
source_labels: ["shard"]
regex: 0
Describe alternatives you've considered
Label or time range sharding is also an option, but IMO this setup would be easier to scale.
Additional context
For example, we have a bucket storage with 10k individual TSDB blocks coming from a dozen Prometheus instances. Rather than try to manually shard based on external lables, we can reasonably evenly divide the work among a number of replicas based on consistent hash of block ID.
See the Relabling docs: https://thanos.io/tip/thanos/sharding.md/#relabelling
Is your feature request related to a problem? Please describe.
since the image repository is hardcoded here we cannot deploy this on an airgapped cluster:
https://github.com/banzaicloud/thanos-operator/blob/master/pkg/sdk/api/v1alpha1/thanos_types.go#L23
Describe the solution you'd like to see
make the imagepath (and maybe the image tag) configurable.
Describe alternatives you've considered
There is no real alternative for this.
Additional context
Describe the bug
Then controller will assume that any ServiceMonitor
resource which has a matching name to the Thanos
resource is owned by the controller. This causes a ServiceMonitor
with the name foo-compactor
to be deleted if there is a Thanos
resource in the same namespace with the name foo
.
Steps to reproduce the issue:
Create a Thanos
resource with the name foo
whit a compactor configuration. The important thing is that it does not have any monitor configuration for the compactor. Create a ServiceMonitor
in the same namespace with the name foo-compactor
. Wait until the controller reconciles the Thanos
resource and deletes the ServiceMonitor
.
Expected behavior
The service monitor created outside of the controller should never be modified or deleted by the controller as it is not the owner of the resource.
Screenshots
N/A
Additional context
I think the reason this is occurring is because of this logic. It basically assumes that if there is no service monitor configuration it should delete any service monitor with a given name, even if the operator has not created the service monitor.
thanos-operator/pkg/resources/compactor/servicemonitor.go
Lines 53 to 56 in 1bd4c3c
Describe the bug
So I'm following this: https://banzaicloud.com/docs/one-eye/thanos-operator/quickstarts/multicluster/ on a multi-namespace cluster
I would expect to see on the peer side 3 pods (or at least 3 containers):
Query, Rule and StoreAPI gateway to collect metrics from object storage
Although only 1 pod is set up with 1 container, for the query deployment
Using this with Prometheus operator and Prometheus sidecars, what I get is that the data is indeed uploaded to an object storage,
but I can query only the data of the Prometheus (that is shipped from the sidecar)
Steps to reproduce the issue:
Really just copy these YAMLs https://banzaicloud.com/docs/one-eye/thanos-operator/quickstarts/multicluster/
Expected behavior
Have a complete set-up of Query, Ruler and StoreAPI gateway
Is your feature request related to a problem? Please describe.
When setting a custom data store in thanos.spec.storeGateway.containerOverrides.volumeMounts the objectstore configmap get's overwritten.
This kind of destroys the usability of having ObjectStore.
Describe the solution you'd like to see
Add bool merge under thanos.spec.storeGateway.containerOverrides.volumeMounts or set merge/patch as default.
Example
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos-sample
spec:
queryDiscovery: true
query: {}
storeGateway:
containerOverrides:
volumeMounts:
- name: thanos-data
mountPath: ./data
merge: true
- mountPath: /etc/config/
name: objectstore-secret
readOnly: true
workloadOverrides:
volumes:
- name: thanos-data
emptyDir: {}
- name: objectstore-secret
secret:
defaultMode: 420
secretName: thanos-objstore-config
Describe alternatives you've considered
Ether make merge/path to the default or make it opt in or opt out.
Additional context
I need to define a location to store my volumeMounts due to that I have PSP rules that say I'm not allowed to write data to root dir.
Describe the bug
When I configure a StoreEndpoint
like this:
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: StoreEndpoint
metadata:
name: storeendpoint
spec:
thanos: thanos
url: https://url.to.my.external.querier:443
config:
mountFrom:
secretKeyRef:
name: thanos-storage-config
key: config
I get this error:
fetching store info from https://url.to.my.external.querier:443: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = "transport: Error while dialing dial tcp: address https://url.to.my.external.querier:443: too many colons in address"
Steps to reproduce the issue:
N/A
Expected behavior
I should have my querier correctly connected
Screenshots
N/A
Additional context
Installed from the helm chart
BaseTypes support common overwritable attributes: https://github.com/banzaicloud/operator-tools/blob/master/pkg/types/base_types.go
Describe the bug
If u will try to setup the following config
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos
spec:
clusterDomain: platform-rke-test-env
enableRecreateWorkloadOnImmutableFieldChange: true
query:
metaOverrides:
replicas: "3"
u still get only 1 pod of query
Steps to reproduce the issue:
Set the
query:
metaOverrides:
replicas: "3"
Expected behavior
U should be able to control the number of instances for query and queryFrontend
Additional context
With query there are a strange behaviour , every time i deploy query even without
query:
metaOverrides:
replicas: "3"
i getting 2 instances of query until one of them don't pass readiness probe, then the second one is marking as "removing"
And if i set replicas into 2 or 3 , i will get 3 instances of query on startup, but then 2 of them will be removed.
The same thing with queryFrontend
Describe the bug
In the latest release (0.3.3) the option to create grafana datasources for a query was added. This also introduced a nil pointer error when a Thanos instance is created without a query configured.
Steps to reproduce the issue:
Create a Thanos object without a query object configured. Wait for the controller to reconcile the Thanos object and panic due to invalid memory reference.
Expected behavior
Thanos objects without a Query object should be able to be reconciled without the operator crashing.
Screenshots
Additional context
The problem is caused from this code not checking if the query object is nil or not.
Describe the bug
Thanos operator is used for deploying the the thanos components (storegateway, query, query frontend).
Thanos store container fills up the ephemeral disk that is only 50GB.
Tried configuring the PVC for the store container, but the operator is not accepting the additional PVC volume and gives the below error.
thanos> Reconciler error failed to reconcile resource: failed to create resource: creating resource failed: Deployment.apps "sandbox-us-west-2-thanos-cluster-store" is invalid: spec.template.spec.volumes[0].persistentVolumeClaim: Forbidden: may not specify more than 1 volume type (name: sandbox-us-west-2-thanos-cluster-store, namespace: monitoring, apiVersion: apps/v1, kind: Deployment, name: sandbox-us-west-2-thanos-cluster-store, namespace: monitoring, apiVersion: apps/v1, kind: Deployment) name: sandbox-us-west-2, namespace: monitoring
Here is my Thanos store gateway config input given to Thanos CR.
storeGateway:
indexCacheSize: 250MB
deploymentOverrides:
spec:
template:
spec:
containers:
This issue is not happening in other clusters where the ephemeral disk is 100GB. I tried configuring the IndexCacheSize to 10GB, and it is still not taking effect.
Have anyone encountered this issue?
How do we configure the Thanos store Gateway to use the PVC or configure the object download limit on the ephemeral disk?
Thanos Operator details:
Thanos Version: thanos:v0.19.0-rc.0
Thanos Operator Version: 0.3.3
Object Storage Provider: S3
Describe the bug
When i try to set retention for compactor for example for 30d, the following error occurring:
E1028 11:06:51.065884 807 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1alpha1.ObjectStore: v1alpha1.ObjectStoreList.Items: []v1alpha1.ObjectStore: v1alpha1.ObjectStore.Spec: v1alpha1.ObjectStoreSpec.Compactor: v1alpha1.Compactor.RetentionResolution5m: RetentionResolution1h: unmarshalerDecoder: time: unknown unit d in duration 30d, error found in #10 byte of ...|n1h":"30d","retentio|..., bigger context ...|or-iam-eks-sbx-eu"}},"retentionResolution1h":"30d","retentionResolution5m":"30d","retentionResolutio|... 28/10/2020 13:07:39
Steps to reproduce the issue:
Set any of retentionResolution1h/retentionResolution5m/retentionResolutionraw as 30d
Expected behavior
It should respect any units ( hours, days, years)
Additional context
working with hours (h)
Is your feature request related to a problem? Please describe.
Thanos compactor uses close to no resources, except when actually compacting data. However, AFAIK using thanos-operator
the way it is done for now, we have to deploy the compactor as a deployment, hence setting request/limits for the CPU/RAM to the resources used when compactor is doing stuff, which can be quite a waste of resources.
Describe the solution you'd like to see
Allow the ability to deploy Thanos compactor as a CronJob instead of a Deployment.
Describe alternatives you've considered
Losing resources when the compactor does not actually compact anything.
Additional context
None.
Is your feature request related to a problem? Please describe.
Currently we are trying to deploy the compactor to a specific set of nodes with extra disk space. We tried to add the workloadOverrides with nodeSelector and tolerations but that does not seem to be picked up. It's possible we are just using the overrides incorrectly too since the documentation is limited on how to use them.
compactor:
workloadOverrides:
tolerations:
- effect: NoSchedule
key: dedicated
value: prometheus
nodeSelector:
dedicated: prometheus
Describe the solution you'd like to see
Either have specific fields to add these to the components (compactor and others) or allow the overrides to pull them in.
Describe alternatives you've considered
Patching the deployment after the operator creates it.
Describe the bug
The config to be able to override container configuration for the different parts of thanos is missing.
Steps to reproduce the issue:
Install 0.1.1 helm chart
helm upgrade -i thanos-operator --namespace monitor banzaicloud-stable/thanos-operator --version 0.1.1
Apply a Thanos CR that looks something like this:
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos-sample
spec:
queryDiscovery: true
query: {}
rule:
containerOverrides:
volumeMounts:
- name: thanos-data
mountPath: ./data
- mountPath: /etc/config/
name: objectstore-secret
readOnly: true
workloadOverrides:
volumes:
- name: thanos-data
emptyDir: {}
- name: objectstore-secret
secret:
defaultMode: 420
secretName: thanos-objstore-config
storeGateway:
containerOverrides:
volumeMounts:
- name: thanos-data
mountPath: ./data
- mountPath: /etc/config/
name: objectstore-secret
readOnly: true
workloadOverrides:
volumes:
- name: thanos-data
emptyDir: {}
- name: objectstore-secret
secret:
defaultMode: 420
secretName: thanos-objstore-config
Upgrade to helm chart 0.2.1
helm upgrade thanos-operator --namespace monitor banzaicloud-stable/thanos-operator --version 0.2.1
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_objectstores.yaml
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_receivers.yaml
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_storeendpoints.yaml
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_thanos.yaml
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_thanosendpoints.yaml
k apply -f https://raw.githubusercontent.com/banzaicloud/thanos-operator/chart/thanos-operator/0.2.1/charts/thanos-operator/crds/monitoring.banzaicloud.io_thanospeers.yaml
Expected behavior
Everything keeps on working like in 0.1.1
Screenshots
Additional context
Instead i get the following crashLoopBack due to I'm not able to write to disk due to a muttating webhooks that applies
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- NET_RAW
readOnlyRootFilesystem: true
Thanks to this I get similar error in both:
statefulsets.apps thanos-sample-storeendpoint-receiver-rule
and
deployment thanos-sample-storeendpoint-receiver-store
➜ k logs thanos-sample-storeendpoint-receiver-store-868b54d476-6qznc
level=info ts=2021-04-13T08:31:15.0960554Z caller=main.go:152 msg="Tracing will be disabled"
level=info ts=2021-04-13T08:31:15.0962121Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-04-13T08:31:15.1869422Z caller=inmemory.go:172 msg="created in-memory index cache" maxItemSizeBytes=131072000 maxSizeBytes=262144000 maxItems=maxInt
level=error ts=2021-04-13T08:31:15.1872223Z caller=main.go:186 err="mkdir data: read-only file system\nmeta fetcher\nmain.runStore\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/store.go:280\nmain.registerStore.func1\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/store.go:119\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:184\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\npreparing store command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:186\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"
Describe the bug
When trying to set:
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos
namespace: monitoring
spec:
query:
queryTimeout: 120s
storeResponseTimeout: 120s
I get the following error from thanos-query
pod:
Error parsing commandline arguments: not a valid duration string: "&Duration{Duration:2m0s,}"
Seems like the string value based to thanos query
is not right?
Steps to reproduce the issue:
Apply the above yaml for Thanos.
Expected behavior
Expected the query and store response timeouts to be set to 120s.
Screenshots
N/A
Additional context
Please let me know if you need additional information!
Describe the bug
No store endpoints registered in query after single cluster setup on version 0.1.1
Steps to reproduce the issue:
sample Configuration from here https://banzaicloud.com/blog/thanos-operator/#single-cluster-deployment
some secret here
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: ObjectStore
metadata:
name: objectstore-sample
spec:
config:
mountFrom:
secretKeyRef:
name: some-secret
key: object-store.yaml
compactor: {}
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: StoreEndpoint
metadata:
name: storeendpoint-sample
spec:
thanos: thanos-sample
config:
mountFrom:
secretKeyRef:
name: some-secret
key: object-store.yaml
selector: {}
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos-sample
spec:
queryDiscovery: true
clusterDomain: some-custom-name
query: {}
queryFrontend: {}
storeGateway: {}
Expected behavior
After store-gateway creation it should automatically appear in query "stores"
Describe the bug
I followed this blog post to setup the thanos observer/observee clusters.
When I apply following resource it creates wrong discovery domains:
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: query-master
spec:
query: {}
queryDiscovery: true
Created result:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- args:
- query
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --log.level=info
- --store=dnssrvnoa+_grpc._tcp.thanos-apps-query.monitoring.cluster.local.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-monitoring-query.monitoring.cluster.local.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-data-query.monitoring.cluster.local.cluster.local
image: quay.io/thanos/thanos:v0.15.0
Take a look into this part cluster.local.cluster.local
Steps to reproduce the issue:
helm install thanos-operator --namespace monitor banzaicloud-stable/thanos-operator --set manageCrds=false
Installed chart version: 0.1.1
kubectl get deploy thanos-query-master-query -o yaml
find wrong descriptions of discovery domainsExpected behavior
This part:
- --store=dnssrvnoa+_grpc._tcp.thanos-apps-query.monitoring.cluster.local.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-monitoring-query.monitoring.cluster.local.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-data-query.monitoring.cluster.local.cluster.local
Should looks like this:
- --store=dnssrvnoa+_grpc._tcp.thanos-apps-query.monitoring.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-monitoring-query.monitoring.cluster.local
- --store=dnssrvnoa+_grpc._tcp.thanos-data-query.monitoring.cluster.local
Describe the bug
When i trying setup following configuration
queryFrontend:
queryRangeMaxRetriesPerRequest: 2
queryRangeMaxQueryParallelism: 14
queryRangeSplit: "6h"
queryRangeResponseCacheMaxFreshness: "5m"
operator don't add into deployment following args
--query-range.max-retries-per-request
--query-range.max-query-parallelism
Steps to reproduce the issue:
add into the configuration
queryRangeMaxRetriesPerRequest:
queryRangeMaxQueryParallelism:
Expected behavior
--query-range.max-retries-per-request
--query-range.max-query-parallelism
should be added to deployment args
Additional context
Maybe related to the type, cause other option(string based) working just fine
Is your feature request related to a problem? Please describe.
We'd like to set the following flags on the thanos store component:
--experimental.enable-index-cache-postings-compression
--selector.relabel-config
/ --selector.relabel-config-file
Describe the solution you'd like to see
We could add these options to Thanos.spec.storeGateway.
Describe alternatives you've considered
While no substitute for spec fields, #71 could be a nice escape hatch allowing users to set arbitrary thanos (store) flags before operator support is added.
Additional context
I'm happy to send PRs for this, but it'd be nice to get feedback on #71 first.
Is your feature request related to a problem? Please describe.
When using the prometheus operator, one can configure Prometheus to only look at ServiceMonitors that match a particular label. With the thanos-operator, the serviceMonitor fields are only booleans, and I do not see any way to add labels to them to match what prometheus operator would expect.
Describe the solution you'd like to see
A new field serviceMonitorLabels
or something to that effect that would add additional labels to the Service Monitor resource.
Describe alternatives you've considered
Additional context
Describe the bug
Setting Thanos.spec.storeGateway.timeRanges
, as documented in https://github.com/banzaicloud/thanos-operator/blob/master/docs/types/thanos_types.md, has no effect.
Steps to reproduce the issue:
Upload the following manifests:
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: Thanos
metadata:
name: thanos
spec:
storeGateway:
timeRanges:
- maxTime: -24h
---
apiVersion: monitoring.banzaicloud.io/v1alpha1
kind: StoreEndpoint
metadata:
name: bucket
spec:
config:
mountFrom:
secretKeyRef:
key: EXAMPLE
name: EXAMPLE
thanos: thanos
Observe that the resultant store pods do not have a --max-time flag
Expected behavior
I'd expect the resultant store pods' Pod.spec.containers.args
to have a --max-time=-24h
argument.
Screenshots
Additional context
It's possible I've misunderstood the operator code, but it looks like this option does nothing, looking at this setArgs
function. We either need to set the flags with reflection on a thanos
struct tag, if this is not too awkward to do with the nested struct, or write code to handle this field.
Is your feature request related to a problem? Please describe.
Thanos operator is spwaning bucket web with wrong args for 0.13+
containers:
- args:
- bucket
- web
- --log.level=info
- --http-address=0.0.0.0:10902
- --objstore.config-file=/etc/config/object-store.yaml
- --refresh=1800s
- --timeout=300s
image: quay.io/thanos/thanos:v0.13.0
imagePullPolicy: Always
name: bucket
ports:
- containerPort: 10902
name: http
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
Describe the solution you'd like to see
Change arguments for the deployment to use correct args aka
tools bucket web
instead of current
Additional context
I got a PR coming up after I have tested it.
Describe the bug
Steps to reproduce the issue:
Expected behavior
Screenshots
Additional context
Is your feature request related to a problem? Please describe.
We use memcached as an external cache for thanos-store, for both the index and chunk caches. See --index-cache.config
and --store.caching-bucket.config
in https://thanos.io/tip/components/store.md.
Describe the solution you'd like to see
Ideally, there would be configurable fields under Thanos.spec.storeGateway
or StoreEndpoint.spec
that translate into the YAML snippets that Thanos uses as external cache configs. These could be passed as string literals into flags, or mounted as files from ConfigMaps or Secrets.
Describe alternatives you've considered
Alternatively / in addition, we could add an "extra args" setting to allow users to pass arbitrary flags to thanos store (and other thanos components). It looks like this could either be done in this repo or by adding an "Args" field to ContainerBase in https://github.com/banzaicloud/operator-tools.
Additional context
I'm happy to send a PR, and I'm of course interested in what the maintainers think about the problem, and its proposed solutions.
Describe the bug
Query Frontend supports --query-frontend.log-queries-longer-than flag to log queries running longer than some duration.
The flag is wrongly hardcoded in the thanos operator. _(underscore) is in place instead of -(hiphen)
https://github.com/banzaicloud/thanos-operator/blob/master/pkg/sdk/api/v1alpha1/thanos_types.go#L141
Expected behavior
--query-frontend.log-queries-longer-than - This is flag expected by thanos querier frontend
Describe the bug
Resources for receiverGroups are currently not removed when a receiverGroup is removed from the CR. This is becase the receiverGroup is an array and some extra logic would be required to detect and delete resources for groups that are not available anymore in the configuration.
Steps to reproduce the issue:
Create a receiver with two groups, like in the example:
config/samples/monitoring_v1alpha1_receiver.yaml
Expected behavior
The resources that belong to the group are removed.
Additional context
A potential solution would be to use the component reconciler from operator-tools with the receiver as the parent object.
Describe the solution you'd like to see
Thanos store supports memcached for index caches. It would be useful to manage the memached via the operator.
Describe alternatives you've considered
Running a separate memcached service.
Describe the bug
I see version 0.1.1 of this operator is tagged and available on Dockerhub. The helm chart currently pins the operator's image tag to the Chart.yaml's appVersion, and this can't be overriden.
Would it be possible to release a helm chart for v0.1.1?
Steps to reproduce the issue:
Expected behavior
Screenshots
Additional context
I'd like to try the new query frontend support, which from the dates of various releases looks like it might be available in v0.1.1.
We're currently consuming the helm chart via tanka's helm chart ingestion feature, so I can jsonnet-patch the image tag in the mean time as a workaround.
Is your feature request related to a problem? Please describe.
I do not want to create an ingress for querier, as there's a local grafana instance that will access it in-cluster.
Describe the solution you'd like to see
A bool variable in the CRD spec that disables ingress creation.
Describe alternatives you've considered
N/A
Additional context
N/A
I see the thanos-operator is supported thanos receiver. How do i deploy
#33
Thank you
Describe the bug
It looks to me like this commit downgrades the ingress to v1beta1
, but does not undo the field name changes from this commit. This results in the following error when applying:
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Ingress.spec.rules[0].http.paths[0].backend): unknown field "service" in io.k8s.api.networking.v1beta1.IngressBackend
Steps to reproduce the issue:
I am applying the helm chart via Terraform:
resource "helm_release" "thanos-operator" {
repository = "https://kubernetes-charts.banzaicloud.com"
chart = "thanos-operator"
name = "thanos"
namespace = "my-namespace"
values = [file("${path.module}/data/values.yaml")]
}
Setting ingress.enabled = true
in values.yaml
should trigger this bug.
Expected behavior
The ingress should be created.
Describe the bug
The stateless components query and query-frontend have hard-coded replica counts, that can't be overriden in deploymentOverrides.
Steps to reproduce the issue:
Create a Thanos instance with spec.queryFrontend.deploymentOverrides.replicas
to 2, and observe that the resultant deployment created by the operator has only one replica.
You can see the hardcoding here: https://github.com/craigfurman/thanos-operator/blob/query-frontend-service-type-configurability/pkg/resources/query_frontend/deployment.go#L40
Expected behavior
Query or QueryFrontend deployments have replica counts equal to the value of the relevant deploymentOverrides.replicas.
Screenshots
Additional context
We are experimenting with Thano and Thanos operator in our AWS EKS environment
We are having an issue with pod annotations for kube2iam…not really sure on how to set them up using the Thanos config files
We are using the following configuration, but the annotations do not appear on the pods and we can see access denied errors in the logs...attached
apiVersion: monitoring.banzaicloud.io/v1alpha1 kind: ObjectStore metadata: name: objectstore-ice-2 spec: config: mountFrom: secretKeyRef: name: thanos key: object-store.yaml bucketWeb: label: cluster compactor: workloadMetaOverrides: annotations: {"iam.amazonaws.com/role": "k8s-thanos-metrics" }
Is your feature request related to a problem? Please describe.
I want be able to configure AntiAffinity to not to collocate pods on the same nodes for example
Describe the solution you'd like to see
Add affinity/AntiAffinity into supported types
Is your feature request related to a problem? Please describe.
Thanos 0.15.0 now includes a Query Frontend component. The operator should be able to deploy this.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like to see
Support for recieve components.
Describe alternatives you've considered
DIY thanos statefulset installs
Additional context
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.