Coder Social home page Coder Social logo

stackabletech / commons-operator Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 1.0 943 KB

Operator for common objects of the Stackable Data Platform

License: Other

Rust 36.31% Makefile 14.84% Smarty 3.29% Dockerfile 1.73% Python 18.58% Shell 3.13% Jinja 11.69% Starlark 2.66% Nix 7.76%

commons-operator's People

Contributors

adwk67 avatar dependabot[bot] avatar fhennig avatar lfrancke avatar maltesander avatar nicklarsennz avatar nightkr avatar razvan avatar renovate-bot avatar renovate[bot] avatar sbernauer avatar soenkeliebau avatar stackable-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

brandwatchltd

commons-operator's Issues

Lower memory usage of commons-operator

Currently the memory required by the commons-operator scales with the amount of Pods in a K8s cluster.
This can lead to more or less unbounded memory usage.

We need to investigate how we can limit this.
One possibility to limits this is by leveraging the work that will be done in stackabletech/issues#188

RUSTSEC-2020-0071: Potential segfault in the time crate

Potential segfault in the time crate

Details
Package time
Version 0.1.43
URL time-rs/time#293
Date 2020-11-18
Patched versions >=0.2.23
Unaffected versions =0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

The affected functions from time 0.2.7 through 0.2.22 are:

  • time::UtcOffset::local_offset_at
  • time::UtcOffset::try_local_offset_at
  • time::UtcOffset::current_local_offset
  • time::UtcOffset::try_current_local_offset
  • time::OffsetDateTime::now_local
  • time::OffsetDateTime::try_now_local

The affected functions in time 0.1 (all versions) are:

  • at
  • at_utc
  • now

Non-Unix targets (including Windows and wasm) are unaffected.

Patches

Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3 series.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.

The restart controller is not allowed to evict pods on GKE (and potentially other k8s flavors) due to missing rights in the clusterrole

Affected version

0.2.0

Current and expected behavior

Expected:
When a pod reaches the expiration date specified by the annotation '' it should be restarted by the restart controller.

Current:
The pod is not restarted, and the restart controller logs the following error in its log:

Failed to reconcile object controller.name="pod.restarter.commons.stackable.tech" error=reconciler for object Pod.v1./simple-nifi-node-default-0.default failed error.sources=[failed to evict Pod, ApiError: pods "simple-nifi-node-default-0" is forbidden: User "system:serviceaccount:default:commons-operator-serviceaccount" cannot create resource  │
│ "pods/eviction" in API group "" in the namespace "default": Forbidden (ErrorResponse { status: "Failure", message: "pods \"simple-nifi-node-default-0\" is forbidden: User \"system:serviceaccount:default:commons-operator-serviceaccount\" cannot create resource \"pods/eviction\" in API group \"\" in the namespace \"default\"", reason: "Forbidden", code: 403 }), pods "simple-nifi-node-default-0" is forbidden: User "syste │
│ m:serviceaccount:default:commons-operator-serviceaccount" cannot create resource "pods/eviction" in API group "" in the namespace "default": Forbidden]

Possible solution

The restart controller needs to be granted create rights on the 'pods/eviction' objects.
To achieve this the following should be added to the clusterrole for this deployment:

  - apiGroups:
      - ""
    resources:
      - pods/eviction
    verbs:
      - create

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

yes

RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

Potential segfault in localtime_r invocations

Details
Package chrono
Version 0.4.19
URL chronotope/chrono#499
Date 2020-11-10

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

Workarounds

No workarounds are known.

References

See advisory page for additional details.

StatefulSet restarter always restarts replica 0 immediately after initial rollout

Current behaviour

New restarter-enabled StatefulSet have their replica 0 restarted after the initial rollout is complete.

Expected Behaviour

The initial rollout should be completed "normally" with no extra restarts.

Why does this happen?

There is a race condition between Kubernetes' StatefulSet controller creating the first replica Pod and commons' StatefulSet restart controller adding the restart trigger labels. If the restarter loses the race then the first replica is created without the metadata, triggering a restart once it is added.

What can we do about it?

Add a mutating webhook (see the spike) that adds the relevant metadata. The webhook must not replace the existing controller, since webhook delivery is not reliable.

However, webhook delivery requires a bunch of extra infrastructure that we do not currently have, namely:

  1. We must implement the K8s webhook HTTPS API (https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
  2. We must generate a TLS certificate and write that into the MWC

Definition of done

  • Webhook certificate is managed (provisioned and renewed) by the commons operator
  • The Webhook should apply the same podTemplate annotations as the controller currently does
    • Initial STS rollout does not cause a restart (STS.metadata.generation should stay 1)
  • Controller must still apply metadata if the webhook is disabled and/or fails (at the cost of still doing the extra restart in this case)
  • The webhook must fail open (failurePolicy: Ignore)
  • A Kuttl test should verify the above (maybe minus the failurePolicy)
Original ticket The StatefulSet of a Superset cluster is immediately restarted after its creation. This should not be necessary and should be prevented.
$ kubectl describe statefulset simple-superset-node-default
...
Events:
  Type    Reason            Age                    From                    Message
  ----    ------            ----                   ----                    -------
  Normal  SuccessfulDelete  3m31s                  statefulset-controller  delete Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful
  Normal  SuccessfulCreate  2m59s (x2 over 3m31s)  statefulset-controller  create Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful

After the restart, the Superset pods are annotated as follows:

annotations:
  configmap.restarter.stackable.tech/simple-superset-node-default: cf60300e-0c45-4ee2-b60c-de53b0084182/21998
  secret.restarter.stackable.tech/simple-superset-credentials: e0e4b781-46f9-44f4-80c8-a0876b91ed8b/16909

This could be an indication that the restart controller of the commons-operator is involved.

The commons-operator is busy while the StatefulSet is restarted:

2022-09-16T10:02:50.228971Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.236144Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.239371Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.255219Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.258511Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.266146Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.274647Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:51.433621Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:51.449009Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default

Bump operator-rs to 0.27.1

Update operator-rs to 0.27.1.

  • operator-rs bumped
  • operator has no dependecy on kube or serde_yaml (use stackabletech/operator-rs#450 instead)
  • Use Fragment wherever possible
  • Orphaned resources mechanism is used
  • Use parser for k9s Quantity instead of parsing memory values yourself

Product image selection will be tracked later on by stackabletech/issues#305 but should be pretty easy compared to the changes in this Issue

Add kuttl tests

Currently, when running ./scripts/run_tests.sh, beku gives an error:

No such file or directory: 'tests/test-definition.yaml'

We should add some relevant tests for the commons-operator. There was mention of the restarter as one contender.

RUSTSEC-2021-0139: ansi_term is Unmaintained

ansi_term is Unmaintained

Details
Status unmaintained
Package ansi_term
Version 0.12.1
URL ogham/rust-ansi-term#72
Date 2021-08-18

The maintainer has adviced this crate is deprecated and will not
receive any maintenance.

The crate does not seem to have much dependencies and may or may not be ok to use as-is.

Last release seems to have been three years ago.

Possible Alternative(s)

The below list has not been vetted in any way and may or may not contain alternatives;

See advisory page for additional details.

RUSTSEC-2024-0013: Memory corruption, denial of service, and arbitrary code execution in libgit2

Memory corruption, denial of service, and arbitrary code execution in libgit2

Details
Package libgit2-sys
Version 0.15.2+1.6.4
URL rust-lang/git2-rs#1017
Date 2024-02-06
Patched versions >=0.16.2

The libgit2 project fixed three security issues in the 1.7.2 release. These issues are:

  • The git_revparse_single function can potentially enter an infinite loop on a well-crafted input, potentially causing a Denial of Service. This function is exposed in the git2 crate via the Repository::revparse_single method.
  • The git_index_add function may cause heap corruption and possibly lead to arbitrary code execution. This function is exposed in the git2 crate via the Index::add method.
  • The smart transport negotiation may experience an out-of-bounds read when a remote server did not advertise capabilities.

The libgit2-sys crate bundles libgit2, or optionally links to a system libgit2 library. In either case, versions of the libgit2 library less than 1.7.2 are vulnerable. The 0.16.2 release of libgit2-sys bundles the fixed version of 1.7.2, and requires a system libgit2 version of at least 1.7.2.

It is recommended that all users upgrade.

See advisory page for additional details.

RUSTSEC-2022-0048: xml-rs is Unmaintained

xml-rs is Unmaintained

Details
Status unmaintained
Package xml-rs
Version 0.8.4
URL https://github.com/netvl/xml-rs/issues
Date 2022-01-26

xml-rs is a XML parser has open issues around parsing including integer
overflows / panics that may or may not be an issue with untrusted data.

Together with these open issues with Unmaintained status xml-rs
may or may not be suited to parse untrusted data.

Alternatives

See advisory page for additional details.

Annotate pods with node address

stackabletech/hdfs-operator#175 (and others, like Kafka, will have the same problem later on) currently have an init job that reads the `Node' object to get the external IP. This embeds some annoying policy into the product container (such as which address to prefer), and requires the product Pod to have excessive privileges (you need a ClusterRoleBinding to be able to access Node objects).

I'd like to propose that we add a controller to the commons operator that pulls this information out into annotations on the Pod object (which we can then add a projected environment variable that reads).

The existing Pod.status.hostIP field is not sufficient, because:

  • This typically prefers the internal IP rather than external
  • Depending on cluster policy we'd like to be able to use hostnames here in the future (such as for Kerberos)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.