Coder Social home page Coder Social logo

agent's People

Contributors

dependabot[bot] avatar lfrancke avatar maltesander avatar nightkr avatar pipern avatar siegfriedweber avatar soenkeliebau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agent's Issues

Ability to update systemd unit files

Implement the ability to check whether a systemd unit file needs an update based on upstream (orchestrator) changes.
Change the unit file based on these changes.

Agent should check on startup if important directories are readable/writable

When the data directory is not writeable the Agent will exit with a weird panic from the bowels of Krustlet because the plugins dir cannot be created. To avoid this we should check if the data directory exists and is writeable so we can provide a nicer error.

The same might be nice for the log dir and maybe others? Not sure where it makes sense.

Ability to perform daemon-reload

Whenever any unit file was changed systemd needs to reload information from the unit files. This should be transparently triggered after any relevant changes.

Process Management using systemd

One of the core components of the Agent will be to manage processes.
We decided to use systemd as the tool doing the actual supervision as it's available on all our target systems (Debian, Ubuntu, RHEL).

The Agent needs a way to manager systemd Units and this ticket is about implementing this functionality.

Every time a process starts it'll need configuration data on disk which needs to be materialized from configuration in the Operator/Orchestrator resources.
To make debugging easier we'd like to have a new uniquely named directory in a configurable location (e.g. /var/run/stackable/...) which has all the files a process needs.
An open question is whether restarts due to failure reuse an existing configuration directory or not.

This is the epic for this task and these are the dependencies:

  • #3 Create a data structure (object/resource) to describe a systemd unit
  • #4 Investigate existing crates that deal with systemd and see if we can use one or more of these for the functionatily we need
  • #8 Investigate whether we can put our unit files in a separate location from the standard unit files
  • #5 Create systemd unit files
  • #6 Update systemd unit files
  • #7 Delete systemd unit files
  • #9 Reload systemd (daemon-reload)
  • #10 Receive and process notifications from systemd about process state changes (e.g. a process dying or restarting) - this will probably happen via D-Bus
  • #11 Trigger a start/stop/restart of a unit
  • #12 Materialize the required configuration files/data, keytabs, certificates etc.

Open Issues/questions:

  • Log file target (stdout/stderr)
  • Restart/failure behavior
  • Environment variable
  • Start on boot behavior

Document mapping of pod data structure to systemd service files

We need to define, which pod properties we initially want to support and how we want to map those to our systemd unit files.

I'll just start with my initial ideas in this ticket, if we find out that it becomes too unwieldy we'll move the discussion somewhere else.

Once we are agreed, I'll pull the result out into an ADR.

The following shows fields I've looked at so far and where in the systemd file I'd extract them, but I am sure that list is far from complete.

pod:
  metadata:
    name: -> name
  spec:
    containers: <we currently only allow one container per pod>
      image: <used by virtual kubelet for downloading package>
      env: Environment=...
      command: ExecStart
      args: ExecStart
      name: <unused, taken from meta.name>
      volumeMounts: <used by virtual kubelet to set up machine>
      workingDir: WorkingDirectory
    initcontainers: <we could either implement these as extra services linked by a before= statement, or build ExecStartPre commands from them, not sure which makes more sense>
    restartPolicy: Restart
    terminationGracePeriodSeconds: TimeoutStopSec  <Systemd also offers TimeoutStartSeconds but I could not find a matching pod field, should we reuse this one for both?>

A few assumptions in there that might be worth discussing. We currently restrict pods to only have one container as the idea was to just create multiple pods if you need more. Do we want to change that and allow multiple containers? What would be the benefit - downsides?

For init containers, I am unsure how to treat these, I don't think it is relevant at this stage, but might be worth having a quick look at just to make sure we don't burn any bridges. There's two options, we can implement these as one-shot services that are required before our main service starts. That way systemd should run them once, before trying to start our main service (need to investigate the full implications of this).
Alternatively we can create ExecStartPre commands from these fields, which systemd would run once, before starting the main service.

Both have things pro and con I guess.. does anyone have any preference of the top of their heads?

Also, we should probably at least take the user to run this as from the PodSecurityContext, but that opens up an entire can of worms that I am not sure we are ready to deal with just yet.. thoughts?

Add ability to restrict possible mount paths

Pods can specify absolute paths for mounting config maps into (zookeeper needs this for example).
It would be good to be able to restrict this so that only specific directories can be targeted with this.

Refactor usage of Krustlet config to allow using command line parameters to configure agent

Currently all command line parameters are passed to the Krustlet config code as well, which causes an exception if any parameters are present that the Krustlet does not understand. In reverse, any command line parameters which the agent doesn't unterstand cause an exception in the agent.

The combination of these two means that it is currently not possible to pass anything on the commandline (except maybe version or help) - because for any parameter one of the two components will fail.

This should be fixed by manually constructing a Krustlet config object instead of using the methods that parse the command line, but this requires some boiler plate and re-implementation of default values.

Ability to replace path variables in generated config

When downloading configmaps to local config files the agent needs to replace the following values:

  • package path where the package has been downloaded to
  • config path where the config files will reside
  • log directory, where the application should place its logs

Make generation of systemd unit files deterministic

Currently the structure of systemd files that are generated is not determined by any specific set of rules, which means that the ordering of entries may change if for example the order that environment variables are specified in the pod is changed.

We need to come up with a set of rules to apply during generation of these files to ensure that only actual changes to the unit file trigger a rewrite.

Something along the lines of:

  • write known sections in this order (unit, service, install)
  • write rest of sections in alphabetical order
  • sort lines within sections alphabetically

Implement systemd unit monitoring

Receive and process notifications from systemd about process state changes (e.g. a process dying or restarting) - this will probably happen via D-Bus

Build and deploy nightly packages

We want to build and deploy binary packages in RPM & DEB format at least every day, potentially as part of every build.

We'll target at least CentOS/RHEL 7 for now, potentially 8.
Debian I'm unsure which versions make sense as I'm not too familiar with it.

Support running of services as application users

Currently all our managed processes (e.g. ZooKeeper) are started as root.
We'd like to support also running all our services as non-root.

We'd like to let the user choose the username the services should run as.
So the CRDs need to be extended (but that's for operator specific issues).

We will follow what the systemk does which means that the Agent will need to look in each Pod for the desired username.

This is the field where the name can be found: pod.securityContext.windowsOptions.runAsUserName
Note: There is also a pod.securityContext.runAsUser field but that only takes an integer which is not enough for us.

  • The agent needs to be extended to read the aforementioned property
  • The username then needs to be propagated to the systemd unit
  • Optional/Bonus: If the user does not exist it can be created automatically
    • If this is implemented it should be an optional feature, the Agent should have a configuration option disabling auto-creation of users
  • When the agent creates directories for the services (via Volumes) they need to be owned by the same user
    • This might also be made configurable later but that's for another issue

Note: This is not about running the Agent itself as non-root!

Implement graceful handling of agent restart

Once we have implemented systemd for running services in #5 , we need to look at identifying agent restarts and gracefully handling that.

Basically the test upon startup needs to be

  • check if config directory for current pod version exists
  • if yes, assume service should be running and check state
  • if not: start service with current config
  • Remove current Krustlet behavior to evict (=delete) pods upon shutdown

Crash if no tag is provided for the container image

The agent crashes if no tag is provided for the container image.

Given the following kafka.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: kafka
spec:
  containers:
  - name: kafka
    image: kafka
  tolerations:
  - key: "kubernetes.io/arch"
    operator: "Equal"
    value: "stackable-linux"

When the pod is deployed:

$ kubectl apply -f kafka.yml

Then the agent crashes with the following log output:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/provider/repository/package.rs:42:35

Add functionality to tail logs via kubectl (or similar tools)

It would be brilliant to support looking at the logs of our services via kubectl logs

The code to send logs is present and could be used, I have successfully sent "test" messages to kubectl logs, however some extra work needs to be done to read the actual logs from the systemd journal.

I have found two crates that seem to support this functionality, but both have issues if I am not mistaken:

I believe this is a medium-low hanging fruit if we can find a crate that offers this functionality, as most of the supporting code could be reused from Krustlet.

Add a README

To explain what this project is all about and how it can be used

Write action log to support technical audits

As a system administrator I'd like to know what happened on a stackable 'managed' node.

An action log containing the relevant actions performed by the agent could help maintaining a comprehensible state of the nodes.

  • downloaded version x.y.z of software abc
  • extracted software abc version x.y.z into folder foo
  • applied default security configuration
  • started node as part of a cluster
    ...
  • downloaded security patch x.y.z+1 of software abc
  • removed node from running cluster
  • stopped node
  • started rolling update of node 3 of cluster
  • added node to running cluster
  • rolling update finished successfully
    ...
  • download new major version x+1.0.0 of software abc
  • applied configuration migration
  • started stop-the-world cluster update
    ...

Agent doesn't replace variables in environment variables

The agent replaces three paths in config files that are downloaded from config maps:

  • configroot
  • logroot
  • packageroot

These replacements should also happen in environment variables that are added to pods (see below for an example), but this is not currently working.

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2020-12-14T12:48:19Z"
  generateName: spark-cluster-worker-
spec:
  containers:
  - command:
    - spark-3.0.1-bin-hadoop2.7/sbin/start-slave.sh
    - spark://bawa-virtualbox:7077
    env:
    - name: SPARK_CONF_DIR
      value: '{{configroot}}/conf'
    - name: SPARK_NO_DAEMONIZE
      value: "true"
    image: spark:3.0.1
    imagePullPolicy: IfNotPresent
    name: spark-3-0-1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.