cloudfoundry / eirini Goto Github PK

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend

License: Apache License 2.0

Go 95.07% Shell 4.05% Dockerfile 0.62% Makefile 0.25%

cloud-foundry kube orchestrator eirini kubernetes

eirini's Introduction

What is Eirini?

Eirini is a thin layer of abstraction on top of Kubernetes that allows Cloud Foundry to deploy applications as Pods on a Kubernetes cluster. Eirini uses the Diego abstractions of Long Running Processes (LRPs) and Tasks to capture Cloud Foundry's notion of long running processes and one-off tasks.

Deployment instructions are available at: cloudfoundry-incubator/eirini-release.

Components

Eirini is composed of:

api: The main component, provides the REST API used by the Cloud Controller. It's responsible for starting LRPs and tasks.
event-reporter: A Kubernetes reconciler that watches for LRP instance crashes and reports them to the Cloud Controller.
instance-index-env-injector: A Kubernetes webhook that inserts the CF_INSTANCE_INDEX environment variable into every LRP instance (pod).
task-reporter: A Kubernetes reconciler that reports the outcome of tasks to the Cloud Controller and deletes the underlying Kubernetes Jobs after a configurable TTL has elapsed.
eirini-controller: A Kubernetes reconciler that acts on create/delete/update operations on Eirini's own Custom Resouce Definitions (CRDs). This is still experimental.

CI Pipelines

We use Concourse. Our pipelines can be found here.

Contributing

Please read CONTRIBUTING.md for details.

Have a question or feedback? Reach out to us!

We can be found in our Slack channel #eirini-dev in the Cloud Foundry workspace. Please hit us up with any questions you may have or to share your experience with Eirini!

eirini's People

Contributors

Stargazers

Watchers

eirini's Issues

Failed to create Statefulset : guid already exists Error

Description

We deployed cf-for-k8s 0.1.0 version and performed scalability tests . We pushed source code based apps and reached till 800 apps. During the course of tests at we could see the below errors.

{"timestamp":"2020-04-24T14:46:28.275329914Z","level":"error","source":"handler","message":"handler.desire-app.bifrost-failed","data":{"error":"failed to desire: failed to create statefulset: statefulsets.apps \"app-2-6-light-2-space-2-dedc6afb16\" already exists","guid":"22875703-d484-48c0-8e35-e9e50f30ca72-25926a2f-476d-4650-b904-a7e45134bdb0","session":"38768"}}
{"timestamp":"2020-04-24T14:46:28.469625717Z","level":"error","source":"handler","message":"handler.desire-app.bifrost-failed","data":{"error":"failed to desire: failed to create statefulset: statefulsets.apps \"app-2-1-light-21-space-2-3ab4ad2579\" already exists","guid":"5d94793f-de02-4294-98e0-fd97db282f87-d6e2c77c-c911-4fb7-9d64-9fb330fba0f4","session":"38769"}}
{"timestamp":"2020-04-24T14:46:28.708660473Z","level":"error","source":"handler","message":"handler.desire-app.bifrost-failed","data":{"error":"failed to desire: failed to create statefulset: statefulsets.apps \"app-2-5-light-15-space-2-85ef1faa94\" already exists","guid":"68558507-37cc-4cc6-81ee-de4585b9120e-d46e4837-f455-47e0-aa66-4a065e47e9e3","session":"38770"}}

We are sure that we didn't push with same appname.

Steps to reproduce

Deploy cf-for-k8s and push source code based apps beyond 500.

Disk metrics are not working

Description

With metrics-server deployed, I was able to deploy an application to KubeCF and view metrics. I used a simple test app (https://github.com/nwmac/cf-metrics-test) for testing - this consumes CPU, Disk and Memory, increasing over time.

Disk metrics are broken - despite writing 10MB files as part of the test app, the Disk Usage only ever reported 64Kb usag

Steps to reproduce

Deploy the above app on an Eirini deployment. Inspect reported disk usage with cf app NAME.

What was expected to happen

Disk usage to be increasing over time

What actually happened

It doesn't

Suggested fix (optional)

Additional information (optional)

Originally reported by @nwmac on a different channel.

Unable to distinguish reliably application staged with Eirini or Docker images

Description

While pushing an application with cf and Eirini, there is no way to tell if the runtime app was pushed from a docker image or staged with Eirini

Steps to reproduce

Push an standard cf app, and an app with cf push test-docker --docker-image viovanov/node-env-tiny

What was expected to happen

Have a clear way to distinguish an app pushed from docker images and one that was being staged.

What actually happened

The only difference between the two applications is only the image

Suggested fix (optional)

For example, an annotation or a label in the pod which makes it easy to figure out what the source was. The only thing I can think of now is checking the image source being used.

Additional information (optional)

It's hard for EiriniX extension to figure out where are the applications coming from, making solutions to address this overkill

Set timeouts for requests to Kubernetes API

Description

Currently it appears Eirini does not set a timeout when making calls to the Kubernetes API. In situations of network issues or load, requests may be left open for hours or more.

Steps to reproduce

Have Eirini send a request to a Kubernetes API endpoint that does not respond.

What was expected to happen

The request should eventually timeout

What actually happened

The request stays open forever.

Additional information (optional)

kubernetes/kubernetes#17074

eirini app pod names only give a hint on cf spaces

Description

When I push an app, the resulting pod name seems to consist of <cfappname>-<cfspace>-<id>-<instanceNo>, so e.g. "dora-test-9912c1e844-0".
If you have multiple Organaziations, with a space named e.g. "test" in each of them, it will be confusing to just have "Space" as part of the pod name.

Steps to reproduce

Create two Organizations, e.g. "qa" and "qa1" and create a space "test" in each of them
push an app to each of the targets - e.g. dora1 to qa/test, and dora to qa1/test :)
find resulting pod names to be, e.g.

> kubectl get pods -n eirini
NAME                      READY     STATUS    RESTARTS   AGE
dora-test-9912c1e844-0    1/1       Running   0          1d
dora1-test-1f0dcb591f-0   1/1       Running   0          31m

What was expected to happen

The pod name to include "cf-org" and "cf-space"

What actually happened

Only "cf-space", which can consist of identical values, was chosen without "cf-org", which has to be unique - it's confusing, because "cfappname" has to be unique already.

Suggested fix (optional)

In order to be helpful (like human readable) the pod name should include "cf-org" as well, so consist of <cfappname>-<cforg>-<cfspace>-<id>-<instanceNo>.

[request] Support to create a K8s Deployment instead of Statefulset resource

Request

When cf is used with Eirini and command cf push executed, then a k8s statefulset resource is created.

While this resource addresses a specific use case needed for master/slave topology, this is not the best k8s resource to be created to deploy a microservice application which only requires to be stateless (see : https://stackoverflow.com/questions/41583672/kubernetes-deployments-vs-statefulsets).

Would it be possible to offer an option or config parameter that we could pass between CAPI and Eirini in order to specify the type fo the resource to be created ?

Docker images that do not specify a numeric USER fail with CreateContainerConfigError

Description

Docker staging does not inspect images for a USER directive. When a Docker image does not specify a USER or specifies a USER as a string, the desired LRP will have a CreateContainerConfigError with events showing Error: container has runAsNonRoot and image will run as root or Error: container has runAsNonRoot and image has non-numeric user (<some-user-string>), cannot verify user is non-root.

This is because the PodSecurityContext specifies RunAsNonRoot. Images that don't specify a USER will try to default to UID 0 (root). Images that specify a USER string don't meet the requirements for RunAsNonRoot - Kubernetes requires a non-zero numeric user. (https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups)

Steps to reproduce

Create a cf-for-k8s cluster.
Auth/login with the CF CLI
cf push hello-ruby -o awittrock/helloworld-ruby

What was expected to happen

Container starts successfully.

What actually happened

Error: container has runAsNonRoot and image will run as root

Suggested fix (optional)

Docker staging inspects image and adds user metadata to payload. cloud_controller_ng appends this data to the desired LRP request for use in the PodSecurityContext. The PodSecurityContext respects the user provided by cloud_controller_ng if it is specified as a numeric user. The PodSecurityContext provides a default value for requests that do not specify a user (empty string) or provide a non-numeric user.

Additional information (optional)

We've done an exploration and test drove the implementation. We will submit a PR once we have done the necessary work on cloud_controller_ng.

Include `app_guid` and `process_type` on app StatefulSet/Pods

tl;dr

We would like for app_guid and process_type to be included on the StatefulSet/Pod templates for Eirini apps to have deterministic and stable labels to select off of. E.g.

labels:
  guid: <capi-process-guid>
  process_type: <capi-process-type>           <--------------- NEW!
  app_guid: <capi-app-guid>                   <--------------- NEW!
  rootfs-version: ""
  source_type: APP
  version: <capi-process-version>

Background

The CF Networking Program is working on enhancing the routing tier for CF on Kubernetes (see: #72). We're currently working on fetching v3 Routes and v3 Destinations from CAPI and converting these into "CF Route" CRDs. These will be used to create Istio resources and Services to power an Istio/Envoy based routing tier for Eirini apps. See this design document for more information.

Problem

Route Destinations in Cloud Controller are currently a combination of route_guid, app_guid, and process_type. CAPI Process GUIDs are not stable over time (for example a rolling app deployment changes them) so this mapping allows for the Destination object to remain stable.

Since we currently only have a Process GUID on the StatefulSet/Pod for an app we'd have to do additional lookups on CAPI the Process GUID for a given app/process type and update our Services/Istio resources whenever an app's process GUID changes. Both of these are doable, but puts unnecessary load on the system when we could just have long lived resources that match off of the stable app guid and process type for the workload. This has the added benefit of giving other teams more stable identifiers to key off of.

Proposal

We enhance CAPI to send the app_guid and process_type along when it desires an app on Eirini and update Eirini to include this information as labels on the StatefulSet/Pod template.

Thanks!
cc @ndhanushkodi @rosenhouse

Misc

Example StatefulSet (current)

Including this truncated StatefulSet to give folks an idea of what is on it currently.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    application_id: 150866b1-bf6a-45b7-abe8-d861cc26ae17
    application_name: zdt-test
    ...
    process_guid: 3dc6e687-0e73-44b0-8785-01cba02034f1-403850e1-3e17-4ae3-9868-ed832697b201
    ...
    space_name: s
    version: 403850e1-3e17-4ae3-9868-ed832697b201
  creationTimestamp: 2019-09-27T16:54:01Z
  generation: 1
  labels:
    guid: 3dc6e687-0e73-44b0-8785-01cba02034f1
    rootfs-version: ""
    source_type: APP
    version: 403850e1-3e17-4ae3-9868-ed832697b201
  name: zdt-test-s-b480caa13e
  ...
  resourceVersion: "8724193"
  ...
  uid: 680aee46-e147-11e9-bbfa-42010a000b0f
spec:
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      guid: 3dc6e687-0e73-44b0-8785-01cba02034f1
      source_type: APP
      version: 403850e1-3e17-4ae3-9868-ed832697b201
  serviceName: ""
  template:
    metadata:
      annotations:
        application_id: 150866b1-bf6a-45b7-abe8-d861cc26ae17
        process_guid: 3dc6e687-0e73-44b0-8785-01cba02034f1-403850e1-3e17-4ae3-9868-ed832697b201
      creationTimestamp: null
      labels:
        guid: 3dc6e687-0e73-44b0-8785-01cba02034f1
        rootfs-version: ""
        source_type: APP
        version: 403850e1-3e17-4ae3-9868-ed832697b201
    spec:
      automountServiceAccountToken: false
      ...

mTLS using standard TSL library

Description

Eirini currently uses its own golang implementation of mTLS. There is a CF tls config package that stays up-to-date with internal guidelines for TLS configuration.

Suggested fix (optional)

Use code.cloudfoundry.org/tlsconfig to generate TLS config.

cf cli error when multiple requests to modify app happen together

Description

If multiple requests try to modify the same app concurrently, you can get this error back from the cf cli:

Runner error: Operation cannot be fulfilled on statefulsets.apps "uptimer-app-8a203fc5-f28d-4237-a334-e609-5439e8b96f": the object has been modified; please apply your changes to the latest version and try again

We saw this when trying to delete an app, and we were able to reproduce it by scaling an app multiple times in a short window.

Steps to reproduce

cf scale -i 2 APPNAME &
cf scale -i 1 APPNAME &
cf scale -i 2 APPNAME &
cf scale -i 1 APPNAME &

What was expected to happen

No errors occur, the last request to be made is the state of the application.

What actually happened

If the last request fails, the CC data is updated with the desired number of apps, but it takes a couple of minutes for the app to actually converge on that number.

Scheduling applications to a k8s namespace based on CF space

As far as I can tell eirini only supports specifying a single namespace for running user applications: https://github.com/cloudfoundry-incubator/eirini#configuring-opi-with-cloud-foundry-and-kubernetes

Is it possible to configure eirini to schedule applications to many namespaces? I'd like to be able to deploy applications bound to a CF space to a dedicated k8s namespace.

Compatibility with app system env variables

Description

The VCAP_APPLICATION env does not maintain compatibility with traditional cloud foundry or pass alternatives environment variables such as CF_INSTANCE_INDEX documented here: https://docs.cloudfoundry.org/devguide/deploy-apps/environment-variable.html.

This hinders migrations of workloads from traditional cloud foundry to cf-for-k8s or kubecf and has specific impact on one of the rails applications I'm responsible for. Specifically, in my case, VCAP_APPLICATION.instance_index is relied on to run migrations from a single instance. In addition, because CF_INSTANCE_INDEX is also not available there is not a straightforward alternatives to the instance_index key in the VCAP_APPLICATION env var.

Other common tools that use these variables:

Java-cfenv: https://github.com/pivotal-cf/java-cfenv
Spring cloud connectors: https://github.com/spring-cloud/spring-cloud-connectors (Legacy)

Steps to reproduce

Push the following test app to cf-for-k8s

  var http = require('http');
http.createServer(function (request, response) {
   response.writeHead(200, {'Content-Type': 'text/plain'});
   response.end(`${JSON.stringify(process.env)}`);
}).listen(process.env.PORT);

Run the following curl command and grep to look for CF_* environment variables:
curl my-test-app.example.com | jq . | grep CF_

and run a similar curl command and grep to look for the VCAP_APPLICATION variable

curl my-test-app.example.com | jq . | grep VCAP_APPLICATION

What was expected to happen

We'd expect compatibility with CF environment variables supported on traditional cloud foundry as described here under the app specific system variables section for running apps:
https://docs.cloudfoundry.org/devguide/deploy-apps/environment-variable.html#app-system-env

and also compatibility with the VCAP_APPLICATION environment variable detailed here:
https://docs.cloudfoundry.org/devguide/deploy-apps/environment-variable.html#VCAP-APPLICATION

What actually happened

We only see the following limited set of CF_* environment variables and also a limited set under the VCAP_APPLICATION environment variable:

"CF_INSTANCE_INTERNAL_IP": "10.24.5.87",
"CF_INSTANCE_IP": "10.24.5.87",
"CF_INSTANCE_ADDR": "0.0.0.0:8080",
"CF_INSTANCE_PORT": "8080",
"CF_INSTANCE_PORTS": "[{\"external\":8080,\"internal\":8080}]",

 "VCAP_APPLICATION": "{\"cf_api\":\"my-test-app.example.com\",\"limits\":{\"fds\":16384,\"mem\":1024,\"disk\":1024},\"application_name\":\"test-env\",\"application_uris\":[\"my-test-app.example.com\"],\"name\":\"test-env\",\"space_name\":\"test\",\"space_id\":\"629ef6c5-4a40-401a-ab42-30d09b32d80b\",\"organization_id\":\"b95bc8ac-38b6-4cc8-bf72-8a6aea7fd4bb\",\"organization_name\":\"system\",\"uris\":[\"my-test-app.example.com\"],\"process_id\":\"3975af88-1740-4143-bf40-f6dd2fb6d599\",\"process_type\":\"web\",\"application_id\":\"3975af88-1740-4143-bf40-f6dd2fb6d599\",\"version\":\"dd410472-62f2-47db-868e-498bb6343159\",\"application_version\":\"dd410472-62f2-47db-868e-498bb6343159\"}"

Suggested fix (optional)

Maintain compatibility with traditional cloud foundry on VMs.

Panic in eirini-metrics

Description

Trying to bump kubecf to 1.8.0, and I'm getting a panic from eirini-metrics.
I must be missing something, but I can't tell what.

Steps to reproduce

Ran using missing config files.

What was expected to happen

Proper error with a useful error message, no panic.

What actually happened

It panics.

Suggested fix (optional)

[Fill in]

Additional information (optional)

│ W0904 19:50:16.381331       1 client_config.go:552] Neither --kubeconfig nor --ma │
│ ster was specified.  Using the inClusterConfig.  This might not work.             │
│ W0904 19:50:16.382538       1 client_config.go:552] Neither --kubeconfig nor --ma │
│ ster was specified.  Using the inClusterConfig.  This might not work.             │
│ panic: open : no such file or directory                                           │
│ goroutine 1 [running]:                                                            │
│ code.cloudfoundry.org/eirini/cmd.ExitIfError(...)                                 │
│     code.cloudfoundry.org/eirini@/cmd/commons.go:44                               │
│ main.main()                                                                       │
│     code.cloudfoundry.org/eirini@/cmd/metrics-collector/main.go:45 +0x487         │
│ stream closed

Intention of having statefulsets for applications deployed using eirini

Description

We deployed eirini on GKE cluster and we see that eirini creates statefulsets for applications but that doesn't have any volumes attached.
Is there any intend of creating statefulsets because in our environment where we have 250 node GKE cluster, we are not able to deploy beyond 14K statefulsets(indeed cf apps which eirini creates) for which we created an issue in K8s kubernetes/kubernetes#78682

IMHO, we can go for kind deployments instead which even supports rolling updates

Please configure GITBOT

Pivotal provides the Gitbot service to synchronize issues and pull requests made against public GitHub repos with Pivotal Tracker projects.

If you do not want to use Pivotal Tracker to manage this GitHub repo, you do not need to take any action.

If you are a Pivotal employee, you can configure Gitbot to sync your GitHub repo to your Pivotal Tracker project with a pull request.

Steps:

Fork this repo: cfgitbot-config (an ask+cf@ ticket is the fastest way to get read access if you get a 404)
Add the Toolsmiths-Bots team to have admin access to your repo
Add the cf-gitbot ([email protected]) user to have owner access to your Pivotal Tracker project
Add your new repo and or project to config-production.yml file
Submit a PR, which will get auto-merged if you've done it right

If you are not a pivotal employee, you can request that [email protected] set up the integration for you.

You might also be interested in configuring GitHub's Service Hook for Tracker on your repo so you can link your commits to Tracker stories. You can do this yourself by following the directions at:

https://www.pivotaltracker.com/blog/guide-githubs-service-hook-tracker/

If there are any questions, please reach out to [email protected].

Make CC certs configuration optional

Description
In cf-for-k8s we'd like to let Istio handle container-to-container communications, but Eirini has hard requirements that the following (non-exhaustive) are specified:

cc_cert_path
cc_key_path
cc_ca_path

Suggested fix (optional)

Make cert/key/ca configuration optional.
Allow communication over HTTP.

There may be other certificate configurations that would be desirable to be optional, but this is the specific one I noticed when trying to trim down some explicit certificate configuration.

Allow plaintext http communication

Description

OPI doesn't support plaintext clients. In cf-for-k8s, we have mTLS enabled everywhere via istio & envoy. When requests pass from envoy ingress to eirini, they must be decrypted twice- once at the envoy layer and again by the OPI server.

Steps to reproduce

Try to configure OPI without TLS certificates.

What was expected to happen

It successfully starts a plaintext http server.

What actually happened

It fails because there are no TLS certificates

Suggested fix (optional)

Run a plaintext http server when TLS certificates are not provided, perhaps with an additional "serve_plaintext" configuration that defaults to false.

unsafe_allow_automount_service_account_token is not applied to task pods

Description

Setting the unsafe_allow_automount_service_account_token option for eirini does not apply it to task Pods and instead always sets it to false. It should follow the same behaviour as for apps (see here and here)

Steps to reproduce

Deploy eirini with unsafe_allow_automount_service_account_token: true
Push an app and run a task

What was expected to happen

The tasks Pod doesn't have automountServiceAccountToken set

What actually happened

The tasks Pod has automountServiceAccountToken: false

Additional information (optional)

Original issue

Managing the app-registry-secret reference passed into scheduled workloads

Description

We recently tried to implement secret rotation with the immutable secret pattern through kapp managed "versioned secrets" (documented here). Ultimately, the app-registry-secret reference we passed into Eirini was not updated on the app workload statefulSets on redeployment of Eirini. Do y'all have any opinions about how we should manage that secret and potentially insight into how Eirini monitors references to secrets and configMaps in the workloads it schedules?

Steps to reproduce

Deploy cf-for-k8s with a google service account granted GCS storage admin (the permissions required to use GCR as an app registry) configured as the app-registry-secret passed to Eirini, updated to 'app-registry-secret-ver-1' with kapp's versioned annotation. (Note that to avoid the use kapp in this reproduction flow, we could just directly update the Eirini config to point at a new secret reference explicitly)

What was expected to happen

Eirini updated its reference to the app-registry-secret and all references to that secret in workloads it had scheduled.

What actually happened

Only the secret reference in Eirini was updated, so new app workloads got the correct reference, but references in existing workloads remained stale.

Potential fix (optional)

In a comment on a related cf-for-k8s issue, we considered the exclusion of the imagePullSecret from the statefulSet definition of scheduled workloads, instead relying on the inheritance of a serviceAccount imagePullSecret in the namespace. This assumes any namespace into which we schedule workloads has configured its default serviceAccount appropriately.

Thanks,
Andrew

App container metrics have incorrect/hardcoded values

Description

We have observed that Eirini is incorrectly reporting disk and memory quotas for apps. It looks like it is always using a hardcoded/default value of 10 bytes, which is certainly not the case.

Steps to reproduce

Push an app
cf tail that-app
- cf tail is available via the log-cache plugin: https://github.com/cloudfoundry/log-cache-cli

What was expected to happen

2019-07-26T15:46:43.48-0700 [that-app/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:<REAL_DISK_QUOTA> bytes memory:24690688.000000 bytes memory_quota:<REAL_MEMORY_QUOTA> bytes

What actually happened

2019-07-26T15:46:43.48-0700 [that-app/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:24690688.000000 bytes memory_quota:10.000000 bytes

Thanks,
K + @jspawar

OPI needs cluster-wide access to resources

Description

The OPI service shouldn't require any cluster-wide permissions.

Steps to reproduce

Using a serviceaccount with access to the eirini namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: eirini-role
  namespace: {{ .Values.eirini.opi.namespace }}
rules:
...
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - create
  - update
  - delete
  - list
...

What was expected to happen

OPI should work.

What actually happened

Got an error.

Suggested fix (optional)

Only work with StatefulSets in the eirini namespace.

Additional information (optional)

{"timestamp":"2020-09-04T22:26:15.145143385Z","level":"error","so 
 urce":"handler","message":"handler.list-apps.bifrost-failed","dat 
 a":{"error":"failed to list desired LRPs: failed to list stateful 
 sets: statefulsets.apps is forbidden: User \"system:serviceaccoun 
 t:kubecf:opi\" cannot list resource \"statefulsets\" in API group 
  \"apps\" at the cluster scope","session":"57"}}

Modules bump is failing in CI

See https://jetson.eirini.cf-app.com/teams/main/pipelines/eirini/jobs/bump-go-modules-eirini/builds/13

Interview partners for research about communication in GitHub projects wanted

Hi. My name is Verena Ebert, and I am a PhD student at the University of Stuttgart in Germany.
A few months ago, I have examined 90 GitHub projects to see what communication channels they use and how they write about them in the written documents, for example README or wiki. If you are interested in the previous results, you can find them here:
https://arxiv.org/abs/2205.01440
Your project was one of these 90 projects and, therefore, I am interested in further details about your communication setup.

To gather more data about your communication setup, I kindly ask one of the maintainers to do an interview with me. The interview will be about 30-35 minutes long and via Skype, WebEx or any other provider you prefer. The interviews should be done in November 2022, if possible.

In this interview, I would like to ask some questions about the reasons behind the channels, to understand the thoughts of the maintainers in addition to the written information.

The long goal of my PhD is to understand how communication works in GitHub projects and how a good set of communication channels and information for other maintainers and developers looks like. One possible outcome is a COMMUNICATION.md with instructions and tips about which channels could be useful and how these channels should be introduced to other projects participants. Of course, if you are interested, I will keep you up to date about any further results in my research.

If you are interested in doing an interview, please respond here or contact me via email ([email protected]). We will then make an appointment for the interview at a time and date that suits you.

If you agree, I would like to record the interview and transcribe the spoken texts into anonymized written texts. In this case, I will send you the transcript for corrections afterwards. Only if you agree, the transcripts or parts of it would be part of a publication.

Whitelisted annotations

Description

We'd like to add Prometheus scrape annotations to pods in k8s based on CAPI annotations in order to identity and scrape apps with metrics endpoints.

Suggestion

Map whitelisted/white-patterns Cloud Controller annotations sets directly to k8s annotations.

Our use case whitelist:

prometheus.io/scrape
prometheus.io/port
prometheus.io/path

This will allow app devs to set an annotation on their app, and if allowed, it will get set on the k8s pod. These annotations are used for by prometheus-compatible scrapers for service discovery in k8s clusters.

Additional information (optional)

https://pivotal.slack.com/archives/CPPEXT81M/p1574270166358800

Pod and Service labels consistency

It'll be nice to have the same labels for pods and services to able to reuse the selector find services and pods corresponding to an app GUID. Right now pods have cloudfoundry.org/app_guid and cloudfoundry.org/guid labels and services have cloudfoundry.org/app.

AFAIK, having the same label is a good practice in Kubernetes which eases operators' experience and leverages tools working Kubernetes cluster to group objects under the same entity. https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/

cc @ndhanushkodi

Eirini component emits app container metrics too frequently

Description

Based on cf tail output (from the log-cache CLI plugin), the Eirini component emits app container metrics every 200ms:

$ cf tail dora
   2019-06-17T07:20:46.19-0700 [dora/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:25088000.000000 bytes memory_quota:10.000000 bytes
   2019-06-17T07:20:46.39-0700 [dora/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:25088000.000000 bytes memory_quota:10.000000 bytes
   2019-06-17T07:20:46.59-0700 [dora/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:25088000.000000 bytes memory_quota:10.000000 bytes
   2019-06-17T07:20:46.79-0700 [dora/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:25088000.000000 bytes memory_quota:10.000000 bytes
   2019-06-17T07:20:46.99-0700 [dora/0] GAUGE cpu:1.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:25088000.000000 bytes memory_quota:10.000000 bytes

This frequency of metric emission is inefficient, as the metric readings do not change that fast, and at scale may put significant unnecessary load on the Loggregator system. App developers consuming the unified stream of loggregator messages for the app will also observe that these messages overwhelm other types of loggregator messages (such as app stdout and stderr).

For comparison, the Diego cell rep defaults to emitting these metrics every 15s (https://github.com/cloudfoundry/diego-release/blob/v2.32.0/jobs/rep/spec#L136-L138), although platform operators may configure it.

Steps to reproduce

cf push <APP>
cf install-plugin -r CF-Community "log-cache"
cf tail <APP> -f

What was expected to happen

App developer should see container metrics about every 15s per app instance.

What actually happened

App developer see container metrics about every 200ms per app instance.

/cc @jenspinney

Rolling updates for CF Apps

Description

Right now, rolling updates is not yet implemented. Are there any plans to enable this ?

[Request] Delay deletion of k8s jobs for cf tasks until logs can be tailed

Description

When a cf task is run in cf-for-k8s, a corresponding k8s jobs is created.
To get the logs from this job into the log stream, the fluentd sidecar will pick up the log file of the new container spun up to run it. After the task is completed, Eirini will immediately delete the job and its logs. This is unfortunate as it causes us to lose logs for very short tasks because the container and its logs are deleted before fluentd can tail the log.

Suggested fix

Is there a way we could implement a mandatory time to live for containers we need logs from? Waiting even 30 secs would be a huge help. There is a concept like this in k8s we could lean on, but its still in alpha: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs

We might have to make this configurable as well, because there might still be cases where there are a ton of logs for a short run task. Allowing an operator to extend this or override it would be useful in such cases.

Steps to reproduce

You can find the reproduction steps and original issue here: cloudfoundry/cf-k8s-logging#35

Memory Consumption goes high for eirini

Description

We did a scalability tests on Eirini. We used diego stress test framework to push applications. We set max-in flight as 50(concurrent pushes). We reached till 14000 applications and after that we faced statefulset issue .Might be because of Kube Controller Manager throttling.

During the load test , we could see Eirini memory usage keeps piling till 6GB as in the above image.

Steps to reproduce

Deploy 14000 applications

Additional information (optional)

Is profiling enabled in Eirini, so that we can dig down ?

desiring LRPs with empty lists of ports causes OPI panics

Description

Originally reported by as cloudfoundry/cf-for-k8s#287, pushing the spring-music sample app causes OPI to panic when desiring the "task" process. In the issue, @cselzo investigated and found that this was due to the app having 2 processes that don't have any open ports. Neither of those processes intend to be externally accessible, so we think this should be a valid LRP to desire.

Steps to reproduce

Push the spring-music sample app using the v7 CLI and cf-for-k8s. The v7 CLI uses CAPI's /v3/apps/:guid/restart endpoint, making the failure easier to trace as it bubbles up a errors when talking to OPI.

What was expected to happen

Successfully create a StatefulSet with no open ports.

What actually happened

 {"timestamp":"2020-07-22T14:38:13.213363528Z","level":"debug","source":"handler","message":"handler.desire-app.requested","data":{"app_guid":"e9ae3554-fb1c-40cb-b310-464124ad2a95"
 ,"guid":"dc3121ef-0c16-433e-be4e-12ee667cc24c-f2ad251f-541c-45d0-a728-075157e2ea0e","session":"51798","version":"f2ad251f-541c-45d0-a728-075157e2ea0e"}}
 2020/07/22 14:38:13 http: panic serving 127.0.0.1:35244: runtime error: index out of range [0] with length 0
 goroutine 119659 [running]:
 net/http.(*conn).serve.func1(0xc000352c80)
     net/http/server.go:1767 +0x139
 panic(0x181c8e0, 0xc00076d980)
     runtime/panic.go:679 +0x1b2
 code.cloudfoundry.org/eirini/bifrost.(*OPIConverter).ConvertLRP(0xc00048a080, 0xc00074eb70, 0x24, 0xc00074eba0, 0x24, 0xc00003e690, 0x49, 0xc0002c1990, 0x4, 0xc00074ec00, ...)
     code.cloudfoundry.org/eirini@/bifrost/convert.go:50 +0x1120
 code.cloudfoundry.org/eirini/bifrost.(*LRP).Transfer(0xc0005b3d10, 0x1b9eae0, 0xc0008b8180, 0xc00074eb70, 0x24, 0xc00074eba0, 0x24, 0xc00003e690, 0x49, 0xc0002c1990, ...)
     code.cloudfoundry.org/eirini@/bifrost/lrp.go:36 +0xc0
 code.cloudfoundry.org/eirini/handler.(*App).Desire(0xc00000d380, 0x1b98da0, 0xc00047a620, 0xc0000dd200, 0xc000594960, 0x1, 0x3)
     code.cloudfoundry.org/eirini@/handler/app_handler.go:45 +0x55b
 github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc0003484e0, 0x1b98da0, 0xc00047a620, 0xc0000dd200)
     github.com/julienschmidt/[email protected]/router.go:387 +0xa3d
 net/http.serverHandler.ServeHTTP(0xc0005b4ee0, 0x1b98da0, 0xc00047a620, 0xc0000dd200)
     net/http/server.go:2802 +0xa4
 net/http.(*conn).serve(0xc000352c80, 0x1b9eae0, 0xc00041dac0)
     net/http/server.go:1890 +0x875
 created by net/http.(*Server).Serve
     net/http/server.go:2928 +0x384
 2020/07/22 14:38:13 http: panic serving 127.0.0.1:35246: runtime error: index out of range [0] with length 0
 goroutine 119664 [running]:
 net/http.(*conn).serve.func1(0xc0002921e0)
     net/http/server.go:1767 +0x139
 panic(0x181c8e0, 0xc0005bc560)
     runtime/panic.go:679 +0x1b2
 code.cloudfoundry.org/eirini/bifrost.(*OPIConverter).ConvertLRP(0xc00048a080, 0xc00074f200, 0x24, 0xc00074f230, 0x24, 0xc00003edc0, 0x49, 0xc0007b4378, 0x4, 0xc00074f260, ...)
     code.cloudfoundry.org/eirini@/bifrost/convert.go:50 +0x1120
 code.cloudfoundry.org/eirini/bifrost.(*LRP).Transfer(0xc0005b3d10, 0x1b9eae0, 0xc0008b8480, 0xc00074f200, 0x24, 0xc00074f230, 0x24, 0xc00003edc0, 0x49, 0xc0007b4378, ...)
     code.cloudfoundry.org/eirini@/bifrost/lrp.go:36 +0xc0
 code.cloudfoundry.org/eirini/handler.(*App).Desire(0xc00000d380, 0x1b98da0, 0xc00047ab60, 0xc00095f200, 0xc000595a40, 0x1, 0x3)
     code.cloudfoundry.org/eirini@/handler/app_handler.go:45 +0x55b
 github.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc0003484e0, 0x1b98da0, 0xc00047ab60, 0xc00095f200)
     github.com/julienschmidt/[email protected]/router.go:387 +0xa3d
 net/http.serverHandler.ServeHTTP(0xc0005b4ee0, 0x1b98da0, 0xc00047ab60, 0xc00095f200)
     net/http/server.go:2802 +0xa4
 net/http.(*conn).serve(0xc0002921e0, 0x1b9eae0, 0xc0008b83c0)
     net/http/server.go:1890 +0x875
 created by net/http.(*Server).Serve
     net/http/server.go:2928 +0x384

Org/space/app name annotations

Description

The Log Egress (Loggregator) team would like to send logs from apps to many downstream consumers, most of which have historically depended on app metadata to help filter and annotate those logs. Some of those are native to the platform, like Log Cache, but others are offsite, such as Splunk, Papertrail, or AppDynamics. To make this happen, annotations on the StatefulSet/Pods with the name of the application, org and space would be extremely useful.

While it is possible for this information to become out of date using the cf rename-* commands, the current implementation in CF continues to send the original name, so it would be fine for Eirini not to update them.

Suggested Fix

Given that the cloudfoundry.org/application_name and space_name annotations already seem to exist (https://github.com/cloudfoundry-incubator/eirini/blob/ca5617920508538c055485a30362b4240b2d35ee/k8s/statefulset.go#L28), we suggest adding a cloudfoundry.org/org_name or organization_name annotation.

Eirini can only run inside of the k8s cluster it is deploying to

Description

Eirini assumes it is running inside of the k8s cluster it is deploying to because of how it constructs its k8s client. The current k8s client configuration is expecting certain properties to be set that are applied to all pods in a k8s cluster; however, this prevents Eirini from being deployed outside of the cluster.

Our specific motivation for this is so that we can deploy Eirini via BOSH.

Eirini does not work with K8s 1.13

Description

Eirini does not work with K8s 1.13

Steps to reproduce

Go to your favorite cloud provider
Provision a K8s cluster with version 1.13 or higher
Deploy an eirini on it

What was expected to happen

It should work

What actually happened

It doesn't

Enhancement: app readiness should depend on all containers

we're playing with automatic sidecar injection from Istio.

looks like Eirini considers the app "Running" if only 1 of the containers is running.

we probably want to change it so that all (non-init) containers must be Ready before CF sees the App as running.

would y'all be open to a PR?

cc @tcdowney

Description

Eirini is not converting from kubectl top cpu units to cf app cpu units correctly.

kubectl top displays cpu used in milliunits (1000m = 1 core).
cf app displays cpu in percentage (100% = 1 core).
Eirini is reporting number of cores, rounded to the nearest integer (1.0 = 1 core).

This causes cf app to display, for example, 2.0% instead of 195%.

Steps to reproduce

$ cf push dora

$ kubectl -n eirini exec dora-s-491a8c664e-0 -- /home/vcap/app/stress --cpu 2

$ kubectl -n eirini top pod
NAME                  CPU(cores)   MEMORY(bytes)
dora-s-491a8c664e-0   1948m        25Mi

$ cf app dora
...
     state     since                  cpu    memory        disk          details
#0   running   2019-06-15T08:01:24Z   2.0%   25.1M of 1G   40.1M of 1G

$ cf tail dora -n1
Retrieving logs for app dora in org o / space s as admin...

   2019-06-17T09:04:25.79-0700 [dora/0] GAUGE cpu:2.000000 percentage disk:42000000.000000 bytes disk_quota:10.000000 bytes memory:26431488.000000 bytes memory_quota:10.000000 bytes

What was expected to happen

$ cf app dora
...
     state     since                  cpu    memory        disk          details
#0   running   2019-06-15T08:01:24Z   194.8%   25.1M of 1G   40.1M of 1G

What actually happened

$ cf app dora
...
     state     since                  cpu    memory        disk          details
#0   running   2019-06-15T08:01:24Z   2.0%   25.1M of 1G   40.1M of 1G

ServiceAccount per App?

As we scope out more of the routing and networking stuff, one thing that has come up is how to do app identity.

CF Diego has instance identity certs, which are different for each app instance.

These support route integrity and may also be used by applications to mTLS auth to system components like CredHub, and also for mTLS to other apps.

In the Eirini/K8s world, it isn't clear that we need identity per instance or pod. But it sure would be nice to have identity per application, in the form of a dedicate service account for each CF App (or potentially per process type?).

That identity could be used for authN for mTLS between apps and also for route integrity from the routing tier to applications, e.g. using Istio or something else that can mTLS to sidecars (e.g. #72).

What do y'all think?

cc @tcdowney @ndhanushkodi

Add ability to List/Get a Task

Description

The CF API <--> Eirini tasks sync loop doesn't work right now (see cloudfoundry/cf-for-k8s#412) because the process is unable to list all tasks that Eirini knows about and is unable to get a single task back from Eirini.

Steps to reproduce

This manifests via the Tasks Sync loop not being able to reconcile tasks. The user-facing reproduction steps are described in cloudfoundry/cf-for-k8s#412. Essentially if a task completion callback doesn't successfully reach the CF API the corresponding task will be stuck in the RUNNING state forever.

Suggested fix

Implement the List/Get Task endpoints and the the CF API team can implement the corresponding changes on the Cloud Controller's OPI Task Client.

Additional information (optional)

cloudfoundry/cloud_controller_ng#1588
Diego BBS docs: https://godoc.org/code.cloudfoundry.org/bbs#ExternalTaskClient

Have org/space/app name annotations on app pods, not just stateful sets

Description

As a continuation of #83, it would be useful to have the cloudfoundry.org/application_name, space_name, and org_name annotations on the individual app pods, not just the statefulset. (For context, the Kubernetes filter we use in our Fluentd logging stack only pulls labels/annotations for the pod emitting the log.)

Steps to reproduce

Create an application using an Eirini, and compare the annotations on its statefulset and the resulting pods.

What was expected to happen

The following annotations to be present on the pods:

cloudfoundry.org/application_name
cloudfoundry.org/space_name
cloudfoundry.org/org_name

What actually happened

None of the annotations are present on the pods.

On large clusters, eirini-controller, eirini-events, and eirini-task-reporter get OOMKilled

Description

On the CAPI team we've been running scale tests with, the goal of validating that cf-for-k8s can run up to 2000 app instances. When large numbers of apps are running on the cluster

Steps to reproduce

deploy cf-for-k8s with the following parameters
- GKE cluster with 100+ nodes
- Eirini components scaled up to 10 replicas
deploy 2000 apps (10 per space)

What was expected to happen

Either the cluster would work... or it would fail in a clear way

What actually happened

Push succeeds, apps appear to be running as stateful sets
(Routing to pushed apps appears not to be working but we don't think that's related to Eirini, still tracking down the cause and will report back)

However, further deploys with kapp cannot be fully complete succesfully, because Eirini components with queues are repeatedly OOMKilled and then enter CrashLoopBackOff status. It's a little tricky to tell this has happened from e.g. k9s because they spend most of their time in CrashLoopBackOff and only briefly enter OOMKilled status, so you need to be inspecting them directly for a minute or two, or logging your cluster events somewhere.

Additionally, there are minimal logs from the affected components.

Suggested fix (optional)

Provide guidance for increasing memory limits for 2000 AI clusters

Ideally though, the queueing components would at least emit warnings if their queues were getting unreasonably long.

Additional information (optional)

We used the following script to generate load

#!/usr/bin/env bash

set -euo pipefail

: "${NUMBER_OF_APPS:?}"

# don't change this without also changing scale_suite_test.go
# must be power of 10 (1, 100, 1000, etc)
APPS_PER_SPACE=10
CF_API="https://api.scale-testing.k8s.capi.land"

function login() {
    cf api --skip-ssl-validation "$CF_API"
    CF_USERNAME=admin CF_PASSWORD=$(yq -r '.cf_admin_password' "./cf-values.yml") cf auth
}

function prepare_cf_foundation() {
    cf enable-feature-flag diego_docker
    cf update-quota default -r 3000 -m 3000G
}

function deploy_apps() {
    org_name_prefix="scale-tests"
    space_name_prefix="scale-tests"

    # we subtract 1 here because `seq` is inclusive on both sides
    number_of_org_spaces="$((NUMBER_OF_APPS / APPS_PER_SPACE - 1))"
    number_of_apps_per_org_space="$((NUMBER_OF_APPS / number_of_org_spaces - 1))"

    for n in $(seq 0 ${number_of_org_spaces})
    do
      org_name="${org_name_prefix}-${n}"
      space_name="${space_name_prefix}-${n}"
      cf create-org "${org_name}"
      cf create-space -o "${org_name}" "${space_name}"
      cf target -o "${org_name}" -s "${space_name}"

      for i in $(seq 0 ${number_of_apps_per_org_space})
      do
        name="bin-$((n * APPS_PER_SPACE + i))"
        echo $name
        cf push $name -m 128M -k 256M -i 2 -p ~/workspace/cf-acceptance-tests/assets/catnip -b paketo-buildpacks/go &
        # let's give CF time to push an app, sometimes it uses the next org/space if
        # don't give enough time
        sleep 5
      done
      wait
    done
}

function main() {
    curl -vvv --retry 300 -k "$CF_API"

    login
    prepare_cf_foundation
    deploy_apps
}

main

Crash events aren't appearing

Description

We modified the catnip sample app so when we call it with ENDPOINT/sigterm/KILL the handler calls os.Exit(0). We don't see any crash events

Steps to reproduce

Made these changes in cf-for-k8s:

diff --git a/build/eirini/eirini-values.yml b/build/eirini/eirini-values.yml
index a3a529a..ab5175e 100644
--- a/build/eirini/eirini-values.yml
+++ b/build/eirini/eirini-values.yml
@@ -21,7 +21,15 @@ opi:
       caPath: "tls.ca"
 
   events:
-    enable: false
+    enable: true
+    tls:
+      capiClient:
+        secretName: "eirini-internal-tls-certs"
+        keyPath: "tls.key"
+        certPath: "tls.crt"
+      capi:
+        secretName: "eirini-internal-tls-certs"
+        caPath: "tls.ca"
 
   logs:
     enable: false
diff --git a/config/_ytt_lib/eirini/rendered.yml b/config/_ytt_lib/eirini/rendered.yml
index 5ddb993..b80542b 100644
--- a/config/_ytt_lib/eirini/rendered.yml
+++ b/config/_ytt_lib/eirini/rendered.yml
@@ -97,6 +97,41 @@ spec:
   - secret
   - downwardAPI
 ---
+apiVersion: policy/v1beta1
+kind: PodSecurityPolicy
+metadata:
+  annotations:
+    seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default
+    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
+  name: eirini-events
+spec:
+  allowPrivilegeEscalation: false
+  fsGroup:
+    ranges:
+    - max: 65535
+      min: 1
+    rule: MustRunAs
+  hostIPC: false
+  hostNetwork: false
+  hostPID: false
+  privileged: false
+  readOnlyRootFilesystem: false
+  requiredDropCapabilities:
+  - ALL
+  runAsUser:
+    rule: MustRunAsNonRoot
+  seLinux:
+    rule: RunAsAny
+  supplementalGroups:
+    ranges:
+    - max: 65535
+      min: 1
+    rule: MustRunAs
+  volumes:
+  - configMap
+  - secret
+  - projected
+---
 apiVersion: v1
 kind: ServiceAccount
 metadata:
@@ -105,6 +140,12 @@ metadata:
 ---
 apiVersion: v1
 kind: ServiceAccount
+metadata:
+  name: eirini-events
+  namespace: cf-system
+---
+apiVersion: v1
+kind: ServiceAccount
 metadata:
   name: opi
   namespace: cf-system
@@ -251,6 +292,41 @@ rules:
   - use
 ---
 apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: eirini-events
+  namespace: cf-workloads
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - pods
+  verbs:
+  - list
+  - watch
+- apiGroups:
+  - ""
+  resources:
+  - events
+  verbs:
+  - list
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: eirini-events-psp
+  namespace: cf-system
+rules:
+- apiGroups:
+  - policy
+  resourceNames:
+  - eirini-events
+  resources:
+  - podsecuritypolicies
+  verbs:
+  - use
+---
+apiVersion: rbac.authorization.k8s.io/v1
 kind: RoleBinding
 metadata:
   name: cf-workloads-app-rolebinding
@@ -298,6 +374,34 @@ subjects:
   name: eirini-secret-smuggler
   namespace: cf-system
 ---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: eirini-events
+  namespace: cf-workloads
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: eirini-events
+subjects:
+- kind: ServiceAccount
+  name: eirini-events
+  namespace: cf-system
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: eirini-events-psp
+  namespace: cf-system
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: eirini-events-psp
+subjects:
+- kind: ServiceAccount
+  name: eirini-events
+  namespace: cf-system
+---
 apiVersion: v1
 kind: Service
 metadata:
@@ -384,3 +488,61 @@ spec:
               - key: tls.ca
                 path: eirini.ca
               name: eirini-internal-tls-certs
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  annotations:
+    kbld.k14s.io/images: |
+      - Metas: null
+        URL: index.docker.io/eirini/event-reporter@sha256:a1c6d5dfe8961856d09a6f32169a2162e8b9b3e2b492c980183aa1fd8d064129
+  name: eirini-events
+  namespace: cf-system
+spec:
+  selector:
+    matchLabels:
+      name: eirini-events
+  template:
+    metadata:
+      labels:
+        name: eirini-events
+    spec:
+      containers:
+      - image: index.docker.io/eirini/event-reporter@sha256:a1c6d5dfe8961856d09a6f32169a2162e8b9b3e2b492c980183aa1fd8d064129
+        imagePullPolicy: Always
+        name: event-reporter
+        resources:
+          requests:
+            cpu: 15m
+            memory: 15Mi
+        volumeMounts:
+        - mountPath: /etc/eirini/config
+          name: config-map-volume
+        - mountPath: /etc/eirini/secrets
+          name: cf-secrets
+      dnsPolicy: ClusterFirst
+      securityContext:
+        runAsNonRoot: true
+      serviceAccountName: eirini-events
+      volumes:
+      - configMap:
+          items:
+          - key: events.yml
+            path: events.yml
+          name: eirini
+        name: config-map-volume
+      - name: cf-secrets
+        projected:
+          sources:
+          - secret:
+              items:
+              - key: tls.crt
+                path: cc.crt
+              - key: tls.key
+                path: cc.key
+              name: eirini-internal-tls-certs
+          - secret:
+              items:
+              - key: tls.ca
+                path: cc.ca
+              name: eirini-internal-tls-certs

and in cf-acceptance-tests:

diff --git a/assets/catnip/signal/signal.go b/assets/catnip/signal/signal.go
index 5fc05bf4..d0d611aa 100644
--- a/assets/catnip/signal/signal.go
+++ b/assets/catnip/signal/signal.go
@@ -1,11 +1,23 @@
 package signal
 
 import (
+	"fmt"
+	"io"
 	"net/http"
 	"os"
 )
 
 func KillHandler(res http.ResponseWriter, req *http.Request) {
-	currentProcess, _ := os.FindProcess(os.Getpid())
+	pid := os.Getpid()
+	fmt.Fprintf(os.Stdout, "About to kill process %d\n", pid)
+	io.WriteString(res, fmt.Sprintf("About to kill process %d\n", pid))
+	currentProcess, _ := os.FindProcess(pid)
 	currentProcess.Kill()
+	fmt.Fprintf(os.Stdout, "Did kill process %d\n", pid)
+	io.WriteString(res, fmt.Sprintf("Did kill process %d\n", pid))
+	fmt.Fprintf(os.Stdout, "About to exit process %d\n", pid)
+	io.WriteString(res, "About to exit...")
+	os.Exit(0)
+	fmt.Fprintf(os.Stdout, "Did exit process %d\n", pid)
+	io.WriteString(res, "Did exit")
 }

Ran kapp deploy -a cf <(ytt -f config -f $values_file) -y
z catnip
cf push catnip
curl catnip.DOMAIN/sigterm/KILL
cf events catnip

What was expected to happen

I was expecting to see some events that contained the string
audit.apps.process.crash

What actually happened

Getting events for app catnip in org org / space space as admin...

time                          event                      actor             description
2020-06-25T16:32:37.00-0700   audit.app.droplet.create   [email protected]
2020-06-25T16:31:56.00-0700   audit.app.update           [email protected]   state: STARTED
2020-06-25T16:31:56.00-0700   audit.app.build.create     [email protected]
2020-06-25T16:31:56.00-0700   audit.app.update           [email protected]   state: STOPPED
2020-06-25T16:31:49.00-0700   audit.app.upload-bits      [email protected]
2020-06-25T16:31:48.00-0700   audit.app.update           [email protected]   disk_quota: 1024, instances: 1, memory: 1024
2020-06-25T16:29:51.00-0700   audit.app.droplet.create   [email protected]
2020-06-25T16:29:08.00-0700   audit.app.update           [email protected]   state: STARTED
2020-06-25T16:29:08.00-0700   audit.app.build.create     [email protected]
2020-06-25T16:29:07.00-0700   audit.app.update           [email protected]   state: STOPPED
2020-06-25T16:29:01.00-0700   audit.app.upload-bits      [email protected]
2020-06-25T16:28:58.00-0700   audit.app.update           [email protected]   disk_quota: 1024, instances: 1, memory: 1024
2020-06-25T16:18:40.00-0700   audit.app.droplet.create   [email protected]
2020-06-25T16:18:03.00-0700   audit.app.update           [email protected]   state: STARTED
2020-06-25T16:18:03.00-0700   audit.app.build.create     [email protected]
2020-06-25T16:17:56.00-0700   audit.app.upload-bits      [email protected]
2020-06-25T16:17:53.00-0700   audit.app.map-route        [email protected]
2020-06-25T16:17:53.00-0700   audit.app.create           [email protected]   instances: 1, state: STOPPED, environment_json: [PRIVATE DATA HIDDEN]

Additional information (optional)

The catnip app without os.Exit(1) doesn't seem to actually exit.

Routing enhancements

We're from the Networking team and are investigating how we might make Routing work better for Eirini. Things such as:

support route services
support TCP routing
support per-instance routing
integrate with Istio or other service mesh tech

We want to check some assumptions we have and ask for guidance about where we might make enhancements.

Things we think are true:

Eirini Route Emitter today discovers route info from annotations on statefulsets/pods that look like:

'[{"hostname":"dora.apps.lagunabeach.cf-app.com","port":8080},{"hostname":"proxy-and-dora.apps.lagunabeach.cf-app.com","port":8080}]'

this JSON doesn't currently carry data about route services, TCP routing, and other stuff that is present in the Diego DesiredLRP Routes field

Is this correct?

Options we're considering:

Enhance Cloud Controller and/or Eirini to put the full Diego Routes data in the annotations
Enhance Cloud Controller and/or Eirini to put Routing info (e.g. CAPI Route and RouteMapping) into K8s API as Custom Resources
???

Do you have opinions about which of these options we should pursue? Any pointers about where we might start if we wanted to open a pull-request?

cc @rosenhouse

Making automountServiceAccountToken property configurable

Description

While upgrading Istio to 1.6, we discovered that the istio sidecars depend on service account tokens being mounted in environments where third party jwt tokens aren't supported (like kind). Currently this value is hard coded to false. We were hoping this could be configurable, so that in environments like kind, cf-for-k8s has the option to set that to true.

Suggested fix (optional)

There were some ideas discussed in this thread

Another suggestion made was to separate the service account used by the apps, from the service account used by the "statefulset desirer" here. If there are two service accounts, both would need the automountServiceAccountToken property to be configurable.

cc my pair @jenspinney

cloudfoundry / eirini Goto Github PK

eirini's Introduction

What is Eirini?

Components

CI Pipelines

Contributing

Have a question or feedback? Reach out to us!

eirini's People

Contributors

Stargazers

Watchers

Forkers

eirini's Issues

Description

Steps to reproduce

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Additional information (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Additional information (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Additional information (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Request

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Additional information (optional)

tl;dr

Background

Problem

Proposal

Misc

Example StatefulSet (current)

Description

Suggested fix (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Additional information (optional)

Description

Description

Steps to reproduce

What was expected to happen

What actually happened

Suggested fix (optional)

Description

Steps to reproduce

What was expected to happen

What actually happened

Additional information (optional)

Description

Steps to reproduce