Coder Social home page Coder Social logo

remote-controller's Introduction

Lagoon Remote Controller

This project is comprised of controllers responsible for handling Kubernetes and Openshift build deploy and removal of environments for Lagoon. It also handles Lagoon tasks that are triggered via the Lagoon UI, and also more advanced tasks that Lagoon can leverage.

Install

See lagoon-charts

Usage

The controllers have the ability to start with and without messaging queue support.

With MQ

This is the preferred way to be installed, it reads messages from dedicated queues that are sent from Lagoon. The recieved message contains everything that the LagoonBuild or LagoonTask spec needs to start doing the work in the destination cluster.

Without MQ

This is handy for testing scenarios, a K3D or KinD can be started locally and the controllers installed into it. A user can then define or craft a LagoonBuild or LagoonTask spec manually and apply it to the local cluster. There is currently no documentation for how to do this, we may release more information on how to do this after some more testing has been done.

remote-controller's People

Contributors

bomoko avatar cdchris12 avatar cgoodwin90 avatar rocketeerbkw avatar schnitzel avatar seanhamlin avatar shreddedbacon avatar smlx avatar tobybellwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

bomoko

remote-controller's Issues

possibly out of sync messages

Have seen some reports of the final message being out of sync potentially. Where the completion message is sent before the last running message so the status gets reset to running but retains all the conditions of being complete

Alert when multiple Builds fail at the same time

Single failed builds will not be very informational to alert about, as also code issues can cause failed deployments.
But we could try to implement a logic in the system that realizes if all started builds in the last 15mins have all failed, which would point to a more infrastructure issue than an individual environment issue.

build priority cancellations

build cancellation should take into consideration build priorities

if a high priority build is pending, and a lower priority build is triggerd, the controller should not cancel the higher priority build

Write tests

- start a KinD
- start a local registry
- start a local mariadb
- start a local rabbitmq

  • start a local mq listener (need to build this)

- install dbaas-operator
- install this operator
- deploy an example lagoon project using the latest amazeeio/kubectl-build-deploy-dind:x image

Check if namespace is being deleted

If a namespace is being deleted, don't try and create the build for this environment and fail the build (it won't work anyway, so failing it straight away is the same thing)

On the failure, we can inject a log entry with some potential debugging information.

Refactor build monitor to allow incremental progress updates to be sent to logs exchange

uselagoon/lagoon#2862 implements logs2s3, and solves part of the problem of uselagoon/lagoon#2835

To fully solve, the build monitor needs to be able to periodically send updates, or send updates when a new label lagoon.sh/buildStep transitions between steps of the build

This gist provides a rough idea of how the buildStep patching will work https://gist.github.com/shreddedbacon/959a21a33cd2d95db5b1de1dc155047d

Handle active/standby entirely in Lagoon

It would be good to have one less external system to maintain. Initial thoughts are to deprecate dioscuri (https://github.com/amazeeio/dioscuri) and handle the full route handling inside of the task image

Initially it would be simply having the controller inspect the ingress migration payload to extract the source and destination namespaces and then add the source namespace lagoon-deployer service account to the destination namespace lagoon-deployer-admin rolebinding, this would grant permission to do the required changes in both namespaces, this could be handled here (https://github.com/uselagoon/remote-controller/blob/main/handlers/misctask_handler.go#L126)

All the existing code from dioscuri for handling the migrations can be bundled into the activestandby task image (https://github.com/uselagoon/lagoon/tree/main/taskimages/activestandby)

Error on build repeat

For I re-ran the Lagoon test suite after a build failure and saw this error:

2020-09-16T12:37:31.164Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "lagoonbuild", "request": "ci-features-control-k8s-lagoon-type-override/lagoon-build-2g7rk", "error": "pods \"lagoon-build-2g7rk\" already exists"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88

how to abort a build

I ran into an issue where I wanted to abort a build (the build was stuck on waiting for a deployment with a crashloopbacking pod to finish, which takes a long time)

There was already a new build scheduled from lagoon (dashboard showed new)

I then tried:

  1. delete the build pod, nothing happened
  2. delete the LagoonBuild object, also nothing happened
  3. only after I restarted the whole lagoon-build-deploy the new build was picked up

Did I do something wrong? Can we make that the build is marked as failed by the controller after the build pod is deleted?

AKS: during decomissioning of lagoon enviornment: delete all PVCs before namespaces

Azure AKS has a strange but, when you delete a namespace it tries to delete the PVCs in it, but unfortunately a secret that is used to provision the PVCs is deleted before the PVCs itself and so the provisioner of the PVC cannot remove it (slowclap).

Therefore we need to:
If we are on AKS and we're decommissioning a Lagoon Environment: Remove all Deployments/Statefulset/Replicasets/Deamonsets (which removes all Pods), then remove the PVCs, wait for all PVCs to be removed, then remove the PVC

Openshift Task creation fails with `an empty namespace may not be set during creation`

I merged #25 and released it under amazeeio/lagoon-builddeploy:v0.1.14-pr25, but we have now an issue when creating a task via the controller on an openshift:

  • controller is running inlagoon-build-deploy on ch1
  • deploys to drupal-example-1 also on ch1
2021-06-28T20:19:02.575Z	INFO	controllers.LagoonTask	Checking task pod for: lagoon-task-10692-ngpw1d	{"lagoontask": "drupal-example-1-master/lagoon-task-10692-ngpw1d"}
2021-06-28T20:19:02.587Z	INFO	controllers.LagoonTask	Creating task pod for: lagoon-task-10692-ngpw1d	{"lagoontask": "drupal-example-1-master/lagoon-task-10692-ngpw1d"}
2021-06-28T20:19:02.587Z	INFO	controllers.LagoonTask	Unable to create task pod for project drupal-example-1, environment master: an empty namespace may not be set during creation	{"lagoontask": "drupal-example-1-master/lagoon-task-10692-ngpw1d"}
2021-06-28T20:19:02.587Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "lagoontask", "request": "drupal-example-1-master/lagoon-task-10692-ngpw1d"}

Controller injects wrong harbor credentials into container

with the recent fix to append harbor credentials instead of replacing them: #56
there is now a possibility that the controller injects the wrong credentials into the build container:
https://github.com/amazeeio/lagoon-kbd/blob/12419865e6ffc971d73befcc2b251420303b781f/controllers/lagoonbuild_controller.go#L451-L462
basically the code above loads the registries defined in the lagoon-internal-registry-secret secret and injects them into the docker container. If there are multiple registries defined in the lagoon-internal-registry-secret secret (like it can happen now) it's possible that the wrong registry is injected.

We probably should filter for the actual defined harbor registry to the controller and only inject that one into the build pods

Support Openshift 3.11

Add support for openshift 3.11

Need to be able to create ProjectRequests and handle the differences between the build template and kubernetes build template variables, as there are some differences between route/ingress naming, and a few other bits.

Support for multiple controllers in a cluster

Being able to run multiple controllers so that each can talk to a different Lagoon is required

To support this, controllers should add Labels/Annotations and OwnerReferences to objects that it creates and watches, this would allow for multiple controllers to run without having to worry about builds or tasks from one controller being sent to a different Lagoon.

Additional to this is supporting a namespace prefix that would get added to all environments when they are created.
Having namespace prefixes allows for two or more controllers to create namespaces without having to worry about clashing names. This already sort of exists in the controller, but would need to be refined/exposed as a flag/variable definition.

Handle AMQP not being available on startup

If rabbitmq/amqp is not available when the operator starts, it won't attempt to re-connect.

This isn't a problem if the broker disappears after the operator is running, as it will try to reconnect based on the reconnect interval defined.

Need to build in a way to attempt to retry connection the same as if the broker disconnects during operation.

Fail build on any errors produced by the controller

Currently if a step in provisioning the build by the controller fails, it can get stuck in a state where it may keep trying.

Instead of returning an error and re-trying, the build should fail and report back that it has failed to Lagoon.

Cancellation changes

If a cancellation is triggered on an already complete job that hadn't updated the API, try to collect the state from the build resource and send that back, otherwise fall back to cancelled state.

controller sigsevs on misc task

image running: amazeeio/lagoon-builddeploy:v0.1.6

error message:

2021-01-05T21:40:31.839Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2021-01-05T21:40:31.840Z	INFO	setup	starting messaging handler
2021-01-05T21:40:31.841Z	INFO	setup	starting controllers
2021-01-05T21:40:31.841Z	INFO	setup	starting manager
2021-01-05T21:40:31.841Z	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
2021-01-05T21:40:32.104Z	INFO	handlers.LagoonTasks	Listening for lagoon-tasks:test8.amazee.io:builddeploy
2021-01-05T21:40:32.104Z	INFO	handlers.LagoonTasks	Listening for lagoon-tasks:test8.amazee.io:remove
2021-01-05T21:40:32.104Z	INFO	handlers.LagoonTasks	Listening for lagoon-tasks:test8.amazee.io:jobs
2021-01-05T21:40:32.104Z	INFO	handlers.LagoonTasks	Listening for lagoon-tasks:test8.amazee.io:misc
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x124dd80]

goroutine 258 [running]:
github.com/amazeeio/lagoon-kbd/handlers.(*Messaging).Consumer.func6(0x7fba39f67a68, 0xc0000c05d8)
	/workspace/handlers/message_queue.go:286 +0x5e0
github.com/cheshir/go-mq.(*worker).Run(0xc0002301c0, 0xc0000b4300)
	/go/pkg/mod/github.com/shreddedbacon/[email protected]/consumer.go:107 +0x156
created by github.com/cheshir/go-mq.(*consumer).consume
	/go/pkg/mod/github.com/shreddedbacon/[email protected]/consumer.go:62 +0x7a

message in the queue:

{
  "misc": {
    "miscResource": "eyJhcGlWZXJzaW9uIjoiYmFja3VwLmFwcHVpby5jaC92MWFscGhhMSIsImtpbmQiOiJSZXN0b3JlIiwibWV0YWRhdGEiOnsibmFtZSI6InJlc3RvcmUtNjgxM2RmNCJ9LCJzcGVjIjp7InNuYXBzaG90IjoiNjgxM2RmNDc0YWNiZTUwMjI1ZTk1ZDFhMTMxMDU0YTVhMmIzMTM2YzVmNTIyMGU2ZTM5YzE2ODczODdiOTFjZiIsInJlc3RvcmVNZXRob2QiOnsiczMiOnt9fSwiYmFja2VuZCI6eyJzMyI6eyJidWNrZXQiOiJiYWFzLWRydXBhbC1leGFtcGxlLXRlc3Q4In0sInJlcG9QYXNzd29yZFNlY3JldFJlZiI6eyJrZXkiOiJyZXBvLXB3IiwibmFtZSI6ImJhYXMtcmVwby1wdyJ9fX19"
  },
  "key": "kubernetes:restic:backup:restore",
  "environment": {
    "name": "master",
    "openshiftProjectName": "drupal-example-test8-master"
  },
  "project": {
    "name": "drupal-example-test8"
  },
  "advancedTask": {}
}

decoded misc.miscResource:

{
  "apiVersion": "backup.appuio.ch/v1alpha1",
  "kind": "Restore",
  "metadata": {
    "name": "restore-6813df4"
  },
  "spec": {
    "snapshot": "6813df474acbe50225e95d1a131054a5a2b3136c5f5220e6e39c1687387b91cf",
    "restoreMethod": {
      "s3": {}
    },
    "backend": {
      "s3": {
        "bucket": "baas-drupal-example-test8"
      },
      "repoPasswordSecretRef": {
        "key": "repo-pw",
        "name": "baas-repo-pw"
      }
    }
  }
}

Manually cancelled builds show wrong message

When manually cancelling a build, the controller cancels the build properly but then another process runs in and changes the cancelled log message to one that indicates it was cancelled by a newer build, which is not the case.

========================================
Build cancelled
========================================
This build was cancelled as a newer build was triggered.

Don't use camelcase for flags

I just realised the backup retention/default flags use camelCase and not kebab-case like other flags. I should have picked this up before I merged it but here we are.

	flag.StringVar(&backupDefaultSchedule, "backupDefaultSchedule", "M H(22-2) * * *",
		"The default backup schedule for all projects on this cluster.")
	flag.IntVar(&backupDefaultMonthlyRetention, "backupDefaultMonthlyRetention", 1,
		"The number of monthly backups k8up should retain after a prune operation.")
	flag.IntVar(&backupDefaultWeeklyRetention, "backupDefaultWeeklyRetention", 6,
		"The number of weekly backups k8up should retain after a prune operation.")
	flag.IntVar(&backupDefaultDailyRetention, "backupDefaultDailyRetention", 7,
		"The number of daily backups k8up should retain after a prune operation.")
	flag.IntVar(&backupDefaultHourlyRetention, "backupDefaultHourlyRetention", 0,
		"The number of hourly backups k8up should retain after a prune operation.")

Deploy fails immediately on ImagePullBackoff

I had a deploy fail almost immediately when the kubectl-build-deploy-dind image went into ImagePullBackoff. Here's what happened:

  1. Started several deploys, got success back from Lagoon API.
  2. Lagoon build pods appeared.
  3. One build pod went into ImagePullBackoff (the others started running).
  4. The ImagePullBackoff build pod disappeared.
  5. Deploy shown as failed in Lagoon dashboard:

Screenshot from 2021-05-25 16-25-09

I would have expected the pod to eventually start running instead of failing the deploy. The other builds started at the same time ran fine, so the image pull error may have just been a transient network issue?

I ran deploy on this environment a second time and it ran through fine.

Deleting an env with no namespace does not delete in lagoon

When deleting an environment in lagoon that has already had the namespace deleted in kubernetes, the controller does not respond to lagoon appropriately to let it know the namespace doesn't exist.

This causes lagoon to not actually delete the environment.

Openshift 4 Permission error when patching build pods

Error from server (Forbidden): pods "lagoon-build-h46jdp" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "pcap-dedicated-admins": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "splunkforwarder": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

The controller should start the build pod using a different service account, preferably the one that owns the token being mounted into the pod

Metrics

Could be good to get metrics for prometheus

Rootless builds

While working on uselagoon/lagoon#2481 I noticed that the build pods are running as root. I just wanted to start a discussion about the best way to update them to run as a non-root user.

Do you see any issues with running as a different user? And where would be the best place to define a securityContext on the build pod? Hard-coding it in lagoon-kbd would be one way.. but is that the best way to go?

check if pull secret actually includes robot account for given harbor during `CreateOrRefreshRobot`

during some more edge cases I found another possible issue, that would make the whole system more resilient against edge cases.

Assuming:

  • There was a successful deployment happening where the robot account was added to the secret lagoon-internal-registry-secret
  • The robot account in harbor is NOT deactivated, deleted or expired
  • Something (a bot, a human) removes the robot account credentials from the secret lagoon-internal-registry-secret (this is the edge case, but it is possible that this happens)

Then during CreateOrRefreshRobot() the code does not actually realize that the robot account is not existing in lagoon-internal-registry-secret and just continues, causing the deployment to fail.

So my suggestion is that we should actually check if the secret lagoon-internal-registry-secret contains a robot account for the current harbor and if not, force recreate the robot account.

`pullrequest.number` dissapears causing issues to deploy

I'm currently testing lagoon-master with lagoon-kbd against a kubernetes cluster and found the following issue:

Lagoon-master creates a rabbitmq message like this:

{
  "metadata": {
    "name": "lagoon-build-qzzyhp",
    "namespace": "lagoon"
  },
  "spec": {
    "build": {
      "type": "pullrequest",
      "image": {},
      "ci": "false"
    },
    "branch": {
      "name": "pr-78"
    },
    "pullrequest": {
      "headBranch": "pr-testing",
      "headSha": "14627d2b6061ed1a1a1b8a692963145271a9a5d8",
      "baseBranch": "master",
      "baseSha": "229ec3e57e220114e1c6ab60dde07a8ea12b873b",
      "title": "Update system.site.yml - update",
      "number": "78"
    },
    "project": {
      "name": "drupal-example-test1-controller",
      "gitUrl": "[email protected]:amazeeio/drupal-example.git",
      "uiLink": "https://ui-lagoon-master.ch.amazee.io/projects/drupal-example-test1-controller/drupal-example-test1-controller-pr-78/deployments/lagoon-build-qzzyhp",
      "environment": "pr-78",
      "environmentType": "development",
      "productionEnvironment": "master2",
      "standbyEnvironment": "master",
      "subfolder": "",
      "routerPattern": "pr-78.drupal-example-test1-controller.test1.amazee.io",
      "deployTarget": "test1.amazee.io",
      "projectSecret": "7d6c71a7f475e53b96142e1ce7b141e80f8de59365e17b964a6085fd4bd96f73",
      "registry": "registry.lagoon.svc:5000",
      "monitoring": {
        "contact": "",
        "statuspageID": "null"
      },
      "variables": {
        "project": "W3sibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1VSTCIsInZhbHVlIjoiaHR0cHM6Ly9oYXJib3ItbmdpbngtbGFnb29uLW1hc3Rlci5jaC5hbWF6ZWUuaW8iLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9LHsibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1VTRVJOQU1FIiwidmFsdWUiOiJyb2JvdCRkcnVwYWwtZXhhbXBsZS10ZXN0MS1jb250cm9sbGVyLTE2MDc5MTEyNTMiLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9LHsibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1BBU1NXT1JEIiwidmFsdWUiOiJleUpoYkdjaU9pSlNVekkxTmlJc0luUjVjQ0k2SWtwWFZDSjkuZXlKbGVIQWlPakUyTlRFeE1URXlOVE1zSW1saGRDSTZNVFl3TnpreE1USTFNeXdpYVhOeklqb2lhR0Z5WW05eUxYUnZhMlZ1TFdSbFptRjFiSFJKYzNOMVpYSWlMQ0pwWkNJNk16SXhOVGM1TENKd2FXUWlPakV4TkRNc0ltRmpZMlZ6Y3lJNlczc2lVbVZ6YjNWeVkyVWlPaUl2Y0hKdmFtVmpkQzh4TVRRekwzSmxjRzl6YVhSdmNua2lMQ0pCWTNScGIyNGlPaUp3ZFhOb0lpd2lSV1ptWldOMElqb2lJbjFkZlEucG43c3VDTlBUYTl3bmhuY1J1cVozZEFIQ09ka3RTWXpQdmo5TFlzbHhFWm1hd25uMWFwSU9VZGdGVHZ0YVk1cDRwbGZTQlM0WlZndm81U0tHWEJIOWprSVFpWjFZVkx5dnZPOU43YWROVi1sakNTaXNaSDNGaGlmZHRQSUkySkxtUTV6QldOOUhSUjAzcmZKMkdCeDVndkd0Uy1aUkVsSFlFRUpaeWdTVXRvMGk3MzFPWmhmSGVlRTE1UW83cEh2Q2hxWkhhTXJsOS0wamZGN01RTTltNGN3MGNpWllUa0NhMkxFQkt6aWVvT1hnN3IweVlSd2ZFeGlGTWJ2bjBJZTV3YXFmZEU5UEZTR2FYWTRmcFJIX0Q3dlZFLXdBN3FmMGYyWktIS2s0cGY5dlpQd1JXLWRQU0t1RGl0blhycC1qZ3VzR1R6UEhTczFEdDlLZDhxb1dZYTNZbHVVRnVCcndUeDJndFdVb2g1NnRibjlHMGloTG5ybUwyZUNUUzRBRzNWbW1kUWRhUGN6WTZFejhGdVh6SW9UVHhmSUxjTlNHWGljM00yN1hGX2RON2d0OWpQaTVfWEJMekNWYVRQNHZ6TDN4eVByVHV5dWNseWFVR0JkNl9GelRIeWRwNXlJczVRS2czREhCSm9ielB4ZzNCZEJqbTNXNmd5blh2bmNadWU5UmVqMmdHb3lNa0xPT21QS2VWcnhXNm1uWmptd2l4blR1cE5vVzA2Vktqek9ZRWRHenkwVmJKQnBjVk03Y2xCQ1BYb2x0TElGQ3hsRWVib3lzV1NYVGk4dk0zeXR4WEFfenlkeG5uNDVET3RVQkxBdENtUlFWSXJuYUpMNDFpSGtwZ0NtVU9yWW1zNFlkSWRhMVAzbXRhR3hSc0pEckI4RzVkZkFnelUiLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9XQ==",
        "environment": "W10="
      }
    }
  }
}

see the spec.pullrequest.number in there

but then the LagoonBuild object is created by lagoon-build-deploy and it disapears:

kind: LagoonBuild
apiVersion: lagoon.amazee.io/v1alpha1
metadata:
  creationTimestamp: '2020-12-24T15:11:26Z'
  finalizers:
    - finalizer.lagoonbuild.lagoon.amazee.io/v1alpha1
  generation: 2
  labels:
    lagoon.sh/buildStatus: Failed
    lagoon.sh/controller: lagoon
  managedFields:
    - apiVersion: lagoon.amazee.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:finalizers':
            .: {}
            'v:"finalizer.lagoonbuild.lagoon.amazee.io/v1alpha1"': {}
          'f:labels':
            .: {}
            'f:lagoon.sh/buildStatus': {}
            'f:lagoon.sh/controller': {}
        'f:spec':
          .: {}
          'f:branch':
            .: {}
            'f:name': {}
          'f:build':
            .: {}
            'f:ci': {}
            'f:type': {}
          'f:gitReference': {}
          'f:project':
            .: {}
            'f:deployTarget': {}
            'f:environment': {}
            'f:environmentType': {}
            'f:gitUrl': {}
            'f:key': {}
            'f:monitoring':
              .: {}
              'f:statuspageID': {}
            'f:name': {}
            'f:productionEnvironment': {}
            'f:projectSecret': {}
            'f:registry': {}
            'f:routerPattern': {}
            'f:standbyEnvironment': {}
            'f:uiLink': {}
            'f:variables':
              .: {}
              'f:environment': {}
              'f:project': {}
          'f:promote': {}
          'f:pullrequest':
            .: {}
            'f:baseBranch': {}
            'f:baseSha': {}
            'f:headBranch': {}
            'f:headSha': {}
            'f:title': {}
        'f:status':
          .: {}
          'f:conditions': {}
      manager: manager
      operation: Update
      time: '2020-12-24T15:11:44Z'
  name: lagoon-build-qzzyhp
  namespace: drupal-example-test1-controller-pr-78
  resourceVersion: '132895455'
  selfLink: >-
    /apis/lagoon.amazee.io/v1alpha1/namespaces/drupal-example-test1-controller-pr-78/lagoonbuilds/lagoon-build-qzzyhp
  uid: f969b554-bed7-4fa2-9c9f-70881db54e80
spec:
  branch:
    name: pr-78
  build:
    ci: 'false'
    type: pullrequest
  gitReference: ''
  project:
    deployTarget: test1.amazee.io
    environment: pr-78
    environmentType: development
    gitUrl: '[email protected]:amazeeio/drupal-example.git'
    monitoring:
      statuspageID: 'null'
    name: drupal-example-test1-controller
    productionEnvironment: master2
    projectSecret: 7d6c71a7f475e53b96142e1ce7b141e80f8de59365e17b964a6085fd4bd96f73
    registry: 'registry.lagoon.svc:5000'
    routerPattern: pr-78.drupal-example-test1-controller.test1.amazee.io
    standbyEnvironment: master
    uiLink: >-
      https://ui-lagoon-master.ch.amazee.io/projects/drupal-example-test1-controller/drupal-example-test1-controller-pr-78/deployments/lagoon-build-qzzyhp
    variables:
      environment: W10=
      project: >-
        W3sibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1VSTCIsInZhbHVlIjoiaHR0cHM6Ly9oYXJib3ItbmdpbngtbGFnb29uLW1hc3Rlci5jaC5hbWF6ZWUuaW8iLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9LHsibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1VTRVJOQU1FIiwidmFsdWUiOiJyb2JvdCRkcnVwYWwtZXhhbXBsZS10ZXN0MS1jb250cm9sbGVyLTE2MDc5MTEyNTMiLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9LHsibmFtZSI6IklOVEVSTkFMX1JFR0lTVFJZX1BBU1NXT1JEIiwidmFsdWUiOiJleUpoYkdjaU9pSlNVekkxTmlJc0luUjVjQ0k2SWtwWFZDSjkuZXlKbGVIQWlPakUyTlRFeE1URXlOVE1zSW1saGRDSTZNVFl3TnpreE1USTFNeXdpYVhOeklqb2lhR0Z5WW05eUxYUnZhMlZ1TFdSbFptRjFiSFJKYzNOMVpYSWlMQ0pwWkNJNk16SXhOVGM1TENKd2FXUWlPakV4TkRNc0ltRmpZMlZ6Y3lJNlczc2lVbVZ6YjNWeVkyVWlPaUl2Y0hKdmFtVmpkQzh4TVRRekwzSmxjRzl6YVhSdmNua2lMQ0pCWTNScGIyNGlPaUp3ZFhOb0lpd2lSV1ptWldOMElqb2lJbjFkZlEucG43c3VDTlBUYTl3bmhuY1J1cVozZEFIQ09ka3RTWXpQdmo5TFlzbHhFWm1hd25uMWFwSU9VZGdGVHZ0YVk1cDRwbGZTQlM0WlZndm81U0tHWEJIOWprSVFpWjFZVkx5dnZPOU43YWROVi1sakNTaXNaSDNGaGlmZHRQSUkySkxtUTV6QldOOUhSUjAzcmZKMkdCeDVndkd0Uy1aUkVsSFlFRUpaeWdTVXRvMGk3MzFPWmhmSGVlRTE1UW83cEh2Q2hxWkhhTXJsOS0wamZGN01RTTltNGN3MGNpWllUa0NhMkxFQkt6aWVvT1hnN3IweVlSd2ZFeGlGTWJ2bjBJZTV3YXFmZEU5UEZTR2FYWTRmcFJIX0Q3dlZFLXdBN3FmMGYyWktIS2s0cGY5dlpQd1JXLWRQU0t1RGl0blhycC1qZ3VzR1R6UEhTczFEdDlLZDhxb1dZYTNZbHVVRnVCcndUeDJndFdVb2g1NnRibjlHMGloTG5ybUwyZUNUUzRBRzNWbW1kUWRhUGN6WTZFejhGdVh6SW9UVHhmSUxjTlNHWGljM00yN1hGX2RON2d0OWpQaTVfWEJMekNWYVRQNHZ6TDN4eVByVHV5dWNseWFVR0JkNl9GelRIeWRwNXlJczVRS2czREhCSm9ielB4ZzNCZEJqbTNXNmd5blh2bmNadWU5UmVqMmdHb3lNa0xPT21QS2VWcnhXNm1uWmptd2l4blR1cE5vVzA2Vktqek9ZRWRHenkwVmJKQnBjVk03Y2xCQ1BYb2x0TElGQ3hsRWVib3lzV1NYVGk4dk0zeXR4WEFfenlkeG5uNDVET3RVQkxBdENtUlFWSXJuYUpMNDFpSGtwZ0NtVU9yWW1zNFlkSWRhMVAzbXRhR3hSc0pEckI4RzVkZkFnelUiLCJzY29wZSI6ImludGVybmFsX2NvbnRhaW5lcl9yZWdpc3RyeSJ9XQ==
  promote: {}
  pullrequest:
    baseBranch: master
    baseSha: 229ec3e57e220114e1c6ab60dde07a8ea12b873b
    headBranch: pr-testing
    headSha: 14627d2b6061ed1a1a1b8a692963145271a9a5d8
    title: Update system.site.yml - update

see that now spec.pullrequest.number is missing

the pod that is created to deploy PRs then looks like this:

kind: Pod
apiVersion: v1
metadata:
  name: lagoon-build-qzzyhp
  namespace: drupal-example-test1-controller-pr-78
  selfLink: >-
    /api/v1/namespaces/drupal-example-test1-controller-pr-78/pods/lagoon-build-qzzyhp
  uid: 09b491ed-f686-4f37-b87a-52c9983527e6
  resourceVersion: '132895453'
  creationTimestamp: '2020-12-24T15:11:27Z'
  labels:
    lagoon.sh/buildName: lagoon-build-qzzyhp
    lagoon.sh/controller: lagoon
    lagoon.sh/jobType: build
  annotations:
    kubernetes.io/psp: eks.privileged
  ownerReferences:
    - apiVersion: lagoon.amazee.io/v1alpha1
      kind: LagoonBuild
      name: lagoon-build-qzzyhp
      uid: f969b554-bed7-4fa2-9c9f-70881db54e80
spec:
  volumes:
    - name: lagoon-deployer-token-r5b7f
      secret:
        secretName: lagoon-deployer-token-r5b7f
        defaultMode: 420
    - name: lagoon-sshkey
      secret:
        secretName: lagoon-sshkey
        defaultMode: 420
    - name: default-token-mgnb7
      secret:
        secretName: default-token-mgnb7
        defaultMode: 420
  containers:
    - name: lagoon-build
      image: 'imagecache.amazeeio.cloud/amazeeio/kubectl-build-deploy-dind:v1.13.1'
      env:
        - name: SOURCE_REPOSITORY
          value: '[email protected]:amazeeio/drupal-example.git'
        - name: GIT_REF
        - name: SUBFOLDER
        - name: BRANCH
          value: pr-78
        - name: PROJECT
          value: drupal-example-test1-controller
        - name: ENVIRONMENT_TYPE
          value: development
        - name: ACTIVE_ENVIRONMENT
          value: master2
        - name: STANDBY_ENVIRONMENT
          value: master
        - name: PROJECT_SECRET
          value: 7d6c71a7f475e53b96142e1ce7b141e80f8de59365e17b964a6085fd4bd96f73
        - name: MONITORING_ALERTCONTACT
        - name: BUILD_TYPE
          value: pullrequest
        - name: ENVIRONMENT
          value: pr-78
        - name: KUBERNETES
          value: test1.amazee.io
        - name: REGISTRY
          value: 'registry.lagoon.svc:5000'
        - name: ROUTER_URL
          value: pr-78.drupal-example-test1-controller.test1.amazee.io
        - name: CI
          value: 'false'
        - name: PR_HEAD_BRANCH
          value: pr-testing
        - name: PR_HEAD_SHA
          value: 14627d2b6061ed1a1a1b8a692963145271a9a5d8
        - name: PR_BASE_BRANCH
          value: master
        - name: PR_BASE_SHA
          value: 229ec3e57e220114e1c6ab60dde07a8ea12b873b
        - name: PR_TITLE
          value: Update system.site.yml - update
        - name: PR_NUMBER
          value: "\0"
        - name: LAGOON_PROJECT_VARIABLES
          value: >-
            [{"name":"INTERNAL_REGISTRY_URL","value":"https://harbor-nginx-lagoon-master.ch.amazee.io","scope":"internal_container_registry"},{"name":"INTERNAL_REGISTRY_USERNAME","value":"robot$drupal-example-test1-controller-1607911253","scope":"internal_container_registry"},{"name":"INTERNAL_REGISTRY_PASSWORD","value":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTExMTEyNTMsImlhdCI6MTYwNzkxMTI1MywiaXNzIjoiaGFyYm9yLXRva2VuLWRlZmF1bHRJc3N1ZXIiLCJpZCI6MzIxNTc5LCJwaWQiOjExNDMsImFjY2VzcyI6W3siUmVzb3VyY2UiOiIvcHJvamVjdC8xMTQzL3JlcG9zaXRvcnkiLCJBY3Rpb24iOiJwdXNoIiwiRWZmZWN0IjoiIn1dfQ.pn7suCNPTa9wnhncRuqZ3dAHCOdktSYzPvj9LYslxEZmawnn1apIOUdgFTvtaY5p4plfSBS4ZVgvo5SKGXBH9jkIQiZ1YVLyvvO9N7adNV-ljCSisZH3FhifdtPII2JLmQ5zBWN9HRR03rfJ2GBx5gvGtS-ZRElHYEEJZygSUto0i731OZhfHeeE15Qo7pHvChqZHaMrl9-0jfF7MQM9m4cw0ciZYTkCa2LEBKzieoOXg7r0yYRwfExiFMbvn0Ie5waqfdE9PFSGaXY4fpRH_D7vVE-wA7qf0f2ZKHKk4pf9vZPwRW-dPSKuDitnXrp-jgusGTzPHSs1Dt9Kd8qoWYa3YluUFuBrwTx2gtWUoh56tbn9G0ihLnrmL2eCTS4AG3VmmdQdaPczY6Ez8FuXzIoTTxfILcNSGXic3M27XF_dN7gt9jPi5_XBLzCVaTP4vzL3xyPrTuyuclyaUGBd6_FzTHydp5yIs5QKg3DHBJobzPxg3BdBjm3W6gynXvncZue9Rej2gGoyMkLOOmPKeVrxW6mnZjmwixnTupNoW06VKjzOYEdGzy0VbJBpcVM7clBCPXoltLIFCxlEeboysWSXTi8vM3ytxXA_zydxnn45DOtUBLAtCmRQVIrnaJL41iHkpgCmUOrYms4YdIda1P3mtaGxRsJDrB8G5dfAgzU","scope":"internal_container_registry"}]
        - name: MONITORING_STATUSPAGEID
          value: 'null'
      resources: {}
      volumeMounts:
        - name: lagoon-deployer-token-r5b7f
          readOnly: true
          mountPath: /var/run/secrets/lagoon/deployer
        - name: lagoon-sshkey
          readOnly: true
          mountPath: /var/run/secrets/lagoon/ssh
        - name: default-token-mgnb7
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: Always
  restartPolicy: Never
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  nodeName: ip-10-200-123-50.eu-central-1.compute.internal
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: lagoon/build
      operator: Exists
      effect: NoSchedule
    - key: lagoon/build
      operator: Exists
      effect: PreferNoSchedule
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
  priorityClassName: lagoon-priority-production
  priority: 1000000
  enableServiceLinks: true
status:
  phase: Failed
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2020-12-24T15:11:27Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2020-12-24T15:11:27Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [lagoon-build]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2020-12-24T15:11:27Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [lagoon-build]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2020-12-24T15:11:27Z'
  hostIP: 10.200.123.50
  podIP: 10.200.87.81
  podIPs:
    - ip: 10.200.87.81
  startTime: '2020-12-24T15:11:27Z'
  containerStatuses:
    - name: lagoon-build
      state:
        terminated:
          exitCode: 128
          reason: ContainerCannotRun
          message: >-
            OCI runtime create failed: container_linux.go:370: starting
            container process caused: process_linux.go:459: container init
            caused: setenv: invalid argument: unknown
          startedAt: '2020-12-24T15:11:40Z'
          finishedAt: '2020-12-24T15:11:40Z'
          containerID: >-
            docker://6b760f4c4ac254f15a031bbc94a420a953e503ea1e391ec5338494127b13c2de
      lastState: {}
      ready: false
      restartCount: 0
      image: 'imagecache.amazeeio.cloud/amazeeio/kubectl-build-deploy-dind:v1.13.1'
      imageID: >-
        docker-pullable://imagecache.amazeeio.cloud/amazeeio/kubectl-build-deploy-dind@sha256:3fdb6515cb513971adebbb2a1fb87cae860cdd47b34a0fd0f58f9d34f2e5cd86
      containerID: >-
        docker://6b760f4c4ac254f15a031bbc94a420a953e503ea1e391ec5338494127b13c2de
      started: false
  qosClass: BestEffort

with

        - name: PR_NUMBER
          value: "\0"

which then causes kubernetes to fail:

Error: failed to start container "lagoon-build": Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: setenv: invalid argument: unknown

while I think this is actually a kubernetes bug that you should be able to create a pod with \0 as env variable, it's also a bug that the pull request number gets somehow swallowed by the controller.

I found the code that takes the message and creaetes the LagoonBuilds object:
https://github.com/amazeeio/lagoon-kbd/blob/main/handlers/message_queue.go#L95-L106 and I don't see anything wrong with it.

The only piece that I could imagine is that somehow the fact the value in the rabbitmq message is a string, but the lagoon types for LagoonBuilds is defined as a number:
https://github.com/amazeeio/lagoon-kbd/blob/86935f85c0516ee50deff1839000f7567f6b1b53/api/v1alpha1/lagoonbuild_types.go#L142
could cause some issues?

v0.1.5 container just fails and doesn't start at all

v0.1.5rc4 seems to work:

❯ docker run amazeeio/lagoon-builddeploy:v0.1.5rc4
2020-12-23T02:00:59.699Z	ERROR	controller-runtime.client.config	unable to get kubeconfig	{"error": "could not locate a kubeconfig"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/client/config.GetConfigOrDie
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/client/config/config.go:146
main.main
	/workspace/main.go:130
runtime.main
	/usr/local/go/src/runtime/proc.go:203

while v0.1.5 just silently errors:

❯ docker run amazeeio/lagoon-builddeploy:v0.1.5

recreate lagoon-internal-registry-secret secret if it does not exist

during debugging and trying to find a workaround for #57 I found that if:

  1. the secret lagoon-internal-registry-secret does not exist
  2. the robot account in harbor is still valid (not deleted, not expired yet)
    then:

I think we should always upsert the image with the robot account, but I guess this might not be possible as we cannot get the token anymore from harbor after the robot account was created.
So maybe we need to check if the lagoon-internal-registry-secret does not exist, that we forcefully recreate the robot account?

Proxy support for builds

In some cases, the remote could be behind a proxy, it would be handy to be able to pass in the following variables to build pods so they can utilise proxies

  • HTTP_PROXY / http_proxy
  • HTTPS_PROXY / https_proxy
  • NO_PROXY

duplicated restore object name

The kubernetes restore object's name is created by using the first 7 chars of the backup_ip.
Ie: if the backup id is 8888888xxxxxxxxxxxxxxxxx the restore object name will be restore-8888888.
If the restore for any reason fails, the restore object is not deleted and this prevents a new restore to be created (because object already exists).
Probably the best to randomize or add a random string to the restore object to prevent failures.

Define OC/KUBECTL DIND images in Controller

Instead of defining the OC/KUBECTL DIND image overrides in the API / Webhooks2Tasks images, it also makes sense to define the overrides in the controllers themselves.

This would mean that all controllers would need to be updated with the image override though to prevent different controllers from deploying different DIND images

build pods not cleaned up

The build-deploy controller currently doesn't clean up lagoon build pods, which leads to completed build pods hanging around for a long time in project namespaces.

Ideally there would be a policy to clean up build pods - maybe keep the last 3 only? or maybe last n failed and successful?

This doesn't need to be configurable right now, just the functionality with some default parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.