The lagoon-charts from uselagoon

lagoon-remote: convert docker-host to statefulset to make it scalable with PVCs

Currently scaling won't work since there's only a single RWO PVC associated with the deployment.

find better chart release process

So I think we need to find a better chart release process than we have today.

As it is right now with check-version-increment set to the default of true (see: 2331cbd) each PR needs to bump the version to the next version, I think this is not really nice because:

it needs every contributor to know this, if we really want this we should have a pre-commit hook or something that checks this as the DX is now quite bad: you push a change and your PR is gonna fail immediately every actually linting your code (the linter does not even lint if the version bump is wrong)
if you have 3 PRs that you want to test on each other, you need to merge the first, bump the version on the two others, merge the second, bump the version on the last, merge. This seems like a way to complex system and will people cause to send hughe PRs instead of smaller digestible ones.
it's not how we have worked so far in lagoon, currently contributors just send us PRs and the maintainers then decide in which version something goes in. With the current active system in lagoon-charts it's almost he contributors saying in which version it goes in, if the maintainers want to add it to another version they have to actually change code

because of all of this we added check-version-increment: false as this then just lints the code and does not enforce a version bump.
The version bump itself is then enforced inside the release workflow:

initially we added another linting step to the release workflow, but had to realize that this is not meant to be used that way (as the helm/chart-testing-action action first checks if there are changes and only then starts to lint (see https://github.com/helm/chart-testing-action/blob/master/ct.sh#L39-L44), while this works in a PR during a branch build there are no changes (ct list-changed returns nothing) and so the linter actually will never do anything, so we removed it again.
then though we realized that helm/chart-releaser-action actually also fails if we try to release a version that already exists and will not overwrite any releases. While this is also not a great DX (would be much better to actually tell why exactly it failed) it's better than nothing
we then also tried to change the release process all together from release on each main branch push to a release on creating a new tag. Unfortunately this also doesn't work as the helm/chart-releaser-action is meant to run on a branch, it checks first if we have changes that are not tagged yet and only then will pack the charts and create a new tag by itself. As we though just created a tag manually it will decide to not do anything 🤦 and never release anything.

so my plan was to have this:

PRs are linted, not version checked
lagoon-charts maintainers are merging in PRs they want to release (release build will fail as the version bump did not happen yet)
when the maintainers feel they are ready to release a new version, they commit directly a version bump into the main branch and the release charts

Overall I think that process could work, the only sad thing are the failed release builds on step 2

Roll the SSH subchart into lagoon-core

It's not much use as a subchart and will simplify the architecture.

In future once uselagoon/lagoon#1820 is closed, it can be moved into lagoon-remote.

TODO:

add graphql endpoint var
use lagoon-core tag
remove special helper template for subchart architecture

lagoon-logging: haproxy log routing

The tag on the openshift haproxy router input is a prefix, not the full tag name.

So the filter/match blocks need .** added.

Add monitoring for rabbitmq

Need to know about stuck queues etc.

Remove update-push-images.sh from lagoon-remote chart

In k8s we do not need anymore to push to the Openshift registry
https://github.com/amazeeio/lagoon/blob/main/images/docker-host/update-push-images.sh
so we can remove it from Deployment chart https://github.com/uselagoon/lagoon-charts/blob/main/charts/lagoon-remote/templates/docker-host.deployment.yaml#L46

This will solve the following issue in the k8s clusters.

go-crond: failed cronjob: cmd:/lagoon/cronjob.sh "/update-push-images.sh" out:2020-11-10T15:15:00Z CRONJOB: /update-push-images.sh
++ cat /var/run/secrets/kubernetes.io/serviceaccount/namespace
+ NAMESPACE=lagoon
+ docker -H localhost info
+ docker login -u=serviceaccount --password-stdin registry.lagoon.svc:5000
Error response from daemon: Get http://registry.lagoon.svc:5000/v2/: dial tcp: lookup registry.lagoon.svc on 172.20.0.10:53: no such host
 err:exit status 1 time:166.646926m

Upgrade lagoon-logging to use logging-operator 3.6.0

This has some changes in the CRDs, so our templates need some minor tweaks: https://github.com/banzaicloud/logging-operator/releases/tag/3.6.0

Review sensitive values on deployments

Anything sensitive should be in a secret (see e.g. #101).

lagoon-logging: haproxy double logging

Some of the unparseable logs from haproxy are the pre-SNI connection logs which should be discarded. There are some other low-level haproxy entries that aren't required too.

Tighten podSecurityContexts across all charts

Almost nothing needs to be running as root.

Add and configure controllerhandler image for Lagoon 1.9.0

Now Lagoon 1.9.0 has been released, it adds a controllerhandler service to lagoon-core https://github.com/amazeeio/lagoon/tree/master/services/controllerhandler

set API_HOST correctly for services

https://github.com/amazeeio/lagoon/blob/master/node-packages/commons/src/api.ts#L76 has a default API_HOST of api:3000 - this will need to be customised with lagoon-core-api

Migrate Helm charts using stable/* to new homes

As per https://www.cncf.io/blog/2020/10/07/important-reminder-for-all-helm-users-stable-incubator-repos-are-deprecated-and-all-images-are-changing-location/

We'll have to find the replacement helm repos for the charts we currently access from the stable/* repos - currently docker-registry and stable/nfs-server-provisioner in some situations.

Add HPA for all scalable services

API has one already, other services need the same thing.

Add monitoring for docker-host

For the docker-host we should monitoring we should take following metrics into consideration

Diskspace used on the dockerhost
Inodes available on the dockerhost

Add the remaining test suites to lagoon-test

Currently we only run features-kubernetes in the lagoon-test test suite.

We should also run:

nginx
drupal
active-standby-kubernetes

Integrate harbor with test-suite workflow

The current test-suite workflow uses an unauthenticated registry for expediency.

This workflow needs to be updated to use harbor as the registry, as we do in production.

lagoon-core: support installing with managed mariadb, maybe even with MariaDBConsumer of dbaas

I'm thinking how we could install lagoon-core with the MariaDBConsumer of the https://github.com/amazeeio/dbaas-operator
as running databases in pods is not the very best system (not scalable, not HA, etc.) it would be much nicer to have a managed Database for it.

It's gonna be a bit tricky though to do this, after some research I found the lookup system of helm where we could lookup the credentials inside an MariaDBConsumer object, but then we need to create the MariaDBConsumer before all other resources. pre-install hooks of helm could do this, but they don't have really a "wait until the credentials are filled" so this might cause some issues).
Therefore I created amazeeio/dbaas-operator#23 which would have dbaas create a secret which we then could mount in the api and keycloak services (and in my understanding k8s would not try to start the pod until the secret exists).

of course we can also in an initial phase make it all manual: just pass host, user, db, pass of the mariadb credentials in the helm file, happy to have this as a first solution

Move master branch to main

As per uselagoon/lagoon#2139

Add serviceaccount for lagoon-core ssh service

Needs:

view and exec into arbitrary pods
disable toggle for ssh serviceaccount

lagoon-logging: end2end tests and ansible service broker create high amount of indexes

on our openshift clusters we have an end2end test, which tests the creation of a new project, deploys a pod and removes the namespace again. This runs every 15min and creates 24*4 namespaces every day, therefore 96 indexes for the container logs:

container-logs-e2e-1594921999-4f5dk34jlba7lmvmjnc2hslnum_bi-_-unknown_environmenttype-_-2020.07 
container-logs-e2e-1595660042-m45wplgchbg5jgioiabnaf36ya_bi-_-unknown_environmenttype-_-2020.07
container-logs-e2e-1595105636-vggrvpciebffvefnrowsuquz3q_bi-_-unknown_environmenttype-_-2020.07

the same is for the ansible servicebroker which creates new namespaces as well:

container-logs-lagoon-dbaas-mariadb-apb-prov-nqr5n_bi-_-unknown_environmenttype-_-2020.07
container-logs-lagoon-dbaas-mariadb-apb-prov-gsdxw_bi-_-unknown_environmenttype-_-2020.07
container-logs-lagoon-dbaas-mariadb-apb-depr-gsdxw_bi-_-unknown_environmenttype-_-2020.07

in lagoon we captured them here:
https://github.com/amazeeio/lagoon/blob/8128a4db750a61fcda62e6d5206be6c4865506e9/services/logs2logs-db/pipeline/router-application-logs.conf#L68-L72
https://github.com/amazeeio/lagoon/blob/8128a4db750a61fcda62e6d5206be6c4865506e9/services/logs-forwarder/.lagoon.single.yml#L194-L211

can we implement something similar in lagoon-logging again? It keeps our elasticsearch cluster small.

I'm aware that this is maybe just a special think for amazee.io related cluster and might not be something that would be good for every lagoon installation, maybe we can do a similar thing to excludeNamespaces from the loggingOperator also for the logs-dispatcher?

use rabbitmq helm chart for broker

Use upstream main branch images for lagoon services

We shouldn't need to override any images with lagoon-core in:

Instead we can use main once the changes are merged into main upstream, and then eventually remove the custom tags and bump the chart app-version once the relevant changes make it into a tagged release.

lagoon-remote chart missing README

The lagoon-remote chart needs a README with a brief overview and usage instructions.

Add pod anti-affinity to HA service in lagoon-core chart

Replace AIO logo with Lagoon logo in Chart.yaml

add prometheus exporter for docker-host

Move this repository to uselagoon

https://github.com/uselagoon

Missing environment variables in lagoon-core

#88 and #89 need to be reverted since these variables are actually needed.

#88 causes this error in the test suite:

+++ docker tag ci-features-control-k8s-lagoon-type-override-node registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest
+++ echo 'docker push registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest'
++ '[' -f /kubectl-build-deploy/lagoon/push ']'
++ parallel --retries 4
The push refers to repository [registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node]
Get https://registry.lagoon.svc:5000/v2/: dial tcp: lookup registry.lagoon.svc on 10.96.0.10:53: no such host

#89 causes this error in the test suite:

TASK [LAGOON TYPE OVERRIDE CONTROL-K8S - POST api deployEnvironmentBranch with target git branch lagoon-type-override and project ci-features-control-k8s (no sha) to http://lagoon-core-api:80/graphql] ***
Tuesday 29 September 2020  07:07:10 +0000 (0:00:06.310)       0:00:16.877 ***** 
ok: [localhost] => {
    "msg": "api response: {u'data': {u'deployEnvironmentBranch': u'Error: request to http://api:3000/graphql failed, reason: getaddrinfo ENOTFOUND api api:3000'}}"
}

In general I've been pretty careful about only adding the environment variables which are strictly required so any variable removal from the charts will generally require a corresponding change in Lagoon itself. If anything there will be environment variables missing from the charts 😄

For these specific variables (REGISTRY, API_HOST), they are accessed in the common node packages here:
https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/api.ts#L76
https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/tasks.ts#L92

And then imported by various services like this:
https://github.com/amazeeio/lagoon/blob/main/services/api/src/gitlab-sync/groups.ts#L3

Add this repository to helm hub

Only once the lagoon charts go 1.0.

See https://github.com/helm/hub

lagoon-core, lagoon-remote: add CI which also installs/runs lagoon-test suite against any changes in a PR

Add some more helm tests to lagoon-core

Every deployment/statefulset should have some kind of test.

In addition we should test optional features like:

HPAs
...?

Run tests in parallel

Relies on helm/helm#7763

Consider publishing opendistro-es chart in this repository

The chart's git repository is here, but there's no helm repo (opendistro-for-elasticsearch/opendistro-build#294).

Use upstream image for chart-testing-action

The new release should have all the required functionality - no need for a custom image.

Add global imagepull secret for all charts

Images are pulled from docker hub and the new rate-limiting means we need to authenticate or risk getting rate limited

Lint YAML in CI

https://github.com/karancode/yamllint-github-action

Refactor values of kubeRBACProxy in lagoon-remote

The sidecar shouldn't be a top-level value.

haproxy router-logs are not matched

I tried to deploy the lagoon-logging to an openshift cluster and it seems to be unable to match the actual haproxy logs, there are quite some errors:

2020-07-27 19:20:56 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:20:56 haproxy[45]: 45.32.195.186:53120 [27/Jul/2020:19:20:56.785] fe_sni~ be_edge_http:spiriva-com-master:nginx/pod:nginx-4-x7p5h:nginx:10.1.6.217:8080 0/0/1/1/+2 401 +251 - - ---- 14/5/1/1/0 0/0 {nginx-spiriva-com-master.lagoon.bi.amazee.io|Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)} \"GET / HTTP/1.1\""
2020-07-27 19:20:57 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:20:57 haproxy[45]: 18.194.77.136:13943 [27/Jul/2020:19:20:57.183] fe_sni~ be_edge_http:xxx-bi-hubnext-com-master:xxxx.bi-hubnext.com/pod:node-7-njj6h:node:10.1.3.231:3000 0/0/2/429/+431 200 +308 - - ---- 14/5/1/1/0 0/0 {xxx.bi-hubnext.com|} \"GET /api/v1/validate/email?opu=FR&[email protected] HTTP/1.1\""
2020-07-27 19:21:06 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:21:06 haproxy[45]: 107.170.227.24:43122 [27/Jul/2020:19:21:06.608] fe_sni~ be_edge_http:drupal-sandbox-tcs-master:nginx/pod:nginx-4-pbkjd:nginx:10.1.9.238:8080 0/0/1/1/+2 401 +251 - - ---- 6/1/1/1/0 0/0 {nginx-drupal-sandbox-tcs-master.lagoon.bi.amazee.io|Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)} \"GET / HTTP/1.1\""

looks like the regex might be wrong?

uselagoon / lagoon-charts Goto Github PK

lagoon-charts's People

Contributors

Stargazers

Watchers

Forkers

lagoon-charts's Issues

Recommend Projects

Recommend Topics

Recommend Org