uselagoon / lagoon-charts Goto Github PK
View Code? Open in Web Editor NEWA collection of Helm charts for Lagoon and associated services.
License: Apache License 2.0
A collection of Helm charts for Lagoon and associated services.
License: Apache License 2.0
Currently scaling won't work since there's only a single RWO PVC associated with the deployment.
So I think we need to find a better chart release process than we have today.
As it is right now with check-version-increment
set to the default of true (see: 2331cbd) each PR needs to bump the version to the next version, I think this is not really nice because:
because of all of this we added check-version-increment: false
as this then just lints the code and does not enforce a version bump.
The version bump itself is then enforced inside the release workflow:
helm/chart-testing-action
action first checks if there are changes and only then starts to lint (see https://github.com/helm/chart-testing-action/blob/master/ct.sh#L39-L44), while this works in a PR during a branch build there are no changes (ct list-changed
returns nothing) and so the linter actually will never do anything, so we removed it again.helm/chart-releaser-action
actually also fails if we try to release a version that already exists and will not overwrite any releases. While this is also not a great DX (would be much better to actually tell why exactly it failed) it's better than nothingmain
branch push to a release on creating a new tag. Unfortunately this also doesn't work as the helm/chart-releaser-action
is meant to run on a branch, it checks first if we have changes that are not tagged yet and only then will pack the charts and create a new tag by itself. As we though just created a tag manually it will decide to not do anything ๐คฆ and never release anything.so my plan was to have this:
Overall I think that process could work, the only sad thing are the failed release builds on step 2
It's not much use as a subchart and will simplify the architecture.
In future once uselagoon/lagoon#1820 is closed, it can be moved into lagoon-remote.
TODO:
The tag on the openshift haproxy router input is a prefix, not the full tag name.
So the filter
/match
blocks need .**
added.
Need to know about stuck queues etc.
In k8s we do not need anymore to push to the Openshift registry
https://github.com/amazeeio/lagoon/blob/main/images/docker-host/update-push-images.sh
so we can remove it from Deployment chart https://github.com/uselagoon/lagoon-charts/blob/main/charts/lagoon-remote/templates/docker-host.deployment.yaml#L46
This will solve the following issue in the k8s clusters.
go-crond: failed cronjob: cmd:/lagoon/cronjob.sh "/update-push-images.sh" out:2020-11-10T15:15:00Z CRONJOB: /update-push-images.sh
++ cat /var/run/secrets/kubernetes.io/serviceaccount/namespace
+ NAMESPACE=lagoon
+ docker -H localhost info
+ docker login -u=serviceaccount --password-stdin registry.lagoon.svc:5000
Error response from daemon: Get http://registry.lagoon.svc:5000/v2/: dial tcp: lookup registry.lagoon.svc on 172.20.0.10:53: no such host
err:exit status 1 time:166.646926m
This has some changes in the CRDs, so our templates need some minor tweaks: https://github.com/banzaicloud/logging-operator/releases/tag/3.6.0
Anything sensitive should be in a secret (see e.g. #101).
Some of the unparseable logs from haproxy are the pre-SNI connection logs which should be discarded. There are some other low-level haproxy entries that aren't required too.
Almost nothing needs to be running as root.
Now Lagoon 1.9.0 has been released, it adds a controllerhandler service to lagoon-core https://github.com/amazeeio/lagoon/tree/master/services/controllerhandler
https://github.com/amazeeio/lagoon/blob/master/node-packages/commons/src/api.ts#L76 has a default API_HOST of api:3000 - this will need to be customised with lagoon-core-api
We'll have to find the replacement helm repos for the charts we currently access from the stable/* repos - currently docker-registry and stable/nfs-server-provisioner in some situations.
API has one already, other services need the same thing.
For the docker-host we should monitoring we should take following metrics into consideration
Currently we only run features-kubernetes
in the lagoon-test
test suite.
We should also run:
nginx
drupal
active-standby-kubernetes
The current test-suite
workflow uses an unauthenticated registry for expediency.
This workflow needs to be updated to use harbor as the registry, as we do in production.
I'm thinking how we could install lagoon-core with the MariaDBConsumer of the https://github.com/amazeeio/dbaas-operator
as running databases in pods is not the very best system (not scalable, not HA, etc.) it would be much nicer to have a managed Database for it.
It's gonna be a bit tricky though to do this, after some research I found the lookup system of helm where we could lookup the credentials inside an MariaDBConsumer
object, but then we need to create the MariaDBConsumer
before all other resources. pre-install hooks of helm could do this, but they don't have really a "wait until the credentials are filled" so this might cause some issues).
Therefore I created amazeeio/dbaas-operator#23 which would have dbaas create a secret which we then could mount in the api and keycloak services (and in my understanding k8s would not try to start the pod until the secret exists).
of course we can also in an initial phase make it all manual: just pass host, user, db, pass of the mariadb credentials in the helm file, happy to have this as a first solution
As per uselagoon/lagoon#2139
Needs:
on our openshift clusters we have an end2end test, which tests the creation of a new project, deploys a pod and removes the namespace again. This runs every 15min and creates 24*4 namespaces every day, therefore 96 indexes for the container logs:
container-logs-e2e-1594921999-4f5dk34jlba7lmvmjnc2hslnum_bi-_-unknown_environmenttype-_-2020.07
container-logs-e2e-1595660042-m45wplgchbg5jgioiabnaf36ya_bi-_-unknown_environmenttype-_-2020.07
container-logs-e2e-1595105636-vggrvpciebffvefnrowsuquz3q_bi-_-unknown_environmenttype-_-2020.07
the same is for the ansible servicebroker which creates new namespaces as well:
container-logs-lagoon-dbaas-mariadb-apb-prov-nqr5n_bi-_-unknown_environmenttype-_-2020.07
container-logs-lagoon-dbaas-mariadb-apb-prov-gsdxw_bi-_-unknown_environmenttype-_-2020.07
container-logs-lagoon-dbaas-mariadb-apb-depr-gsdxw_bi-_-unknown_environmenttype-_-2020.07
in lagoon we captured them here:
https://github.com/amazeeio/lagoon/blob/8128a4db750a61fcda62e6d5206be6c4865506e9/services/logs2logs-db/pipeline/router-application-logs.conf#L68-L72
https://github.com/amazeeio/lagoon/blob/8128a4db750a61fcda62e6d5206be6c4865506e9/services/logs-forwarder/.lagoon.single.yml#L194-L211
can we implement something similar in lagoon-logging again? It keeps our elasticsearch cluster small.
I'm aware that this is maybe just a special think for amazee.io related cluster and might not be something that would be good for every lagoon installation, maybe we can do a similar thing to excludeNamespaces
from the loggingOperator also for the logs-dispatcher?
We shouldn't need to override any images with lagoon-core
in:
Instead we can use main
once the changes are merged into main
upstream, and then eventually remove the custom tags and bump the chart app-version once the relevant changes make it into a tagged release.
The lagoon-remote
chart needs a README with a brief overview and usage instructions.
#88 and #89 need to be reverted since these variables are actually needed.
#88 causes this error in the test suite:
+++ docker tag ci-features-control-k8s-lagoon-type-override-node registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest
+++ echo 'docker push registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node:latest'
++ '[' -f /kubectl-build-deploy/lagoon/push ']'
++ parallel --retries 4
The push refers to repository [registry.lagoon.svc:5000/ci-features-control-k8s/lagoon-type-override/node]
Get https://registry.lagoon.svc:5000/v2/: dial tcp: lookup registry.lagoon.svc on 10.96.0.10:53: no such host
#89 causes this error in the test suite:
TASK [LAGOON TYPE OVERRIDE CONTROL-K8S - POST api deployEnvironmentBranch with target git branch lagoon-type-override and project ci-features-control-k8s (no sha) to http://lagoon-core-api:80/graphql] ***
Tuesday 29 September 2020 07:07:10 +0000 (0:00:06.310) 0:00:16.877 *****
ok: [localhost] => {
"msg": "api response: {u'data': {u'deployEnvironmentBranch': u'Error: request to http://api:3000/graphql failed, reason: getaddrinfo ENOTFOUND api api:3000'}}"
}
In general I've been pretty careful about only adding the environment variables which are strictly required so any variable removal from the charts will generally require a corresponding change in Lagoon itself. If anything there will be environment variables missing from the charts ๐
For these specific variables (REGISTRY
, API_HOST
), they are accessed in the common node packages here:
https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/api.ts#L76
https://github.com/amazeeio/lagoon/blob/main/node-packages/commons/src/tasks.ts#L92
And then imported by various services like this:
https://github.com/amazeeio/lagoon/blob/main/services/api/src/gitlab-sync/groups.ts#L3
Only once the lagoon charts go 1.0.
Every deployment/statefulset should have some kind of test.
In addition we should test optional features like:
Relies on helm/helm#7763
The chart's git repository is here, but there's no helm repo (opendistro-for-elasticsearch/opendistro-build#294).
The new release should have all the required functionality - no need for a custom image.
Images are pulled from docker hub and the new rate-limiting means we need to authenticate or risk getting rate limited
The sidecar shouldn't be a top-level value.
I tried to deploy the lagoon-logging to an openshift cluster and it seems to be unable to match the actual haproxy logs, there are quite some errors:
2020-07-27 19:20:56 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:20:56 haproxy[45]: 45.32.195.186:53120 [27/Jul/2020:19:20:56.785] fe_sni~ be_edge_http:spiriva-com-master:nginx/pod:nginx-4-x7p5h:nginx:10.1.6.217:8080 0/0/1/1/+2 401 +251 - - ---- 14/5/1/1/0 0/0 {nginx-spiriva-com-master.lagoon.bi.amazee.io|Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)} \"GET / HTTP/1.1\""
2020-07-27 19:20:57 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:20:57 haproxy[45]: 18.194.77.136:13943 [27/Jul/2020:19:20:57.183] fe_sni~ be_edge_http:xxx-bi-hubnext-com-master:xxxx.bi-hubnext.com/pod:node-7-njj6h:node:10.1.3.231:3000 0/0/2/429/+431 200 +308 - - ---- 14/5/1/1/0 0/0 {xxx.bi-hubnext.com|} \"GET /api/v1/validate/email?opu=FR&[email protected] HTTP/1.1\""
2020-07-27 19:21:06 +0000 [warn]: #0 [in_router_openshift] failed to parse message data="<142>Jul 27 19:21:06 haproxy[45]: 107.170.227.24:43122 [27/Jul/2020:19:21:06.608] fe_sni~ be_edge_http:drupal-sandbox-tcs-master:nginx/pod:nginx-4-pbkjd:nginx:10.1.9.238:8080 0/0/1/1/+2 401 +251 - - ---- 6/1/1/1/0 0/0 {nginx-drupal-sandbox-tcs-master.lagoon.bi.amazee.io|Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)} \"GET / HTTP/1.1\""
looks like the regex might be wrong?
Most pods probably don't need them.
After some discussion this is going to work better with a looser coupling.
This service is a workaround for uselagoon/lagoon#2133. Once that issue is fixed, the service name should be defined via an environment variable and the hard-coded keycloak
service removed.
We currently have a lot of complex test logic in https://github.com/uselagoon/lagoon-charts/blob/main/.github/workflows/lint-test.yaml
It would be nice to move that out to a Makefile
or similar so it's easy to run locally.
There are currently a lot of wide and overlapping permissions that possibly could be tightened up.
e.g. currently lagoon-core has a serviceaccount for the api and the broker.
It should be possible to use jsonpath
or go-template
output instead, and cut a few steps out of the test-suite
workflow.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.