Coder Social home page Coder Social logo

radix-flux's Introduction

Repository structure

The Git repository contains the following top directories:

  • clusters directory contains the Flux configuration per cluster.
  • components directory contains all components deployed to the cluster with base configuration.
├── clusters
│   │
│   ├── c2-production
│   │   ├── (flux-system)
│   │   ├── overlay
│   │   ├── healthChecks.yaml
│   │   ├── kustomization.yaml
│   │   └── postBuild.yaml
│   │
│   ├── development
│   │   ├── (flux-system)
│   │   ├── overlay
│   │   ├── healthChecks.yaml
│   │   ├── kustomization.yaml
│   │   └── postBuild.yaml
│   │
│   ├── monitoring
│   │   ├── (flux-system)
│   │   ├── overlay
│   │   └── kustomization.yaml
│   │
│   ├── playground 
│   │   ├── (flux-system)
│   │   ├── overlay
│   │   ├── healthChecks.yaml
│   │   ├── kustomization.yaml
│   │   └── postBuild.yaml
│   │
│   └── production
│       ├── (flux-system)
│       ├── overlay
│       ├── healthChecks.yaml
│       ├── kustomization.yaml
│       └── postBuild.yaml
│
└── components
    ├── flux
    ├── third-party
    └── radix-platform

Clusters

Flux system

The flux-system directory underneath parent folder clusters is created and managed by Flux.

Overlay

In Radix we want separate configurations per cluster. In order to achieve this we use Flux overlays which override the configuration defined in the components directory. The overlay directory has the same structure as the components directory, but contains only files for the resources to be overridden. The files then need to be included in the kustomization.yaml file in the cluster environment directory.

For example, radix-operator uses cluster-specific configuration which requires overriding the helm release. To do that, the kustomization found in the overlay for the radix-operator helmRelease has set clusterName: ${ACTIVE_CLUSTER}. The variable is substituted by Flux with the key found in postBuild.yaml.

# file: clusters/development/overlay/radix-platform/radix-operator/radix-operator.yaml

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: radix-operator
  namespace: flux-system
spec:
  patches:
    - patch: |-
        apiVersion: helm.toolkit.fluxcd.io/v2
        kind: HelmRelease
        metadata:
          name: radix-operator
          namespace: default
        spec:
          values:
            activeClusterName: ${ACTIVE_CLUSTER} # Set in postBuild development
# file: clusters/development/postBuild.yaml

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: flux-system
  namespace: flux-system
spec:
  postBuild:
    substitute:
        ACTIVE_CLUSTER: cluster-1

The radix-operator kustomization file needs to be included in the kustomization.yaml file.

# file: clusters/development/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1
kind: Kustomization
resources:
- ./overlay/radix-platform/radix-operator/radix-operator.yaml

We patch the flux-system Kustomization with cluster environment specific configuration. To make it clear which parts of the configuration is changed, we have separate files for separate fields. For example, we use postBuild.yaml to patch the postBuild spec of flux-system Kustomization, and healthChecks.yaml to patch the healthChecks spec.

# file: clusters/development/postBuild.yaml

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: flux-system
  namespace: flux-system
spec:
  postBuild:
    substitute:
      RADIX_ZONE: dev # dev | playground | prod
      RADIX_ENVIRONMENT: dev # dev | prod
      radix_acr_repo_url: radixdev.azurecr.io
    substituteFrom:
      - kind: ConfigMap
        name: radix-flux-config
# file: clusters/development/healthChecks.yaml

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: flux-system
  namespace: flux-system
spec:
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: velero
      namespace: velero

Automatic image updates

Flux v2 can automatically track container registries for new pushed versions of container images and update the Flux configuration repository to upgrade the configured components, deployed to the cluster. To set up automatic image updates, three components are required.

ImageRepository

The ImageRepository defines the container registry where Flux should look for new versions of container images. If the container registry is private and requires authentication, a secret can be defined which contains credentials to access it.

# file: components/radix-platform/radix-operator/imageRepo.yaml

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
  name: radix-operator
  namespace: flux-system
spec:
  image: radixdev.azurecr.io/radix-operator
  interval: 1m0s
  secretRef:
    name: radix-docker

imagePolicy

The imagePolicy resource specifies how Flux will identify the latest container image scanned from the imageRepository. The policy spec specifies whether the latest image is found by a SemVer range or by alphabetical or numberical sorting. If the image tag contains a timestamp, the timestamp can be filtered and extracted using the filterTags spec.

# file: components/radix-platform/radix-operator/imagePolicy.yaml

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
  name: radix-operator
  namespace: flux-system
spec:
  filterTags:
    pattern: ^master-[a-f0-9]+-(?P<ts>[0-9]+)
    extract: $ts
  imageRepositoryRef:
    name: radix-operator
  policy:
    numerical:
      order: asc

imageUpdateAutomation

The imageUpdateAutomation resource specifies which Git repository and branch Flux should write image updates to. The Git repository should be the Flux configuration repository. It can be configured to commit directly to an existing branch, or to commit to a new branch. A GitHub workflow can be used to automatically create a Pull Request with the updated versions. The commit message can be customized to include information about the image updates. When Flux searches the imageRepository and finds a tag that is newer than the one in the Flux configuration repository, it will use the imageUpdateAutomation to commit the changes to a branch in the repository.

# file: components/flux/imageUpdateAutomation.yaml

apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: radix-dev-acr-auto-update
  namespace: flux-system
spec:
  interval: 1m0s
  sourceRef:
    kind: GitRepository
    name: flux-system
  git:
    checkout:
      ref:
        branch: master
    commit:
      author:
        email: [email protected]
        name: FluxBot
      messageTemplate: '{{range .Updated.Images}}{{println .}}{{end}}'
    push:
      branch: flux-image-updates
  update:
    path: ${FLUX_CONFIG_PATH}
    strategy: Setters

The path spec specifies a directory with files, which is scanned regularly by Flux to find the values which should be changed. This prevents Flux from updating the overlay in all cluster environment configurations. The variable is defined in the postBuild.yaml file for the cluster. Flux identifies the values to change by looking for an "image policy marker" which contains the name and namespace of the imagePolicy. The marker also specifies whether it is the name or the tag which is the value.

  • {"$imagepolicy": "namespace:radix-operator"} resolves to radixdev.azurecr.io/radix-operator:master-a5e880b9-1634484632
  • {"$imagepolicy": "namespace:radix-operator:name"} resolves to radixdev.azurecr.io/radix-operator
  • {"$imagepolicy": "namespace:radix-operator:tag"} resolves to master-a5e880b9-1634484632
# file: clusters/development/overlay/radix-platform/radix-operator/radix-operator.yaml

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: flux-system
  namespace: flux-system
spec:
  postBuild:
    substitute:
      RADIX_OPERATOR_TAG: master-a5e880b9-1634484632 # {"$imagepolicy": "flux-system:radix-operator:tag"}

kustomization.yaml

In each of the cluster environment directories and component sub-directories, there is a kustomization.yaml file which specifies which resources should be deployed.

Components

All components are defined in the components directory with base configuration, which is the configuration that is common for all cluster environments such as the helm repository and namespace. The helm release of components are also defined here, but only with the values common for all cluster environments; the rest being set by the overlay.

In each component directory there is also a kustomization.yaml file which defines the resources in that directory which are to be deployed. This enables specifying only the path to the directory in the kustomization.yaml file in the cluster environment directory, rather than specifying each file in the directory separately. The kustomization.yaml file acts like an index.

Want to contribute? Read our contributing guidelines


Security notification

radix-flux's People

Contributors

anneliawa avatar christertime avatar emirgens avatar github-actions[bot] avatar ingeknudsen avatar joakimhagen avatar jonaspetersorensen avatar keaaa avatar magnus-longva-bouvet avatar mhorvat-no avatar nilsgstrabo avatar oterno avatar oyron avatar pespiri avatar richard87 avatar satr avatar sondresjolyst avatar stianovrevage avatar sveinpj avatar thezultimate avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

radix-flux's Issues

Waiting for Blob-CSI Driver Update to Fix Truncated Data Issue with Blobfuse v2.1.1 or Higher

A fix for this issue is released in Blobfuse v2.1.1 or higher. However, this version of Blobfuse will only be available with a newer version of the Blob-CSI driver. We are currently waiting for this new release.

We are currently running:
blob-csi: 1.23.2
blobfuse2: 2.1.0

links:
blob-csi-driver blobfuse2 version: https://github.com/kubernetes-sigs/blob-csi-driver/blob/c53028ea6b024c4d1c1fea435edbbee3a83e9af3/deploy/csi-blob-node.yaml#L61
fix for truncated files: Azure/azure-storage-fuse#1142

Set wildcard records in prod and C2

  • Add wildcard records. Cluster-specific, active-cluster and app alias
  • Delete existing custom CNAMES, e.g. console.radix.equinor.com
  • Patch radix-flux and remove external-dns
  • Remove external-dns HelmRelease
  • Remove existing external-dns managed records
  • Remove unused external-dns manifests in radix-flux

Running containers as root user should be avoided [Security]

Containers shouldn't run as root users in your Kubernetes cluster. Running a process as the root user inside a container runs it as root on the host. If there's a compromise, an attacker has root in the container, and any misconfigurations become easier to exploit.

  • Fix the security context for Tekton init container
  • Fix or make exception for Velero

Manual remediation:

  1. From the Unhealthy resources tab, select the cluster. Defender for Cloud lists the relevant pods.
  2. For these pods, ensure the runAsUser property is set to a non-zero value or set property runAsNonRoot=true.
  3. After making your changes, redeploy the pod with the updated rule.

Make 3'rd party and Radix components to be run on ARM nodes

Strategy to decide:

  • Nodeselector (simple)
  • affinity / antiaffinity

Components

  • azure-service-operator
  • blob-csi-driver
  • cert-manager
  • external-secrets-operator
  • grafana
  • ingress-nginx
  • keda
  • kube-prometheus-stack
  • kubernetes-replicator
  • kured
  • (radix)prometheus-guard DEV #2169
  • (radix)prometheus-guard All envs
  • equinor/radix-acr-cleanup#83
  • equinor/radix-cicd-canary#205
  • radix-cluster-cleanup
  • radix-cost-allocation
  • radix-operator
  • radix-vulnerability-scanner
  • tekton-pipelines
  • velero
  • workload-identity-webhook

Upgrade flux to v2.3.0 in all enviroments

API changes
https://github.com/fluxcd/flux2/releases/tag/v2.3.0

Features and improvements:
https://fluxcd.io/blog/2024/05/flux-v2.3.0/

Cool feature:
flux reconcile helmrelease <release> --reset
flux reconcile helmrelease <release> --force

Installing or upgrading Flux
To upgrade the APIs, make sure the new Custom Resource Definitions and controllers are deployed, and then change the manifests in Git:

  1. Set apiVersion: helm.toolkit.fluxcd.io/v2beta2 in the YAML files that contain HelmRelease definitions.
  2. Set apiVersion: notification.toolkit.fluxcd.io/v1beta3 in the YAML files that contain Alert and Provider definitions.
  3. Set apiVersion: image.toolkit.fluxcd.io/v1beta2 in the TAML files that contain ImageRepository, ImagePolicy and ImageUpdateAutomation
  4. Commit, push and reconcile the API version changes.

Bumping the APIs version in manifests can be done gradually. It is advised to not delay this procedure as the deprecated versions will be removed after 6 month

Change Velero to run containers as non-root user

Running containers as root user should be avoided
Containers shouldn't run as root users in your Kubernetes cluster. Running a process as the root user inside a container runs it as root on the host. If there's a compromise, an attacker has root in the container, and any misconfigurations become easier to exploit.
#velero

Update velero helm 3.2.0 -> 4.0.1

#################################################################################

BREAKING: The config values passed contained no longer accepted
options. See the messages below for more details.
To verify your updated config is accepted, you can use
the helm template command.

#################################################################################

ERROR: Please make .configuration.backupStorageLocation from map to slice

ERROR: Please make .configuration.volumeSnapshotLocation from map to slice

REMOVED: .configuration.provider has been removed, instead each backupStorageLocation and volumeSnapshotLocation has a provider configured

warning: Upgrade "velero" failed: post-upgrade hooks failed: unable to build kubernetes object for post-upgrade hook velero/templates/backupstoragelocation.yaml: error validating "": error validating data: [ValidationError(BackupStorageLocation.spec.objectStorage): missing required field "bucket" in io.velero.v1.BackupStorageLocation.spec.objectStorage, ValidationError(BackupStorageLocation.spec): missing required field "provider" in io.velero.v1.BackupStorageLocation.spec]

Upgrade Flux to newer version (From 0.32 to v0.40.2)

Note that v0.40.0 contained breaking changes
https://github.com/fluxcd/flux2/releases/tag/v0.40.0

The autologin flags (--aws-autologin-for-ecr, --gcp-autologin-for-gcr and --azure-autologin-for-acr) have been deprecated to bring the Image API closer to the Source API, where cloud provider contextual login is configured at object level with .spec.provider. Usage of these flags will result in a logged error. Please update all the ImageRepository manifests that require contextual login with the new field .spec.provider and the appropriate cloud provider value; aws, gcp, or azure. Refer the docs for more details and examples.

Configure Immutable (read-only) root filesystem for 3rd party components

Containers should run with a read only root file system in your Kubernetes cluster. Immutable filesystem protects containers from changes at run-time with malicious binaries being added to PATH.

Components with issues

Improve move custom ingress action performance

example on slow action run: move custom ingress run 23
example on unreachable cluster: move custom ingress run 20

  • Check if monitor addon is enabled before enabling
  • Exit action if cluster is unreachable
  • Exit action if kubelogin failed
  • fix Waiting for AAD role to propagate

Perm:

  • 2023-05-11T12:32:09.0652279Z Message: The client 'dd4dd75c-6e56-4c2b-9404-e76d2c29c67f' with object id 'dd4dd75c-6e56-4c2b-9404-e76d2c29c67f' does not have authorization or an ABAC condition not fulfilled to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/**_/resourceGroups/clusters/providers/Microsoft.ContainerService/managedClusters/weekly-19/providers/Microsoft.Authorization/roleAssignments/c25cdbcc-5afc-4a57-899f-01b1de71f50e' or the scope is invalid. If access was recently granted, please refresh your credentials.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.