Coder Social home page Coder Social logo

redkubes / otomi-core Goto Github PK

View Code? Open in Web Editor NEW
2.2K 30.0 166.0 225.66 MB

Application Platform for Linode Kubernetes Engine (and any other conformant K8s)

Home Page: https://otomi.io

License: Apache License 2.0

Shell 4.13% Smarty 20.25% Dockerfile 0.62% Open Policy Agent 4.05% Mustache 52.94% Python 0.42% TypeScript 16.80% JavaScript 0.73% Makefile 0.06%
kubernetes paas developer-selfservice devops self-hosted gitops platform-engineering

otomi-core's Introduction



Application Platform for Linode Kubernetes Engine

Releases Build status Last commit License

Contributions Website otomi.io

APL Console

Getting started

Helm

To install APL, make sure to have a Kubernetes cluster running with at least:

  • Version 1.27, 1.28 or 1.29
  • A node pool with at least 8 vCPU and 16GB+ RAM (more resources might be required based on the activated capabilities)
  • Calico CNI installed (or any other CNI that supports K8s network policies)
  • A default storage class configured
  • When using the custom provider, make sure the K8s LoadBalancer Service created by APL can obtain an external IP (using a cloud load balancer or MetalLB)

Note

The transition from Otomi to APL is still in progress. Installing APL will use the latest Otomi release (v2.11.5).

Tip

Install APL with DNS to unlock it's full potential. Check here for more info.

Add the Helm repository:

helm repo add apl https://linode.github.io/apl-core/
helm repo update

and then install the Helm chart:

helm install apl apl/otomi \
--set cluster.name=$CLUSTERNAME \
--set cluster.provider=$PROVIDER # use 'linode' for LKE or 'custom' for any other cloud/infrastructure

When the installer job is completed, follow the activation steps.

Integrations

Core Applications (that are always installed):

  • Istio: The service mesh framework with end-to-end transit encryption
  • Argo CD: Declarative Continuous Deployment
  • Keycloak: Identity and access management for modern applications and services
  • Cert Manager - Bring your own wildcard certificate or request one from Let's Encrypt
  • Nginx Ingress Controller: Ingress controller for Kubernetes
  • External DNS: Synchronize exposed ingresses with DNS providers
  • Tekton Pipeline: K8s-style resources for declaring CI/CD pipelines
  • Tekton Triggers: Trigger pipelines from event payloads
  • Tekton dashboard: Web-based UI for Tekton Pipelines and Tekton Triggers
  • Gitea: Self-hosted Git service
  • Cloudnative-pg: Open source operator designed to manage PostgreSQL workloads

Optional Applications (that you can activate to compose your ideal platform):

  • Velero: Back up and restore your Kubernetes cluster resources and persistent volumes
  • Knative: Deploy and manage serverless workloads
  • Drone: Continuous integration platform built on Docker
  • Prometheus: Collecting container application metrics
  • Grafana: Visualize metrics, logs, and traces from multiple sources
  • Grafana Loki: Collecting container application logs
  • Harbor: Container image registry with role-based access control, image scanning, and image signing
  • Kyverno: Kubernetes native policy management
  • Jaeger: End-to-end distributed tracing and monitor for complex distributed systems
  • Kiali: Observe Istio service mesh relations and connections
  • Minio: High performance Object Storage compatible with Amazon S3 cloud storage service
  • Trivy: Kubernetes-native security toolkit
  • Falco: Cloud Native Runtime Security
  • Grafana Tempo: High-scale distributed tracing backend
  • OpenTelemetry: Instrument, generate, collect, and export telemetry data to help you analyze your software’s performance and behavior
  • Paketo build packs: Cloud Native Buildpack implementations for popular programming
  • Kaniko: Build container images from a Dockerfile

Documentation

Check out otomi.io for more detailed documentation.

License

APL is licensed under the Apache 2.0 License.

otomi-core's People

Contributors

0-sv avatar ani1357 avatar bartusz01 avatar ben10k avatar caslubbers avatar dennisvankekem avatar dependabot[bot] avatar diabhey avatar dunky13 avatar eldermatt avatar ferruhcihan avatar githubcdr avatar j-zimnowoda avatar k7o avatar k8sbee avatar leiarenee avatar martijncalker avatar merll avatar mojtabaimani avatar morriz avatar oshah97 avatar panpan0000 avatar rawc0der avatar renovate-bot avatar renovate[bot] avatar srodenhuis avatar staticvoid255 avatar tre7roja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

otomi-core's Issues

chart resource checkup & tuning

Most charts have preconfigured sane resource specifications, but we need to find out which don't have sane values, and which ones have none.

Team service will get a LimitRange as fallback, but we really want to tune all of our own workloads.

Add AAD Pod Identity (Azure) and Kube2IAM (AWS)

Both NS and BCT have asked to provide access to k8s applications to specific cloud resources (databases) based on role based access. To support this feature we need to integrate/support:

AAD Pod Identity for Azure
Kube2IAM for AWS

Team namespace

  • ingress
  • network policies
  • opa policies
  • istio authorization policies
  • knative services (if container given)

Team namespace: knative services

System input:

  • docker image info
  • https domain
  • certArn?

System output:

  • running app on domain
  • cert generated by cert-manager? (if not certArn)

Enhance OPA policies

We want our opa policies to also limit access to the following resources:

  • istio stuff
  • ResourceQuota
  • ?

Azure Monitor in Grafana

User story

As an Azure user, I want to use Azure Monitor, so I can see Azure related metrics and logs.

Ideally, this is in the issue title, but if not, you can put it here. If so, delete this section.

Acceptance criteria

  • View datasource in Grafana and see that Azure Monitor is there
  • View Dashboards in Grafana and see that the following dashboards are there:
    • azure monitor
    • azure appgw
    • azure mariadb
    • azure redis

Tasks

If relevant, you can make a checklist for tasks.

    • Add datasource and make configurable
    • Import dashboards into stack and make configurable

Implement dashboards for admins

Admin:

Landing on service dashboard

Top menu with items:

  1. Team list:
  • list of teams, with create/edit/delete leading to

1.1 Team:

  • name
  • password (used for multitenancy proxies, should become generated)
  • oidc details
  • base domain
  • list of services, create/edit/delete leading to

1.1.1 Team services:

  • name (used for url creation)
  • service toggle:
    • k8s svc (predeployed k8s service):
      • name
      • port
    • docker image:
      • location:
      • pull secret
      • semver to deploy automatically
  • domain
  • certArn

Customer CRD

Should reflect what a customer has chosen:

  • one or more projects (results in namespaces)
  • shared services (in {custId}-shared namespace

App CRD

Results:

App CRD which results in:

  1. Autoscaled knative service from docker image / git repo
  2. Stateful service connection from Service Catalog (redis, mongo, mysql)

TODO:

  1. Research & choose: OAM (with dapr & rudr) or CNAB?

Additional otomi-stack tests

Pipeline tests for dev cluster only:

  1. lint (already there)
  2. diff

Add additionalConfigs target in prometheus-operator for prom-blackbox-exporter

Team dashboards

Features for team dashboards:

  • Not 'Service Index', but 'Team [$TEAM_NAME] dashboard' (when dashboard is for a specific team)
  • Not 'Service Index', but Cluster Admin Dashboard (when dashboard is for admins with full cluster scope)
  • Otomi Stack logo on page

Other:
Grafana only showing some dashboards relevant to the team (no cluster resources like nodes, no k8s resources)

Creating GKE cluster script has dependency on directory that I mounted from Otomi-values repo

➜  otomi-stack git:(master) ✗ ./bin/create-gke-cluster.sh                                                                                                  (⎈ gks_otomi-cloud_europe-west4_otomi-gke-dev:default)
bin/env.sh: line 5: cd: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env: No such file or directory
ERROR: The value of CLOUD env must be one of the following: bin charts helmfile.d helmfile.tpl k8s test tests tools values
bin/env.sh: line 21: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env/google/dev.sh: No such file or directory
WARNING: From 1.14, legacy Stackdriver GKE logging is deprecated. Thus, flag `--enable-cloud-logging` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
WARNING: From 1.14, legacy Stackdriver GKE monitoring is deprecated. Thus, flag `--enable-cloud-monitoring` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
ERROR: (gcloud.container.clusters.create) could not parse resource []
ERROR: (gcloud.container.clusters.get-credentials) argument --region: expected one argument
Usage: gcloud container clusters get-credentials NAME [optional flags]
  optional flags may be  --help | --internal-ip | --region | --zone

For detailed information on this command and its flags, run:
  gcloud container clusters get-credentials --help

otomi-stack-api can be deployed via helm chart

The Pod configuration:

  1. have initContainer that:
  • pulls git repo with otomi-stack and stores in EmptyDir volume
  • prepares .kube and stores in EmptyDir volume
  1. The otomi-stack-api container should
  • mount otomi-stack volume
  • mount kube volume
  • use env from ConfigMap

The ConfigMap:

  • PORT

A service by default is not exposed to the public domain.

EXAMPLE

Add isExposed field:

teamConfig:
  teams:
    otomi:
      name: otomi
      services:
        - name: hello
          isPublic: false # does not need oauth2 sso
          isExposed: false # Service is not going to be exposed
          domain: custom.doma.in
          hasCert: true 
        - name: hello2
          isPublic: false # does not need oauth2 sso
          isExposed: true # service is going to be exposed 
          domain: custom.doma.in
          hasCert: true 

Adding this feature involves changes in the following files:

  • make conditional adding service to Nginx-ingress incharts/team-ns/templates/nginx-ingress.yaml
  • make conditional adding hosts field for VirtualService in
    charts/team-ns/templates/istio-virtualservices.yaml
  • make conditional adding host to ingressgateway in charts/team-ns/templates/istio-gateway.yaml

Moreover customer values migration will be needed for already exposed services!

Need to talk to @Morriz about above changes.

Upgrade istio

Upgrade istio with:

  1. SDS for Azure (Azure needs feature flag to be able to use secret projection)
  2. full mtls: figure out what is now not working with mtls correctly

network policies in charts/team-ns

We need to limit the following for team namespaces:

  • no cross namespace traffic except to "shared" namespace
  • disable all egress, but only if a more specific egress rule for apps can be deployed later, resulting in allowed egress to targets specified in a service config's egressTargets

Unable to redeploy Drone

Reason:

Events:
  Type     Reason              Age   From                                         Message
  ----     ------              ----  ----                                         -------
  Normal   Scheduled           3m    default-scheduler                            Successfully assigned team-admin/drone-server-75bd9df5bc-bkwwq to aks-agentpool1-36062263-vmss000004
  Warning  FailedAttachVolume  3m    attachdetach-controller                      Multi-Attach error for volume "pvc-44c5170c-0995-495a-81eb-98f550c56da9" Volume is already used by pod(s) drone-server-78bfd656cc-sg4qx
  Warning  FailedMount         57s   kubelet, aks-agentpool1-36062263-vmss000004  Unable to mount volumes for pod "drone-server-75bd9df5bc-bkwwq_team-admin(648cf00b-92f0-4bd2-bcac-7bb2f8409b52)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-75bd9df5bc-bkwwq". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-g7jdk istio-envoy istio-certs]

Workaround:

Delete old drone ReplicaSet:

k -n team-admin get rs drone-<old-rs>

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m
➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Drone leaves unterminated pods

➜  istio-operator k -n drone-pipelines get po --show-labels                                                                                                                                                 (⎈ aks-elemenz-ota-admin:default)
NAME                         READY   STATUS        RESTARTS   AGE    LABELS
drone-1eff6bd9z1lz3q3bur1v   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=7,io.drone.name=drone-1eff6bd9z1lz3q3bur1v,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4amhypaz59acbew602xn   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=6,io.drone.name=drone-4amhypaz59acbew602xn,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4ozf776086btrsy16oj3   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=4,io.drone.name=drone-4ozf776086btrsy16oj3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-9kv76nkffsywtlvx6t2k   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-9kv76nkffsywtlvx6t2k,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-b8oys2tdykp77vuq1she   2/7     Terminating   4          5d5h   io.drone.build.event=push,io.drone.build.number=25,io.drone.name=drone-b8oys2tdykp77vuq1she,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-csmtayx4oi13u380qtni   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=2,io.drone.name=drone-csmtayx4oi13u380qtni,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-omtr6s7w48angst7h5g3   2/5     Terminating   2          19d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-omtr6s7w48angst7h5g3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-oz79r874kkw26z3k0qem   2/7     Terminating   4          9d     io.drone.build.event=push,io.drone.build.number=14,io.drone.name=drone-oz79r874kkw26z3k0qem,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-uab7pw1virhpaf6e1t1f   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=8,io.drone.name=drone-uab7pw1virhpaf6e1t1f,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true

Drone pods stuck in terminating state

Describe the bug

➜  otomi-stack git:(master) ✗ kap                                                                                                                                                     (⎈ otomi-aks-dev-admin:default)
NAMESPACE           NAME                                                      READY   STATUS        RESTARTS   AGE
drone-pipelines     drone-12ny47i784r9wmssru66                                4/7     Terminating   2          8d
drone-pipelines     drone-cn0falbrifj4kanetbk1                                3/7     Terminating   2          8d
drone-pipelines     drone-hlj1pvowss6eyfd1ha5t                                4/7     Terminating   2          8d
drone-pipelines     drone-kql4jpfqxnvb8s2cxz3q                                3/7     Terminating   2          8d
drone-pipelines     drone-o3uwwjm5z0009g8gxcsy                                4/7     Terminating   2          8d
drone-pipelines     drone-ov3c9pc8ynykr4atoi4p                                4/7     Terminating   2          8d
drone-pipelines     drone-ytz1e3ow1w00guko80k5                                3/7     Terminating   2          8d

create post-deploy script

create post-deploy script to reboot some pods (preferably when conditions match, like when team is added):

  • istio-pilot.istio-system: done in drone pipeline
  • loki-0.monitoring (investigate if it's needed)

Execution of hfp command sometimes fails with error

Describe the bug
Execution of hfp command sometimes fails with error

Get https://34.90.25.94/api/v1/namespaces/tillerless/secrets?labelSelector=NAME%3Dweave-scope%2COWNER%3DTILLER: error executing access token command "/Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=

It might be related to old token stored in ~/.kube/config , since I am able to fix it by calling kap command.

To Reproduce
It usually happens if I am not using this command by 1H

Expected behavior
It always works :)

Add online form to admin dashboard to submit support tickets

When a customer uses otomi stack, they will always get support. We can provide a form (automatically configured with the correct customer information (like customer, support level, clustername, et cetera) that can be used to submit tickets to us

Analysis

  • Find out connectivity with zoho desk

The prometheus-operator-prometheus-node-exporter resource limits changes with each deplyment

monitoring, prometheus-operator-prometheus-node-exporter, DaemonSet (apps) has changed:
  # Source: prometheus-operator/charts/prometheus-node-exporter/templates/daemonset.yaml
  apiVersion: apps/v1
  kind: DaemonSet
  metadata:
    name: prometheus-operator-prometheus-node-exporter
    namespace: monitoring
    labels:     
      app: prometheus-node-exporter
      heritage: Helm
      release: prometheus-operator
      chart: prometheus-node-exporter-1.8.1
      jobLabel: node-exporter
  spec:
    selector:
      matchLabels:
        app: prometheus-node-exporter
        release: prometheus-operator
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
    template:
      metadata:
        labels:         
          app: prometheus-node-exporter
          heritage: Helm
          release: prometheus-operator
          chart: prometheus-node-exporter-1.8.1
          jobLabel: node-exporter
      spec:
        serviceAccountName: prometheus-operator-prometheus-node-exporter
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        containers:
          - name: node-exporter
            image: "quay.io/prometheus/node-exporter:v0.18.1"
            imagePullPolicy: IfNotPresent
            args:
              - --path.procfs=/host/proc
              - --path.sysfs=/host/sys
              - --web.listen-address=0.0.0.0:9100
              - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
              - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
            ports:
              - name: metrics
                containerPort: 9100
                protocol: TCP
            livenessProbe:
              httpGet:
                path: /
                port: 9100
            readinessProbe:
              httpGet:
                path: /
                port: 9100
            resources:
-             limits:
-               cpu: 200m
-               memory: 50Mi
-             requests:
-               cpu: 100m
-               memory: 30Mi
+             {}
            volumeMounts:
              - name: proc
                mountPath: /host/proc
                readOnly:  true
              - name: sys
                mountPath: /host/sys
                readOnly: true
        hostNetwork: true
        hostPID: true
        tolerations:
          - effect: NoSchedule
            operator: Exists
        volumes:
          - name: proc
            hostPath:
              path: /proc
          - name: sys
            hostPath:
              path: /sys

A customer can upgrade otomi-stack

Prerequisites:

  • A customer has its own repo that contains only values
  • values has appVersion
  • FluxCD with Redis backend (link)

Upgrade scenario:

  • an otomi-stack version is released as a docker image
  • the scanner in otomi-api detects it and patches drone with the new STACK_VERSION env var. Redis client listens (pub-sub) to new image deployments detected by FluxCD.
  • the pipeline needs to be changed to use the new STACK_VERSION and the next values COMMIT will trigger the pipeline to deploy the new API potentially an otomi-api is released as a new docker image
  • drone pulls a new otomi-stack image
  • drone upgrades otomi-api
  • otomi-api sees that there is appVersion mismatch and perform values upgrade
  • drone deploys new stack after values upgrade

Create Demo for Mediawiki application (with Knative)

For a (potential) customer, I would like to give a demo for the following use case:

  • they have a private Gitlab CI instance
  • they manage their own images (for media wiki instances) based on a mediawiki base image
  • We demonstrate the deployment of a new app. This results in a Knative deployment and a URL to access the app is provided.
  • We demonstrate how to update (add plug-ins and PHP extensions) the image and redeploy the updated image
  • Requires a MySQL instance

Add cluster auto scaler to be used on AKS (Azure)

It seems the cluster auto scaling feature for AKS (still in preview) is not working. We need to make sure the cluster auto scaler add-on can be used instead.

Also test the auto-scaling feature on both AKS and EKS

Slack tuning

Now we get too many slack issues, which we can resolve:

CRIT:

  • Kubelet down: do we need to monitor this? Is it only an active rule? Can we disable this?

NON CRIT:

  • KubeQuotaExceeded: limit can be removed once we tuned OPA to not allow teams to edit their quota. (see #19)

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m
➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

Additional context
Killing pods does not help

Add / support Promitor (Azure)

In our AKS setup we utilise an App Gateway in combination with a WAF feature. BCT has asked if logs from the WAF can be showed in a Grafana dashboard.

Maybe https://promitor.io/ can be a solution. Can we do a small test to see if we can get relevant logs/metrics out of Azure into a Grafana dashboard. We would like to have a single plane of glass for all metrics/logs

Cannot redeploy drone

Events:
  Type     Reason              Age    From                                         Message
  ----     ------              ----   ----                                         -------
  Normal   Scheduled           3m24s  default-scheduler                            Successfully assigned team-admin/drone-server-78bfd656cc-bdl7f to aks-agentpool1-23650041-vmss000004
  Warning  FailedAttachVolume  3m24s  attachdetach-controller                      Multi-Attach error for volume "pvc-c4ce7cc3-25c4-4129-b8bd-66e3a674c0bc" Volume is already used by pod(s) drone-server-5cdf564c56-vtl8p
  Warning  FailedMount         81s    kubelet, aks-agentpool1-23650041-vmss000004  Unable to mount volumes for pod "drone-server-78bfd656cc-bdl7f_team-admin(491871ef-fab4-4bcd-9925-7b323ae764d3)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-78bfd656cc-bdl7f". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-gw7cn istio-envoy istio-certs]

Template to deploy WordPress container with MySQL database

Add template to template repository to create a WordPress setup with:

  1. a Wordpress image (multiple versions)
  2. a MySQL database (with persistent storage)
  3. persistent file storage (multiple tiers)
  • fast (SSD)
  • regular (HDD)
  1. backup plan (file and db)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.