Coder Social home page Coder Social logo

frank-at-suse / vsphere_ha_autoscale_cluster Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 4.0 54 KB

Terraform plan for creating an HA, autoscaled multi-node RKE2 cluster on VMware vSphere

License: Mozilla Public License 2.0

HCL 100.00%
autoscaler high-availability kubernetes linux rancher-server rancher2 rke2 terraform vmware-vsphere

vsphere_ha_autoscale_cluster's Introduction

RKE2 Cluster with Autoscaling & API Server HA

Rancher Terraform Kubernetes

Reason for Being

This Terraform plan is for creating a multi-node RKE2 cluster in vSphere with machine pool autoscaling via upstream K8s Cluster Autoscaler & API Server HA via a kube-vip DaemonSet manifest - both of these are common asks and bring our cluster some "cloud-provider-like" behaviors in the comfort of our own datacenter.

Environment Prerequisites

  • Functional Rancher Management Server with vSphere Cloud Credential

  • vCenter >= 7.x and credentials with appropriate permissions (see https://github.com/rancher/barn/blob/main/Walkthroughs/vSphere/Permissions/README.md)

  • Virtual Machine Hardware Compatibility at Version >= 15

  • Create the following in the files/ directory:

    NAME PURPOSE
    .rancher-api-url URL for Rancher Management Server
    .rancher-bearer-token API bearer token generated via Rancher UI
    .ssh-public-key SSH public key for additional OS user
  • Since this plan leverages BGP for K8s Control Plane load balancing, a router capable of BGP is required. For lab/dev/test use, a small single-CPU Linux VM running BIRD v2 daemon (sudo apt install bird2) with the following config would suffice:

protocol bgp kubevip {
        description "kube-vip for Cluster CP";
        local <router eth interface IP address> as 64513;
        neighbor range <network prefix of Control Plane subnet> as <AS value configured in kube-vip manifest>;
        graceful restart;
        ipv4 {
                import filter {accept;};
                export filter {reject;};
        };
        dynamic name "kubeVIP";
}

Caveats

The cluster_autoscaler.tf plan includes the following values in ExtraArgs:

skip-nodes-with-local-storage: false
skip-nodes-with-system-pods: false

Those exist here to make the autoscaler logic more easily demonstrable and should be used with caution in production or any other environment you care about, as they could incur data loss or workload instability.


The lifecycle block in cluster.tf is somewhat fragile:

lifecycle {
  ignore_changes = [
    rke_config[0].machine_pools[1].quantity
  ]
}

Starting from the [0] value, Terraform processes indices lexicographically - the "worker" pool is machine_pools[1] and "ctl_plane" pool is machine_pools[0] for no other reason than "worker" comes after "ctl_plane" from a dictionary perspective. Due to this, if the "ctl_plane" pool were to be renamed something like "x_ctl_plane", the incorrect machine pool would occupy the machine_pools[1] index, causing undesired behavior. To prevent this, basic variable validation is in place that forces MachinePool names to begin with ctl-plane and worker otherwise the below error will be thrown:

Err: MachinePool names must begin with 'ctl-plane' for Control Plane Node Pool & 'worker' for Autoscaling Worker Node
Pool.

To Run

terraform apply

Node pool min/max values are annotations that can be adjusted with the rancher_env.autoscale_annotations variable. Changing these values on a live cluster will not trigger a redeploy. Any nodes in the autoscaled pool selected for scale down and/or deletion will have a Taint applied that is visible in the Rancher UI:

autoscaler

Tested Versions

SOFTWARE VERSION DOCS
K8s Cluster Autoscaler 1.26.2 https://github.com/kubernetes/autoscaler/tree/master/charts/cluster-autoscaler#readme
kube-vip 0.6.2 https://kube-vip.io/docs/
Rancher Server 2.7.6 https://rancher.com/docs/rancher/v2.6/en/overview
Rancher Terraform Provider 3.1.1 https://registry.terraform.io/providers/rancher/rancher2/latest/docs
RKE2 1.26.8+rke2r1 https://docs.rke2.io
Terraform 1.4.6 https://www.terraform.io/docs
vSphere 8.0.1.00300 https://docs.vmware.com/en/VMware-vSphere/index.html

vsphere_ha_autoscale_cluster's People

Contributors

frank-at-suse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.