Coder Social home page Coder Social logo

rancher-sandbox / cluster-api-provider-harvester Goto Github PK

View Code? Open in Web Editor NEW
21.0 4.0 6.0 821 KB

A Cluster API Infrastructure Provider for Harvester

License: Apache License 2.0

Dockerfile 0.99% Makefile 11.34% Go 77.82% Shell 9.85%
cluster-api harvester infrastructure kubernetes kubevirt provider

cluster-api-provider-harvester's Introduction

cluster-api-provider-harvester

This project has begun as a Hack Week 23 project. It is still in a very early phase. Please do not use in Production.

What is Cluster API Provider Harvester (CAPHV)

The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management.

Cluster API Provider Harvester is Cluster API Infrastructure Provider for provisioning Kubernetes Clusters on Harvester.

At this stage, the Provider has been tested on a single environment, with Harvester v1.2.0 using two Control Plane/Bootstrap providers: Kubeadm and RKE2.

The templates folder contains examples of such configurations.

Getting Started

Cluster API Provider Harvester is compliant with the clusterctl contract, which means that clusterctl simplifies its deployment to the CAPI Management Cluster. In this Getting Started guide, we will be using the Harvester Provider with the RKE2 provider (also called CAPRKE2).

Management Cluster

In order to use this provider, you need to have a management cluster available to you and have your current KUBECONFIG context set to talk to that cluster. If you do not have a cluster available to you, you can create a kind cluster. These are the steps needed to achieve that:

  1. Ensure kind is installed (https://kind.sigs.k8s.io/docs/user/quick-start/#installation)
  2. Create a special kind configuration file if you intend to use the Docker infrastructure provider:
cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: capi-test
nodes:
- role: control-plane
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
EOF
  1. Run the following command to create a local kind cluster:
kind create cluster --config kind-cluster-with-extramounts.yaml
  1. Check your newly created kind cluster :
kubectl cluster-info

and get a similar result to this:

Kubernetes control plane is running at https://127.0.0.1:40819
CoreDNS is running at https://127.0.0.1:40819/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Setting up clusterctl

Before the Harvester provider can be installed with clusterctl, it is necessary to explain to clusterctl where to find it, which repository, which type of provider it is, etc. This can be done by creating or modifying the file $HOME/.cluster-api/clusterctl.yaml, and adding to it the following content:

providers:
  - name: "harvester"
    url: "https://github.com/rancher-sandbox/cluster-api-provider-harvester/releases/latest/components.yaml"
    type: "InfrastructureProvider"

Now, the Harvester and RKE2 providers can be installed with the clusterctl command. In this particular case, our manifests will be using the ResourceSet feature gate for Cluster API, we will need to set the environment variable EXP_CLUSTER_RESOURCE_SET to true before running the clusterctl init command.

export EXP_CLUSTER_RESOURCE_SET=true

$ clusterctl init --infrastructure harvester --control-plane rke2 --bootstrap rke2
Fetching providers
Installing cert-manager Version="v1.14.5"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.7.2" TargetNamespace="capi-system"
Installing Provider="bootstrap-rke2" Version="v0.3.0" TargetNamespace="rke2-bootstrap-system"
Installing Provider="control-plane-rke2" Version="v0.3.0" TargetNamespace="rke2-control-plane-system"
Installing Provider="infrastructure-harvester" Version="v0.1.2" TargetNamespace="caphv-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -

Create a workload cluster

Now, you can test out the provider by generating some YAML and applying it to the above kind cluster. Such YAML templates can be found in ./templates directory. We will be interested here in the RKE2 examples under ./templates. Please be aware that the file cluster-template-rke2-dhcp.yaml is a template with placeholders: it cannot be applied directly to the cluster. You need to generate a valid YAML file first. In order to do that, you need to set the following environment variables:

export CLUSTER_NAME=test-rk # Name of the cluster that will be created.
export HARVESTER_ENDPOINT=x.x.x.x # Harvester Clusters IP Adr.
export NAMESPACE=example-rk # Namespace where the cluster will be created.
export KUBERNETES_VERSION=v1.26.6 # Kubernetes Version
export SSH_KEYPAIR=<public-key-name> # should exist in Harvester prior to applying manifest
export VM_IMAGE_NAME=default/jammy-server-cloudimg-amd64.img # Should have the format <NAMESPACE>/<NAME> for an image that exists on Harvester
export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=2
export VM_DISK_SIZE=40Gi # Put here the desired disk size
export RANCHER_TURTLES_LABEL='' # This is used if you are using Rancher CAPI Extension (Turtles) to import the cluster automatically.
export VM_NETWORK=default/untagged # change here according to your Harvester available VM Networks
export HARVESTER_KUBECONFIG_B64=XXXYYY #Full Harvester's kubeconfig encoded in Base64. You can use: cat kubeconfig.yaml | base64
export CLOUD_CONFIG_KUBECONFIG_B64=ZZZZAAA # Kubeconfig generated for the Cloud Provider: https://docs.harvesterhci.io/v1.3/rancher/cloud-provider#deploying-to-the-rke2-custom-cluster-experimental 

NOTE: The CLOUD_CONFIG_KUBECONFIG_B64 variable content should be the result of the script available here -- meaning, the generated kubeconfig -- encoded in BASE64.

Now, we can generate the YAML using the following command:

clusterctl generate yaml --from https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/main/templates/cluster-template-rke2.yaml > harvester-rke2-clusterctl.yaml

After examining the resulting YAML file, you can apply it to the management cluster:

kubectl apply -f harvester-rke2-clusterctl.yaml

You should see the following output:

namespace/example-rk created
cluster.cluster.x-k8s.io/test-rk created
harvestercluster.infrastructure.cluster.x-k8s.io/test-rk-hv created
secret/hv-identity-secret created
rke2controlplane.controlplane.cluster.x-k8s.io/test-rk-control-plane created
rke2configtemplate.bootstrap.cluster.x-k8s.io/test-rk-worker created
machinedeployment.cluster.x-k8s.io/test-rk-workers created
harvestermachinetemplate.infrastructure.cluster.x-k8s.io/test-rk-wk-machine created
harvestermachinetemplate.infrastructure.cluster.x-k8s.io/test-rk-cp-machine created
clusterresourceset.addons.cluster.x-k8s.io/crs-harvester-ccm created
clusterresourceset.addons.cluster.x-k8s.io/crs-harvester-csi created
clusterresourceset.addons.cluster.x-k8s.io/crs-calico-chart-config created
configmap/cloud-controller-manager-addon created
configmap/harvester-csi-driver-addon created
configmap/calico-helm-config created

Checking the workload cluster:

After a while you should be able to check functionality of the workload cluster using clusterctl:

clusterctl describe cluster -n example-rk test-rk

and once the cluster is provisioned, it should look similar to the following:

NAME                                                     READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/test-rk                                          True                     7h35m
├─ClusterInfrastructure - HarvesterCluster/test-rk-hv
├─ControlPlane - RKE2ControlPlane/test-rk-control-plane  True                     7h35m
│ └─3 Machines...                                        True                     7h45m  See test-rk-control-plane-dmrg5, test-rk-control-plane-jkdrb, ...
└─Workers
  └─MachineDeployment/test-rk-workers                    True                     7h46m
    └─2 Machines...                                      True                     7h46m  See test-rk-workers-jwjdg-sz7qk, test-rk-workers-jwjdg-vxgbx

cluster-api-provider-harvester's People

Contributors

belgaied2 avatar dependabot[bot] avatar patricklaabs avatar richardcase avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cluster-api-provider-harvester's Issues

The NetworkName is static and cannot be changed

First thank you for this capi for harvester provider.

As you can see here the NetworkName for new VMs is hardcoded to vlan1 which makes it impossible to use a different NetworkName.

At least this should be mentioned in the README as a requirement. 😃

I am currently not really sure if additional networks are used at all, if someone extends that list.

documentation: Adding more documentations for users new to harvester and how to properly set up harvester

Describe the solution you'd like:
Since I am fairly new to harvester, I had some troubles to get it up and running in the first place.

Why do you want this feature:
As a user, I'd like to have a quick and easy to read documentation about how to set up a brand new harvester instance.

Things like:

  • Adding the SSH Key
  • Getting the Kubeconfig of the Harvester Cluster
  • Creating the needed Network

etc.

Anything else you would like to add:
If this is something useful, I'd like to add this kind of documentation to this provider.
I guess it could be helpful for new folks - like me - to get up and running a lot faster and start using this awesome provider 😄

nil pointer exception occurs when a Harvester Kubeconfig has current-context != cluster.name

What happened:
Every time the Identity Secret to authenticate against Harvester has a current-context value doesn't exist in the name keys of the cluster section, the CAPHV controller will fail with nil pointer error.

What did you expect to happen:
I expected that the Controller would continue to reconcile normally without CrashLooping.

How to reproduce it:
Put in the cluster manifests, an identity secret that has different names for current-context and cluster.name and apply manifest to management cluster.

Anything else you would like to add:
Problem is happening at this line in the code:

configCluster := config.Clusters[config.CurrentContext]
.
Server is looked up in the cluster section of the kubeconfig using the current-context value, which might be different than the cluster.name matching it.

More generally, the whole section of the code seems to lack checking against nil values, which might cause the Crash Loop.

Environment:

  • CAPHV v0.1.4

Bug: Harvester-Cloud Provider - failed to get instance metadata

What happened:
When deploying a 3-ControlPlane & 2 Worker Node Cluster on Harvester the harverster-cloud-provider in Version (v0.2.0)
reports following errors:

I0609 10:04:39.804475       1 node_controller.go:415] Initializing node test-rk-workers-qrm4l-j6hkz with cloud provider

E0609 10:04:39.805118       1 node_controller.go:229] error syncing 'test-rk-workers-qrm4l-j6hkz': failed to get instance metadata for node test-rk-workers-qrm4l-j6hkz: an empty namespace
 may not be set when a resource name is provided, requeuing

W0609 10:05:10.554930       1 reflector.go:347] github.com/rancher/lasso/pkg/cache/cache.go:145: watch of *v1.VirtualMachineInstance ended with: an error on the server ("unable to decode
an event from the watch stream: stream error: stream ID 261; INTERNAL_ERROR; received from peer") has prevented the request from succeeding

What did you expect to happen:

How to reproduce it:

Anything else you would like to add:

I also checked the provided kubeconfig 'cloud-config' for the harvester cloud provider.
This one is correct and may not be the cause of this issue.

Environment:

  • rke provider version: 1.1.2
  • rke controlplane & bootstrap provider: 0.2.2
  • OS (e.g. from /etc/os-release):

New cluster using Talos is not progressing beyond Machines in Provisioning stage.

What happened:
[A clear and concise description of what the bug is.]

The cluster is not coming up, Harvester Loadbalancer is not created, machines never leave provisioning state.
The machines is provisioned in harvester, gets IP from my network. I can attach a console to them. Though its Talos so its not much you get in return.

Screenshot of console of one of the talos cp vms:

Screenshot 2024-06-06 232557

caph-provider logs:

 ERROR   failed to patch HarvesterMachine        {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "7ec120a6-8a1e-40b1-98dd-3597ce44ca1c", "machine": "cluster-capi-mgmt-p-01/capi-mgmt-p-01-7shhp", "cluster": "cluster-capi-mgmt-p-01/capi-mgmt-p-01", "error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value", "errorCauses": [{"error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value"}]}
github.com/rancher-sandbox/cluster-api-provider-harvester/controllers.(*HarvesterMachineReconciler).Reconcile.func1
        /workspace/controllers/harvestermachine_controller.go:121
github.com/rancher-sandbox/cluster-api-provider-harvester/controllers.(*HarvesterMachineReconciler).Reconcile
        /workspace/controllers/harvestermachine_controller.go:198
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
2024-06-06T19:58:10Z    ERROR   Reconciler error        {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "7ec120a6-8a1e-40b1-98dd-3597ce44ca1c", "error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value", "errorCauses": [{"error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value"}]}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
  1. These two log entries keeps going.
 2024-06-06T19:58:10Z    INFO    Reconciling HarvesterMachine ...        {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "dc815768-5306-42cc-91c0-be802d85bc82"}
2024-06-06T19:58:10Z    INFO    Waiting for ProviderID to be set on Node resource in Workload Cluster ...       {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "dc815768-5306-42cc-91c0-be802d85bc82", "machine": "cluster-capi-mgmt-p-01/capi-mgmt-p-01-7shhp", "cluster": "cluster-capi-mgmt-p-01/capi-mgmt-p-01"}

capt-controller-manager logs:

I0606 19:58:08.737945       1 taloscontrolplane_controller.go:176] "controllers/TalosControlPlane: successfully updated control plane status" namespace="cluster-capi-mgmt-p-01" talosControlPlane="capi-mgmt-p-01" cluster="capi-mgmt-p-01"
I0606 19:58:08.739615       1 controller.go:327] "Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="b0b79408-8a41-43df-91ef-07fe7d36fa7c"
E0606 19:58:08.739746       1 controller.go:329] "Reconciler error" err="at least one machine should be provided" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="b0b79408-8a41-43df-91ef-07fe7d36fa7c"
I0606 19:58:08.749008       1 taloscontrolplane_controller.go:189] "reconcile TalosControlPlane" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="c37dc309-f8fb-42c7-a375-5faceb9019b9" cluster="capi-mgmt-p-01"
I0606 19:58:09.190175       1 scale.go:33] "controllers/TalosControlPlane: scaling up control plane" Desired=3 Existing=1
I0606 19:58:09.213294       1 taloscontrolplane_controller.go:152] "controllers/TalosControlPlane: attempting to set control plane status"
I0606 19:58:09.220900       1 taloscontrolplane_controller.go:564] "controllers/TalosControlPlane: failed to get kubeconfig for the cluster" error="failed to create cluster accessor: error creating client for remote cluster \"cluster-capi-mgmt-p-01/capi-mgmt-p-01\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://10.0.0.113:6443/api/v1?timeout=10s\": tls: failed to verify certificate: x509: certificate is valid for 10.0.0.3, 127.0.0.1, ::1, 10.0.0.5, 10.53.0.1, not 10.0.0.113"

cabpt-talos-bootstrap(I dont know if this is relevant):

I0606 19:58:09.206570       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.224117       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.243118       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.280372       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.341804       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.352557       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.439369       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.480714       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.539945       1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.548156       1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-n48cx"
I0606 19:58:09.717884       1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-n48cx"
I0606 19:58:09.720944       1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-7shhp"
I0606 19:58:09.756344       1 talosconfig_controller.go:223] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4/owner-name=capi-mgmt-p-01-n48cx: ignoring an already ready config"
I0606 19:58:09.765995       1 secrets.go:243] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4/owner-name=capi-mgmt-p-01-n48cx: updating talosconfig" endpoints=null secret="capi-mgmt-p-01-talosconfig"

What did you expect to happen:
I expected that the caph provider created the LB and proceeded on creating the cluster.

How to reproduce it:

I added the providers for talos (boostrap and controlplane) and ofcourse the harvester provider.

Added 4 files + the harvester secret with the following configuration:

cluster.yaml:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: capi-mgmt-p-01
  namespace: cluster-capi-mgmt-p-01
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
        - 172.16.0.0/20
    services:
      cidrBlocks:
        - 172.16.16.0/20
    serviceDomain: cluster.local
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
    kind: TalosControlPlane
    name: capi-mgmt-p-01
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: HarvesterCluster
    name: capi-mgmt-p-01

harvester-cluster.yaml:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterCluster
metadata:
  name: capi-mgmt-p-01
  namespace: cluster-capi-mgmt-p-01
spec:
  targetNamespace: cluster-capi-mgmt-p-01
  loadBalancerConfig:
    ipamType: pool
    ipPoolRef: k8s-api
  server: https://10.0.0.3
  identitySecret: 
    name: trollit-harvester-secret
    namespace: cluster-capi-mgmt-p-01

harvester-machinetemplate.yaml:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterMachineTemplate
metadata:
  name: capi-mgmt-p-01
  namespace: cluster-capi-mgmt-p-01
spec:
  template: 
    spec:
      cpu: 2
      memory: 8Gi
      sshUser: ubuntu
      sshKeyPair: default/david
      networks:
      -  cluster-capi-mgmt-p-01/capi-mgmt-network
      volumes:
      - volumeType: image 
        imageName: harvester-public/talos-1.7.4-metalqemu
        volumeSize: 50Gi
        bootOrder: 0

controlplane.yaml:

apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
  name: capi-mgmt-p-01
  namespace: cluster-capi-mgmt-p-01
spec:
  version: "v1.30.0"
  replicas: 3
  infrastructureTemplate:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
    kind: HarvesterMachineTemplate
    name: capi-mgmt-p-01
  controlPlaneConfig:
    controlplane:
      generateType: controlplane
      talosVersion: v1.7.4
      configPatches:
        - op: add
          path: /cluster/network
          value:
            cni:
              name: none

        - op: add
          path: /cluster/proxy
          value:
            disabled: true

        - op: add
          path: /cluster/network/podSubnets
          value:
            - 172.16.0.0/20

        - op: add
          path: /cluster/network/serviceSubnets
          value:
            - 172.16.16.0/20

        - op: add
          path: /machine/kubelet/extraArgs
          value:
            cloud-provider: external

        - op: add
          path: /machine/kubelet/nodeIP
          value:
            validSubnets:
              - 10.0.0.0/24

        - op: add
          path: /cluster/discovery
          value:
            enabled: false

        - op: add
          path: /machine/features/kubePrism
          value:
            enabled: true

        - op: add
          path: /cluster/apiServer/certSANs
          value:
            - 127.0.0.1

        - op: add
          path: /cluster/apiServer/extraArgs
          value:
            anonymous-auth: true

Anything else you would like to add:

I have tried to switch the Loadbalancer config from dhcp to ipPoolRef, and set a pre-configured ippool this also did not work. I think its related to that the LB is never provisioned in the first place.

[Miscellaneous information that will assist in solving the issue.]

Environment:

  • talos controlplane provider version: 0.5.5
  • talos bootstrap provider version: 0.6.4
  • harvester cluster api provider: 0.1.2
  • harvester version installed on my HP server: 1.3.0
  • OS (e.g. from /etc/os-release):

When creating a LB SVC placeholder or a Harvester LB, CAPHV prefixes the name with the namespace without checking its compliance with DNS-1035

What happened:
Tried to create a CAPI cluster in a namespace beginning with a number: 2ax-ns , this should not be prefixed to a Service name, because it does not satisfy the RFC 1035 naming rules. When done, this results in failure to create the LB placeholder, meaning a failure to have the first step of InfrastructureReady for further processing.

What did you expect to happen:
I expected CAPHB to only generate RFC 1035-compliant names, eliminating the risk of failure related to naming.

How to reproduce it:
Define any CAPI cluster in a namespace beginning with a numeric character.

Anything else you would like to add:
During name generation of Harvester objects (LBs, Services, VMs, etc.), CAPHV should make sure the resulting name satisfies to the following constraints:

  • contain at most 63 characters
  • contain only lowercase alphanumeric characters or '-'
  • start with an alphabetic character
  • end with an alphanumeric character

Environment:

  • CAPHV v0.1.4

Bug: Waiting for ProviderID to be set

What happened:
When provisioning a Cluster - following the Guide - to Harvester, the nodes (1x ControlPlane, 2x Worker) are getting created and the LoadBalancer Instance, too.

The CAPHV-Controller keeps telling me:

2024-06-07T08:34:33Z    INFO    Reconciling HarvesterMachine ...    {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "Har
vesterMachine", "HarvesterMachine": {"name":"test-rk-workers-jhfpk-hcsbw","namespace":"example-rk"}, "namespace": "example-rk", "name": "test-rk-workers-jhfpk-hcsbw", "reconcileID": "722c
0c53-b6c4-4ce2-9d9e-928a83056ddd"}

The ControlPlane has become ready, but the workers not.
The ControlPlane does have a ProviderID - when viewing the harvestercluster Ressource.

What did you expect to happen:

I'd did expect a ProviderID being set to the workers, so that my Cluster becomes ready.

How to reproduce it:

Anything else you would like to add:

I also tried to restart the controller pods on my local kind-cluster, but that didn't helped.

I wonder, if this still comes from the Tigera-Operator deployment.
Even if the pod

tigera-operator     tigera-operator-7975bb4546-jqm6f                          ●      1/1       Running

is running and healthy, the pod seems to throws some "errors":

{"level":"info","ts":1717749713.219541,"logger":"controller_apiserver","msg":"Reconciling APIServer","Request.Namespace":"","Request.Name":"default"}

{"level":"info","ts":1717749713.2196012,"logger":"controller_apiserver","msg":"APIServer config not found","Request.Namespace":"","Request.Name":"default"}

Environment:

  • rke provider version: v0.1.2
  • OS (e.g. from /etc/os-release): macOS m1
  • Harvester Version: 1.2.0
  • cluster-api-provider-rke2-bootstrap: v0.3.0
  • cluster-api-provider-rke2-controlplane: v0.3.0

clusterctl init with harvester InfrastructureProvider causes crashloop in caphv-controller pod

What happened:
When installing cluster-api resources on to the bootstrap KinD cluster, the resulting caphv-controller pod will crash-loop without showing any particular error.

What did you expect to happen:
Expected to see a running cluster-api infrastructure in the bootstrap KinD cluster as shown in the README.

How to reproduce it:
Follow the README as described.

Anything else you would like to add:
Have tried on KinD latest and 1.26.3. Also have tried on K3D. vSphere provider works fine.

Environment:

  • rke provider version: default / v0.2.7
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

** Logs of caphv-controller **

2024-03-28T17:25:55Z	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2024-03-28T17:25:55Z	INFO	setup	starting manager
2024-03-28T17:25:55Z	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
2024-03-28T17:25:55Z	INFO	starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0328 17:25:55.701404       1 leaderelection.go:245] attempting to acquire leader lease caphv-system/1e1658d6.cluster.x-k8s.io...
I0328 17:25:55.705868       1 leaderelection.go:255] successfully acquired lease caphv-system/1e1658d6.cluster.x-k8s.io
2024-03-28T17:25:55Z	DEBUG	events	caphv-controller-manager-645bdf8d77-fsvdj_e867e880-4d85-4bc9-8b7e-3bcaabe67677 became leader	{"type": "Normal", "object": {"kind":"Lease","namespace":"caphv-system","name":"1e1658d6.cluster.x-k8s.io","uid":"49415ac9-6a7e-4ce9-9bc4-ae7bd4d2a294","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1172"}, "reason": "LeaderElection"}
2024-03-28T17:25:55Z	INFO	Starting EventSource	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "source": "kind source: *v1alpha1.HarvesterMachine"}
2024-03-28T17:25:55Z	INFO	Starting EventSource	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster", "source": "kind source: *v1alpha1.HarvesterCluster"}
2024-03-28T17:25:55Z	INFO	Starting EventSource	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "source": "kind source: *v1beta1.Machine"}
2024-03-28T17:25:55Z	INFO	Starting EventSource	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "source": "kind source: *v1beta1.Cluster"}
2024-03-28T17:25:55Z	INFO	Starting Controller	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine"}
2024-03-28T17:25:55Z	INFO	Starting EventSource	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster", "source": "kind source: *v1.Secret"}
2024-03-28T17:25:55Z	INFO	Starting Controller	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster"}
2024-03-28T17:25:55Z	INFO	Starting workers	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster", "worker count": 1}
2024-03-28T17:25:55Z	INFO	Starting workers	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "worker count": 1}
2024-03-28T17:27:09Z	INFO	Stopping and waiting for non leader election runnables
2024-03-28T17:27:09Z	INFO	shutting down server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
2024-03-28T17:27:09Z	INFO	Stopping and waiting for leader election runnables
2024-03-28T17:27:09Z	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster"}
2024-03-28T17:27:09Z	INFO	Shutdown signal received, waiting for all workers to finish	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine"}
2024-03-28T17:27:09Z	INFO	All workers finished	{"controller": "harvestercluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterCluster"}
2024-03-28T17:27:09Z	INFO	All workers finished	{"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine"}
2024-03-28T17:27:09Z	INFO	Stopping and waiting for caches
2024-03-28T17:27:09Z	INFO	Stopping and waiting for webhooks
2024-03-28T17:27:09Z	INFO	Wait completed, proceeding to shutdown the manager

Handling of cloud-config for the Harvester Cloud Provider is unnecessarily complex and undocumented

What happened:
The Harvester Cloud Provider needs a Secret to be set in the Workload cluster in order to set the ProviderID of the nodes that will be needed by Cluster API to continue provisioning.

First, the cloud-config is not documented.
Second, the templates are not doing the implementation the same way.

  • The RKE2 template uses a placeholder that is complex to generate for the final user, as it has non-trivial formatting.
  • The Kubeadm template uses the same as the main Harvester Kubeconfig (privileges too large).

What did you expect to happen:
Expected the template to work in a simpler way, for instance, the ${CLOUD_CONFIG_SECRET} should become ${CLOUD_CONFIG_KUBECONFIG_B64} and only have only the data.cloud-config content. That way the user would only need to provide a Base64 string without worrying about formatting.

How to reproduce it:
Any formatting issue in ${CLOUD_CONFIG_SECRET} will cause the provisioning to not function.

The way of referencing existing resources in harvester should work equally.

The ssh KeyPair needs to be in the desired vm namespace.

On the other side the used image can specify a namespace and the image name in the form NAMESPACE/IMAGENAME

export VM_IMAGE_NAME=default/jammy-server-cloudimg-amd64.img # Should have the format <NAMESPACE>/<NAME> for an image that exists on Harvester

Bug: Manifest generation uses a wrong kubeconfig for the cloud-config

What happened:
When running the clusterctl generate cluster --from https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/v0.1.2/templates/cluster-template-rke2.yaml -n ${CLUSTER_NAMESPACE} ${CLUSTER_NAME} > harvester-rke2-clusterctl.yaml the generated Manifest has some weird and wrong kubeconfig for the cloud-config, which will be placed in here:
https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/main/templates/cluster-template-rke2.yaml#L568

One might have to manually overwrite the string to the desired base64 string of the harvester kubeconfig.

What did you expect to happen:

A correct pasting of the base64 encoded kubeconfig

How to reproduce it:

  • Create a kind cluster
  • clusterctl init
  • run clusterctl generate cluster --from https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/v0.1.2/templates/cluster-template-rke2.yaml -n ${CLUSTER_NAMESPACE} ${CLUSTER_NAME} > harvester-rke2-clusterctl.yaml

Anything else you would like to add:
I did not find the cause of this issue adhoc, but I'll dig later into it.

Environment:

  • rke provider version:
  • OS (e.g. from /etc/os-release):

Feature: Improving clusterctl generate command to use already defined env-vars.

Describe the solution you'd like:
Improving the cmd from

clusterctl generate cluster --from https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/v0.1.2/templates/cluster-template-rke2.yaml -n example-rk test-rk > harvester-rke2-clusterctl.yaml

to

clusterctl generate cluster --from https://github.com/rancher-sandbox/cluster-api-provider-harvester/blob/v0.1.2/templates/cluster-template-rke2.yaml -n ${CLUSTER_NAMESPACE} ${CLUSTER_NAME} > harvester-rke2-clusterctl.yaml

Why do you want this feature:

Part of the stream-lining for a user to easier get started with the provider and remove redundancies.

Anything else you would like to add:

Updating the README.md

Describe the solution you'd like:
Updating the current README.md at the root folder to make it easier for a new user to get up and running using the ClusterAPI Provider for Harvester with rk2

Why do you want this feature:

Anything else you would like to add:

  • The Link in the README.md to the samples folder does not work, since it has been renamed from 'samples' to 'templates'.
  • Updating the Manifest generation (The Link is also broken)
  • Updating the needed env-variables

Bug: Tigera-Operator Pod stucks in CrashLoopBackOff

What happened:
When I try to provision a Kubernetes Cluster - as described in the README.md - my ControlPlane becomes ready, and I am able to connect to the new cluster with the kubeconfig.

Some pods are starting, but the important one 'tiger-operator' keeps restarting:

2024/06/06 10:11:56 [INFO] Version: v1.29.0
2024/06/06 10:11:56 [INFO] Go Version: go1.18.9b7
2024/06/06 10:11:56 [INFO] Go OS/Arch: linux/amd64
2024/06/06 10:11:56 [ERROR] Get "https://10.43.0.1:443/api?timeout=32s": dial tcp 10.43.0.1:443: connect: network is unreachable
clusterctl describe cluster test-rk -n example-rk
NAME                                                                     READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/test-rk                                                          True                     81m
├─ClusterInfrastructure - HarvesterCluster/test-rk-hv
└─ControlPlane - RKE2ControlPlane/test-rk-control-plane                  True                     81m
  └─Machine/test-rk-control-plane-wvg4g                                  True                     81m
    └─MachineInfrastructure - HarvesterMachine/test-rk-cp-machine-sjtck

What did you expect to happen:

How to reproduce it:

Anything else you would like to add:
I am currently trying to deploy a 1-ControlPlane Cluster on Harvester, but the Error also occours, when i try to deploy 1-2 ControlPlanes with 1 Worker Node.

Environment:

  • rke provider version: 0.1.2
  • OS (e.g. from /etc/os-release): macOS
  • Harvester Version: 1.2.0
  • rke2-bootstrap: v0.3.0
  • rke2-controlplane: v0.3.0

Use variable names for the vm counts like described in cluster-api

Please use the same environment variables like all other providers (or at least how they are named in clusterctl)

export CONTROLPLANE_REPLICAS=3
export WORKER_REPLICAS=2

While clusterctl uses this clusterctl generate cluster --control-plane-machine-count --worker-machine-count
and the variables are named like shown here: https://github.com/kubernetes-sigs/cluster-api/blob/f335f132714a49adfcd815512c5125d8891f29d6/docs/book/src/clusterctl/provider-contract.md?plain=1#L327-L328

Updating Manifests to use the latest Longhorn Images

Describe the solution you'd like:
As of running CAPHV in Version 0.1.2, with rke2 controlplane & bootstrap provider in version 0.2.7 with Harvester 1.3.0
I ran into some issue while claiming and providing Storages with Longhorn in my Attached CAPI-Clusters.

I did some researches and just used most of the newest and latest versions I could've find.

Here are a few:

longhornio/csi-resizer:v0.5.1-lh1 => longhornio/csi-resizer:v1.10.1
longhornio/csi-provisioner:v1.6.0-lh1 => longhornio/csi-provisioner:v3.6.4
longhornio/csi-attacher:v2.2.1-lh1 => longhornio/csi-attacher:v4.5.1

longhornio/csi-node-driver-registrar:v1.2.0-lh1 => longhornio/csi-node-driver-registrar:v2.9.2
rancher/harvester-csi-driver:v0.1.6

Why do you want this feature:

With the versions we currently provide, I was not able to create any working PVC on the attached Kubernetes Cluster.
Even the auto-attaching was not working for Longhorn.

With the newest versions, I was able to do it.

Anything else you would like to add:

Feature/Enhancement: ClusterResourceSet Rollout-Strategy

Describe the solution you'd like:
Using the ClusterAPIs ClusterResourceSets, we can use the Rollout-Strategy 'Reconcile'.

The documentation can be found here:
https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set#update-from-applyonce-to-reconcile

Why do you want this feature:
Currently, the CAPHV ClusterAPI Provider only provides the ClusterResourceSets with the Rollout-Strategy of: strategy: ApplyOnce.

Like here:

---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
name: crs-harvester-ccm
namespace: ${NAMESPACE}
spec:
clusterSelector:
matchLabels:
ccm: external
resources:
- kind: ConfigMap
name: cloud-controller-manager-addon
strategy: ApplyOnce

When a User/Admin deletes a ClusterResourceSet, the CRS will not be reconciled to the cluster, when using the ApplyOnce option.

Instead, we should move to the Strategy of: strategy: Reconcile

By doing this, we include a little bit more robustness to the ClusterResourceSets and ensure the availability on the clusters attached.

Anything else you would like to add:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.