Coder Social home page Coder Social logo

kubernetes-sigs / cluster-api-provider-cloudstack Goto Github PK

View Code? Open in Web Editor NEW
35.0 10.0 33.0 2.85 MB

A Kubernetes Cluster API Provider implementation for Apache CloudStack.

Home Page: https://cluster-api-cloudstack.sigs.k8s.io/

License: Apache License 2.0

Dockerfile 0.04% Makefile 2.99% Go 94.82% Shell 2.15% HTML 0.01%
cloudstack iaas kubernetes

cluster-api-provider-cloudstack's Introduction

Powered by Apache CloudStack



What is the Cluster API Provider CloudStack

The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management.

The API itself is shared across multiple cloud providers allowing for true Apache CloudStack hybrid deployments of Kubernetes. It is built atop the lessons learned from previous cluster managers such as kops and kubicorn.

Launching a Kubernetes cluster on Apache CloudStack

Check out the Getting Started Guide to create your first Kubernetes cluster on Apache CloudStack using Cluster API.

Features

  • Native Kubernetes manifests and API
  • Choice of Linux distribution (as long as a current cloud-init is available). Tested on Ubuntu, Centos, Rocky and RHEL
  • Support for single and multi-node control plane clusters
  • Deploy clusters on Isolated and Shared Networks
  • cloud-init based nodes bootstrapping

Compatibility with Cluster API and Kubernetes Versions

This provider's versions are able to install and manage the following versions of Kubernetes:

Kubernetes Version v1.22 v1.23 v1.24 v1.25 v1.26 v1.27
CloudStack Provider (v0.4)

Compatibility with Apache CloudStack Versions

This provider's versions are able to work on the following versions of Apache CloudStack:

CloudStack Version 4.14 4.15 4.16 4.17 4.18 4.19
CloudStack Provider (v0.4)

Operating system images

Note: Cluster API Provider CloudStack relies on a few prerequisites which have to be already installed in the used operating system images, e.g. a container runtime, kubelet, kubeadm, etc. Reference images can be found in kubernetes-sigs/image-builder.

Prebuilt images can be found below :

Hypervisor Kubernetes Version Rocky Linux 8 Ubuntu 20.04
KVM v1.22 qcow2, md5 qcow2, md5
v1.23 qcow2, md5 qcow2, md5
v1.24 qcow2, md5 qcow2, md5
v1.25 qcow2, md5 qcow2, md5
v1.26 qcow2, md5 qcow2, md5
v1.27 qcow2, md5 qcow2, md5
VMware v1.22 ova, md5 ova, md5
v1.23 ova, md5 ova, md5
v1.24 ova, md5 ova, md5
v1.25 ova, md5 ova, md5
v1.26 ova, md5 ova, md5
v1.27 ova, md5 ova, md5
XenServer v1.22 vhd, md5 vhd, md5
v1.23 vhd, md5 vhd, md5
v1.24 vhd, md5 vhd, md5
v1.25 vhd, md5 vhd, md5
v1.26 vhd, md5 vhd, md5
v1.27 vhd, md5 vhd, md5

Getting involved and contributing

Are you interested in contributing to cluster-api-provider-cloudstack? We, the maintainers and community, would love your suggestions, contributions, and help! Also, the maintainers can be contacted at any time to learn more about how to get involved:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Github issues

Bugs

If you think you have found a bug please follow the instructions below.

  • Please spend a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
  • Get the logs from the cluster controllers. Please paste this into your issue.
  • Open a new issue.
  • Remember that users might be searching for your issue in the future, so please give it a meaningful title to help others.
  • Feel free to reach out to the Cluster API community on the Kubernetes Slack.

Tracking new features

We also use the issue tracker to track features. If you have an idea for a feature, or think you can help Cluster API Provider CloudStack become even more awesome follow the steps below.

  • Open a new issue.
  • Remember that users might be searching for your issue in the future, so please give it a meaningful title to help others.
  • Clearly define the use case, using concrete examples.
  • Some of our larger features will require some design. If you would like to include a technical design for your feature, please include it in the issue.
  • After the new feature is well understood, and the design agreed upon, we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) pull request, and happy coding.

Our Contributors

Thank you to all contributors and a special thanks to our current maintainers & reviewers:

Maintainers Reviewers
@rohityadavcloud @rohityadavcloud
@davidjumani @davidjumani
@jweite-amazon @jweite-amazon

All the CAPC contributors:

cluster-api-provider-cloudstack's People

Contributors

amazon-auto avatar chrisdoherty4 avatar davidjumani avatar dependabot[bot] avatar g-gaston avatar hrak avatar jweite-amazon avatar k8s-ci-robot avatar kiranchavala avatar maxdrib avatar mrbobbytables avatar mrog avatar pearl1594 avatar pmotyka avatar rejoshed avatar rohityadavcloud avatar tatlat avatar vdombrovski avatar vignesh-goutham avatar vishesh92 avatar wanyufe avatar weizhouapache avatar wongni avatar xmudrii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-provider-cloudstack's Issues

Allow Project-Level Resource Creation

/kind feature

Describe the solution you'd like
Currently, the CAPI provider creates cluster resources at the account level but has no provision to create them within the context of a Project. In theory, one may be able to specify the project's account id, but these accounts are hidden via the listProject API (hardcoded in Cloudstack) so the CAPI provider fails to find the account. This effectively makes it impossible to utilize the CAPI provider in Cloudstack deployments that utilize Projects for multi-team "tenant" access.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version: v0.4.8
  • Kubernetes version: (use kubectl version): v1.26.1
  • OS (e.g. from /etc/os-release): Rocky Linux 9.1

Requeue instead of logging reconciler errors when owners not set on CloudStackMachine yet

/kind bug

What steps did you take and what happened:

During reconciliation of a cluster, in several places GetParent gets called (which calls GetOwnerOfKind). When the CloudStackMachines do not have an owner yet, this results in a whole bunch of reconciler errors being logged like so:

E0726 12:18:22.767591 1 controller.go:329] "msg"="Reconciler error" "error"="couldn't find owner of kind Machine in namespace default" "CloudStackMachine"={"name":"hrak-cluster-control-plane-nrr5m","namespace":"default"} "controller"="cloudstackmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="CloudStackMachine" "name"="hrak-cluster-control-plane-nrr5m" "namespace"="default" "reconcileID"="04d91e20-21a8-4b8b-b04e-c57b4c01067f"

Maybe this can be adapted so that it re queues this step in the same way that RequeueIfCloudStackClusterNotReady does, but then by checking the ownerrefs, to reduce the amount of 'bad news' in the logs.

What did you expect to happen:

No errors being logged

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version: HEAD
  • Kubernetes version: (use kubectl version): 1.27.3
  • OS (e.g. from /etc/os-release):

Free IP check fails when using a domain admin account

/kind bug

What steps did you take and what happened:

When creating clusters using a domain admin account that attaches VMs to a shared network in the ROOT domain the domain admin account cannot see free IP addresses (it can only see IPs associated with VMs in the subdomain). This leads to the free IP check failing even though the VM could create successfully.

What did you expect to happen:

The free IP check succeeds.

Anything else you would like to add:

Shared network free IP check.

Environment:

Allow user-data compression to be disabled to support ignition

/kind feature

Describe the solution you'd like

CAPC currently compresses userdata for the VM's it creates. Since ignition does not support compression, it is currently impossible to use ignition based bootstrapping of nodes (f.e. using Flatcar Linux)

Offer a boolean flag to disable compression of userdata (only base64 encoding)

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version: 0.4.8 with FeatureFlag KubeadmBootstrapFormatIgnition=true
  • Kubernetes version: (use kubectl version): 1.24.10
  • OS (e.g. from /etc/os-release): flatcar-stable-3374.2.3

E2E Testing Improvement

/kind feature

Describe the solution you'd like
I would like the E2E tests to check that all custom resources and all ACS constructs created are cleaned up.

Recently we encountered an issue where a failure domain successfully deleted without cleaning up it's machines. A check for dangling resources would have failed said merge.

Question/discussion: Running cloudstack in a pod, in a kubernetes cluster

Goal

To switch to apache cloudstack as my main management interface to my xcp-ng servers & use the clusterapi cloudstack provider to deploy kubernetes clusters in my xcp-ng environment.

Steps taken

  • I have succeeded at creating a container which is running cloudstack.
  • Able to login and configure my two xcp-ng servers.

Status

Unable to fully succeed at configuring cloudstack with my xcp-ng servers. Granted, I am new and may be configuring things incorrectly.

Question/discussion

Is it even possible to succeed at running cloudstack in a pod within a kubernetes cluster or will this never work due to the low level network access needed by cloudstack?

If I recall correctly (to try and get things working), I gave the pod unlimited access & enabled the bridge module on xcp-ng, which broke the xcp-ng installation, but I think the bridge module was the problem, so as long as openvswitch is ok to use it seems like the cloudstack in pod idea should work.

(I'm asking here because I figure the folks involved in this provider may have tried this configuration.)

Work around

I guess I could stand up a vm in xcp-ng and make that the cloudstack server. After it's configured and working the cloudstack web interface would be all that's needed from that point on. ... (but I try to keep to gitops as much as possible, would prefer to use a container and pass in any needed configuration via env vars).

ClusterAPI Provider does not translate existing CS network name into Kubernetes safe name

/kind bug

What steps did you take and what happened:
Created a Cluster with an existing Network and the Getting Started Guide.
The network name referenced is called "M_Play".
Installation failed with the following message.
E0718 09:00:43.305057 1 controller.go:317] controller/cloudstackfailuredomain "msg"="Reconciler error" "error"="creating isolated network CRD: CloudStackIsolatedNetwork.infrastructure.cluster.x-k8s.io \"capc-cluster-m_play\" is invalid: metadata.name: Invalid value: \"capc-cluster-m_play\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" "name"="e146065e3fcde3e28804d45cb6e45297" "namespace"="capi-joschi" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="CloudStackFailureDomain"

What did you expect to happen:
I expected that cluster-api-provider-cloudstack translates names from cloudstack to kubernetes. Eg. replaces the underlines with hypens.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version: 1.4.4
  • Kubernetes version: (use kubectl version): 1.24.14
  • OS (e.g. from /etc/os-release): rockylinux (prebuilt)

Reconcile a CAPC cluster to a CKS cluster

/kind feature

Describe the solution you'd like

CAPC reconciles various resources in CloudStack, we want to see CAPC leverage and reconcile notion of a CAPC k8s cluster to a CKS (unmanaged) cluster.

The CKS cluster resource in CloudStack captures notion of a k8s cluster in CloudStack, and recently a feature improvement made it possible to have notion of unmanaged k8s cluster with apache/cloudstack#7515 with documentation at apache/cloudstack-documentation#315 and API support via go-sdk change apache/cloudstack-go#59. A PR proposed #250 which couldn't satisfy review from other stakeholders. With this issue, alternative options and approaches can be discussed or the previous PR reimagined.

As a CAPC user I would want the following:

  • Ability to see my CAPC cluster show up as unmanaged CKS cluster in CloudStack (API and UI)
  • Be able to see usage, of my CAPC cluster via CKS cluster (cpu, memory etc.)
  • As MachineSets are added/removed, the related instances can be added removed in CKS 'unmanaged' cluster
  • In large environment, esp a typical CI/CD use-cases, if as a user I lose/forget my CAPC admin cluster I should be able to manually delete the CAPC cluster resources (namely, VMs and networks)
  • I should be able to do this as a user account type (note: a CloudStack role has a list of allowed/disallowed APIs, however, an account type control ability of an account type to do certain things such as acquire public IP of choice in a given shared/isolated/vpc network etc. needed typically for CAPC cluster)
  • Prefer not to use another tool, than CAPC for reconciliation of a CAPC cluster to a CloudStack Kubernetes Service (CKS) cluster

Brief design doc: (to be expanded by @vishesh92)

Removing FailureDomain upgrade stuck on etcd

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Created a 3-1-1 cluster with EKS-A split across multiple failure domains. Proceeded to remove the second failure domain. Cluster became stuck in status EtcdRollingUpdateInProgress with message Rolling 1 replicas with outdated spec (0 replicas up to date). All VM's were showing as Ready, but the cluster was showing as Not Ready in clusterctl describe cluster.

The etcdadm controller is showing logs like

1.6613603260138805e+09	INFO	controllers.EtcdadmCluster.eksa-drib-d30788f-etcd	Rolling out Etcd machines	{"needRollout": ["eksa-drib-d30788f-etcd-2q5k8"]}
1.6613603260512962e+09	INFO	controllers.EtcdadmCluster.eksa-drib-d30788f-etcd	following machines owned by this etcd cluster:
1.6613603260523336e+09	ERROR	controller.etcdadmcluster	Reconciler error	{"reconciler group": "etcdcluster.cluster.x-k8s.io", "reconciler kind": "EtcdadmCluster", "name": "eksa-drib-d30788f-etcd", "namespace": "eksa-system", "error": "etcd endpoint port is not open"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

I'm not sure if this is an issue with CAPC, EKS-A, or etcdadm. This issue is also being observed in the EKS-A e2e tests

What did you expect to happen:
Etcd clusters to proceed with their rolling upgrade instead of getting stuck.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cloudstack dev environment
  • Cluster-api-provider-cloudstack version: v0.4.7-rc3
  • Kubernetes version: (use kubectl version): 1.21
  • OS (e.g. from /etc/os-release): rhel8

Add Sub-Domain User E2E & Improve Sub-Domain Testing Overall

Exactly as the title suggests. We need to add the use of domain and account in a sub-domain account to our E2E tests.

Also add more unit tests for domain resolution like just base ROOT domain.

Enable semi-integ tests in the pipeline via configurable parameters -- like in the E2E tests.

Remove Affinity Group dependency on owner ref

/kind bug

Cloudstack machine spec accepts affinity group in two ways.

  1. Managed way where we can just say pro or anti and CAPC will create an affinity group accordingly for each node group. Ex. for KCP, Etcd, and each worker node group.
  2. Provide an already created list of affinity group IDs

In the managed way, CAPC gets the owner ref of CAPI machine and uses the owner name to identify and generate a unique name per node group. While this would work in most situations, owner ref of CAPI machines is not a solid or strict API spec to identify the node role or node group. For instance, ETCD machines can be owned by etcdamd controller, but when the controller chooses to delete old machines, it simply removes its ownership, at which point, CAPI cluster owns those old etcd machines.

An intermediary API that is responsible to track node roles along with affinity or anti-affinity status should be established from which affinity group names should be generated. This API should derive the affinity details during KCP/Cluster creation time.

Additional ObjectMeta field should be removed from CloudStackMachineTemplate

CloudStackMachineTemplate API for v1beta2 has an additional ObjectMeta in one of its children structs (CloudStackMachineTemplateResource. This field is not required and is not used anywhere. While this is predominately non-intrusive, kube-apiserver from 1.25 enabled ServerSideValidation as the feature matured to beta. When unmarshalling a struct from this API, in includes creationtimestamp as nil as part of that additional ObjectMeta field. A yaml generated from un-marshalling struct fails server side validation.

API field to be removed -

ObjectMeta metav1.ObjectMeta `json:"metadata,omitempty"`

Allow network offering to be configurable

/kind feature

Describe the solution you'd like

NetOffering is currently hardcoded as DefaultIsolatedNetworkOfferingWithSourceNatService, but in some setups, these network offerings have been customized and DefaultIsolatedNetworkOfferingWithSourceNatService is not available.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version: 1.4.8
  • Kubernetes version: (use kubectl version): 1.24.9
  • OS (e.g. from /etc/os-release):

Upgrade to CAPI v1.3

Summary

CAPC v0.4 uses CAPI v1.2. To provide users with at least 1 CAPC release using CAPI 1.3, I'm proposing we do a v0.5 release that uses CAPI 1.3 before upgrading to CAPI 1.4.

/kind feature

CloudStackClusterTemplate API and CRD request to use with clusterclass implementation

/kind feature

Describe the solution you'd like
I like to use Cluster-api ClusterClass solution to manage our cluster topology easily with Cluster-api CloudStack deployments. I checked default configuration detailed here https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass.html#basic-clusterclass but I noticed that we don't have crd, api definition for CloudStackClusterTemplate, when I deploy sample clusterclass I got the error below

error: resource mapping not found for name: "capc-sharedkv" namespace: "" from "capc-sharedkv-cc.yaml": no matches for kind "CloudStackClusterTemplate" in version "infrastructure.cluster.x-k8s.io/v1beta2"

Do we plan to include this on upcoming release?

Anything else you would like to add:
My clusterclass definition is below.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: capc-sharedkv-cc-1.0
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: capc-sharedkv-cp # Done
      namespace: default
    machineInfrastructure:
      ref:
        kind: CloudStackMachineTemplate
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        name: capc-sharedkv-cp-v1-26-6 # Done
        namespace: default
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: CloudStackClusterTemplate
      name: capc-sharedkv # TODO
      namespace: default
  workers:
    machineDeployments:
    - class: default-worker
      template:
        bootstrap:
          ref:
            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
            kind: KubeadmConfigTemplate
            name: capc-sharedkv-md-v1-26-6 # Done
            namespace: default
        infrastructure:
          ref:
            apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
            kind: CloudStackMachineTemplate
            name: capc-sharedkv-md-v1-26-6 # Done
            namespace: default

Environment:

  • Cluster-api-provider-cloudstack version: v0.4.8
  • Kubernetes version: (use kubectl version): 1.26.6
  • OS (e.g. from /etc/os-release): debian11

"post-cluster-api-provider-cloudstack-push-images" job failed for 201 days

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

kubernetes-sigs/cluster-api#8784

The log of latest run
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-cluster-api-provider-cloudstack-push-images/1664572298696331264

What did you expect to happen:

succeed.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

It is because the docker container does not have gcc installed.

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Remove Affinity Group Ref from CloudStackMachine Spec

/kind feature

Cloudstack machine spec on v1beta2 version has a AffinityGroupRef defined in the API. This is a dead field and is not being used anywhere in the code and has no functionality. Affinity groups can be used either using the pro/anti affinity field or provide group IDs that will get attached to the vm during creation time. This dead field should be removed from the API.

Extra ObjectMeta struct included in CloudStackMachineTemplateResource

/kind bug

What steps did you take and what happened:
Create a kind cluster and install CAPC on it. Attempting to serialize a CloudStackMachineTemplate resource and then apply the result to a K8s cluster. This returned the following error:

"error: error validating \"STDIN\": error validating data: unknown object type \"nil\" in CloudStackMachineTemplate.spec.template.metadata.creationTimestamp; if you choose to ignore these errors, turn validation off with --validate=false\n"

It's basically causing the issue described in kubernetes/kubernetes#67610

What did you expect to happen:
I expected to be able to serialize the Go object to bytes, and then apply it with kubectl without having an extra ObjectMeta resource included to worry about. This ObjectMeta does not appear to be used anywhere in CAPC and should be removed. I should not have to set the creationTimestamp field either since as the comment says

	// CreationTimestamp is a timestamp representing the server time when this object was
	// created. It is not guaranteed to be set in happens-before order across separate operations.
	// Clients may not set this value. It is represented in RFC3339 form and is in UTC.

Anything else you would like to add:
This ObjectMeta does not appear to be used anywhere in CAPC and should be removed. However, this would require a new CAPC api version bump

Environment:

  • Cluster-api-provider-cloudstack version: main
  • Kubernetes version: (use kubectl version): 1.21
  • OS (e.g. from /etc/os-release):

Multiple workers for controllers

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
CAPC provider uses controller-runtime for all of its controllers. When each of the controller is set up with a manager, CAPC should configure controller-runtime to use multiple concurrent workers for each controller. This could be customizable with input arguments as well.

In order to support multiple concurrent workers, there might be code changes required to, in order to stop multiple workers accessing same resource. CAPI has a clean implementation of caching and locking that could be used as inspiration for this.

This will help when CAPC is managing multiple workload clusters and especially when the environment has a lot of operations going on resulting in lot of reconcile tasks being queued up quickly more than a single worker can serve.

Stop injecting Ginkgo into controllers during test runs

Summary

Ginkgo is used for most, possibly all, tests in CAPC. It tries to be smart when executing controller testing by injecting code into controller source to catch controller panics as they run in separate goroutines. When un caught, Ginkgo will recognize there's a panic but won't necessarily 'handle' it.

Injecting test code into sources before running tests is a poor practice that we should remove. Panic issues are typically easy to identify even if Ginkgo doesn't handle it explicitly.

v1beta3 API

From few of the features as well as bugs we are tracking, some of them require an API change, which are all sizable enough and non-additive to warrant a new API version. This issue will track all tasks required to introduce the new API version.

Tasks

Set Owner Reference to endpoint secrets

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

When an endpoint secret is dedicated to a single cluster (by putting a cluster name label to the secret) I'd like CAPC to set the secret's owner reference to the target cluster if it's not already owned by a different cluster. The reason why I want this feature is to make clusterctl move command move the secret along with the cluster automatically.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Failure to build go mod depending on CAPC

/kind bug

What steps did you take and what happened:
When a consumer that depends on CAPC v0.4.8-rc2+ attempts to build, it fails with the errors below

/Users/dribinm/go/go1.18/bin/go build -ldflags "-s -w -buildid='' -extldflags -static" -o bin/manager ./manager
# sigs.k8s.io/cluster-api-provider-cloudstack/api/v1beta2
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackcluster_webhook.go:43:27: cannot use &CloudStackCluster{} (value of type *CloudStackCluster) as type admission.Defaulter in variable declaration:
	*CloudStackCluster does not implement admission.Defaulter (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackcluster_webhook.go:53:27: cannot use &CloudStackCluster{} (value of type *CloudStackCluster) as type admission.Validator in variable declaration:
	*CloudStackCluster does not implement admission.Validator (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackmachine_webhook.go:43:27: cannot use &CloudStackMachine{} (value of type *CloudStackMachine) as type admission.Defaulter in variable declaration:
	*CloudStackMachine does not implement admission.Defaulter (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackmachine_webhook.go:53:27: cannot use &CloudStackMachine{} (value of type *CloudStackMachine) as type admission.Validator in variable declaration:
	*CloudStackMachine does not implement admission.Validator (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackmachinetemplate_webhook.go:43:27: cannot use &CloudStackMachineTemplate{} (value of type *CloudStackMachineTemplate) as type admission.Defaulter in variable declaration:
	*CloudStackMachineTemplate does not implement admission.Defaulter (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackmachinetemplate_webhook.go:53:27: cannot use &CloudStackMachineTemplate{} (value of type *CloudStackMachineTemplate) as type admission.Validator in variable declaration:
	*CloudStackMachineTemplate does not implement admission.Validator (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackaffinitygroup_types.go:75:25: cannot use &CloudStackAffinityGroup{} (value of type *CloudStackAffinityGroup) as type runtime.Object in argument to SchemeBuilder.Register:
	*CloudStackAffinityGroup does not implement runtime.Object (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackaffinitygroup_types.go:75:53: cannot use &CloudStackAffinityGroupList{} (value of type *CloudStackAffinityGroupList) as type runtime.Object in argument to SchemeBuilder.Register:
	*CloudStackAffinityGroupList does not implement runtime.Object (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackcluster_types.go:76:25: cannot use &CloudStackCluster{} (value of type *CloudStackCluster) as type runtime.Object in argument to SchemeBuilder.Register:
	*CloudStackCluster does not implement runtime.Object (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackcluster_types.go:76:47: cannot use &CloudStackClusterList{} (value of type *CloudStackClusterList) as type runtime.Object in argument to SchemeBuilder.Register:
	*CloudStackClusterList does not implement runtime.Object (missing DeepCopyObject method)
../../../go/pkg/mod/sigs.k8s.io/[email protected]/api/v1beta2/cloudstackcluster_types.go:76:47: too many errors
make[2]: *** [eks-a-cluster-controller] Error 1
make[1]: *** [create-cluster-controller-binary-linux-amd64] Error 2
make: *** [create-cluster-controller-binaries-local] Error 2

What did you expect to happen:
I expected the build to succeed. It seems like the go dependency expects the DeepGen files to be present in the repo in order to take a dependency on it and build successfully

Anything else you would like to add:
v0.4.8-rc1 succeeds when used as a dependency in another go mod. Comparing v0.4.8-rc1...v0.4.8-rc2, it appears the main difference is the removal of api/v1beta2/zz_generated.deepcopy.go

Environment:

  • Cluster-api-provider-cloudstack version: v0.4.8-rc2+
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release): Mac

Inconsistent JSON tags

In certain structs, the JSON tags case is not consistent.
Eg: CloudStackMachineSpec.AffinityGroupIDs is tagged to affinitygroupids while CloudStackMachineSpec.ZoneID is tagged to zoneID.

There are others and it would be great to standardise the tags to a certain convention prior to an initial release so that down the line there won't be any compatibility issues

Orphaned CloudStack VM's present in slow CloudStack environments

/kind bug

What steps did you take and what happened:
In slow CloudStack environments, we have observed the following race condition:

  1. User requests scale up of resources
  2. CAPI machine is created
  3. CloudStackMachine is created
  4. CAPC begins creating a VM
  5. Before VM creation finishes, MachineHealthCheck triggers and CAPI machine is removed
  6. Since VM creation did not complete in the CAPC machine controller, Spec.InstanceID is not set on the CloudStackMachine resource, and hence when the CloudStackMachine is being ReconcileDeleted, the call to DestroyVMInstance is skipped. This results in a Cloudstack VM and K8s node existing on the cluster without the corresponding CAPI/CAPC Machine resources to manage it.

What did you expect to happen:
I expected there not to be any rogue VM's and a 1:1 mapping between CloudStackMachine and CloudStack VM's.

Anything else you would like to add:
Resolution is to modify the VM deletion logic. in cloudstackmachine_controller, if Spec.InstanceID is not set, look it up from VM name by calling listVirtualMachines. Then proceed to call DestroyVM in all situations.

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Support CloudStack normal user account

/kind feature

Describe the solution you'd like

While this is documented at https://cluster-api-cloudstack.sigs.k8s.io/topics/cloudstack-permissions I would prefer if I can use CAPC as a normal user account without any issues for support network models such as shared network, vpc and isolated network.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

clusterctl upgrade plan considering pre-release version 0.4.9rc3

/kind bug

What steps did you take and what happened:

When trying to upgrade the management cluster, running clusterctl upgrade plan throws an error about missing metadata.yaml in release 0.4.9rc3.

$ clusterctl upgrade plan
Checking new release availability...
Error: failed to read "metadata.yaml" from the repository for provider "infrastructure-cloudstack": failed to download files from GitHub release v0.4.9-rc3: failed to get file "metadata.yaml" from "v0.4.9-rc3" release

according to the docs, pre-release provider versions should be ignored by clusterctl upgrade plan

clusterctl upgrade plan does not display pre-release versions by default. For example, if a provider has releases v0.7.0-alpha.0 and v0.6.6 available, the latest release available for upgrade will be v0.6.6.

What did you expect to happen:

v0.4.9rc3 to be ignored by clusterctl upgrade

[Propose] create release-0.4 branch

/kind feature

Describe the solution you'd like

I suggest to create a new branch for v0.4.X releases, which will

  • stay on cluster-api v1.2.X
  • accepts bug fixes only
  • support older k8s versions
  • All PRs should pass unit tests.
  • e2e tests are skipped due to #242

The branch name can be "release-0.4" (same as cluster-api) or "stable-0.4". It will be used for 0.4.X releases, for example 0.4.10.

The main branch will then be the development branch, which will

  • upgrade to latest dependencies, for example cluster-api v1.4.2
  • support latest k8s versions
  • accepts bug fixes (cherry-pick or merge forward from release-0.4 branch)
  • improvements and new features
  • All PRs should pass unit tests and e2e tests.

Once v0.5.0 is released, create a new branch "release-0.5"

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Enable configuration of timeout for unreachable VM's

Currently there is a hardcoded 5 minute timeout for VM's between the time the VM comes up in CloudStack to when it should join the cluster, as defined in #123. In order to make this timeout functional for other environments, we should make the timeout configurable with some environment variable plumbed through to this controller.

capi-provider-cloudstack-presubmit-unit-test often fails with envtest timeout

We have been observing an issue recently where the prow test capi-provider-cloudstack-presubmit-unit-test nondeterministically fails with the following error

 Unexpected error:
      <*fmt.wrapError | 0xc00045efe0>: {
          msg: "unable to start control plane itself: failed to start the controlplane. retried 5 times: timeout waiting for process kube-apiserver to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)",
          err: <*fmt.wrapError | 0xc00045efc0>{
              msg: "failed to start the controlplane. retried 5 times: timeout waiting for process kube-apiserver to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)",
              err: <*errors.errorString | 0xc000316430>{
                  s: "timeout waiting for process kube-apiserver to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)",
              },
          },
      }
      unable to start control plane itself: failed to start the controlplane. retried 5 times: timeout waiting for process kube-apiserver to start successfully (it may have failed to start, or stopped unexpectedly before becoming ready)

like in https://beta.prow.model-rocket.aws.dev/view/s3/prowpresubmitsdataclusterstack-prowbucket7c73355c-a8z3fg9jrzai/pr-logs/pull/aws_cluster-api-provider-cloudstack/95/capi-provider-cloudstack-presubmit-unit-test/1526601349179904000.

This began after we upgraded to ginkgo v2. I found an issue for a similar problem where increasing the timeout may help the issue, as discussed in kubernetes-sigs/kubebuilder#628

CloudStackMachine could not be deleted when using invalid serviceOffing

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
E2E testing detects a bug that when invalid serviceOffering is configured to create CloudStackMachine. This CloudStackMachine could not be deleted due to missing instanceID.

What did you expect to happen:
CloudStackMachine should be deleted successfully without stopping cluster deletion.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
This is for admin doc purpose, a PR had been created to address this:#201

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

Refactor tests to improve development cycles efficiency

/kind feature

Summary

When developers enter the typical development cycle they can experience challenges. This issue outlines some of those challenges with the hope of creating smaller issues to address the concerns.

Development cycle challenges

The use of Ginkgo in unit tests has created hard to follow test logic

When reading the test source developers can find themselves jumping between stacked BeforeEach statements to identify mocking expectations that aren't necessarily obvious in individual tests.

It would be helpful if unit tests were more self contained (using little or no BeforeEach statements spread vertically across the test sources) and/or relied solely on native testing capabilities in the language.

The Makefile test depends on generation and linting

Depending on generation and linting in the test recipe inflates the test cycle. When tests are compiled it will become obvious mocks aren't defined correctly (IDEs may catch the issues earlier). Linting is its own recipe and can be removed from test (again IDEs may highlight linting issues early and the CI will catch them). We may want a recipe that still runs runs the various generation and linting recipes, however its important developers can iterate quickly and the existing setup creates a poor experience.

These concerns are orthogonal to the CI. The CI should run checks to ensure generated sources are up to date and lint the code.

Running individual tests can be challenging

Go developers are usually accustomed to go test. However, go test doesn't work on several of our packages as we've munged integration tests that leverage envtest with unit tests. Consequently unit tests, that are ideally invokable with just go test, can't be run independently. Furthermore, the test recipe doesn't offer a filter for running tests in isolation making it difficult to focus in on problem areas.

The code structure makes adding new tests challenging

The code architecture bundles significant logic into algorithms, particularly in the /pkg/cloud package. In some instances, this may be appropriate, but in others it seems we've exported methods for better testability (code smell). The code could benefit from some restructuring to make it more testable and transitively easier to work with.

Machine controller panics with logging bug

/kind bug

Cloudstack machine controller panics when trying to get generate affinity group name. Affinity group names are generated with the objects owner ref. If the object is owned by another entity than KCP, Etcdadm, or MachineSet, the AG group name generation fails. This causes the controller to panic with log.

CAPC does not resolve service offering with zone as additional filtering parameter

/kind bug

What steps did you take and what happened:
The following error is generated when multiple cloud stack zones with the same offering name exist.

E0104 21:43:15.451029 1 controller.go:317] controller/cloudstackmachine "msg"="Reconciler error" "error"="1 error occurred:\n\t* expected 1 Service Offering with name 4 vCPU 8 GiB RAM, but got 2\n\n" "name"="<>" "namespace"="<>" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="CloudStackMachine"

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-cloudstack version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

CAPI v1.5.0-beta.0 has been released and is ready for testing

CAPI v1.5.0-beta.0 has been released and is ready for testing.
Looking forward to your feedback before CAPI 1.5.0 release, the 25th July 2023!

For quick reference

Following are the planned dates for the upcoming releases

Release Expected Date
v1.5.0-beta.x Tuesday 5th July 2023
release-1.5 branch created (Begin [Code Freeze]) Tuesday 11th July 2023
v1.5.0-rc.0 released Tuesday 11th July 2023
release-1.5 jobs created Tuesday 11th July 2023
v1.5.0-rc.x released Tuesday 18th July 2023
v1.5.0 released Tuesday 25th July 2023

Add Windows Nodes Support

/kind feature

Windows nodes are supported by Kubernetes and Image-builder can even create Kubernetes enabled Windows Images.

Anything else you would like to add:
All of the main stream K8s Cluster Managers have added Windows support, Tanzu, Openshift, and Rancher but the simplicity of Cluster-Api and Cloudstack would really help the poor souls having to deploy Windows Nodes in their clusters.

Port-forwarding and firewall rules are not getting deleted when the capc cluster is destroyed.

/kind bug

What steps did you take and what happened:

  1. Launch a Capc cluster in a specific network

export CLOUDSTACK_NETWORK_NAME=k8snetwork
export CLUSTER_ENDPOINT_IP=10.0.54.100

  1. Capc cluster is created in Cloudstack

  2. Delete the capc cluster

kubectl apply -f capc-cluster-spec.yaml

What did you expect to happen:

Expected the port forwarding and firewall rules on the public ip to get deleted.

The port-forwarding and firewall should get deleted on the public ip

Screenshots
Screenshot 2023-07-14 at 6 41 14 PM

Screenshot 2023-07-14 at 6 41 08 PM

Possibility to add a secondary NIC to machines

/kind feature

Describe the solution you'd like
Currently, it's not quite possible to force a CAPC infrastructure that creates machines with multiple network interfaces; the only network related parameter is the following:

export CLOUDSTACK_NETWORK_NAME=XYZ

Sometimes however it's useful to create an infrastructure that will be connected to multiple networks (e.g. a storage network for a custom CSI, or for implementing LoadBalancers)

It would be nice to be able to add another network (that exists inside Cloudstack) to all machines created as part of CAPC network.

Anything else you would like to add:
N/A

Environment:

  • Cluster-api-provider-cloudstack version: 0.4.4
  • Kubernetes version: (use kubectl version): v1.23.3
  • OS (e.g. from /etc/os-release): Ubuntu 20.04 LTS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.