kubernetes-sigs / cluster-api-provider-kubevirt Goto Github PK

View Code? Open in Web Editor NEW

110.0 7.0 62.0 1.03 MB

Cluster API Provider for KubeVirt

License: Apache License 2.0

Go 89.72% Makefile 3.11% Dockerfile 0.56% Shell 6.61%

cluster-api-provider-kubevirt's Introduction

Kubernetes Cluster API Provider Kubevirt

Kubernetes-native declarative infrastructure for Kubevirt.

What is the Cluster API Provider Kubevirt?

The Cluster API brings declarative Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers allowing for true Kubevirt hybrid deployments of Kubernetes.

Quick Start

Checkout our Cluster API Quick Start to create your first Kubernetes cluster.

Getting Involved and Contributing

Are you interested in contributing to cluster-api-provider-kubevirt? We, the maintainers and the community, would love your suggestions, support and contributions! The maintainers of the project can be contacted anytime to learn about how to get involved.

In the interest of getting new people involved have issues marked as good-first-issues. Although the issues have a smaller scope but are very helpful in getting accquintated with the codebase. For more see the issue tracker. If you're unsure where to start, feel free to reach out to discuss.

See also: Our own contributor guide and the Kubernetes community page.

We also encourage ALL active community participants to act as if they are maintainers, even if you don't have 'official' written permissions. This is a community effort and we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power!

Office hours

Join the SIG Cluster Lifecycle Google Group to documents and calendar.
Participate in the conversations on Kubernetes Discuss
Meetings:
- Cluster API Provider KubeVirt Syncup Meetings: Tuesdays at 8:00 PT (Pacific Time) (weekly starting Tuesday, December 7th, 2021). Convert to your timezone.
  - Meeting notes and Agenda.

Other ways to communicate with the maintainers

Please check in with us in the #cluster-api-kubevirt. You can also join our Mailing List

Github Issues

Bugs

If you think you have found a bug please follow the instruction below.

Please give a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
Get the logs from the custom controllers and please paste them in the issue.
Open a bug report.
Remember users might be searching the issue in the future, so please make sure to give it a meaningful title to help others.
Feel free to reach out to the community on slack.

Tracking new feature

We also have an issue tracker to track features. If you think you have a feature idea, that could make Cluster API provider Kubevirt become even more awesome, then follow these steps.

Open a feature request
Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
Clearly define the use case with concrete examples. Example: type this and cluster-api-provider-kubevirt does that.
Some of our larger features will require some design. If you would like to include a technical design to your feature, please go ahead.
After the new feature is well understood, and the design is agreed upon we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) PR and happy coding!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-api-provider-kubevirt's People

Contributors

Stargazers

Watchers

Forkers

agradouski cchengleo dankenigsberg rmohr nunnatsa nirarg davidvossel dramasamy dgovinndaraju12 kubevirt sayantani11 openshift jayesh-srivastava 1abhisheksarkar aaseem pranshusrivastava isaacdorfman timyinshi wangxin311 arvinderpal doytsujin 0xfelix nikhilsharmawe gunine isabella232 sanselme nicko170 qinqon rhrazdil pjaton enp0s3 brybacki raz-bn richardstevenson orenc1 jbpratt hiromu-a5a yuanchen8911 brianmcarey natelu ii rgolangh barthv awels feitnomore prometherion lion7 johanot moondev celebdor ioboi bryan-cox jparrill bzub alebsack steveefemsc edgestack jgu17 richardcase

cluster-api-provider-kubevirt's Issues

Update README document

We should update README doc with how to get started with this repo instead of default template doc

Workload cluster VM is not able to resolve DNS entry

/kind bug

we observed workload cluster is not able to pull docker image, and after look into the issue, seems the VM is not able to resolve DNS.

We're using the following qcow2 image for the Kubevirt VM, and kubernetes version is 1.21.0

NODE_VM_IMAGE_TEMPLATE=quay.io/kubevirtci/fedora-kubeadm:35

[root@kvcluster01-control-plane-8bgh9 capk]# ping www.google.com
ping: www.google.com: Name or service not known
[root@kvcluster01-control-plane-8bgh9 capk]# cat /etc/resolv.conf
# Generated by NetworkManager
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10

I did tcpdump on cni0 interface of infra kubernetes cluster node, I didn't see any dns packet been received there.

If I add google DNS server 8.8.8.8 in /etc/resolv.conf file, then I am able to resolve dns entry.

[root@kvcluster01-control-plane-8bgh9 capk]# echo "nameserver 8.8.8.8" >> /etc/resolv.conf
[root@kvcluster01-control-plane-8bgh9 capk]# ping www.google.com
PING www.google.com (142.250.65.228) 56(84) bytes of data.
64 bytes from lga25s73-in-f4.1e100.net (142.250.65.228): icmp_seq=1 ttl=114 time=2.30 ms
64 bytes from lga25s73-in-f4.1e100.net (142.250.65.228): icmp_seq=2 ttl=114 time=2.36 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 2.297/2.330/2.364/0.033 ms

I also tried the same image quay.io/kubevirtci/fedora-kubeadm:35 with native kubevirt, everything works without adding 8.8.8.8.

[root@fedora capk]# cat /etc/resolv.conf
# Generated by NetworkManager
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
[root@fedora capk]# ping www.google.com
PING www.google.com (142.251.40.132) 56(84) bytes of data.
64 bytes from lga25s80-in-f4.1e100.net (142.251.40.132): icmp_seq=1 ttl=113 time=2.45 ms
64 bytes from lga25s80-in-f4.1e100.net (142.251.40.132): icmp_seq=2 ttl=112 time=2.41 ms

here is my kubevirt vm yaml, FYI

root@ny5-infra04:~/kubevirt# cat vm.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstance
metadata:
  name: fedora
  labels:
    vmi: http
spec:
  terminationGracePeriodSeconds: 30
  domain:
    resources:
      requests:
        memory: 1024M
    devices:
      disks:
      - name: containerdisk
        disk:
          bus: virtio
      - name: emptydisk
        disk:
          bus: virtio
      - disk:
          bus: virtio
        name: cloudinitdisk
  volumes:
  - name: containerdisk
    containerDisk:
      image:  quay.io/kubevirtci/fedora-kubeadm:35
  - name: emptydisk
    emptyDisk:
      capacity: "2Gi"
  - name: cloudinitdisk
    cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: capk
        chpasswd: { expire: False }

I am not sure what's the different the way kubevirt create VM and cluster-api create the VM, but it's surprise to me VM with native kubevirt works but not with cluster api kubevirt.

Thanks
Xin

Add clusterkubevirtadm utility to create ServiceAccount for infrastructure resources creation

Create new Cobra CLI utility
Name: clusterkubevirtadm
It create will ServiceAccount in the Infrastructure clusture to be used for all the resources creation
It will also create a Role to this ServiceAccount, include minimal permissions required for the resources creation
It will output the credentials file (kubeconfig) for using the new service account

This utility, will allow the infrastructure admin create special credentials for each tenant cluster creation.

Add the weekly meeting schedule in the `office hours` section

Currently the README.md doesn't mention the weekly meetings for Kubevirt as we can see here: https://github.com/kubernetes/community/blob/master/sig-cluster-lifecycle/README.md#cluster-api-provider-kubevirt
So, it would be better if we could add the weekly meetings in the office-hours section.

Environment:

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

/kind documentation
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

📖 Make changes to the README file

Currently the Readme file just follows a basic K8s-templates. It would be better to update it with a one line description about CAP-Kubevirt and with some additional info related to contribution guidelines, and few other things.
Similar to the one in https://github.com/kubernetes-sigs/cluster-api-provider-gcp
Environment:

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

/kind documentation
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

[e2e tests] Cover creating tenant clusters in external infra clusters in the integration tests

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

We are not testing the creation of capk tenant clusters in external clusters.

What did you expect to happen:

Let's add an integration test which creates a kubeconfig with a service account token and corresponding RBAC rules and then uses this kubeconfig in the capk controllers to create the clusters.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

kind enhancement
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

ssh validation code is specific for cloud init

Hi,

As part of KubevirtMachine Reconcile, there is ssh validation for:

VM is booted
Bootstrap is completed

This validation assumes the following:

The VM support cloudinit format (for example fedora)
When the bootstrap is done the file /run/cluster-api/bootstrap-success.complete is created on the VM file system

Those assumptions are not true for Openshift cluster scenario:

The Nodes run RHCOS
RHCOS use ignitons and not cloud init
Users and ssh keys are added in different format in RHCOS
The /run/cluster-api/bootstrap-success.complete file isn't created when the bootstrap is done

I can think of two possible solutions:

Change the validation to use more generic way, not ssh (does Kubevirt provide something like this?)
Add another parameter to KubevirtMachine indicates which OS used, have different validation for each OS

CAPK addition of `capk` user to Cloud-Init user-data conflict when user(s) specified in Kubeadm configs

What steps did you take and what happened:

Kubeadm's KubeadmControlPlane.spec.kubeadmConfigSpec and KubeadmConfigTemplate.spec.template.spec expose the capability for the end-user to add custom users to pass to the Cloud-Init user data that will be generate for control-plane and worker nodes, respectively.

By example:

Create a cluster, specifying a user to add to control-plane nodes in the KubeadmControlPlane spec:

kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
  name: pja-control-plane
spec:
  (...)
  kubeadmConfigSpec:
    (...)
    users:
    - name: pja
      lockPassword: false
      sshAuthorizedKeys:
        - XXXXXXXXXXXXX

Introspect the user-data generated for one of the control-plane node and noticed the invalid dual users field:3.

$ kubectl get secrets pja-control-plane-<hash>-userdata -o jsonpath='{.data.userdata}' | base64 -d

(...)
runcmd:
  - 'kubeadm init --config /run/kubeadm/kubeadm.yaml  && echo success > /run/cluster-api/bootstrap-success.complete'
users:
  - name: pja
    lock_passwd: false
    ssh_authorized_keys:
      - XXXXXXXXXXXXX
users:
  - name: capk
    gecos: CAPK User
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: users, admin
    ssh_authorized_keys:
      - XXXXXXXXXXXXX

Looking at the source-code, this is due to the concatenation of string in kubevirtmachine_controller.go that doesn't introspect the user-data YAML for a pre-existing users field.

What did you expect to happen:

The two users should be under a single users field:

(...)
runcmd:
  - 'kubeadm init --config /run/kubeadm/kubeadm.yaml  && echo success > /run/cluster-api/bootstrap-success.complete'
users:
  - name: pja
    lock_passwd: false
    ssh_authorized_keys:
      - XXXXXXXXXXXXX
  - name: capk
    gecos: CAPK User
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: users, admin
    ssh_authorized_keys:
      - XXXXXXXXXXXXX

Anything else you would like to add:

Note that, at least with my environment, cloud-init appears to be lenient and create both users. However, YAML spec expects unique keys and this should be addressed.

The fix should take into account the fact that this can result with two users with the same name. In my opinion, it would be fine to reserve the capk user for CAPK use and, thus, have the entry added by the latter overriding a user-defined capk user, but this might need to be discussed.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

create ginkgo/gomega e2e test suite

We need a ginkgo/gomega test suite to replace the hack/functest.sh e2e test.

The functest.sh script was only meant as a stop gap that allowed us to get some quick e2e test feedback for our CI lanes. We need to port that bash script to golang code. This will give us the ability to express more advanced e2e tests in the future.

Two deprecated, insecure function are in use in the ssh package

The x509.IsEncryptedPEMBlock and the x509.DecryptPEMBlock function used in pkg/ssh/utils.go have been deprecated since Go 1.16 because it shouldn't be used: Legacy PEM encryption as specified in RFC 1423 is insecure by design. Since it does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext.

/kind bug

Documentation - goals, use cases, design decisions

For the sake of potential users and contributors, it would be good to have a statement on the project goals, prioritized and dismissed use cases as well as major design decisions that were made by the core team.

It will make it much easier for those who consider contributing to the project to decide whether it's going to match their needs.

clusterctl generate depends on `cluster-template.yaml` being published in the releases

What steps did you take and what happened:

❯ clusterctl generate cluster test -v=5
Using configuration File="/home/jbpratt/.cluster-api/clusterctl.yaml"
Fetching File="cluster-template.yaml" Provider="kubevirt" Type="InfrastructureProvider" Version="v0.1.1"
Error: failed to read "cluster-template.yaml" from provider's repository "infrastructure-kubevirt": failed to download files from GitHub release v0.1.1: failed to get file "cluster-template.yaml" from "v0.1.1" release
sigs.k8s.io/cluster-api/cmd/clusterctl/client/repository.(*templateClient).Get
	sigs.k8s.io/cluster-api/cmd/clusterctl/client/repository/template_client.go:94
sigs.k8s.io/cluster-api/cmd/clusterctl/client.(*clusterctlClient).getTemplateFromRepository
	sigs.k8s.io/cluster-api/cmd/clusterctl/client/config.go:339
sigs.k8s.io/cluster-api/cmd/clusterctl/client.(*clusterctlClient).GetClusterTemplate
	sigs.k8s.io/cluster-api/cmd/clusterctl/client/config.go:256
sigs.k8s.io/cluster-api/cmd/clusterctl/cmd.runGenerateClusterTemplate
	sigs.k8s.io/cluster-api/cmd/clusterctl/cmd/generate_cluster.go:180
sigs.k8s.io/cluster-api/cmd/clusterctl/cmd.glob..func6
	sigs.k8s.io/cluster-api/cmd/clusterctl/cmd/generate_cluster.go:93
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/[email protected]/command.go:872
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/[email protected]/command.go:990
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/[email protected]/command.go:918
sigs.k8s.io/cluster-api/cmd/clusterctl/cmd.Execute
	sigs.k8s.io/cluster-api/cmd/clusterctl/cmd/root.go:99
main.main
	sigs.k8s.io/cluster-api/cmd/clusterctl/main.go:26
runtime.main
	runtime/proc.go:250
runtime.goexit
	runtime/asm_amd64.s:1571

What did you expect to happen:

Success in finding and generating the from a template

Anything else you would like to add:

I assume all we need to do is add the templates located here into the release done here, similar to the AWS provider

/kind bug

Tenant cluster loadbalancer service is hardcoded to ClusterIP type

With ClusterIP type a tenant cluster will only be accessible within the same cluster, and cannot be exposed to external workloads.

Example load balancer service:

kvcluster-lb      ClusterIP   10.108.169.113   <none>        6443/TCP    4m56s

Currently, there's no way to configure the loadbalancer service, as the spec is hard coded.

Publish arm64 images for the capk-manager

When downloading the current release and installing it (https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/releases/download/v0.1.0-rc.0/infrastructure-components.yaml), the manager container goes into CrashLoopBackOff due to exec /manager: exec format error on an ARM64 cluster. It seems that there is only the amd64 image available on Quay (quay.io/capk/capk-manager-amd64). It looks like there is already automation in place for this but I can't find the resulting artifacts and the manifest is set to pull the amd64 image. kubevirt/kubevirt#3558

cluster-api-provider-kubevirt/Makefile

Line 61 in 4c4743d

ALL_ARCH = amd64 arm arm64

cluster-api-provider-kubevirt/Makefile

Lines 201 to 203 in 4c4743d

    
           ## -------------------------------------- 
        
           ## Docker — All ARCH 
        
           ## --------------------------------------

  Normal   Scheduled    57s                default-scheduler  Successfully assigned capk-system/capk-controller-manager-8b94f79db-gf94l to majora
  Normal   Pulled       52s                kubelet            Successfully pulled image "quay.io/capk/capk-manager-amd64:v0.1.0-rc.0" in 3.585999603s
  Normal   Pulling      51s                kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
  Normal   Pulled       48s                kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 3.028312891s
  Normal   Created      48s                kubelet            Created container kube-rbac-proxy
  Normal   Started      48s                kubelet            Started container kube-rbac-proxy
  Normal   Pulled       47s                kubelet            Successfully pulled image "quay.io/capk/capk-manager-amd64:v0.1.0-rc.0" in 447.556578ms
  Normal   Pulling      26s (x3 over 55s)  kubelet            Pulling image "quay.io/capk/capk-manager-amd64:v0.1.0-rc.0"
  Normal   Created      26s (x3 over 51s)  kubelet            Created container manager
  Normal   Pulled       26s                kubelet            Successfully pulled image "quay.io/capk/capk-manager-amd64:v0.1.0-rc.0" in 448.153154ms
  Normal   Started      25s (x3 over 51s)  kubelet            Started container manager
  Warning  BackOff      17s (x7 over 46s)  kubelet            Back-off restarting failed container

Honor namespace for Kubevirt VMs based on VM spec

Right now, the namespace for the external KubeVirt VMs is inferred by introspecting the kubeconfig data for the external config.

infracluster.go

	namespace := "default"
 	namespaceBytes, ok := infraKubeconfigSecret.Data["namespace"]
 	if ok {
 		namespace = string(namespaceBytes)
 		namespace = strings.TrimSpace(namespace)
 	}

The issue we're trying to solve here is how to express the location of the underlying KubeVirt VM from the KubeVirtMachine's namespace.

One way to do this is via #11, namely by using ObjectMetadata of VMITemplate.

We will need to rework the logic of determining the Kubevirt VM namespace based on VMI's ObjectMetadata once #11 has landed.

Typo in README.md

The README.md file has a simple typo at line 48. A go-through of the whole .md file is advised in order to correct any other typos.

Base images for cluster nodes

From CAPK community meeting discussion on Feb 1 (https://docs.google.com/document/d/1ZAnRLCKOVbDqrsrYis2OR0aZIAlqp576gCJVCkMNiHM), we need to find (or build and publish) a cluster node image, in order to truly unlock CAPK for developers. Also, it is a blocker for CAPK quick start guide.

Discussion points:

currently, we use kubevirtci fedora image for testing: https://github.com/kubevirt/kubevirtci/blob/main/cluster-provision/images/vm-image-builder/fedora-kubeadm/cloud-config
do we need to support multiple Linux versions? (so far, we have agreed that one is sufficient, e.g. ubuntu)
image needs to have kubeadm installed
a good example image should really be provided and possibly shipped with the release
hopefully, kubernetes-sigs/image-builder project can save us some effort
ideally, we can just use the openstack image of image-builder project: https://image-builder.sigs.k8s.io/capi/providers/openstack.html

Add KubevirtMachine and VirtualMachine metadata to KubevirtMachineTemplate spec

Hi team,

Thank you for all the work on this new provider. This is great to see it land.

Following on #11, the spec of the KubevirtMachineTemplate to specify KubeVirt resources seems quite deep and, with the understand that this would be a backward incompatible changes, I would like to propose a flatter structure.

Currently, the structure is as follow:

KubevirtMachineTemplate
├── ...
└── spec
    └── template
        └── spec
            ├── bootstrapped
            ├── providerID
            └── vmSpec  (= VirtualMachineInstance.spec)
                ├── accessCredentials
                ├── affinity
                └── ...

Assuming #11 is implemented and the VirtualMachine.spec is used instead of the VirtualMachineInstance.spec, can we update the structure to something like the following:

KubevirtMachineTemplate
├── ...
└── spec
    ├── bootstrapped
    ├── providerID
    └── template
        ├── metadata
        └── spec  (= VirtualMachine.spec)
            ├── dataVolumeTemplates
            ├── runStrategy
            ├── running
            └── template
                ├── metadata
                └── spec

Here are my thoughts on which I based this proposal:

Because the KubevirtMachineTemplate name already describes clearly that this is a template, the first spec.template path might be redundant, even more as there are no other fields specified at these levels.
bootstrapped and providerID are, as far as I understand, properties of the machine resource, so having them directly under the spec of the KubevirtMachineTemplate seems appropriate.
The machine is provided by a KubeVirt's VirtualMachine so we need its template, which feels appropriate as a spec.template field.
As one might want to specify labels and annotations for the VirtualMachine, spec.template should have a metadata field.
The virtualMachine spec can then be under spec.template.spec

Any thoughts, opinions or concerns on adapting this structure in such a way?

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

capk manager causes a segmentation fault when cluster is created with the kubevirtci tool

What steps did you take and what happened:
The cluster was created and the tests run using these steps:
https://docs.google.com/document/d/1kPLk1KfGvAX4vUV5DOpvekPGr4wQF6Iait4M-YBq1tM/edit

[A clear and concise description on how to REPRODUCE the bug.]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x142579e]

goroutine 471 [running]:
sigs.k8s.io/cluster-api-provider-kubevirt/controllers.(*KubevirtMachineReconciler).deleteKubevirtBootstrapSecret(0x19c2068, 0xc000359e50, {0x19d2b90, 0xc0000914a0}, {0xc00042ada0, 0x1e})
/workspace/controllers/kubevirtmachine_controller.go:579 +0x5e
sigs.k8s.io/cluster-api-provider-kubevirt/controllers.(*KubevirtMachineReconciler).reconcileDelete(0xc0006d6b80, 0xc000359e50)
/workspace/controllers/kubevirtmachine_controller.go:375 +0x185
sigs.k8s.io/cluster-api-provider-kubevirt/controllers.(*KubevirtMachineReconciler).Reconcile(0xc0006d6b80, {0x19a57f8, 0xc000a2c690}, {{{0xc000acb6c0, 0x16b4be0}, {0xc000047620, 0x30}}})
/workspace/controllers/kubevirtmachine_controller.go:161 +0xb1c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0000b38c0, {0x19a57f8, 0xc000a2c660}, {{{0xc000acb6c0, 0x16b4be0}, {0xc000047620, 0x413834}}})

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:
Local

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

Add quickstart example in README

I didn't see any example on how to use the kubevirt cluster api, can we have a step by step deployment example/guide, so people who new to kubevirt cluster api can get started from there? Thanks

VM deletion looks for resource in wrong namespace

What steps did you take and what happened:

When deleting a cluster that was spawned in an external infrastructure cluster, the pkg/kubevirt/machine.go Machine#Delete method looks for the Virtual Machine to delete in the correct cluster, but in the same namespace name than where the KubeVirtMachine resource is in the management cluster:

namespacedName := types.NamespacedName{Namespace: m.machineContext.KubevirtMachine.Namespace, Name: ...

Unless the same namespace name is used, then the VM failed to be found and the VM is not deleted as expected.

What did you expect to happen:

The logic should look in the correct namespace for the VM; which is already provided when creating the struct in the NewMachine function.

Anything else you would like to add:

The pkg/kubevirt/machine.go Machine#Delete also currently silences any other error that might prevent the VM to be found. In this case, the vm will most likely not be properly populated and the following call to delete it will fail, but with an unclear error. Instead, we should also directly fail when the Get of the VM fails with a non not-found error.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

Add capk image build process

Triggered by Github actions
Triggered on every PR merge (hash tag) or nightly based builds
Image should be pushed to https://quay.io/organization/capk

Completing this task would allow us to add Quick start for capk (#36 and kubernetes-sigs/cluster-api#5845)

CAPK depends on KubeVirt running on the management cluster

Today, CAPK provider will fail to initialize if the management cluster does not have KubeVirt CRDs deployed.

Here's an example (note CAPK provider is failing to initialize):

NAMESPACE                           NAME                                                             READY   STATUS             RESTARTS   AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-5c7c767585-mxfjv       1/1     Running            0          34m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-76c769d87c-kjbdp   1/1     Running            0          34m
capi-system                         capi-controller-manager-66fc7b7785-lznb9                         1/1     Running            0          34m
capk-system                         capk-controller-manager-59d88f64fb-fk46z                         1/2     CrashLoopBackOff   7          34m

CAPK manager log shows the following error:

E0222 22:34:33.654495       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}

Full log from CAPK manager:

I0222 22:34:10.170500       1 request.go:665] Waited for 1.041200068s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/coordination.k8s.io/v1?timeout=32s
I0222 22:34:10.673334       1 logr.go:249] controller-runtime/metrics "msg"="Metrics server is starting to listen"  "addr"="127.0.0.1:8080"
I0222 22:34:10.674078       1 logr.go:249] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1alpha1","Kind":"KubevirtMachineTemplate"}
I0222 22:34:10.674140       1 logr.go:249] controller-runtime/builder "msg"="Registering a validating webhook"  "GVK"={"Group":"infrastructure.cluster.x-k8s.io","Version":"v1alpha1","Kind":"KubevirtMachineTemplate"} "path"="/validate-infrastructure-cluster-x-k8s-io-v1alpha1-kubevirtmachinetemplate"
I0222 22:34:10.674259       1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-infrastructure-cluster-x-k8s-io-v1alpha1-kubevirtmachinetemplate" 
I0222 22:34:10.674373       1 logr.go:249] setup "msg"="starting manager"  
I0222 22:34:10.674461       1 server.go:214] controller-runtime/webhook/webhooks "msg"="Starting webhook server"  
I0222 22:34:10.674583       1 internal.go:362]  "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics" 
I0222 22:34:10.674611       1 leaderelection.go:248] attempting to acquire leader lease capk-system/controller-leader-election-capk...
I0222 22:34:10.674700       1 internal.go:362]  "msg"="Starting server" "addr"={"IP":"::","Port":9440,"Zone":""} "kind"="health probe" 
I0222 22:34:10.674734       1 logr.go:249] controller-runtime/certwatcher "msg"="Updated current TLS certificate"  
I0222 22:34:10.674863       1 logr.go:249] controller-runtime/webhook "msg"="Serving webhook server"  "host"="" "port"=9443
I0222 22:34:10.674929       1 logr.go:249] controller-runtime/certwatcher "msg"="Starting certificate watcher"  
I0222 22:34:29.050812       1 leaderelection.go:258] successfully acquired lease capk-system/controller-leader-election-capk
I0222 22:34:29.051055       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1alpha1.KubevirtMachine"
I0222 22:34:29.051056       1 controller.go:178] controller/kubevirtcluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" "source"="kind source: *v1alpha1.KubevirtCluster"
I0222 22:34:29.051080       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1beta1.Machine"
I0222 22:34:29.051108       1 controller.go:178] controller/kubevirtcluster "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" "source"="kind source: *v1beta1.Cluster"
I0222 22:34:29.051137       1 controller.go:186] controller/kubevirtcluster "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
I0222 22:34:29.051117       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1alpha1.KubevirtCluster"
I0222 22:34:29.051195       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1.VirtualMachineInstance"
I0222 22:34:29.051222       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1.VirtualMachine"
I0222 22:34:29.051245       1 controller.go:178] controller/kubevirtmachine "msg"="Starting EventSource" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" "source"="kind source: *v1beta1.Cluster"
I0222 22:34:29.051279       1 controller.go:186] controller/kubevirtmachine "msg"="Starting Controller" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
I0222 22:34:30.102053       1 request.go:665] Waited for 1.04679704s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/cluster.x-k8s.io/v1alpha4?timeout=32s
E0222 22:34:30.605112       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
I0222 22:34:30.605140       1 controller.go:220] controller/kubevirtcluster "msg"="Starting workers" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" "worker count"=1
E0222 22:34:33.654495       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:34:41.656424       1 request.go:665] Waited for 1.046721061s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1?timeout=32s
E0222 22:34:42.158825       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:34:45.208681       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:34:51.706753       1 request.go:665] Waited for 1.09658386s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/clusterctl.cluster.x-k8s.io/v1alpha3?timeout=32s
E0222 22:34:52.158035       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:34:55.208798       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:01.756577       1 request.go:665] Waited for 1.147322771s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/certificates.k8s.io/v1?timeout=32s
E0222 22:35:02.158878       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:05.208588       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:11.757088       1 request.go:665] Waited for 1.1477616s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/networking.k8s.io/v1beta1?timeout=32s
E0222 22:35:12.158267       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:15.208665       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:21.806808       1 request.go:665] Waited for 1.197688109s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/storage.k8s.io/v1beta1?timeout=32s
E0222 22:35:22.159396       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:25.209345       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:31.856282       1 request.go:665] Waited for 1.247766642s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/cluster.x-k8s.io/v1alpha4?timeout=32s
E0222 22:35:32.158802       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:35.208210       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:41.906657       1 request.go:665] Waited for 1.297384124s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/bootstrap.cluster.x-k8s.io/v1beta1?timeout=32s
E0222 22:35:42.158276       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:45.208500       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:35:51.907172       1 request.go:665] Waited for 1.297277575s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/authorization.k8s.io/v1?timeout=32s
E0222 22:35:52.159635       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:35:55.209096       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:36:01.956369       1 request.go:665] Waited for 1.347632595s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/scheduling.k8s.io/v1beta1?timeout=32s
E0222 22:36:02.158553       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:36:05.208705       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:36:12.006399       1 request.go:665] Waited for 1.397125003s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/coordination.k8s.io/v1?timeout=32s
E0222 22:36:12.158084       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:36:15.208906       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:36:22.056959       1 request.go:665] Waited for 1.447452524s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/policy/v1beta1?timeout=32s
E0222 22:36:22.159369       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:36:25.208501       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachine\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachine"}
I0222 22:36:32.105701       1 request.go:665] Waited for 1.49679813s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/authentication.k8s.io/v1beta1?timeout=32s
E0222 22:36:32.158523       1 logr.go:265] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"VirtualMachineInstance\" in version \"kubevirt.io/v1\""  "kind"={"Group":"kubevirt.io","Kind":"VirtualMachineInstance"}
E0222 22:36:33.756698       1 controller.go:203] controller/kubevirtmachine "msg"="Could not wait for Cache to sync" "error"="failed to wait for kubevirtmachine caches to sync: timed out waiting for cache to be synced" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
I0222 22:36:33.756772       1 logr.go:249]  "msg"="Stopping and waiting for non leader election runnables"  
I0222 22:36:33.756789       1 logr.go:249]  "msg"="Stopping and waiting for leader election runnables"  
I0222 22:36:33.756815       1 controller.go:240] controller/kubevirtcluster "msg"="Shutdown signal received, waiting for all workers to finish" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
I0222 22:36:33.756849       1 controller.go:242] controller/kubevirtcluster "msg"="All workers finished" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
I0222 22:36:33.756862       1 logr.go:249]  "msg"="Stopping and waiting for caches"  
I0222 22:36:33.756940       1 logr.go:249]  "msg"="Stopping and waiting for webhooks"  
I0222 22:36:33.756982       1 logr.go:249] controller-runtime/webhook "msg"="shutting down webhook server"  
I0222 22:36:33.758960       1 logr.go:249]  "msg"="Wait completed, proceeding to shutdown the manager"  
E0222 22:36:33.759092       1 logr.go:265] setup "msg"="problem running manager" "error"="failed to wait for kubevirtmachine caches to sync: timed out waiting for cache to be synced"

Investigate a possibility of eliminating this dependency on KubeVirt in the management cluster. This is relevant for the case when management and infra clusters are decoupled. Then, arguably we should impose KubeVirt deployment requirement only on infra cluster.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

tenant cluster nodes never recover after KubeVirt VM restarts

/kind bug

If a KubeVirt VM restarts which is hosting a tenant cluster node, that node never comes back online within the tenant cluster.

Steps to reproduce.

Create a tenant cluster with capk
restart a tenant cluster VM virtctl restart <vm>
That node never rejoins the tenant cluster despite the VM recovering

At step 3, KubeVirt will bring the VM back online, but internally that VM will never rejoin the tenant cluster for a few reasons.

The VM in our examples uses the ephemeral containerDisk volume type, which is not persistent across restarts. This means that the node would have to re-join the kubeadm cluster, but it can't because the token from the initial bootstrap expires.
The VM's IP address changes after restart, but the hostname renames the same. It's unclear if this inconsistency is tolerated when re-joining the cluster after a restart.

Here's the error that occurs on VM boot after restart with the kubeadm bootstrapper.
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "uavkdm"

The tenant cluster says that the node is NotReady

# kubectl get nodes
NAME                            STATUS     ROLES                  AGE   VERSION
kvcluster-control-plane-gsxzx   Ready      control-plane,master   41m   v1.21.0
kvcluster-md-0-k5srs            NotReady   <none>                 39m   v1.21.0

However, both the Machine and KubeVirtMachine object indicate that the Node is still available, which seems wrong.

$ ./kubevirtci kubectl get machine
selecting docker as container runtime
NAME                              CLUSTER     NODENAME                        PROVIDERID                                 PHASE     AGE   VERSION
kvcluster-control-plane-9hlg7     kvcluster   kvcluster-control-plane-gsxzx   kubevirt://kvcluster-control-plane-gsxzx   Running   70m   v1.21.0
kvcluster-md-0-59885c486b-w5wfx   kvcluster   kvcluster-md-0-k5srs            kubevirt://kvcluster-md-0-k5srs            Running   70m   v1.21.0

machine status

status:
  addresses:
  - address: kvcluster-md-0-k5srs
    type: Hostname
  - address: 10.244.140.77
    type: InternalIP
  - address: 10.244.140.77
    type: ExternalIP
  - address: kvcluster-md-0-k5srs
    type: InternalDNS
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2022-01-14T19:23:33Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-01-14T19:21:18Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2022-01-14T19:23:33Z"
    status: "True"
    type: InfrastructureReady
  - lastTransitionTime: "2022-01-14T19:57:36Z"
    message: 'Node condition MemoryPressure is Unknown. Node condition DiskPressure
      is Unknown. Node condition PIDPressure is Unknown. Node condition Ready is Unknown. '
    reason: NodeConditionsFailed
    status: Unknown
    type: NodeHealthy
  infrastructureReady: true
  lastUpdated: "2022-01-14T19:23:33Z"
  nodeInfo:
    architecture: amd64
    bootID: 697101ac-583b-44e1-8bf4-3535f70bbefa
    containerRuntimeVersion: cri-o://1.21.3
    kernelVersion: 5.14.10-300.fc35.x86_64
    kubeProxyVersion: v1.21.0
    kubeletVersion: v1.21.0
    machineID: 3f37466c56de5676a274894afc91998f
    operatingSystem: linux
    osImage: Fedora Linux 35 (Cloud Edition)
    systemUUID: 3f37466c-56de-5676-a274-894afc91998f
  nodeRef:
    apiVersion: v1
    kind: Node
    name: kvcluster-md-0-k5srs
    uid: 9f70c294-e77e-4539-aa86-e8d6fbd92279
  observedGeneration: 3
  phase: Running

and kubevirt machine status

status:
  addresses:
  - address: kvcluster-md-0-k5srs
    type: Hostname
  - address: 10.244.140.77
    type: InternalIP
  - address: 10.244.140.77
    type: ExternalIP
  - address: kvcluster-md-0-k5srs
    type: InternalDNS
  conditions:
  - lastTransitionTime: "2022-01-14T19:23:33Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-01-14T19:23:33Z"
    status: "True"
    type: BootstrapExecSucceeded
  - lastTransitionTime: "2022-01-14T19:23:02Z"
    status: "True"
    type: VMProvisioned
  nodeupdated: true
  ready: true

Add scripts to turn on presubmit checks

What steps did you take and what happened:
Currently CAPK misses presubmit checks, this will be a great addition to the project. Adding a few scripts for that purpose.

Anything else you would like to add:
Before proceeding, we need to make sure we have the required support in the test-infra repo.

Environment:

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

/kind feature
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

Consider switching to kubevirt.io/api from kubevirt.io/client-go

/kind enhancement

What happened:

kubevirt.io/client-go pulls in tons of dependencies and makes it hard to integrate with other kubernetes related golang packages.

What you expected to happen:

Starting with 0.48 we will have tagged releases of just our APIs at https://github.com/kubevirt/api and https://github.com/kubevirt/containerized-data-importer-api which can be used like described at https://github.com/kubevirt/kubevirt/blob/f4ff1cabebfd98f1f7c72bc1b1c5ab15fd6b3457/staging/src/kubevirt.io/api/README.md.

This pulls in drastically less dependencies and will make dependency resolution in the future much simpler for you.

KubeVirtMachine uses VMI instead of VM

Hey,

First off, It's really great to see this provider land! Excellent work!

One thing stuck out to me while I was looking through this that would be good to talk though. Right now the KubeVirtMachine api uses a VMI spec rather than a VM spec and that VMI spec is later on used to create a persistent VM. By starting with a VMI spec rather than a VM spec, we lose the ability to exercise some the persistent storage workflows provided by the VM spec's DataVolumeTemplates.

How would you all feel about transitioning the KubeVirtMachine api to reference a VM spec instead of a VMI spec? That would allow us to describe both the VMI and how the storage is provided for the VMI using DataVolumeTemplates.

To leverage the VM spec, there's a trick we'd need to implement to make this work correctly. The names of the DataVolumes referenced in the VM spec's DataVolumeTemplate section would need to have something unique appended to them for each unique VM created from the KubeVirtMachine.

sound possible?

Machine::IsBootstrapped will wait indefinitely if "kubeadm init" fails

The command to run in cloud-init is "kubeadm init --config /run/kubeadm/kubeadm.yaml && echo success > /run/cluster-api/bootstrap-success.complete'" and Machine::IsBootstrapped waits for the existence of /run/cluster-api/bootstrap-success.complete.
If kubeadm init exits with error the cluster creation will be stuck.

Update the minimal Dockerfile with a more matured one

When initially lifting the repo for upstream, we forgot to carry the Dockerfile. With the help of @rmohr and @davidvossel, a minimal Dockerfile was added in #24.

We would like the add the Dockerfile to build the manager binary within the context and expose the production image.

kubevirt provider initialization fails due to versioning issue in released metadata.yaml

What steps did you take and what happened:
Tried to initialize kubevirt provider.

cat ~/.cluster-api/clusterctl.yaml
providers:
  - name: "kubevirt"
    url: "https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/releases/v0.1.0/infrastructure-components.yaml"
    type: "InfrastructureProvider"
clusterctl init --infrastructure kubevirt
Fetching providers
Error: invalid provider metadata: version v0.1.0 for the provider capk-system/infrastructure-kubevirt does not match any release series

What did you expect to happen:
kubevirt provider should have initialized successfully

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Updating the major version from 1 to 0 in the released metadata.yaml file and running the initialization with local files fixes the problem.

 wget -q https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/releases/download/v0.1.0/metadata.yaml;cat metadata.yaml
# maps release series of major.minor to cluster-api contract version
# the contract version may change between minor or major versions, but *not*
# between patch versions.
#
# update this file only when a new major or minor version is released
apiVersion: clusterctl.cluster.x-k8s.io/v1alpha3
kind: Metadata
releaseSeries:
  - major: 1
    minor: 1
    contract: v1beta1

Environment:

Cluster-api version: clusterctl version
clusterctl version: &version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.4", GitCommit:"1c3a1526f101d4b07d2eec757fe75e8701cf6212", GitTreeState:"clean", BuildDate:"2022-06-03T17:11:09Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
Cluster-api-provider-kubevirt version: v0.1.0
Kubernetes version: (use kubectl version): v1.24.3(Management cluster)
KubeVirt version: v0.54.0
OS (e.g. from /etc/os-release): Ubuntu 20.04

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

Authorization token for ignition server does not refresh

When a new VM is provisioned, it pulls its authorization token from the user-data-{NAMESPACE}-{HASH}-userdata
secret. The creation of this secret is happening in the kubevirtmachine controller , the source of the data contained in the secret comes from the user-data-{NAMESPACE}-{HASH} secret, which is created by the nodepool controller in HyperShift, the secret is referenced in the Machine object under the field spec.bootstrap.dataSecretName.
The nodepool controller rotates the authorization token every 24H (inside the user-data-{NAMESPACE}-{HASH} secret), but there is no mechanized for rotating the user-data-{NAMESPACE}-{HASH}-userdata which means new VMs can not pull their ignition file. The only way to workaround it is to delete the user-data-{NAMESPACE}-{HASH}-userdatasecret.

/kind bug

Add clusterctl init intergration

In order to be able to install capk on a management cluster with "clusterctl init --infrastructure kubevirt" we need to add a ci workflow that creates a release with 2 files, metadata.yaml and infrastructure-components.yaml which contains the yamls that need to be deployed into the cluster.

`clusterctl` describe shows stale information

What steps did you take and what happened:

While testing #54, I noticed the status posted by clusterctl describe has stated information even though the tenant cluster was successfully formed:

$ cluster-api/bin/clusterctl --kubeconfig kubeconfig.yaml describe cluster workload-cluster -n test-dda2dda

NAME                                                                 READY  SEVERITY  REASON                       SINCE  MESSAGE
/workload-cluster                                                    True                                          57m
├─ClusterInfrastructure - KubevirtCluster/workload-cluster           True                                          60m
├─ControlPlane - KubeadmControlPlane/workload-cluster-control-plane  True                                          57m
│ └─Machine/workload-cluster-control-plane-9wn55                     True                                          57m
└─Workers
  └─MachineDeployment/workload-cluster-md-0                          False  Warning   WaitingForAvailableMachines  60m    Minimum availability requires 1 replicas, current 0 available
    └─Machine/workload-cluster-md-0-776c5bf974-x2c62                 True                                          56m

What did you expect to happen:
There should be no warning.

Anything else you would like to add:
N/A

Environment:

Cluster-api version: v0.4.5 == 65f95147d52abface1778dd426ad948999cbac7b
Cluster-api-provider-kubevirt version: 353694f61daecb3e4c4ea78d13477a2fc209a4a5
Kubernetes version: (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
KubeVirt version: v0.38.0
OS (e.g. from /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

/kind bug

Service created for kubernetes api access may add other cluster as endpoint

/kind bug

When multiple workload cluster created on the same management cluster, the service created for workload cluster api access may add other cluster as endpoint, because of the in-proper selector is using.

For example, I have two cluster been created and kvcluster01 and kvcluster02 on the same management cluster

root@ny5-infra04:~# kubectl get svc
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)          AGE
kvcluster01-lb                       LoadBalancer   10.109.112.46    10.223.31.231   6443:30255/TCP   77m
kvcluster02-lb                       LoadBalancer   10.109.43.60     10.223.31.235   6443:31612/TCP   66m

Name:                     kvcluster01-lb
Namespace:                default
Labels:                   cluster.x-k8s.io/cluster-name=kvcluster01
Annotations:              <none>
Selector:                 cluster.x-k8s.io/role=control-plane.  <<<
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.109.112.46
IPs:                      10.109.112.46
LoadBalancer Ingress:     10.223.31.231
Port:                     <unset>  6443/TCP
TargetPort:               6443/TCP
NodePort:                 <unset>  30255/TCP
Endpoints:                10.244.1.135:6443,10.244.2.13:6443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Name:                     kvcluster02-lb
Namespace:                default
Labels:                   cluster.x-k8s.io/cluster-name=kvcluster02
Annotations:              <none>
Selector:                 cluster.x-k8s.io/role=control-plane.  <<<<
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.109.43.60
IPs:                      10.109.43.60
LoadBalancer Ingress:     10.223.31.235
Port:                     <unset>  6443/TCP
TargetPort:               6443/TCP
NodePort:                 <unset>  31612/TCP
Endpoints:                10.244.1.135:6443,10.244.2.13:6443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Both cluster control node has the same label cluster.x-k8s.io/role=control-plane, so this is not enough to identify each cluster.

root@ny5-infra04:~# kubectl get vmi  --show-labels
NAME                              AGE    PHASE     IP             NODENAME      READY   LABELS
kvcluster01-control-plane-c4tjz   81m    Running   10.244.2.13    ny5-infra06   True    cluster.x-k8s.io/role=control-plane,kubevirt.io/nodeName=ny5-infra06,kubevirt.io/vm=kvcluster01-control-plane-c4tjz,name=kvcluster01-control-plane-c4tjz
kvcluster02-control-plane-7zzsc   69m    Running   10.244.1.135   ny5-infra05   True    cluster.x-k8s.io/role=control-plane,kubevirt.io/nodeName=ny5-infra05,kubevirt.io/vm=kvcluster02-control-plane-7zzsc,name=kvcluster02-control-plane-7zzsc

From endpoint list you can see each service add both cluster as endpoint because of selector.

root@ny5-infra04:~# kubectl get ep
NAME                                 ENDPOINTS                            AGE
kvcluster01-lb                       10.244.1.135:6443,10.244.2.13:6443   78m
kvcluster02-lb                       10.244.1.135:6443,10.244.2.13:6443   66m

Kubectl command may fail because the request may loadbalancing to differnet cluster.

root@ny5-infra04:~# KUBECONFIG=kubeconfig.yaml kubectl get node
NAME                              STATUS   ROLES                  AGE   VERSION
kvcluster01-control-plane-c4tjz   Ready    control-plane,master   76m   v1.21.0
kvcluster01-md-0-zfsvc            Ready    <none>                 75m   v1.21.0
root@ny5-infra04:~# KUBECONFIG=kubeconfig.yaml kubectl get node
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.244.1.135, 10.109.43.60, not 10.223.31.231

Use cloud-provider-kubevirt in the capk tenant cluster

Cloud-provider was already implemented for kubevirt platform
https://github.com/kubevirt/cloud-provider-kubevirt

Need research:

How to include this controller in the capk installation?
Are there any gaps in this controller need to be added?

CAPK should look for `infraClusterSecret` in the `KubevirtCluster` resource namespace by default

What steps did you take and what happened:

Request the creation of a cluster in an external infrastructure providing the kubeconfig in a secret in the same namespace where the cluster spec resources are created and omitting the Namespace field in the KubevirtCluster.spec.infraClusterSecretRef.

E.g. with a ``KubevirtCluster` like the following one

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: KubevirtCluster
metadata:
    name: pja-multi
spec:
    infraClusterSecretRef:
        apiVersion: v1
        kind: Secret
        name: external-infra-kubeconfig

CAPK controller-manager will fail to reconcile the creation of the cluster and report in the log that it cannot find the external-infra-kubeconfig.

What did you expect to happen:

The Namespace field is optional in all the resource references used by CAPI. If the namespace is not specified, then the default behavior is to look for the referenced resource in the same namespace and the referring one. We should have the same logic here.

Anything else you would like to add:

This is due to the logic in /pkg/infracluster/infracluster.go #GenerateInfraClusterClient that does not check whether the Namespace field is set or not. It the latter case, it should fallback and looked in the provided ownerNamespace.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

Kubevirt cluster and machine controllers shouldn't report patch errors when resources already deleted

What steps did you take and what happened:

create a cluster and wait for it to be created
delete the cluster

Looking at the logs of the capk-controller-manager-<hash>-<hash> manager container, the following errors might be reported (unless the deletion took longer to delete the resource, but this showing fairly consistently for me):

(...)
E0720 21:36:17.931136       1 controller.go:317] controller/kubevirtmachine "msg"="Reconciler error" "error"="failed to patch KubevirtMachine: kubevirtmachines.infrastructure.cluster.x-k8s.io \"pja-multi-control-plane-qnrhm\" not found" "name"="pja-multi-control-plane-qnrhm" "namespace"="capk" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine"
(...)
E0720 21:36:23.236566       1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="kubevirtclusters.infrastructure.cluster.x-k8s.io \"pja-multi\" not found" "name"="pja-multi" "namespace"="capk" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"

Looking at the code, this is due to calls to the patch the resources that might occur after the resources have been deleted and the error handling doesn't differentiate between NotFound and other errors.

For KubevirtCluster, this is due to the deferred call to the patch function that is called even when the reconciliation result is the deletion of the resource.
For KubevirtMachine, this is due to the call to the patch function at the end of the reconcileDelete.

What did you expect to happen:

Calling the patches are still valid as the deletion might actually take longer to be done by K8s, but, in the case where the latter was quick and the resources are not longer found, then either the not-found errors should be silenced, or they should be reported at a lower level (info) with the indication that this is an expected case.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

KubevirtCluster not "ready" and cluster stuck in provisioning for a long time

What steps did you take and what happened:
I have a 3 server management cluster (with Ubuntu 20.04 and microk8s 1.21/stable)
Have CAPI 1.2.0, Kubevirt 0.5.5 and latest release version of CAPK i.e. v0.1.0
then I create a new cluster with the following arguments to clusterctl:

export IMAGE_REPO=k8s.gcr.io
export NODE_VM_IMAGE_TEMPLATE=quay.io/capk/ubuntu-2004-container-disk:v1.22.0
export CRI_PATH=/var/run/containerd/containerd.sock
clusterctl generate cluster ccstest01 --kubernetes-version v1.22.0 --control-plane-machine-count=5 --worker-machine-count=5 --target-namespace ccs-test --from templates/cluster-template.yaml

After deploying the cluster stays in provisioning state :
$ kubectl get clusters -n ccs-test
NAME PHASE AGE VERSION
ccstest01 Provisioning 5m8s

and Kubevirtcluster has no Ready status:

$ kubectl get -n ccs-test kubevirtcluster -o yaml
apiVersion: v1
items:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: KubevirtCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha1","kind":"KubevirtCluster","metadata":{"annotations":{},"name":"ccstest01","namespace":"ccs-test"},"spec":{"controlPlaneServiceTemplate":{"spec":{"type":"ClusterIP"}}}}
creationTimestamp: "2022-08-10T00:39:05Z"
finalizers:
- kubevirtcluster.infrastructure.cluster.x-k8s.io
  generation: 2
  labels:
  cluster.x-k8s.io/cluster-name: ccstest01
  name: ccstest01
  namespace: ccs-test
  ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
  blockOwnerDeletion: true
  controller: true
  kind: Cluster
  name: ccstest01
  uid: ea09b2b9-1787-4408-97c6-193c5c96afa3
  resourceVersion: "3751834"
  selfLink: /apis/infrastructure.cluster.x-k8s.io/v1alpha1/namespaces/ccs-test/kubevirtclusters/ccstest01
  uid: 362816a7-03e3-4735-bf33-730213f35dec
  spec:
  controlPlaneEndpoint:
  host: 10.152.183.226
  port: 6443
  controlPlaneServiceTemplate:
  spec:
  type: ClusterIP
  kind: List
  metadata:
  resourceVersion: ""
  selfLink: ""

In the capk-controller logs I see the following lines repeated:

E0810 00:39:26.463301 1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="failed to persist ssh keys to secret: failed to set owner reference for secret: Operation cannot be fulfilled on secrets "ccstest01-ssh-keys": the object has been modified; please apply your changes to the latest version and try again" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:39:46.964172 1 kubevirtcluster_controller.go:118] controller/kubevirtcluster/ccs-test/ccstest01 "msg"="failed to patch KubevirtCluster" "error"="KubevirtCluster.infrastructure.cluster.x-k8s.io "ccstest01" is invalid: status.ready: Required value" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:39:46.964218 1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="failed to persist ssh keys to secret: failed to set owner reference for secret: Operation cannot be fulfilled on secrets "ccstest01-ssh-keys": the object has been modified; please apply your changes to the latest version and try again" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:40:27.950044 1 kubevirtcluster_controller.go:118] controller/kubevirtcluster/ccs-test/ccstest01 "msg"="failed to patch KubevirtCluster" "error"="KubevirtCluster.infrastructure.cluster.x-k8s.io "ccstest01" is invalid: status.ready: Required value" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:40:27.950130 1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="failed to persist ssh keys to secret: failed to set owner reference for secret: Operation cannot be fulfilled on secrets "ccstest01-ssh-keys": the object has been modified; please apply your changes to the latest version and try again" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:41:49.904838 1 kubevirtcluster_controller.go:118] controller/kubevirtcluster/ccs-test/ccstest01 "msg"="failed to patch KubevirtCluster" "error"="KubevirtCluster.infrastructure.cluster.x-k8s.io "ccstest01" is invalid: status.ready: Required value" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:41:49.904890 1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="failed to persist ssh keys to secret: failed to set owner reference for secret: Operation cannot be fulfilled on secrets "ccstest01-ssh-keys": the object has been modified; please apply your changes to the latest version and try again" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:44:33.769811 1 kubevirtcluster_controller.go:118] controller/kubevirtcluster/ccs-test/ccstest01 "msg"="failed to patch KubevirtCluster" "error"="KubevirtCluster.infrastructure.cluster.x-k8s.io "ccstest01" is invalid: status.ready: Required value" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"
E0810 00:44:33.769852 1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="failed to persist ssh keys to secret: failed to set owner reference for secret: Operation cannot be fulfilled on secrets "ccstest01-ssh-keys": the object has been modified; please apply your changes to the latest version and try again" "name"="ccstest01" "namespace"="ccs-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster"

I noticed that the cluster remains in this state for extended period (in multiple runs this wait ranged from 45mins to 3+ hours ) and then all of a sudden the cluster state changes and the subsequent boot strapping operations start.

What did you expect to happen:
I expect the cluster to be in ready soon as it should take at most couple of mins to setup the infra.

Anything else you would like to add:
Since the state was getting resolved randomly, i.e. suddenly setting up the ownership on the ssh keys secret used to work and the cluster was all good. I suspected it to be some kind of race condition, and sure enough with the following changes my clusters were provisioned in seconds:

diff --git a/pkg/ssh/cluster_node_ssh_keys.go b/pkg/ssh/cluster_node_ssh_keys.go
index d3b6d33..80d5892 100644
--- a/pkg/ssh/cluster_node_ssh_keys.go
+++ b/pkg/ssh/cluster_node_ssh_keys.go
@@ -17,6 +17,8 @@ limitations under the License.
package ssh

import (

```
  "time"
```

  "github.com/pkg/errors"
  corev1 "k8s.io/api/core/v1"
  metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

@@ -85,6 +87,9 @@ func (c *ClusterNodeSshKeys) PersistKeysToSecret() (*corev1.Secret, error) {
return nil, errors.Wrapf(err, "failed to create ssh keys secret for cluster")
}

```
  // test if sleeping helps
```
```
  time.Sleep( 10 *  time.Millisecond)
```

  // set owner reference for secret
  mutateFn := func() (err error) {
          newSecret.SetOwnerReferences(clusterutil.EnsureOwnerRef(

$ kubectl get cluster -A
NAMESPACE NAME PHASE AGE VERSION
ccs-test ccstest01 Provisioned 8s

Environment:

Cluster-api version: 1.2.0
Cluster-api-provider-kubevirt version: 0.1.0
Kubernetes version: (use kubectl version): v1.21.14
KubeVirt version: 0.5.5
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

Failing to clusterctl init with kubevirt provider

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]
I'm getting this error from clusterctl when trying to run clusterctl init with Kubevirt as a provider
clusterctl init --infrastructure kubevirt Fetching providers Error: failed to get provider components for the "kubevirt" provider: failed to get repository client for the InfrastructureProvider with name kubevirt: error creating the GitHub repository client: failed to get GitHub latest version: failed to find releases tagged with a valid semantic version number

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version: 1.1.15
Cluster-api-provider-kubevirt version: latest
Kubernetes version: (use kubectl version): 1.22.10
KubeVirt version: 0.54.0
OS (e.g. from /etc/os-release):centos

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

cluster teardown hangs

/kind bug

A kubevirt provider cluster should be able to be torn down by deleting the top level Cluster object, however that doesn't work today.

The issue is that the KubeVirtMachine objects will never have their finalizer removed during the cascading deletion if the top level Cluster object get's deleted before the KubeVirtMachines can be processed.

The result is that the KubeVirtMachine objects block indefinitely trying to get the non-existent Cluster object as a precondition to processing the deletion and removing the finalizer. This blocks the cluster from being torn down entirely.

Tenant namespace should fallback to current context namespace if external kubeconfig is specified.

What steps did you take and what happened:

Discussed with @agradouski:

The namespace where to spawn a tenant cluster should be resolved as follow:

the namespace specified by the enduser in a node's VirtualMachine template; i.e. as KubevirtMachineTemplate.spec.template.spec.virtualMachineTemplate.metadata.namespace
If this is not set and an infraClusterSecretRef is set for this cluster:
1. If set, use the namespace directly set as a value in the secret.
2. Otherwise, fallback to the namespace of the context currently selected in the kubeconfig set in the secret.
3. If the latter is not set, then fallback to the default namespace
Otherwise, use the "current" namespace where the cluster specification has been created.

The current logic skips 2.ii; i.e. it ignores the namespace, if any, set in the kubeconfig file for the current context and directly fallback to default.

What did you expect to happen:

If the end-user did not specified a namespace in the virtual-machine template, and they reference a secret with a kubeconfig to connect to the infrastructure, but the secret doesn't specified directly a namespace, then the namespace of the context currently selected in the kubeconfig should be used, if it is set.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

Tenant cluster VMs running on external infra are not tracked after creation

This feature allows running tenant cluster VMs on external infrastructure (external infra meaning any cluster that is not the CAPI/CAPK management cluster itself): #32

One outstanding issue with tenant clusters deployed on external infra is that CAPI/CAPK management plane does not track underlying VMs for tenant clusters after creation. Meaning, if an underlying VM of a tenant cluster failed completely, or restarted with a different IP, the management plane would not be able to recover the node.

This problem is resolved for tenant clusters running on the management cluster with this PR (issue #70): #73

But we currently lack the resolution for tenant clusters running on external infra.

Please investigate.

CAPK node's userdata secrets are missing label

What steps did you take and what happened:

When creating a new cluster, the userdata secrets created by CAPK to extend Kubeadm secrets with the capk user, have no label.

What did you expect to happen:

The secret should inherit the labels from the secrets generated by Kubeadm.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

Support for external infrastructure clusters

Today kubevirt provider can only create tenant cluster VMs on the same cluster where CAPI/CAPK management plane is running, in other words, infrastructure cluster is the same as the management cluster.

As and admin, I want to be able to use one Kubernetes cluster to run the CAPI/CAPK management plane and another cluster (or multiple clusters) to run tenant clusters, meaning that in the long run we will want to split up the management cluster from infrastructure cluster.

Expectations:

Admins can add a secret to the management cluster, containing kubeconfig for an external cluster.
Template will contain a reference to the secret.
Underlying infrastructure (VMs) for tenant clusters will be created in that external cluster.

Issues connecting with SSH to some newer OS releases (for instance fedora 33+)

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

@davidvossel discovered that on recent fedora the golang ssh client can't connect anymore by default. A fix for that is merged in go sine a few days:

A rebuild with a new enough golang version should solve this at some point.

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version:
Cluster-api-provider-kubevirt version:
Kubernetes version: (use kubectl version):
KubeVirt version:
OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/labels?q=area for the list of labels]

Latest version of clusterctl is not compatible with v1alpha1 version of kubevirt provider

With PR #54 kubevirt provider API versions were set to v1alpha1.

With that, the latest version of clusterctl is no longer compatible with kubevirt provider codebase:

Error: current version of clusterctl is only compatible with v1beta1 providers, detected v1alpha4 for provider infrastructure-kubevirt

Note, from code perspective, kubevirt provider is still up-to-date and compatible with latest cluster-api, and in our CI implementation we don't observe issues, since we don't use clusterctl for manifest generation but instead apply cluster manifests directly (Cluster, KubeadmControlPlane, MachineDeployment, KubevirtCluster, KubevirtMachineTemplate).

But it might present challenges, for example, for our quickstart guide, etc. workarounds being either pinning to older version of clusterctl, or providing instructions for building from code, in the meantime.

Node update with provider ID should not block KubevirtMachine status ready

What happened:
Currently the node is updated with the provider ID in the kubevirtMachine reconciler

cluster-api-provider-kubevirt/controllers/kubevirtmachine_controller.go

Line 296 in 83c6938

    
           if providerID, err = externalMachine.SetProviderID(workloadClusterClient); err != nil {

It cause to the following behavior: as long as the node is not created, the KubevirtMachine state isn't changed to READY

What did you expect to happen:
The Node update should not block the KubevirtMachine Ready status

Anything else you would like to add:
In Openshift case, it cause to deadlock situation, because of the following conditions:

Node can be created only if the relevant certificate is approved
Certificate can be approved only if the relevant KubevirtMachine state is READY

CAPK should not required custom `capk` user

What steps did you take and what happened:

The current CAPK logic requires a SSH keys pair to be created and it modifies the cloud-init user-data generates by Kubeadm to add the custom user - capk - with the public SSH key authorizing it to SSH to each node. This is required so that CAPK can check that each node bootstrapped correctly.

What did you expect to happen:

Probing for the node health should be deferred to K8s, such that the readiness is reported on the VM(I). This could use one of the handlers provided by KubeVirt.

Environment:

Cluster-api version: v1beta1
Cluster-api-provider-kubevirt version: v1alpha1
Kubernetes version: (use kubectl version): v1.20.15
KubeVirt version: v1
OS (e.g. from /etc/os-release): CentOS 7

/kind bug

	## --------------------------------------
	## Docker — All ARCH
	## --------------------------------------

kubernetes-sigs / cluster-api-provider-kubevirt Goto Github PK

cluster-api-provider-kubevirt's Introduction

Kubernetes Cluster API Provider Kubevirt

What is the Cluster API Provider Kubevirt?

Quick Start

Getting Involved and Contributing

Office hours

Other ways to communicate with the maintainers

Github Issues

Bugs

Tracking new feature

Code of conduct

cluster-api-provider-kubevirt's People

Contributors

Stargazers

Watchers

Forkers

cluster-api-provider-kubevirt's Issues

Recommend Projects

Recommend Topics

Recommend Org