openshift / cluster-api-provider-libvirt Goto Github PK

View Code? Open in Web Editor NEW

36.0 13.0 56.0 43.92 MB

License: Apache License 2.0

Makefile 2.41% Go 89.45% Dockerfile 0.44% Shell 4.79% HCL 2.91%

cluster-api-provider-libvirt's Introduction

OpenShift cluster-api-provider-libvirt

This repository hosts an implementation of a provider for libvirt for the OpenShift machine-api.

This provider runs as a machine-controller deployed by the machine-api-operator

Allowing the actuator to connect to the libvirt daemon running on the host machine:

Libvirt needs to be configured to accept TCP connections as described in the installer documentation.

You can verify that you can connect through your host private ip with:

virsh -c qemu+tcp://host_private_ip/system

Video demo

https://youtu.be/urvXXfdfzVc

cluster-api-provider-libvirt's People

Contributors

Stargazers

Watchers

Forkers

enxebre derekwaynecarr vikaschoudhary16 wking frobware paulfantom spangenberg bontequero smarterclayton mgugino-upstream-stage gbraad abhinavdahiya cynepco3hahue praveenkumar zeenix fabiand cfergeau ahrtr tareqalayan athna gyohuangxin jichenjc bloodandwolf xkwangcn barthy1 openshift-cherrypick-robot mtarsel prashanth684 kvasscn yussufsh znipps iputra dbenoit17 squeed andymcc openshift-bot dhellmann jupierce global-localhost global19 global19-atlassian-net isabella232 m1kola openshift-ci-robot rphillips kwoodson fedosin aossama lm0943111262 damdo developing-today uccps-samples y2023y locriandev racheljpg ekleinso

cluster-api-provider-libvirt's Issues

Redeploy of the machine-controller will reset cidrOffset

We increase cidrOffset each time that the controller creates a new machine and add the record to the libvirt network, but controller redeploy will reset the cidrOffset counter and the controller will create DNS records for the new machine with the IP's that already given to another machine.

I do not sure how can we fix it, but I think we will need to check the correlation between running machines and DHCP leases, it also will help to avoid MAC duplication.

Creation of volume for Ignition fails if it already exists

I wanted to test my custom built cluster-api-provider-libvirt image for developing/debugging it. After successfully deploying it, I tried to test by:

oc scale --replicas 0 machinesets/test1-xlz4j-worker-0
sleep 4
oc scale --replicas 1 machinesets/test1-xlz4j-worker-0

But the worker machine failed to come up. Looking at oc logs deployments/clusterapi-manager-controllers --container machine-controller, I found the source of the problem:

I0307 16:44:51.587462       1 ignition.go:152] Creating Ignition temporary file
I0307 16:44:51.588472       1 client.go:426] Check if "test1-xlz4j-worker-0-r5d85" volume exists
I0307 16:44:51.589005       1 client.go:450] Deleting volume test1-xlz4j-worker-0-r5d85
E0307 16:44:51.591374       1 actuator.go:106] Machine error: error creating domain Error creating libvirt volume for Ignition test1-xlz4j-worker-0-r5d85.ignition: virError(Code=90, Domain=18, Message='storage volume 'test1-xlz4j-worker-0-r5d85.ignition' exists already')
E0307 16:44:51.591454       1 actuator.go:50] test1-xlz4j-worker-0-r5d85: error creating libvirt machine: error creating domain Error creating libvirt volume for Ignition test1-xlz4j-worker-0-r5d85.ignition: virError(Code=90, Domain=18, Message='storage volume 'test1-xlz4j-worker-0-r5d85.ignition' exists already')

I'd be happy to try to provide a PR to fix this but to be able to do that, I first need to find out what exactly is the issue here. Is it that the ignition volume should be deleted after it's use or that operator should just re-use the volume if it exists already? I'll hopefully find the answer by digging through the code, but if someone knows already, that would be very helpful.

scale machinesets no effect

oc scale --replicas=3 machineset test1-nf857-worker-0 -n openshift-machine-api

[core@test1-nf857-master-0 ~]$ oc get machinesets -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
test1-nf857-worker-0 3 3 1 1 24h

logs from:
oc logs machine-api-controllers-c59d8448c-644ns -n openshift-machine-api -c controller-manager

I0725 05:48:35.725479 1 controller.go:331] Creating machine 1 of 1, ( spec.replicas(2) > currentMachineCount(1) )
I0725 05:48:35.890195 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:35.956498 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:35.956819 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:55.052917 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:55.053447 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:56.723417 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:30.928201 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:31.044226 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:32.162709 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster

Support 'host-passthrough' CPU type (and generally, CPU type) for worker nodes

Currently, the VMs are created with the default CPU type of libvirt. If we could define the CPU type, it'd be great - it'll allow us to use 'host-passthrough' CPU type.

e2e-libvirt CI job broken

The e2e-libvirt CI job is currently not working for this provider. The reason is that the job configuration assumes dockerfiles for the images it needs to be present in the repository.

So we either need to somehow instruct the job to look for the docker files in the Installer repository or we need to duplicate the docker files in here. The former would be the best approach but we need to figure out how to do that, the latter is very easy (famous last words) but it's not ideal at all since then we'll need to keep them in sync manually.

Error: error creating machine error creating domain: Can't retrieve network name tectonic

networkInterfaceName in machine yaml should not be hard-coded as "tectonic", it should be received through config map. networkInterfaceName must exist in virsh net-list.

And if not, instance domain will not get created.

Update Dockerfile and probably related yaml file to support mainframe/s390x platform

In the following PR, guestfish is used to inject ignition for mainframe/s390x platform.
#174

So we need to update the Dockerfile so as to get the guestfish tool included in the container image of cluster-api-provider-libvirt.

Since guestfish operates the OS image file located on the host machine directly, but the cluster-api-provider-libvirt runs inside a POD/container, so the yaml file used to deploy the deployment/POD needs to be updated accordingly so as to get the volume containing the OS image file mounted into the POD/container. If the approach is difficult or unacceptable, then we could need to figure out some other solutions.

After update to RHEL 8.5 + latest virt:av from 8.5, ilibvirt IPI no longer works

This is a copy of openshift/installer#5401

Pre-bake testing image

Let's pre-install package sand pre-pull docker images so master/worker node bootstrapping finishes ASAP:

install kubelet, kubectl, kubeadm, docker
kubeadm config images pull
etc.

Many errors are not visible in container logs

Listing the two which i found here:

/usr/bin/kvm-spice: No such file or directory:
This error occurs when qemu binary name is not kvm-spice on the host. When this happens instance volume gets created by instance domain does not get created. This error is visible only when the binary is run in non-containerized, stand-alone mode.
Error: error creating machine error creating domain: Can't retrieve network name tectonic
This happens when networkInterfaceName in machine yaml does not exist in
virsh net-list

libvirt-actuator is broken

Hi, I was trying to follow https://github.com/openshift/cluster-api-provider-libvirt/tree/master/cmd/libvirt-actuator to create libvirt instance. I failed at step ./bin/libvirt-actuator create -m examples/machine.yaml -c examples/cluster.yaml

#  ./bin/libvirt-actuator create -m examples/machine.yaml -c examples/cluster.yaml
ERROR: logging before flag.Parse: I1029 05:21:21.536376   29201 actuator.go:86] Creating machine "worker-example" for cluster "tb-asg-35".
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xee6683]

goroutine 1 [running]:
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.ProviderConfigMachine(0x0, 0x0, 0xc00031c688, 0xc000000000, 0x12451f5, 0x23)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:236 +0x53
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.clientForMachine(0x0, 0x0, 0xc00031c580, 0x2, 0x2, 0xc000307d80)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:352 +0x4b
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.(*Actuator).Create(0xc000307d80, 0xc0002df040, 0xc00031c580, 0x0, 0x0)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:89 +0x135
main.init.0.func1(0xc0002c5400, 0xc000307a80, 0x0, 0x4, 0x0, 0x0)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/cmd/libvirt-actuator/main.go:61 +0x224
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).execute(0xc0002c5400, 0xc000307a40, 0x4, 0x4, 0xc0002c5400, 0xc000307a40)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:762 +0x473
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1fece60, 0xc0002e06e0, 0x136cb60, 0xc00000e018)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:852 +0x2fd
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).Execute(0x1fece60, 0xc00000e018, 0xc0000cff88)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/cmd/libvirt-actuator/main.go:143 +0x5c

System Info

[root@dhcp-66-145-109 cluster-api-provider-libvirt]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@dhcp-66-145-109 cluster-api-provider-libvirt]# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false


libvirt-libs-3.9.0-14.el7_5.8.x86_64
libvirt-devel-3.9.0-14.el7_5.8.x86_64
libvirt-daemon-3.9.0-14.el7_5.8.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64

go version go1.11.1 linux/amd64

bin/libvirt-actuator was built from on the crd branch. Please let me know what more I should provide here.
Thank you.

libvirt pool path is wrong when it is not default

Even though I use custom path for storagePool(/home/jooho/KVM), the actuator try to use wrong path.

E0112 03:54:03.202553       1 actuator.go:107] Machine error: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:03.203245       1 actuator.go:51] tbd/tbd-worker-0-nhzvf: error creating libvirt machine: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:04.259706       1 actuator.go:107] Machine error: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:04.259734       1 actuator.go:51] tbd/tbd-worker-0-d554w: error creating libvirt machine: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base

The worker storage files are generated under /home/jooho/KVM but it failed to retrieve the files because it tried to load them under /var/lib/libvirt/images.

Expected result

it should search right path where the image files are generated.

Make generate is broken

go install  -ldflags '-extldflags "-static"' sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen
can't load package: package sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen: cannot find package "sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen" in any of:
        /usr/lib/golang/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen (from $GOROOT)
        /home/alukiano/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen (from $GOPATH)
make: *** [Makefile:46: gencode] Error 1

Folder mockgen does not exist under the master of

ll ~/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock
total 24
Mar 17 14:01 AUTHORS
Mar 17 14:01 CONTRIBUTORS
Mar 17 14:01 gomock
Mar 17 14:01 LICENSE

And it does not good practice to reference tools from some external repository.

Future Release Branches Frozen For Merging | branch:release-4.10 branch:release-4.9

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.10
release-4.9

Contact the Test Platform or Automated Release teams for more information.

support different sizes for worker disks.

Some people try to resize the provided RH CoreOS image using qemu-img resize and then are surprised when workers don't show up with larger disks (see openshift/os#318 (comment)).

Can we implement this?

cluster-api-provider-libvirt/cloud/libvirt/actuators/machine/utils/volume.go

Lines 14 to 17 in 83c10e8

    
           const ( 
        
           	// TODO: support size in the API 
        
           	size = 17706254336 
        
           )

libvirt: Assign fully-qualified domain names to nodes

At least to be consistent (if not for any other possible benefits), the domain naming should be consistent with that of Installer and hence we'd want a similar change to Installer here.

/usr/bin/kvm-spice: No such file or directory

domainDef.Devices.Emulator should not be hardcoded to "/usr/bin/kvm-spice" in "cloud/libvirt/actuators/machine/utils/domain.go"
For example on my dev machine, fedora28, this should have been "/usr/bin/qemu-system-x86_64"

For now, work around for fedora28 is:
ln -s /usr/bin/qemu-system-x86_64 /usr/bin/kvm-spice

Machines should be created with a console

Check out how the option the installer uses when creating machines: openshift/installer#176. This is extremely useful for debugging.

Need to update go mod dependencies for k8s and machine-api packages

As of today cluster-api-provider references very old versions of k8s packages and machine-api packages, leading to bugs like https://bugzilla.redhat.com/show_bug.cgi?id=1831780. Moreover, the latest machine-api does health checking now [1] which would mean a change like [2] is needed here too, but this cannot be done unless the k8s packages are updated which in turn means the machine-api packages are updated too. Without this update, the machine controller keeps restarting as it is shot down by the machine-api-operator because of the inability to report readiness and liveness.

[1] openshift/machine-api-operator#602
[2] openshift/cluster-api-provider-azure#139

Error 'no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"'

Build and start bin/manager from crd branch. Then tried to create example/cluster.yaml and example/machine.yaml. Noticed the manager console keeps reporting error:

E1029 05:34:12.452090   18408 controller.go:159] Error checking existence of machine instance for machine object worker-example; tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"
E1029 05:34:13.452512   18408 actuator.go:47] tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"

At this point the machine cannot be deleted too.

E1029 05:41:35.454821   18408 actuator.go:47] tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"
E1029 05:41:35.454855   18408 controller.go:138] Error deleting machine object worker-example; tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"

System Info

[root@dhcp-66-145-109 cluster-api-provider-libvirt]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@dhcp-66-145-109 cluster-api-provider-libvirt]# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false


libvirt-libs-3.9.0-14.el7_5.8.x86_64
libvirt-devel-3.9.0-14.el7_5.8.x86_64
libvirt-daemon-3.9.0-14.el7_5.8.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64

go version go1.11.1 linux/amd64

minikube version: v0.30.0

Since we don't have a valid ocp 4.0 build, I explored these features on minikube. I'm not sure if I misconfigured something, please let me know what more needs to be provided.
Thank you.

Fix unit test for controller

With #249 the actuator tests are failing like below and this issue is to track so we can fix it later in time. As of now priority is to update latest machine api and vendoring part.

$ make test
 go test -race -cover ./cmd/... ./pkg/cloud/...
?   	github.com/openshift/cluster-api-provider-libvirt/cmd/manager	[no test files]
E0131 15:06:10.408695   14100 actuator.go:107] Machine error: error getting machineProviderConfig from spec: no Value in ProviderConfig
E0131 15:06:10.411129   14100 actuator.go:107] Machine error: error creating libvirt client: error creating libvirt client
E0131 15:06:10.412790   14100 actuator.go:107] Machine error: error creating volume error
E0131 15:06:10.412874   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error creating volume error
E0131 15:06:10.414281   14100 actuator.go:107] Machine error: error creating domain error
E0131 15:06:10.414341   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error creating domain error
E0131 15:06:10.415850   14100 actuator.go:107] Machine error: error looking up libvirt machine error
E0131 15:06:10.415920   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error looking up libvirt machine error
E0131 15:06:10.420175   14100 actuator.go:107] Machine error: error getting machineProviderConfig from spec: no Value in ProviderConfig
E0131 15:06:10.421666   14100 actuator.go:107] Machine error: error creating libvirt client: error creating libvirt client
E0131 15:06:10.423274   14100 actuator.go:107] Machine error: error checking for domain existence: error
E0131 15:06:10.426872   14100 actuator.go:107] Machine error: error deleting "libvirt-actuator-testing-machine" domain error
E0131 15:06:10.429808   14100 actuator.go:107] Machine error: error deleting "libvirt-actuator-testing-machine" volume error
--- FAIL: TestMachineEvents (0.03s)
    --- FAIL: TestMachineEvents/Create_machine_event_failed_(invalid_configuration) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Create_machine_event_failed_(error_creating_libvirt_client) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(invalid_configuration) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_creating_libvirt_client) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_getting_domain) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_deleting_domain) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_deleting_volume) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_succeeds (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)

Console route is not accessible on libvirt

After standing up a libvirt cluster using

ostree 47.191
Installer b4f5ceb6bfde8d3dc0e29f708e0494488ea37ee0

I can get the console route:

console-openshift-console.apps.dev1.rlk.home

but I get "can't find the server at ..." from my browser. Same works for AWS.

RFE: Support for hugepages (2Mb, 1G, etc..) on libvirt

Hi,

On my hypervisors (RHEL), I have most of the memory dedicated to hugepages (the default 2Mb hugepages).
This allows me to fit more VMs into reserved memory and makes their TLB's smaller (needed when you have 64Gb or more memory).

Here's an example:

<domain type='kvm' id='3'>
  <name>sat6</name>
[...]
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
[...]

(The above is for the default 2mb hugepages, for 1G hugepages the syntax is slightly different).

I've done some research and it appears that Terraform has support for hugepages but I wouldn't know where to change it in the openshift installer..
Do you have any idea of where that might be?
Any hint would be welcomed..

Older version of libvirt doesn't show the status for machines resources.

I have 2 different Linux machine (f28 and f29), when I create cluster on f28 which use libvirt-4.1.0-5.fc28.x86_64 doesn't show the status for machine resource.

$ oc get machine test1-nm7bw-master-0 -n openshift-machine-api -oyaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: 2019-03-08T05:10:21Z
  finalizers:
  - machine.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: test1-nm7bw
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: test1-nm7bw-master-0
  namespace: openshift-machine-api
  resourceVersion: "5619"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/test1-nm7bw-master-0
  uid: 7956cf54-4160-11e9-955e-664f163f5f0f
spec:
  metadata:
    creationTimestamp: null
    labels:
      node-role.kubernetes.io/worker: ""
  providerSpec:
    value:
      apiVersion: libvirtproviderconfig.k8s.io/v1alpha1
      autostart: false
      cloudInit: null
      domainMemory: 4096
      domainVcpu: 2
      ignKey: ""
      ignition:
        userDataSecret: master-user-data
      kind: LibvirtMachineProviderConfig
      networkInterfaceAddress: 192.168.126.0/24
      networkInterfaceHostname: ""
      networkInterfaceName: test1-nm7bw
      networkUUID: ""
      uri: qemu+tcp://libvirt.default/system
      volume:
        baseVolumeID: /var/lib/libvirt/images/test1-nm7bw-base
        poolName: default
        volumeName: ""
  versions:
    kubelet: ""

$ oc adm release info --commits | grep libvirt
  libvirt-machine-controllers                   https://github.com/openshift/cluster-api-provider-libvirt                  a06e49585f2cd716ae642c40701c67f17b747553

But when I use f29 machine which havelibvirt-4.7.0-1.fc29.x86_64 does show me the status. Want to make sure that shouldn't cause any problem when changing the machine resource before the start.

$ oc get machines test1-jn8nk-master-0 -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: 2019-03-08T04:54:38Z
  finalizers:
  - machine.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: test1-jn8nk
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: test1-jn8nk-master-0
  namespace: openshift-machine-api
  resourceVersion: "13139"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/test1-jn8nk-master-0
  uid: 46b67558-415e-11e9-934e-664f163f5f0f
spec:
  metadata:
    creationTimestamp: null
    labels:
      node-role.kubernetes.io/worker: ""
  providerSpec:
    value:
      apiVersion: libvirtproviderconfig.k8s.io/v1alpha1
      autostart: false
      cloudInit: null
      domainMemory: 4096
      domainVcpu: 2
      ignKey: ""
      ignition:
        userDataSecret: master-user-data
      kind: LibvirtMachineProviderConfig
      networkInterfaceAddress: 192.168.126.0/24
      networkInterfaceHostname: ""
      networkInterfaceName: test1-jn8nk
      networkUUID: ""
      uri: qemu+tcp://libvirt.default/system
      volume:
        baseVolumeID: /var/lib/libvirt/images/test1-jn8nk-base
        poolName: default
        volumeName: ""
  versions:
    kubelet: ""
status:
  addresses:
  - address: 192.168.126.11
    type: InternalIP
  lastUpdated: 2019-03-08T05:03:47Z
  nodeRef:
    kind: Node
    name: test1-jn8nk-master-0
    uid: 25f46151-415d-11e9-8504-52fdfc072182
  providerStatus:
    apiVersion: libvirtproviderconfig.openshift.io/v1beta1
    conditions: null
    instanceID: 7b44a02e-d881-4c67-ab26-68668b8ed5c6
    instanceState: Running
    kind: LibvirtMachineProviderStatus

$ oc adm release info --commits | grep libvirt
  libvirt-machine-controllers                   https://github.com/openshift/cluster-api-provider-libvirt                  a06e49585f2cd716ae642c40701c67f17b747553

README.md link

In README.md:

Before running the installer make sure you set libvirt to use the host private ip uri above: https://github.com/openshift/installer/blob/master/examples/tectonic.libvirt.yaml#L14

Link is broken. Project openshift/installer does not have an examples directory anymore from what I can tell. Are these instructions documented anywhere else? Or maybe not needed anymore...

Future Release Branches Frozen For Merging | branch:release-4.6

release-4.6

Contact the Test Platform or Automated Release teams for more information.

Link in README leads to 404

The link here:

Before running the installer make sure you set libvirt to use the host private ip uri above: https://github.com/openshift/installer/blob/master/examples/tectonic.libvirt.yaml#L14

leads to 404.

Support customization of worker's vcpu/memory

Master is already customizable through TF_... env variables but worker doesn't support any kind of customization during installation

Request:
Make vcpu and memory customizable for worker during install

Update cluster-api-provider-libvirt vendoring

openshift/installer#6812 does not build because of outdated vendoring in cluster-api-provider-libvirt

https://github.com/cfergeau/cluster-api-provider-libvirt/tree/update-vendoring-master updates the vendoring and resolves the build issue in the PR above, but it also breaks make test in cluter-api-provider-libvirt.

Future Release Branches Frozen For Merging | branch:release-4.17 branch:release-4.18

release-4.17
release-4.18

For more information, see the branching documentation.

Makefile still references original dev repo: http://github.com/enxebre/cluster-api-provider-libvirt

The Makefile still references the original dev repo:

cmd/libvirt-actuator/README.md
9:CGO_ENABLED=1 go build -o bin/libvirt-actuator -a github.com/enxebre/cluster-api-provider-libvirt/cmd/libvirt-actuator

Makefile
34:     go build -o $$GOPATH/bin/deepcopy-gen github.com/enxebre/cluster-api-provider-libvirt/vendor/k8s.io/code-generator/cmd/deepcopy-gen

client.CreateVolume doesn't check the returned error of waitForSuccess

Please take a look at line 428 in pkg/cloud/libvirt/client/client.go,

cluster-api-provider-libvirt/pkg/cloud/libvirt/client/client.go

Line 428 in c01644a

waitForSuccess("error refreshing pool for volume", func() error {

The returned error of waitForSuccess isn't checked, so I think it should be as below,

	err = waitForSuccess("error refreshing pool for volume", func() error {
		return client.pool.Refresh(0)
	})
	if err != nil {
		return fmt.Errorf("can't find storage pool '%s'", client.poolName)
	}

Race condition between libvirt and libvirt actuator to get and update VM interfaces

It possible situation when the domain was already defined and running but still does not have any interfaces of the kind libvirt.DOMAIN_INTERFACE_ADDRESSES_SRC_LEASE, in the result after the create and update of the machine, the machine still will lack any information regarding interfaces under the status section and it will prevent to reference this machine with the relevant node.

A possible solution is just to drop an error if the machine does not have any interface on the update stage, in this case, the machine controller will re-queue the machine.

Future Release Branches Frozen For Merging | branch:release-4.6 branch:release-4.5

release-4.6
release-4.5

Contact the Test Platform or Automated Release teams for more information.

Be more descriptive about libvirt configuration

Install libvirt-daemon, iptables, libvirt-client, libvirt-daemon-driver-qemu, libvirt-daemon-config-network, ebtables, firewalld, dnsmasq
Additionally to the README.md [1] I had to set LIBVIRTD_ARGS="--listen" under /etc/sysconfig/libvirtd as well (see [2]).

[1] https://github.com/openshift/cluster-api-provider-libvirt/blob/master/README.md
[2] https://wiki.libvirt.org/page/Libvirt_daemon_is_not_listening_on_tcp_ports_although_configured_to
How to create default storage pool: simon3z/virt-deploy#8 (comment)

Is it possible to add this provider to an existing CAPI deployment on vanilla K8s?

I have an existing cluster-api deployment using vanilla k8s (includes a docker provider as well as a Vsphere provider), I've been trying to figure out if it's possible to add this provider to the existing framework. I've been picking through the code in this repo as well as the operator repo here https://github.com/openshift/machine-api-operator

There seems to be some gaps in the documentation though to pull everything together, I'd be happy to help submit pr's if I could come up with a working formula but thus far I seem to be missing some pieces. Is this a use case the project team would be interested in helping with and supporting?

Ignition configuration injection isn't supported on s390x/mainframe platform

Currently "fw_cfg" is used by default to inject ignition configuration, but "fw_cfg" isn't supported on s390x/mainframe platform. The proposal is to use guestfish instead to inject ignition configuration for s390x/mainframe.

`networkInterfaceName` is a misnomer, it should be renamed to `networkName`

cluster-api-provider-libvirt seems built on very wrong premises :(

README.md says to modify libvirtd.conf:

listen_tls = 0
listen_tcp = 1
auth_tcp="none"
tcp_port = "16509"

and the libvirtd systemd service:

/usr/sbin/libvirtd -l

This is roughly equivalent to asking people to configure remote password-less root access to the host!

If you can connect to qemu:///system, you can for example create a storage pool with:

<pool type='dir'>
  <name>hack</name>
  <source>
  </source>
  <target>
    <path>/etc</path>
    <permissions>
      <mode>0755</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:etc_t:s0</label>
    </permissions>
  </target>
</pool>

and then read/write to any file in /etc (for example /etc/shadow).
We could restrict this libvirtd access to connections from the cluster, but I don't think we want the anything running in the cluster to be able to escape into the host.
We could use qemu+ssh:// and ssh keys, or qemu+tls:// and client certificates, and protect these secrets from most of the cluster, but I'm not familiar at all with openshift security, so I don't know if this is acceptable or not.

openshift / cluster-api-provider-libvirt Goto Github PK

cluster-api-provider-libvirt's Introduction

OpenShift cluster-api-provider-libvirt

Allowing the actuator to connect to the libvirt daemon running on the host machine:

Video demo

cluster-api-provider-libvirt's People

Contributors

Stargazers

Watchers

Forkers

cluster-api-provider-libvirt's Issues

System Info

Expected result

System Info

Recommend Projects

Recommend Topics

Recommend Org