Coder Social home page Coder Social logo

cluster-api-provider-libvirt's Introduction

OpenShift cluster-api-provider-libvirt

This repository hosts an implementation of a provider for libvirt for the OpenShift machine-api.

This provider runs as a machine-controller deployed by the machine-api-operator

Allowing the actuator to connect to the libvirt daemon running on the host machine:

Libvirt needs to be configured to accept TCP connections as described in the installer documentation.

You can verify that you can connect through your host private ip with:

virsh -c qemu+tcp://host_private_ip/system

Video demo

https://youtu.be/urvXXfdfzVc

cluster-api-provider-libvirt's People

Contributors

abhinavdahiya avatar andymcc avatar bison avatar cfergeau avatar damdo avatar dbenoit17 avatar derekwaynecarr avatar dobbymoodge avatar enxebre avatar fedosin avatar frobware avatar ingvagabund avatar juneezee avatar locriandev avatar michaelgugino avatar openshift-bot avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar paulfantom avatar prashanth684 avatar praveenkumar avatar racheljpg avatar rphillips avatar smarterclayton avatar spangenberg avatar vikaschoudhary16 avatar wking avatar xkwangcn avatar zeenix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-provider-libvirt's Issues

Redeploy of the machine-controller will reset cidrOffset

We increase cidrOffset each time that the controller creates a new machine and add the record to the libvirt network, but controller redeploy will reset the cidrOffset counter and the controller will create DNS records for the new machine with the IP's that already given to another machine.

I do not sure how can we fix it, but I think we will need to check the correlation between running machines and DHCP leases, it also will help to avoid MAC duplication.

Creation of volume for Ignition fails if it already exists

I wanted to test my custom built cluster-api-provider-libvirt image for developing/debugging it. After successfully deploying it, I tried to test by:

oc scale --replicas 0 machinesets/test1-xlz4j-worker-0
sleep 4
oc scale --replicas 1 machinesets/test1-xlz4j-worker-0

But the worker machine failed to come up. Looking at oc logs deployments/clusterapi-manager-controllers --container machine-controller, I found the source of the problem:

I0307 16:44:51.587462       1 ignition.go:152] Creating Ignition temporary file
I0307 16:44:51.588472       1 client.go:426] Check if "test1-xlz4j-worker-0-r5d85" volume exists
I0307 16:44:51.589005       1 client.go:450] Deleting volume test1-xlz4j-worker-0-r5d85
E0307 16:44:51.591374       1 actuator.go:106] Machine error: error creating domain Error creating libvirt volume for Ignition test1-xlz4j-worker-0-r5d85.ignition: virError(Code=90, Domain=18, Message='storage volume 'test1-xlz4j-worker-0-r5d85.ignition' exists already')
E0307 16:44:51.591454       1 actuator.go:50] test1-xlz4j-worker-0-r5d85: error creating libvirt machine: error creating domain Error creating libvirt volume for Ignition test1-xlz4j-worker-0-r5d85.ignition: virError(Code=90, Domain=18, Message='storage volume 'test1-xlz4j-worker-0-r5d85.ignition' exists already')

I'd be happy to try to provide a PR to fix this but to be able to do that, I first need to find out what exactly is the issue here. Is it that the ignition volume should be deleted after it's use or that operator should just re-use the volume if it exists already? I'll hopefully find the answer by digging through the code, but if someone knows already, that would be very helpful.

scale machinesets no effect

oc scale --replicas=3 machineset test1-nf857-worker-0 -n openshift-machine-api

[core@test1-nf857-master-0 ~]$ oc get machinesets -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
test1-nf857-worker-0 3 3 1 1 24h

logs from:
oc logs machine-api-controllers-c59d8448c-644ns -n openshift-machine-api -c controller-manager

I0725 05:48:35.725479 1 controller.go:331] Creating machine 1 of 1, ( spec.replicas(2) > currentMachineCount(1) )
I0725 05:48:35.890195 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:35.956498 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:35.956819 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:55.052917 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:55.053447 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:48:56.723417 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:30.928201 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:31.044226 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster
I0725 05:49:32.162709 1 controller.go:298] MachineSet "test1-nf857-worker-0" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster

e2e-libvirt CI job broken

The e2e-libvirt CI job is currently not working for this provider. The reason is that the job configuration assumes dockerfiles for the images it needs to be present in the repository.

So we either need to somehow instruct the job to look for the docker files in the Installer repository or we need to duplicate the docker files in here. The former would be the best approach but we need to figure out how to do that, the latter is very easy (famous last words) but it's not ideal at all since then we'll need to keep them in sync manually.

Update Dockerfile and probably related yaml file to support mainframe/s390x platform

In the following PR, guestfish is used to inject ignition for mainframe/s390x platform.
#174

So we need to update the Dockerfile so as to get the guestfish tool included in the container image of cluster-api-provider-libvirt.

Since guestfish operates the OS image file located on the host machine directly, but the cluster-api-provider-libvirt runs inside a POD/container, so the yaml file used to deploy the deployment/POD needs to be updated accordingly so as to get the volume containing the OS image file mounted into the POD/container. If the approach is difficult or unacceptable, then we could need to figure out some other solutions.

Pre-bake testing image

Let's pre-install package sand pre-pull docker images so master/worker node bootstrapping finishes ASAP:

  • install kubelet, kubectl, kubeadm, docker
  • kubeadm config images pull
  • etc.

Many errors are not visible in container logs

Listing the two which i found here:

  1. /usr/bin/kvm-spice: No such file or directory:
    This error occurs when qemu binary name is not kvm-spice on the host. When this happens instance volume gets created by instance domain does not get created. This error is visible only when the binary is run in non-containerized, stand-alone mode.

  2. Error: error creating machine error creating domain: Can't retrieve network name tectonic
    This happens when networkInterfaceName in machine yaml does not exist in
    virsh net-list

libvirt-actuator is broken

Hi, I was trying to follow https://github.com/openshift/cluster-api-provider-libvirt/tree/master/cmd/libvirt-actuator to create libvirt instance. I failed at step ./bin/libvirt-actuator create -m examples/machine.yaml -c examples/cluster.yaml

#  ./bin/libvirt-actuator create -m examples/machine.yaml -c examples/cluster.yaml
ERROR: logging before flag.Parse: I1029 05:21:21.536376   29201 actuator.go:86] Creating machine "worker-example" for cluster "tb-asg-35".
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xee6683]

goroutine 1 [running]:
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.ProviderConfigMachine(0x0, 0x0, 0xc00031c688, 0xc000000000, 0x12451f5, 0x23)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:236 +0x53
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.clientForMachine(0x0, 0x0, 0xc00031c580, 0x2, 0x2, 0xc000307d80)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:352 +0x4b
github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine.(*Actuator).Create(0xc000307d80, 0xc0002df040, 0xc00031c580, 0x0, 0x0)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator.go:89 +0x135
main.init.0.func1(0xc0002c5400, 0xc000307a80, 0x0, 0x4, 0x0, 0x0)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/cmd/libvirt-actuator/main.go:61 +0x224
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).execute(0xc0002c5400, 0xc000307a40, 0x4, 0x4, 0xc0002c5400, 0xc000307a40)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:762 +0x473
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1fece60, 0xc0002e06e0, 0x136cb60, 0xc00000e018)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:852 +0x2fd
github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra.(*Command).Execute(0x1fece60, 0xc00000e018, 0xc0000cff88)
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
	/root/go/src/github.com/openshift/cluster-api-provider-libvirt/cmd/libvirt-actuator/main.go:143 +0x5c

System Info

[root@dhcp-66-145-109 cluster-api-provider-libvirt]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@dhcp-66-145-109 cluster-api-provider-libvirt]# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false


libvirt-libs-3.9.0-14.el7_5.8.x86_64
libvirt-devel-3.9.0-14.el7_5.8.x86_64
libvirt-daemon-3.9.0-14.el7_5.8.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64

go version go1.11.1 linux/amd64

bin/libvirt-actuator was built from on the crd branch. Please let me know what more I should provide here.
Thank you.

libvirt pool path is wrong when it is not default

Even though I use custom path for storagePool(/home/jooho/KVM), the actuator try to use wrong path.

E0112 03:54:03.202553       1 actuator.go:107] Machine error: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:03.203245       1 actuator.go:51] tbd/tbd-worker-0-nhzvf: error creating libvirt machine: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:04.259706       1 actuator.go:107] Machine error: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base
E0112 03:54:04.259734       1 actuator.go:51] tbd/tbd-worker-0-d554w: error creating libvirt machine: error creating volume Can't retrieve volume /var/lib/libvirt/images/tbd-base

The worker storage files are generated under /home/jooho/KVM but it failed to retrieve the files because it tried to load them under /var/lib/libvirt/images.

Expected result

it should search right path where the image files are generated.

Make generate is broken

go install  -ldflags '-extldflags "-static"' sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen
can't load package: package sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen: cannot find package "sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen" in any of:
        /usr/lib/golang/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen (from $GOROOT)
        /home/alukiano/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock/mockgen (from $GOPATH)
make: *** [Makefile:46: gencode] Error 1

Folder mockgen does not exist under the master of

ll ~/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/github.com/golang/mock
total 24
Mar 17 14:01 AUTHORS
Mar 17 14:01 CONTRIBUTORS
Mar 17 14:01 gomock
Mar 17 14:01 LICENSE

And it does not good practice to reference tools from some external repository.

/usr/bin/kvm-spice: No such file or directory

domainDef.Devices.Emulator should not be hardcoded to "/usr/bin/kvm-spice" in "cloud/libvirt/actuators/machine/utils/domain.go"
For example on my dev machine, fedora28, this should have been "/usr/bin/qemu-system-x86_64"

For now, work around for fedora28 is:
ln -s /usr/bin/qemu-system-x86_64 /usr/bin/kvm-spice

Need to update go mod dependencies for k8s and machine-api packages

As of today cluster-api-provider references very old versions of k8s packages and machine-api packages, leading to bugs like https://bugzilla.redhat.com/show_bug.cgi?id=1831780. Moreover, the latest machine-api does health checking now [1] which would mean a change like [2] is needed here too, but this cannot be done unless the k8s packages are updated which in turn means the machine-api packages are updated too. Without this update, the machine controller keeps restarting as it is shot down by the machine-api-operator because of the inability to report readiness and liveness.

[1] openshift/machine-api-operator#602
[2] openshift/cluster-api-provider-azure#139

Error 'no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"'

Build and start bin/manager from crd branch. Then tried to create example/cluster.yaml and example/machine.yaml. Noticed the manager console keeps reporting error:

E1029 05:34:12.452090   18408 controller.go:159] Error checking existence of machine instance for machine object worker-example; tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"
E1029 05:34:13.452512   18408 actuator.go:47] tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"

At this point the machine cannot be deleted too.

E1029 05:41:35.454821   18408 actuator.go:47] tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"
E1029 05:41:35.454855   18408 controller.go:138] Error deleting machine object worker-example; tb-asg-35/worker-example: error creating libvirt client: error getting machineProviderConfig from spec: decoding failure: no kind "LibvirtMachineProviderConfig" is registered for version "libvirtproviderconfig/v1alpha1"

System Info

[root@dhcp-66-145-109 cluster-api-provider-libvirt]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@dhcp-66-145-109 cluster-api-provider-libvirt]# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:23:03 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:25:29 2018
  OS/Arch:          linux/amd64
  Experimental:     false


libvirt-libs-3.9.0-14.el7_5.8.x86_64
libvirt-devel-3.9.0-14.el7_5.8.x86_64
libvirt-daemon-3.9.0-14.el7_5.8.x86_64
libvirt-3.9.0-14.el7_5.8.x86_64

go version go1.11.1 linux/amd64

minikube version: v0.30.0

Since we don't have a valid ocp 4.0 build, I explored these features on minikube. I'm not sure if I misconfigured something, please let me know what more needs to be provided.
Thank you.

Fix unit test for controller

With #249 the actuator tests are failing like below and this issue is to track so we can fix it later in time. As of now priority is to update latest machine api and vendoring part.

$ make test
 go test -race -cover ./cmd/... ./pkg/cloud/...
?   	github.com/openshift/cluster-api-provider-libvirt/cmd/manager	[no test files]
E0131 15:06:10.408695   14100 actuator.go:107] Machine error: error getting machineProviderConfig from spec: no Value in ProviderConfig
E0131 15:06:10.411129   14100 actuator.go:107] Machine error: error creating libvirt client: error creating libvirt client
E0131 15:06:10.412790   14100 actuator.go:107] Machine error: error creating volume error
E0131 15:06:10.412874   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error creating volume error
E0131 15:06:10.414281   14100 actuator.go:107] Machine error: error creating domain error
E0131 15:06:10.414341   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error creating domain error
E0131 15:06:10.415850   14100 actuator.go:107] Machine error: error looking up libvirt machine error
E0131 15:06:10.415920   14100 actuator.go:51] libvirt-actuator-testing-machine: error creating libvirt machine: error looking up libvirt machine error
E0131 15:06:10.420175   14100 actuator.go:107] Machine error: error getting machineProviderConfig from spec: no Value in ProviderConfig
E0131 15:06:10.421666   14100 actuator.go:107] Machine error: error creating libvirt client: error creating libvirt client
E0131 15:06:10.423274   14100 actuator.go:107] Machine error: error checking for domain existence: error
E0131 15:06:10.426872   14100 actuator.go:107] Machine error: error deleting "libvirt-actuator-testing-machine" domain error
E0131 15:06:10.429808   14100 actuator.go:107] Machine error: error deleting "libvirt-actuator-testing-machine" volume error
--- FAIL: TestMachineEvents (0.03s)
    --- FAIL: TestMachineEvents/Create_machine_event_failed_(invalid_configuration) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Create_machine_event_failed_(error_creating_libvirt_client) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(invalid_configuration) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_creating_libvirt_client) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.Close() /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:193
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_getting_domain) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_deleting_domain) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_failed_(error_deleting_volume) (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s)
    --- FAIL: TestMachineEvents/Delete_machine_event_succeeds (0.00s)
        controller.go:137: missing call(s) to *mock.MockClient.GetDHCPLeasesByNetwork(is anything) /go/src/github.com/openshift/cluster-api-provider-libvirt/pkg/cloud/libvirt/actuators/machine/actuator_test.go:198
        controller.go:137: aborting test due to missing call(s) 

Console route is not accessible on libvirt

After standing up a libvirt cluster using

ostree 47.191
Installer b4f5ceb6bfde8d3dc0e29f708e0494488ea37ee0

I can get the console route:

console-openshift-console.apps.dev1.rlk.home

but I get "can't find the server at ..." from my browser. Same works for AWS.

RFE: Support for hugepages (2Mb, 1G, etc..) on libvirt

Hi,

On my hypervisors (RHEL), I have most of the memory dedicated to hugepages (the default 2Mb hugepages).
This allows me to fit more VMs into reserved memory and makes their TLB's smaller (needed when you have 64Gb or more memory).

Here's an example:

<domain type='kvm' id='3'>
  <name>sat6</name>
[...]
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
[...]

(The above is for the default 2mb hugepages, for 1G hugepages the syntax is slightly different).

I've done some research and it appears that Terraform has support for hugepages but I wouldn't know where to change it in the openshift installer..
Do you have any idea of where that might be?
Any hint would be welcomed..

Older version of libvirt doesn't show the status for machines resources.

I have 2 different Linux machine (f28 and f29), when I create cluster on f28 which use libvirt-4.1.0-5.fc28.x86_64 doesn't show the status for machine resource.

$ oc get machine test1-nm7bw-master-0 -n openshift-machine-api -oyaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: 2019-03-08T05:10:21Z
  finalizers:
  - machine.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: test1-nm7bw
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: test1-nm7bw-master-0
  namespace: openshift-machine-api
  resourceVersion: "5619"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/test1-nm7bw-master-0
  uid: 7956cf54-4160-11e9-955e-664f163f5f0f
spec:
  metadata:
    creationTimestamp: null
    labels:
      node-role.kubernetes.io/worker: ""
  providerSpec:
    value:
      apiVersion: libvirtproviderconfig.k8s.io/v1alpha1
      autostart: false
      cloudInit: null
      domainMemory: 4096
      domainVcpu: 2
      ignKey: ""
      ignition:
        userDataSecret: master-user-data
      kind: LibvirtMachineProviderConfig
      networkInterfaceAddress: 192.168.126.0/24
      networkInterfaceHostname: ""
      networkInterfaceName: test1-nm7bw
      networkUUID: ""
      uri: qemu+tcp://libvirt.default/system
      volume:
        baseVolumeID: /var/lib/libvirt/images/test1-nm7bw-base
        poolName: default
        volumeName: ""
  versions:
    kubelet: ""

$ oc adm release info --commits | grep libvirt
  libvirt-machine-controllers                   https://github.com/openshift/cluster-api-provider-libvirt                  a06e49585f2cd716ae642c40701c67f17b747553

But when I use f29 machine which havelibvirt-4.7.0-1.fc29.x86_64 does show me the status. Want to make sure that shouldn't cause any problem when changing the machine resource before the start.

$ oc get machines test1-jn8nk-master-0 -o yaml
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: 2019-03-08T04:54:38Z
  finalizers:
  - machine.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: test1-jn8nk
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: test1-jn8nk-master-0
  namespace: openshift-machine-api
  resourceVersion: "13139"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/test1-jn8nk-master-0
  uid: 46b67558-415e-11e9-934e-664f163f5f0f
spec:
  metadata:
    creationTimestamp: null
    labels:
      node-role.kubernetes.io/worker: ""
  providerSpec:
    value:
      apiVersion: libvirtproviderconfig.k8s.io/v1alpha1
      autostart: false
      cloudInit: null
      domainMemory: 4096
      domainVcpu: 2
      ignKey: ""
      ignition:
        userDataSecret: master-user-data
      kind: LibvirtMachineProviderConfig
      networkInterfaceAddress: 192.168.126.0/24
      networkInterfaceHostname: ""
      networkInterfaceName: test1-jn8nk
      networkUUID: ""
      uri: qemu+tcp://libvirt.default/system
      volume:
        baseVolumeID: /var/lib/libvirt/images/test1-jn8nk-base
        poolName: default
        volumeName: ""
  versions:
    kubelet: ""
status:
  addresses:
  - address: 192.168.126.11
    type: InternalIP
  lastUpdated: 2019-03-08T05:03:47Z
  nodeRef:
    kind: Node
    name: test1-jn8nk-master-0
    uid: 25f46151-415d-11e9-8504-52fdfc072182
  providerStatus:
    apiVersion: libvirtproviderconfig.openshift.io/v1beta1
    conditions: null
    instanceID: 7b44a02e-d881-4c67-ab26-68668b8ed5c6
    instanceState: Running
    kind: LibvirtMachineProviderStatus

$ oc adm release info --commits | grep libvirt
  libvirt-machine-controllers                   https://github.com/openshift/cluster-api-provider-libvirt                  a06e49585f2cd716ae642c40701c67f17b747553

Link in README leads to 404

The link here:

Before running the installer make sure you set libvirt to use the host private ip uri above: https://github.com/openshift/installer/blob/master/examples/tectonic.libvirt.yaml#L14

leads to 404.

Support customization of worker's vcpu/memory

Master is already customizable through TF_... env variables but worker doesn't support any kind of customization during installation

Request:
Make vcpu and memory customizable for worker during install

client.CreateVolume doesn't check the returned error of waitForSuccess

Please take a look at line 428 in pkg/cloud/libvirt/client/client.go,

waitForSuccess("error refreshing pool for volume", func() error {

The returned error of waitForSuccess isn't checked, so I think it should be as below,

	err = waitForSuccess("error refreshing pool for volume", func() error {
		return client.pool.Refresh(0)
	})
	if err != nil {
		return fmt.Errorf("can't find storage pool '%s'", client.poolName)
	}

Race condition between libvirt and libvirt actuator to get and update VM interfaces

It possible situation when the domain was already defined and running but still does not have any interfaces of the kind libvirt.DOMAIN_INTERFACE_ADDRESSES_SRC_LEASE, in the result after the create and update of the machine, the machine still will lack any information regarding interfaces under the status section and it will prevent to reference this machine with the relevant node.

A possible solution is just to drop an error if the machine does not have any interface on the update stage, in this case, the machine controller will re-queue the machine.

Be more descriptive about libvirt configuration

Is it possible to add this provider to an existing CAPI deployment on vanilla K8s?

I have an existing cluster-api deployment using vanilla k8s (includes a docker provider as well as a Vsphere provider), I've been trying to figure out if it's possible to add this provider to the existing framework. I've been picking through the code in this repo as well as the operator repo here https://github.com/openshift/machine-api-operator

There seems to be some gaps in the documentation though to pull everything together, I'd be happy to help submit pr's if I could come up with a working formula but thus far I seem to be missing some pieces. Is this a use case the project team would be interested in helping with and supporting?

cluster-api-provider-libvirt seems built on very wrong premises :(

README.md says to modify libvirtd.conf:

listen_tls = 0
listen_tcp = 1
auth_tcp="none"
tcp_port = "16509"

and the libvirtd systemd service:

/usr/sbin/libvirtd -l

This is roughly equivalent to asking people to configure remote password-less root access to the host!

If you can connect to qemu:///system, you can for example create a storage pool with:

<pool type='dir'>
  <name>hack</name>
  <source>
  </source>
  <target>
    <path>/etc</path>
    <permissions>
      <mode>0755</mode>
      <owner>0</owner>
      <group>0</group>
      <label>system_u:object_r:etc_t:s0</label>
    </permissions>
  </target>
</pool>

and then read/write to any file in /etc (for example /etc/shadow).
We could restrict this libvirtd access to connections from the cluster, but I don't think we want the anything running in the cluster to be able to escape into the host.
We could use qemu+ssh:// and ssh keys, or qemu+tls:// and client certificates, and protect these secrets from most of the cluster, but I'm not familiar at all with openshift security, so I don't know if this is acceptable or not.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.