Coder Social home page Coder Social logo

answear / kube-vcloud-flexvolume Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 5.0 9.23 MB

VMware Cloud Director flexVolume driver for Kubernetes

Python 81.04% Shell 2.11% Dockerfile 0.56% Makefile 2.15% Go 14.15%
kubernetes flexvolume-driver vcloud vcloud-director persistent-storage

kube-vcloud-flexvolume's Introduction

kube-vcloud-flexvolume

VMware vCloud Director flexVolume driver for Kubernetes.

Status

Successfully run this driver on production Kubernetes cluster for over half a year without any loss of data. The current stable version is: 2.5.4.

Version 2.4.0 introduces external vcloud-provisioner for ease provisioning Persistent Volumes. Provisioner is deployed inside Kubernetes cluster as a Pod controlled by Deployment.

WARNING: Versions prior to 2.2.1rc1 have a problem with unstable disk paths which under some circumstances could cause data loss. After upgrade from affected versions make sure udev rules have been properly converted to the new format using this script.

Caveats

  • Due to how vCloud works if you want to simultaneously attach/detach disks to/from same VM you should implement a global lock inside attach/detach commands. Check feature/etcd branch for experimental implementation using etcd key-value store. (Merged via pull request #5)

  • When Kubernetes node is marked unschedulable (with kubectl drain) operationExecutor on a new node calls AttachVolume before DetachVolume is called on the old one. We periodically poll the volume to find out if is still attached, but vCloud deletes relation before asynchronous detach:disk task was finished. This sometimes can result in throwing an exception "Could not attach volume '%s' to node '%s'" and repeating the attemp by Kubelet process.

  • Using busType:busSubType combination other than SCSI:VirtualSCSI can lead to unexpected behavior. For example you can attach more than one disk of default type (SCSI:lsilogic), but only the first one will be detected by Linux kernel.

  • When something goes wrong during disk attaching and the driver throws an exception the udev rules required for restoring symlinks after reboot might not be generated. This can result in similar behaviour to one described in this issue. The code tries to minimize the chances of this happening. If the problem occurs, please fill the bug report.

Description

vcloud-flexvolume provides a storage driver using vCloud's Independent Disk feature. The Independent Disk provides persistent disk storage which can be attached to instances running in vCloud Director environment.

You can read more about Independent Disks here.

Installation

  • Make sure kubelet is running with --enable-controller-attach-detach=false
  • Create the directory /usr/libexec/kubernetes/kubelet-plugins/volume/exec/answear.com~vcloud
  • Install wrapper scripts/vcloud as /usr/libexec/kubernetes/kubelet-plugins/volume/exec/answear.com~vcloud/vcloud
  • Create the directory /opt/vcloud-flexvolume/etc
  • Install configuration file config/config.yaml.example as /opt/vcloud-flexvolume/etc/config.yaml and set parameters.

Install packages:

  • python3
  • python3-pip
  • python3-setuptools
  • python3-wheel
  • python3-flufl.enum
  • python3-lxml
  • python3-yaml
  • python3-pygments
  • python3-pyudev

Install the driver itself:

git checkout 2.5.4
python3 setup.py build
sudo python3 setup.py install

or

pip3 install --process-dependency-links git+https://github.com/answear/[email protected]
  • Restart kubelet process.

Create a Kubernetes Pod such as:

cat examples/nginx.yaml | kubectl apply -f -

The driver will create an independent disk with name "testdisk" and size 1Gi under storage profile "T1". The volume will also be mounted as /data inside the container.

Upgrading

  • Install the newest driver version using git or pip.
  • Apply any changes in example config file to your local copy.

Options

Following options are required:

  • volumeName - Name of the independent disk volume.
  • size - Size to allocate for the new independent disk volume. Accepts any value in human-readable format. (e.g. 100Mi, 1Gi)

Optional options may be passed:

  • busType - Disk bus type expressed as a string. One of: 5 - IDE, 6 - SCSI (default), 20 - SATA.
  • busSubType - Disk bus subtype expressed as a string. One of: "" (busType=5), buslogic (busType=6), lsilogic (busType=6), lsilogicsas (busType=6), VirtualSCSI (busType=6), vmware.sata.ahci (busType=20).
  • storage - Name of the storage pool.
  • mountOptions - Additional comma-separated options passed to mount. (e.g. noatime, relatime, nobarrier)

Driver invocation

NOTE: Versions prior to 2.4.0 have "mountoptions" (lowercase). For backwards compatibility and for using StorageClass.MountOptions in provisioner we accept both versions.

  • Init:
>>> vcloud-flexvolume init
<<< {"status": "Success", "capabilities": {"attach": true}}
  • Volume is attached:
>>> vcloud-flexvolume isattached '{"kubernetes.io/fsType":"ext4","kubernetes.io/pvOrVolumeName":"testdisk","kubernetes.io/readwrite":"rw","mountOptions":"relatime,nobarrier","size":"1Gi","storage":"T1","busType":6,"busSubType":"VirtualSCSI","volumeName":"testdisk"}' nodename
<<< {"status": "Success", "attached": false}
  • Attach:
>>> vcloud-flexvolume attach '{"kubernetes.io/fsType":"ext4","kubernetes.io/pvOrVolumeName":"testdisk","kubernetes.io/readwrite":"rw","mountOptions":"relatime,nobarrier","size":"1Gi","storage":"T1","busType":6,"busSubType":"VirtualSCSI","volumeName":"testdisk"}' nodename
<<< {"status": "Success", "device": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1"}

The driver detects (using udev events) the name of the device under which it was registered by the Linux kernel and automatically creates symlink /dev/block/<URN> pointing to ../<device>. URN is a unique volume ID generated by vCloud Director.

  • Wait for attach:
>>> vcloud-flexvolume waitforattach /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1 '{"kubernetes.io/fsType":"ext4","kubernetes.io/pvOrVolumeName":"testdisk","kubernetes.io/readwrite":"rw","mountOptions":"relatime,nobarrier","size":"1Gi","storage":"T1","busType":6,"busSubType":"VirtualSCSI","volumeName":"testdisk"}'
<<< {"status": "Success", "device": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1"}
  • Mount device:
>>> vcloud-flexvolume mountdevice /mnt/testdisk /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1 '{"kubernetes.io/fsType":"ext4","kubernetes.io/pvOrVolumeName":"testdisk","kubernetes.io/readwrite":"rw","mountOptions":"relatime,nobarrier","size":"1Gi","storage":"T1","busType":6,"busSubType":"VirtualSCSI","volumeName":"testdisk"}'
<<< {"status": "Success"}
  • Unmount device:
>>> vcloud-flexvolume unmountdevice /mnt/testdisk
<<< {"status": "Success"}
  • Detach:
>>> vcloud-flexvolume detach testdisk nodename
<<< {"status": "Success"}

TODO

  • Write some tests.
  • Functions in flexvolume/mount.py should raise Exceptions just like the ones in attach.py.
  • Reuse vCloud API session token between invocations.
  • Validate input JSON with JSON Schema.

Credits

  • elFarto - for forking and improvements in Disk.find_disk and Disk.get_disks methods

kube-vcloud-flexvolume's People

Contributors

dzolnierz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kube-vcloud-flexvolume's Issues

Make timeouts configurable

In the driver we have places when we wait for code to execute until timed out. Editing code to increase timeouts is problematic. For example:

  145                 # Make sure task is completed
  146                 task = Client.ctx.client.get_task_monitor().wait_for_status(
  147                     task=is_disk_attached,
  148                     timeout=60,
  149                     poll_frequency=2,
  150                     fail_on_statuses=None,
  151                     expected_target_statuses=[
  152                         TaskStatus.SUCCESS, TaskStatus.ABORTED, TaskStatus.ERROR,
  153                         TaskStatus.CANCELED
  154                     ],
  155                     callback=None

Possible solution: add new option timeout to config file.

Timeout is_disk_connected = wait_for_connected_disk(600)

Greetings!

I have a problem with attach.py with timeout_for_connected_disk. Looks like method from pyudev.monitor def from_netlink could not connect to vm via timeout here

context = pyudev.Context()
    monitor = pyudev.Monitor.from_netlink(context)
    monitor.filter_by(subsystem='block', device_type='disk').

Can you also show me a simple output here via print(result)

result = []
    for device in iter(partial(monitor.poll, timeout), None):
        if device.action == 'add':
            result = [device.device_node, 'connected']
            break
        elif device.action == 'remove':
            result = [device.device_node, 'disconnected']
            break
    return result

I was trying to incrase timeout to 1000s but still not work, im not realy sure how pydev.monitor works.
Rest works as intended.
I see how app create disk, how attach/deattach disk to vm. Also I see disk via fdisk -l

Disk /dev/sdd: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

But I cant continue script, because it's reaching timeout for wait_for_connected_disk.
Maybe need to modernize this method?

kube-vcloud CSI plugin?

Hello! Are there currently plans to migrate kube-vcloud-flexvolume to CSI plugin model, which seems to be the recommended way in current Kubernetes versions for deploying storage plugins/drivers.

And thanks a lot for working on this project, it's an important feature to have :)

invalid character 'D' looking for beginning of value

While trying to launch pod using the provided nginx.yaml(https://github.com/sysoperator/kube-vcloud-flexvolume/blob/master/examples/nginx.yaml) in your repository, the following error has occurred in kubelet service log.

During attach the disk, there is an error:
Operation for ""flexvolume-sysoperator.pl/vcloud/testdisk33"" failed. No retries permitted until 2018-11-13 11:39:19.84658885 +0000 UTC m=+7370.033281972 (durationBeforeRetry 2m2s). Error: "AttachVolume.Attach failed for volume "testdisk33" (UniqueName: "flexvolume-sysoperator.pl/vcloud/testdisk33") from node "k8s-env34-cert-new-o101devorg-1238778" : invalid character 'D' looking for beginning of value"

I have attached the result of attach method call ( Response in detail along with this)
attachcallresponsethroughyaml.txt

Getting permission issue to create log file

FlexVolume: driver call failed: executable: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/answear.com~vcloud/vcloud,
args: [attach {"busSubType":"VirtualSCSI","busType":"6","kubernetes.io/fsType":"ext4","kubernetes.io/pvOrVolumeName":"appktestdisk",
"kubernetes.io/readwrite":"rw","mountOptions":"relatime,nobarrier","size":"1Gi","storage":"Platinum","volumeName":"appktestdisk"}
xxxxxxxxxxxxxVMNAMExxxxxxxxxxxxxxxxxxx],
error: exit status 1,
output: "{"status": "Failure", "message": "Error on line 33 in file
/usr/local/lib/python3.6/site-packages/kube_vcloud_flexvolume-2.4.3-py3.6.egg/flexvolume/attach.py (PermissionError):
[Errno 13] Permission denied: '/pyvcloud.log'"}\nE
xception: '_thread._local' object has no attribute 'client'\n\n"

Unable to create log file as it throws permission error for all location.

~/pyvcloud.log
/home/admin/pyvcloud.log
/var/log/pyvcloud.log
etc.,

After node restart symlinks in /dev/block do not exist

The driver enters into this piece of code:

  165             else:
  166                 import inspect
  167                 raise Exception(
  168                         ("Fatal error on line %d. This should never happen") % (inspect.currentframe().f_lineno)
  169                 )

and pods are never started again after node boot.

Investigate random failures in src/flexvolume/attach.py:158

                # Make sure task is completed
                task = Client.ctx.client.get_task_monitor().wait_for_status(
                    task=is_disk_attached,
                    timeout=60,
                    poll_frequency=2,
                    fail_on_statuses=None,
                    expected_target_statuses=[
                        TaskStatus.SUCCESS, TaskStatus.ABORTED, TaskStatus.ERROR,
                        TaskStatus.CANCELED
                    ],
                    callback=None)
                # Sometimes task "fails" with error:
                # majorErrorCode=500 and message=Unable to perform this action.
                assert task.get('status') == TaskStatus.SUCCESS.value

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.