linux-nvme / nvme-stas Goto Github PK

View Code? Open in Web Editor NEW

22.0 12.0 13.0 1.59 MB

NVMe STorage Appliance Services

License: Apache License 2.0

Dockerfile 0.10% Makefile 0.90% Meson 4.21% Python 89.55% Shell 5.24%

mdns nvme nvme-of nvme-over-fabrics

nvme-stas's Issues

nvme0 - Registration error. Result:0x0000, Status:0x4002 - Invalid Field in Command

Seeing the following errors from stafd when the SFSS controller is reset on the host.

Jan 4 08:28:43 rhel-storage-09 kernel: nvme nvme0: queue_size 128 > ctrl sqsize 31, clamping down
Jan 4 08:28:43 rhel-storage-09 kernel: nvme0: Unknown(0x21), Invalid Field in Command (sct 0x0 / sc 0x2) DNR> Jan 4 08:28:43 rhel-storage-09 stafd[1124]: (tcp, 172.18.210.70, 8009, nqn.1988-11.com.dell:SFSS:1:20230103150601e8,
enp10s0f0np0) | nvme0 - Registration error. Result:0x0000, Status:0x4002 - Invalid Field in Command: A reserved coded value or
an unsupported value in a defined field.

To reproduce:

generate I/O to all the nvme devices

Issue resets to all the nvme controllers - example command below:

echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller

make rpm fails on fedora-36

I'm seeing the following issue with make rpm on v2.1.1 when building on Fedora 36.

Is this a known problem?

[20/87] gcc  -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
FAILED: subprojects/libnvme/src/libnvme.so.1.2.0 
gcc  -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
/usr/bin/ld: cannot open linker script file /home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld: No such file or directory
collect2: error: ld returned 1 exit status
[21/87] gcc -Isubprojects/libnvme/src/libnvme-mi.so.1.2.0.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
[22/87] gcc -Isubprojects/libnvme/test/main-test.p -Isubprojects/libnvme/test -I../subprojects/libnvme/test -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -I/usr/include/json-c -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -MD -MQ subprojects/libnvme/test/main-test.p/test.c.o -MF subprojects/libnvme/test/main-test.p/test.c.o.d -o subprojects/libnvme/test/main-test.p/test.c.o -c ../subprojects/libnvme/test/test.c
[23/87] gcc -Isubprojects/libnvme/src/libnvme-mi-test.so.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
ninja: build stopped: subcommand failed.
error: Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)
make: *** [Makefile:98: .build/rpmbuild] Error 1
fedora-vm2:nvme-stas(branch_v2.1.1) > hostnamectl
 Static hostname: fedora-vm2
       Icon name: computer-vm
         Chassis: vm 🖴
      Machine ID: 5c079f3479774fdbac04330b59e0e7a7
         Boot ID: 2f2d3e3fa116409e8406ab48e36d9657
  Virtualization: kvm
Operating System: Fedora Linux 36 (Server Edition) 
     CPE OS Name: cpe:/o:fedoraproject:fedora:36
          Kernel: Linux 6.1.7-100.fc36.x86_64
    Architecture: x86-64
 Hardware Vendor: QEMU
  Hardware Model: Standard PC _i440FX + PIIX, 1996_

On a scaled up (roughly 450 namespaces from 80 subsystems with 2 NVMe/TCP controllers each) SLES15 SP5 MU host, one ends up seeing following stacd errors in the /var/log/messages during I/O testing with faults:

stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90414]
stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90394]
stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90580]
...

This manifests as dropped connections, failed paths, I/O errors, etc. on the SP5 host.

Config details below:

# uname -r
5.14.21-150500.55.7-default

# rpm -qa | grep nvme
libnvme-devel-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-bash-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
libnvme1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-zsh-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
python3-libnvme-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.x86_64
libnvme-mi1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64

test_new: gi.repository.GLib.GError: g-io-error-quark: Could not connect: No such file or directory (1)

One test case fails when running inside a minimal Debian chroot:

==================================== 9/14 ====================================
test:         Test Avahi
start time:   14:03:39
duration:     0.11s
result:       exit status 1
command:      MALLOC_PERTURB_=0 PYTHONPATH=/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu:/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/subprojects/libnvme /usr/bin/python3 /<<PKGBUILDDIR>>/test/test-avahi.py
----------------------------------- stderr -----------------------------------
E
======================================================================
ERROR: test_new (__main__.Test.test_new)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/test/test-avahi.py", line 17, in test_new
    srv = avahi.Avahi(sysbus, lambda: "ok")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/staslib/avahi.py", line 121, in __init__
    self._sysbus.connection.signal_subscribe(
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 169, in connection
    self._connection = self._get_connection()
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 327, in _get_connection
    return self._provider.get_system_bus_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 58, in get_system_bus_connection
    return Gio.bus_get_sync(
           ^^^^^^^^^^^^^^^^^
gi.repository.GLib.GError: g-io-error-quark: Could not connect: No such file or directory (1)

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)
==============================================================================

Installed packages inside the chroot: debhelper-compat dh-python docbook-xml docbook-xsl iproute2 libglib2.0-dev-bin meson pyflakes3 pylint python3-dasbus python3-gi python3-lxml python3-nvme python3-pyfakefs python3-pyudev python3-systemd python3:any xsltproc

Udev test issues with esoteric network interfaces

Hello,

(as discussed in #406, I'm filing another issue for the sake of completeness)

In Ubuntu, we have an openstack infrastructure that we can use for testing things in VMs (this is different from the infrastructure where autopkgtests run). When running the test-suite in a VM deployed by this openstack, there are multiple udev test-cases failing:

Test Case 1
Test Case 8
Test Case 11
Test Case 12
Legacy Test Case D6
Legacy Test Case F6

What is particular in these machines is the presence of a VXLAN network interface having a link-local IPv6 set (it is also a member of a bridge but I don't think this is relevant). I am not exactly sure what this type of interface does but I managed to reproduce the failure after running the following steps:

ip link add type vxlan id 1234 dstport 0
ip link set vxlan0 up

I also reproduced the failures with dummy interfaces but I'm not sure if that's a real use-case.

ip link add type dummy
ip link set dummy0 up

======================================================================
FAIL: test__cid_matches_tid (__main__.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test-udev.py", line 520, in test__cid_matches_tid
    self.assertEqual(
AssertionError: True != False : Test Case 1 failed

----------------------------------------------------------------------
Ran 6 tests in 0.025s

FAILED (failures=1)

Thanks,
Olivier

Grammar mistake "allows to do"

There is a grammar mistake in NEWS.md and stasadm.xml: "allows to do". It should be "allows one to do" or "allows doing".

Bad whatis entries for several man pages

lintian complains about following:

W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stacd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stafd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stas-config.target.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/[email protected]]

Explanation:

A manual page should start with a NAME section, which lists the program name and a brief description. The NAME section is used to generate a database that can be queried by commands like apropos and whatis. You are seeing this tag because lexgrog was unable to parse the NAME section.

Manual pages for multiple programs, functions, or files should list each separated by a comma and a space, followed by - and a common description.

Listed items may not contain any spaces. A manual page for a two-level command such as fs listacl must look like fs_listacl so the list is read correctly.

Please refer to the lexgrog(1) manual page, the groff_man(7) manual page, and the groff_mdoc(7) manual page for details.

IndexError: list index out of range

$ PYTHONPATH=nvme-stas/.build:nvme-stas/.build/subprojects/libnvme python3 nvme-stas/test/test-nvme_options.py
..E.
======================================================================
ERROR: test_fabrics_empty_file (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nvme-stas/test/test-nvme_options.py", line 34, in test_fabrics_empty_file
    nvme_options = stas.NvmeOptions()
  File "nvme-stas/.build/staslib/stas.py", line 475, in __init__
    options = [option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',')]
IndexError: list index out of range

----------------------------------------------------------------------
Ran 4 tests in 0.254s

FAILED (errors=1)

Dependency on libnvme instead of python3-libnvme?

Is nvme-stas really depending on libnvme or only only python3-libnvme? I am asking because my packaging build fails to due to

meson.build:libnvme_dep = dependency('libnvme', fallback : ['libnvme', 'libnvme_dep'])

and I don't see any direct dependency on the c library.

D-Bus configuration files allow receive_*

The package lint check says

[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[    4s] allow receive_* is normally not needed as that is the default.

D-Bus configuration files placement

Usually distribution place the D-Bus configuration under ${datadir}/dbus-1/system.d
nvme-stas seems to hard codes the placement to /etc

dbus_conf_dir = join_paths(etcdir, 'dbus-1', 'system.d')

Would it be possible to make this more packaging friendly?

Packaging: place staslib in arch-independent path

nvme-stas/staslib/meson.build

Lines 31 to 35 in 3b8e8d9

    
           python3.install_sources( 
        
               files_to_install, 
        
               pure: false, 
        
               subdir: 'staslib', 
        
           )

One more issue found during packaging - as this is a pure Python project with no architecture-specific code, we aim for a single package for all architectures (i.e. noarch in rpm world). However, the meson project places staslib in architecture-specific directory (e.g. /usr/lib64) and also ignores any supplied libdir meson argument.

Setting pure: true in the code snippet above seems to do the trick, although I'm not sure what else would it break.

Also, per https://mesonbuild.com/Python-3-module.html the python3 meson module is deprecated.

nvme-stas discovers pcie nvme devices?

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = ['_nvme-disc._tcp']
stafd_1  | Publishing an object at /org/nvmexpress/staf.
stafd_1  | Registering a service name org.nvmexpress.staf.
stafd_1  | avahi-daemon service available, zeroconf supported.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = []
stacd_1  | Stac._audit_connections()          - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C3TC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C9TC88)]
stacd_1  | Publishing an object at /org/nvmexpress/stac.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Registering a service name org.nvmexpress.stac.
stacd_1  | Stac._connect_to_staf()            - Connected to staf
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
nvme-stas_stacd_1 exited with code 139
stafd_1  | Service._config_ctrls()
stafd_1  | NameResolver.resolve_ctrl_async()  - resolving '172.20.165.201'
stafd_1  | NameResolver.resolve_ctrl_async()  - resolving '172.21.165.201'
stafd_1  | NameResolver.resolve_ctrl_async()  - resolved '172.20.165.201' -> 172.20.165.201
stafd_1  | NameResolver.resolve_ctrl_async()  - resolved '172.21.165.201' -> 172.21.165.201
stafd_1  | Staf._config_ctrls_finish()        - configured_ctrl_list = [{'transport': 'tcp', 'traddr': '172.20.165.201', 'trsvcid': '2023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}, {'transport': 'tcp', 'traddr': '172.21.165.201', 'trsvcid': '3023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}]
stafd_1  | Staf._config_ctrls_finish()        - discovered_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - referral_ctrl_list   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_add   = [(tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery), (tcp, 172.21.165.201, 3023, nqn.2014-08.org.nvmexpress.discovery)]
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_del   = []
stafd_1  | Controller._try_to_connect()       - (tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery) Connecting to nvme control with cfg={'hdr_digest': False, 'data_digest': False, 'keep_alive_tmo': 30}
nvme-stas_stafd_1 exited with code 139

Test Udev (legacy test G6) fails (when interface has multiple IPv6 addresses)

Hello,

I am trying to address various autopkgtest failures in Ubuntu.
Currently, legacy test G6 (from test-udev.py) is consistently failing in our test infrastructure. When running the test locally (i.e., meson test -C build), it succeeds if I only have one IPv6 address but it fails if two addresses are configured on a specific interface

======================================================================
FAIL: test__cid_matches_tid (test-udev.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/autopkgtest.2Tar77/autopkgtest_tmp/test/test-udev.py", line 693, in test__cid_matches_tid
    self.assertEqual(
AssertionError: False != True : Legacy Test Case G6 failed

I added some debug logs before the failing call to self.assertEqual:

match is False
ipv6_addrs is ['2001:xxx:1xx0:8xx7::axc:cxx2', 'fe80::18b8:409b:9be:4db7']
get_ipaddress_obj is 2001:xxx:1xx0:8xx7::axc:cxx2
tid = (tcp, FE80::aaaa:BBBB:cccc:dddd, 8009, hello, 2001:xxx:1xx0:8xx7::axc:cxx2)
cid_legacy = {'transport': 'tcp', 'traddr': 'FE80::aaaa:BBBB:cccc:dddd', 'trsvcid': '8009', 'subsysnqn': 'hello', 'host-traddr': '', 'host-iface': 'tun0', 'src-addr': '', 'host-nqn': ''}
ifaces = {'lo': {4: [IPv4Address('127.0.0.1')], 6: [IPv6Address('::1')]},
    'wallgarden0': {4: [IPv4Address('172.16.90.1')], 6: []},
    'mpqemubr0': {4: [IPv4Address('10.164.167.1')], 6: []},
    'lxdbr0': {4: [IPv4Address('172.16.82.1')], 6: [IPv6Address('fe80::216:3eff:fe0d:f967')]},
    'wg0': {4: [IPv4Address('10.8.3.10')], 6: [IPv6Address('fe80::db90:ce2:8e5f:670b')]},
    'dock0': {4: [IPv4Address('192.168.80.13')], 6: [IPv6Address('fe80::4a2a:e3ff:fe5b:d32f')]},
    'tun0': {4: [IPv4Address('10.172.194.130')], 6: [IPv6Address('2001:xxx:1xx0:8xx7::axc:cxx2'), IPv6Address('fe80::18b8:409b:9be:4db7')]}}
_cid_matches_tid = True

I am not sure what the tests exactly does. Is this an expected failure? I'm running on a Ubuntu 23.10 host with udev version 253.5.

Thanks,
Olivier

`make rpm` fails on `version_no_tilde` undefined in the spec file

run by

git clean -ffdx  && make rpm

see

...
Created /home/glimcb/Dev/cto/nvme-stas/.build/meson-dist/nvme-stas-1.0.tar.xz
rpmbuild -ba .build/nvme-stas.spec
error: Bad source: .build/meson-dist/nvme-stas-%{version_no_tilde}.tar.gz: No such file or directory
make: *** [Makefile:68: rpm] Error 1

nvme-stas needs additional config options to apply in-band auth settings

Libnvme/nvme-cli can use a config JSON file (schema available at https://github.com/linux-nvme/libnvme/blob/master/doc/config-schema.json) using the -J option for the respective connect-all & connect commands. This is especially useful for large scale systems with several NVMe objects where one needs to apply specific settings to individual subsystems/ports during the respective connect-all/connect in a single go. For e.g.

nvme connect-all -J /etc/nvme/config.json
nvme connect -J /etc/nvme/config.json -n <subsys_nqn> -t -a <target_IP>

But turns out nvme-stas doesn't handle the same nor provides an option to do so. So looks like this needs to be addressed in nvme-stas itself so that it can process the config JSON file, similar to how libnvme/nvme-cli already does.

How to run test suite against the installed project

I am working on packaging nvme-stas for Debian/Ubuntu (see https://bugs.debian.org/1032650). Debian/Ubuntu has autopkgtest for running test against the installed binary package. Can you add documentation how to run the test cases against the installed nvme-stas?

All the connections lost after storage upgrade because the nvme-stas stopped reconnecting after 60 times

When a storage array needs to reboot due to software upgrade and this takes more than 10 minutes, nvme-stas will drop all connections.

It's a known issue when connecting manually using "nvme connect" and could be avoid by adding "-l -1" to make the retrying infinite.

However, with nvme-stas, it's connected automatically. nvme-stas better to make it infinite as well.

Unable to save last known config: [Errno 2] No such file or directory: '/run/stafd/last-known-config.pickle'

Starting nvme-stas_stacd_1 ... done
Starting nvme-stas_stafd_1 ... done
Attaching to nvme-stas_stacd_1, nvme-stas_stafd_1
stafd_1  | Cannot determine which NVMe options the kernel supports
stafd_1  | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stacd_1  | Cannot determine which NVMe options the kernel supports
stacd_1  | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stafd_1  | avahi-daemon not available, operating w/o mDNS discovery.
stafd_1  | Unable to save last known config: [Errno 2] No such file or directory: '/run/stafd/last-known-config.pickle'
stacd_1  | Unable to save last known config: [Errno 2] No such file or directory: '/run/stacd/last-known-config.pickle'

D-Bus interfaces suggestions

I finally managed to find some time to take a closer look at the nvme-stas codebase and would like to point out couple of suggestions from usability point of view:

Most of the method calls return a simple string containing some json structure judging by the argument naming. This is quite unfortunate from consumers point of view as they need to take additional steps to parse the JSON structure, e.g. in C apps. Moreover I was unable to find any schema or definition of the returned structure, nor any API stability guarantees. D-Bus offers strong data type system among other objective principles, including complex container types and general Variant type for even more flexibility. Unless the returned JSON structure is free-form and not clearly defined, I would strongly suggest to come up with a concrete return signature.
The D-Bus interface XML files should be provided as external files so that binding generators like gdbus-codegen can build a convenient ready-to-use client API. Related to the previous point, the benefit of having clearly defined signatures is a direct access to structure members and native data types provided. This can be somewhat done by running sta[cf]d.py --idl but it's quite heavy on a build system.
And related to that point it's much easier to provide annotations and documentation within the interface XML files...

Do not allow to enable tracing for non root users

Currently, it is possible that any user can enable the trace feature:

gdbus call -y -d org.nvmexpress.stac -o /org/nvmexpress/stac -m org.freedesktop.DBus.Properties.Set org.nvmexpress.stac.debug tron '<true>'

This enables debugging information also for the syslog/journal and so on. It should not be possible for regular users to trigger that or disable that. The services should check on D-Bus level whether the caller has UID 0 and only allow the change of the property if that is the case.

Add tag for packaging

Would it possible to add an initial tag, e.g. v1.0 or v1.0-rc0? It would make the packaging simpler.

nvme-stas does not disconnect PDC on receipt of mDNS goodbye packet

Noticed a change in behavior from SLES15 SP4's nvme-stas-1.1.9-150400.3.9.3 to SP5's nvme-stas-2.2.2-150500.3.6.1 in terms of PDC handling on receipt of mDNS goodbye packet.

SLES15 SP5 Config:

# uname -r
5.14.21-150500.53-default

# rpm -qa|grep nvme
libnvme-devel-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-2.4+17.gf4cfca93998a-150500.4.3.1.x86_64
libnvme-mi1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
python3-libnvme-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-bash-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch
libnvme1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-zsh-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch

Whenever a NVMe/TCP link is down, 'stafctl ls' would remove the respective PDC entries in the staf cache in SP4's nvme-stas-1.1.9-150400.3.9.3, but that is not the case with SP5's nvme-stas-2.2.2-150500.3.6.1.

In the presence of mDNS goodbye packets:

The NVMe/TCP link down will cause an immediate cache expiration in avahi.
The host will start reconnecting to the PDC and any I/O controllers that were disconnected as the result of the link down for 10 minutes (i.e. the default ctrl_loss_tmo).
nvme-stas will disconnect from the existing PDC upon receipt of the goodbye packet and immediate cache expiration. All PDC reconnects will stop. But IO controller reconnects will continue.
After the link comes back online, an mDNS announcement is issued and the host should connect back to the PDC and to all the IO subsystems returned in the DLPE of the PDC.

Step 3 above is where the nvme-stas behavior has changed from SP4 to SP5 where nvme-stas no longer disconnects the PDC on receipt of the mDNS goodbye packet.

So is this change in behavior intentional? What necessitated it?

ValueError: dst_dir must be absolute

On Fedora33 when building rpm like this:

rpmbuild --nodeps --build-in-place -ba .build/nvme-stas.spec

I get this error during %install stage :

Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.hQKIyt
+ umask 022
+ cd /root
+ '[' .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64 '!=' / ']'
+ rm -rf .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
++ dirname .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ mkdir -p .build/rpm-pkg/BUILDROOT
+ mkdir .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ /usr/bin/meson install -C noarch-redhat-linux-gnu --no-rebuild
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/mesonbuild/mesonmain.py", line 228, in run
    return options.run_func(options)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 720, in run
    installer.do_install(datafilename)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 511, in do_install
    self.install_subdirs(d, dm, destdir, fullprefix) # Must be first, because it needs to delete the old subtree.
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 540, in install_subdirs
    self.do_copydir(d, i.path, full_dst_dir, i.exclude, i.install_mode, dm)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 445, in do_copydir
    raise ValueError(f'dst_dir must be absolute, got {dst_dir}')
ValueError: dst_dir must be absolute, got .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
Installing subdir /root/etc/stas to .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
error: Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)
make: *** [Makefile:68: rpm] Error 1

looks like this should be absolute?

+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64

reading https://docs.fedoraproject.org/en-US/packaging-guidelines/Meson ...

AttributeError: 'NoneType' object has no attribute 'warning'

# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)

# uname -r
5.16.7-1.el8.elrepo.x86_64

I see

# /usr/bin/python3 -u /usr/sbin/stafd --syslog
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 299, in __init__
    options = [ option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',') ]
OSError: [Errno 22] Invalid argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/stafd", line 91, in <module>
    from staslib import stas, avahi # pylint: disable=wrong-import-position
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 329, in <module>
    NVME_OPTIONS = NvmeOptions()
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 303, in __init__
    LOG.warning('Cannot determine which NVMe options the kernel supports')
AttributeError: 'NoneType' object has no attribute 'warning'

Root cause is here - import ordering...

see https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L91

from staslib import stas, avahi # pylint: disable=wrong-import-position

which calls calls https://github.com/linux-nvme/nvme-stas/blob/main/staslib/stas.py#L329

NVME_OPTIONS = NvmeOptions()

which can call LOG but log defined only later on here https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L106

Trim whitespace in trsvcid and traddr

nvme-stas is not trimming the trsvcid and traddr field. In nvme-cli we have a trim function in place for this:

https://github.com/linux-nvme/nvme-cli/blob/3ebf5ff8a70c85dff8cd7a5c470d4f3bb55134fe/fabrics.c#L115

Not trimming results in nvme-stas not be able to connect to targets which do make use of the specification allowing padding.

nqn.XXX:subsystem.mdns_vs_1_tcpnvme_sub_2, eth0) IP address is not valid
stacd[19046]: (tcp, 192.168.1.1                                                                                                                                                                                                                                                     , 4420

nvme: failed to disconnect, error 2

@martin-belanger is this expected ?

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stafd_1  | avahi-daemon service available, zeroconf supported.
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1  | nvme0: failed to disconnect, error 2
stacd_1  | nvme1: failed to disconnect, error 2

more verbose:

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = ['_nvme-disc._tcp']
stafd_1  | Publishing an object at /org/nvmexpress/staf.
stafd_1  | Registering a service name org.nvmexpress.staf.
stafd_1  | avahi-daemon service available, zeroconf supported.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = []
stacd_1  | Stac._audit_connections()          - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1  | Publishing an object at /org/nvmexpress/stac.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Registering a service name org.nvmexpress.stac.
stacd_1  | Stac._connect_to_staf()            - Connected to staf
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) Found existing control device: nvme1
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stafd_1  | Service._config_ctrls()
stafd_1  | Staf._config_ctrls_finish()        - configured_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - discovered_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - referral_ctrl_list   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_add   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_del   = []
stacd_1  | Service._config_ctrls()
stacd_1  | Stac._config_ctrls_finish()        - configured_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - discovered_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_add   = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_del   = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1  | Controller.disconnect()            - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1  | Controller.disconnect()            - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1  | nvme0: failed to disconnect, error 2
stacd_1  | nvme1: failed to disconnect, error 2
stacd_1  | Controller._on_disconn_success()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | Controller._on_disconn_success()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | Service.remove_controller()
stacd_1  | Service._remove_ctrl_from_dict()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | Controller.kill()                  - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1  | Controller._release_resources()    - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1  | Service.remove_controller()
stacd_1  | Service._remove_ctrl_from_dict()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | Controller.kill()                  - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1  | Controller._release_resources()    - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1  | Service._config_ctrls()
stacd_1  | Stac._config_ctrls_finish()        - configured_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - discovered_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_add   = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_del   = []

Use auto-generated hostnqn as a fallback

We're facing a distribution packaging specific issue where we cannot afford to provide unique /etc/nvme/hostnqn or /etc/nvme/hostid files for various reasons (e.g. generic pre-built rootfs image). This is typically not a problem for nvme-cli and libnvme-based tools as a stable hostnqn is autogenerated as a fallback. Not so much for hostid that is often missing, that was not really a problem either.

However nvme-stas demands those files to exist unless hostnqn or hostid are specified in sys.conf.

nvme-stas/staslib/stas.py

Lines 351 to 360 in 45c1985

    
               def hostnqn(self): 
        
                   '''@brief return the host NQN 
        
                   @return: Host NQN 
        
                   @raise: Host NQN is mandatory. The program will terminate if a 
        
                           Host NQN cannot be determined. 
        
                   ''' 
        
                   try: 
        
                       value = self.__get_value('Host', 'nqn', '/etc/nvme/hostnqn') 
        
                   except FileNotFoundError as ex: 
        
                       sys.exit('Error reading mandatory Host NQN (see stasadm --help): %s', ex)

meson: Dependency python3-libnvme not found

nvme-stas/test/meson.build

Line 9 in cf18029

    
           libnvme_dep = dependency('python3-libnvme', fallback: ['libnvme', 'libnvme_dep'], version : '>= 1.2', required: false)  # Only required to run the tests

A minor issue that I came across when packaging -rc5 in Fedora: the python3-libnvme dependency as in term of a pkg-config module is not provided by the upstream libnvme tarball and is not (typically) present in Fedora/RHEL either. Using the 2.0-rc5 tarball that has the following snippet (that was later changed by commit 8d691bd) I still get a meson failure even though the fallback argument is specified and which somewhat gets ignored by meson:

    #libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'], version : '>= 1.2')
    libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'])

Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme

test/meson.build:15:4: ERROR: Automatic wrap-based subproject downloading is disabled

With the above mentioned change on git master I still get a non-fatal meson failure:

Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme
Automatic wrap-based subproject downloading is disabled
Subproject  libnvme is buildable: NO (disabling)
Dependency python3-libnvme from subproject libnvme found: NO (subproject failed to configure)

I've changed the python3-libnvme in our builds to plain libnvme in the meantime, not sure what was your original intention though. But again, this is just a very minor glitch.

Create/Delete operations get executed in the wrong order

The interface to create connections is the pseudo-device “/dev/nvme-fabrics”. This is a blocking interface where only one connection can be made at a time. When multiple processes or threads try to make connections at the same time they simply block on “/dev/nvme-fabrics” until the previous connection request completes. A successful connection usually takes only a few milliseconds to complete. However it takes the kernel about 3 seconds to return from an unsuccessful connection (this is probably a fixed internal kernel timeout value). This means that all processes or threads blocked on “/dev/nvme-fabrics” can remain in a blocked state for several seconds when several connections fail.

Let's say 10 TCP connections are being requested and are all pending on “/dev/nvme-fabrics”. And let's say that all connection requests except for 1 are going to fail due to a momentary network issue. Finally, let's say that the only connection to succeed is the last one that the kernel will attempt. In this case, it will take 27 seconds (3 sec * 9 connections) before the kernel attempts to make the last connection request.

While connect operations are pending to complete, let's say that we want to delete the connection (nvme disconnect --nqn [NQN]) that is to be attempted last by the kernel. This command has no idea that a connect is currently pending. In fact, the command will check that there is no connection for the requested NQN and will simply return. Unfortunately, a few seconds later the pending connect will finally get executed by the kernel and the connection will be established.

We have seen this situation with nvme-stas especially during network outages where nvme-stas tries to delete connections while the connect operation is pending on “/dev/nvme-fabrics”. We end up with connections that should not exist being made.

We need to change the disconnect code in nvme-stas to take into account potentially incomplete connect operations.

Dependency on python-netifaces

The upstream project says:

Warning
netifaces needs a new maintainer. al45tair is no longer able to maintain it or make new releases due to work commitments.

https://github.com/al45tair/netifaces

Are you really sure you want to depend on this code base?

	python3.install_sources(
	files_to_install,
	pure: false,
	subdir: 'staslib',
	)

	def hostnqn(self):
	'''@brief return the host NQN
	@return: Host NQN
	@raise: Host NQN is mandatory. The program will terminate if a
	Host NQN cannot be determined.
	'''
	try:
	value = self.__get_value('Host', 'nqn', '/etc/nvme/hostnqn')
	except FileNotFoundError as ex:
	sys.exit('Error reading mandatory Host NQN (see stasadm --help): %s', ex)

linux-nvme / nvme-stas Goto Github PK

nvme-stas's Issues

Recommend Projects

Recommend Topics

Recommend Org