linux-nvme / nvme-stas Goto Github PK
View Code? Open in Web Editor NEWNVMe STorage Appliance Services
License: Apache License 2.0
NVMe STorage Appliance Services
License: Apache License 2.0
Seeing the following errors from stafd when the SFSS controller is reset on the host.
Jan 4 08:28:43 rhel-storage-09 kernel: nvme nvme0: queue_size 128 > ctrl sqsize 31, clamping down
Jan 4 08:28:43 rhel-storage-09 kernel: nvme0: Unknown(0x21), Invalid Field in Command (sct 0x0 / sc 0x2) DNR> Jan 4 08:28:43 rhel-storage-09 stafd[1124]: (tcp, 172.18.210.70, 8009, nqn.1988-11.com.dell:SFSS:1:20230103150601e8,
enp10s0f0np0) | nvme0 - Registration error. Result:0x0000, Status:0x4002 - Invalid Field in Command: A reserved coded value or
an unsupported value in a defined field.
To reproduce:
- generate I/O to all the nvme devices
- Issue resets to all the nvme controllers - example command below:
echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller
I'm seeing the following issue with make rpm on v2.1.1 when building on Fedora 36.
Is this a known problem?
[20/87] gcc -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
FAILED: subprojects/libnvme/src/libnvme.so.1.2.0
gcc -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
/usr/bin/ld: cannot open linker script file /home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld: No such file or directory
collect2: error: ld returned 1 exit status
[21/87] gcc -Isubprojects/libnvme/src/libnvme-mi.so.1.2.0.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
[22/87] gcc -Isubprojects/libnvme/test/main-test.p -Isubprojects/libnvme/test -I../subprojects/libnvme/test -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -I/usr/include/json-c -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -MD -MQ subprojects/libnvme/test/main-test.p/test.c.o -MF subprojects/libnvme/test/main-test.p/test.c.o.d -o subprojects/libnvme/test/main-test.p/test.c.o -c ../subprojects/libnvme/test/test.c
[23/87] gcc -Isubprojects/libnvme/src/libnvme-mi-test.so.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
ninja: build stopped: subcommand failed.
error: Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)
RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)
make: *** [Makefile:98: .build/rpmbuild] Error 1
fedora-vm2:nvme-stas(branch_v2.1.1) > hostnamectl
Static hostname: fedora-vm2
Icon name: computer-vm
Chassis: vm ๐ด
Machine ID: 5c079f3479774fdbac04330b59e0e7a7
Boot ID: 2f2d3e3fa116409e8406ab48e36d9657
Virtualization: kvm
Operating System: Fedora Linux 36 (Server Edition)
CPE OS Name: cpe:/o:fedoraproject:fedora:36
Kernel: Linux 6.1.7-100.fc36.x86_64
Architecture: x86-64
Hardware Vendor: QEMU
Hardware Model: Standard PC _i440FX + PIIX, 1996_
On a scaled up (roughly 450 namespaces from 80 subsystems with 2 NVMe/TCP controllers each) SLES15 SP5 MU host, one ends up seeing following stacd errors in the /var/log/messages during I/O testing with faults:
stacd[26003]: Udev._process_udev_event() - Error while polling fd: 3 [90414]
stacd[26003]: Udev._process_udev_event() - Error while polling fd: 3 [90394]
stacd[26003]: Udev._process_udev_event() - Error while polling fd: 3 [90580]
...
This manifests as dropped connections, failed paths, I/O errors, etc. on the SP5 host.
Config details below:
# uname -r
5.14.21-150500.55.7-default
# rpm -qa | grep nvme
libnvme-devel-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-bash-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
libnvme1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-zsh-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
python3-libnvme-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.x86_64
libnvme-mi1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
One test case fails when running inside a minimal Debian chroot:
==================================== 9/14 ====================================
test: Test Avahi
start time: 14:03:39
duration: 0.11s
result: exit status 1
command: MALLOC_PERTURB_=0 PYTHONPATH=/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu:/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/subprojects/libnvme /usr/bin/python3 /<<PKGBUILDDIR>>/test/test-avahi.py
----------------------------------- stderr -----------------------------------
E
======================================================================
ERROR: test_new (__main__.Test.test_new)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/<<PKGBUILDDIR>>/test/test-avahi.py", line 17, in test_new
srv = avahi.Avahi(sysbus, lambda: "ok")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/staslib/avahi.py", line 121, in __init__
self._sysbus.connection.signal_subscribe(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 169, in connection
self._connection = self._get_connection()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 327, in _get_connection
return self._provider.get_system_bus_connection()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 58, in get_system_bus_connection
return Gio.bus_get_sync(
^^^^^^^^^^^^^^^^^
gi.repository.GLib.GError: g-io-error-quark: Could not connect: No such file or directory (1)
----------------------------------------------------------------------
Ran 1 test in 0.002s
FAILED (errors=1)
==============================================================================
Installed packages inside the chroot: debhelper-compat dh-python docbook-xml docbook-xsl iproute2 libglib2.0-dev-bin meson pyflakes3 pylint python3-dasbus python3-gi python3-lxml python3-nvme python3-pyfakefs python3-pyudev python3-systemd python3:any xsltproc
Hello,
(as discussed in #406, I'm filing another issue for the sake of completeness)
In Ubuntu, we have an openstack infrastructure that we can use for testing things in VMs (this is different from the infrastructure where autopkgtests run). When running the test-suite in a VM deployed by this openstack, there are multiple udev test-cases failing:
What is particular in these machines is the presence of a VXLAN network interface having a link-local IPv6 set (it is also a member of a bridge but I don't think this is relevant). I am not exactly sure what this type of interface does but I managed to reproduce the failure after running the following steps:
ip link add type vxlan id 1234 dstport 0
ip link set vxlan0 up
I also reproduced the failures with dummy
interfaces but I'm not sure if that's a real use-case.
ip link add type dummy
ip link set dummy0 up
======================================================================
FAIL: test__cid_matches_tid (__main__.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test-udev.py", line 520, in test__cid_matches_tid
self.assertEqual(
AssertionError: True != False : Test Case 1 failed
----------------------------------------------------------------------
Ran 6 tests in 0.025s
FAILED (failures=1)
Thanks,
Olivier
There is a grammar mistake in NEWS.md
and stasadm.xml
: "allows to do". It should be "allows one to do" or "allows doing".
lintian complains about following:
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stacd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stafd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stas-config.target.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/[email protected]]
Explanation:
A manual page should start with a NAME section, which lists the program name and a brief description. The NAME section is used to generate a database that can be queried by commands like apropos and whatis. You are seeing this tag because lexgrog was unable to parse the NAME section.
Manual pages for multiple programs, functions, or files should list each separated by a comma and a space, followed by - and a common description.
Listed items may not contain any spaces. A manual page for a two-level command such as fs listacl must look like fs_listacl so the list is read correctly.
Please refer to the lexgrog(1) manual page, the groff_man(7) manual page, and the groff_mdoc(7) manual page for details.
$ PYTHONPATH=nvme-stas/.build:nvme-stas/.build/subprojects/libnvme python3 nvme-stas/test/test-nvme_options.py
..E.
======================================================================
ERROR: test_fabrics_empty_file (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "nvme-stas/test/test-nvme_options.py", line 34, in test_fabrics_empty_file
nvme_options = stas.NvmeOptions()
File "nvme-stas/.build/staslib/stas.py", line 475, in __init__
options = [option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',')]
IndexError: list index out of range
----------------------------------------------------------------------
Ran 4 tests in 0.254s
FAILED (errors=1)
Is nvme-stas really depending on libnvme or only only python3-libnvme? I am asking because my packaging build fails to due to
meson.build:libnvme_dep = dependency('libnvme', fallback : ['libnvme', 'libnvme_dep'])
and I don't see any direct dependency on the c library.
The package lint check says
[ 4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[ 4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[ 4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[ 4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[ 4s] allow receive_* is normally not needed as that is the default.
Usually distribution place the D-Bus configuration under ${datadir}/dbus-1/system.d
nvme-stas seems to hard codes the placement to /etc
dbus_conf_dir = join_paths(etcdir, 'dbus-1', 'system.d')
Would it be possible to make this more packaging friendly?
Lines 31 to 35 in 3b8e8d9
One more issue found during packaging - as this is a pure Python project with no architecture-specific code, we aim for a single package for all architectures (i.e. noarch in rpm world). However, the meson project places staslib
in architecture-specific directory (e.g. /usr/lib64
) and also ignores any supplied libdir
meson argument.
Setting pure: true
in the code snippet above seems to do the trick, although I'm not sure what else would it break.
Also, per https://mesonbuild.com/Python-3-module.html the python3
meson module is deprecated.
stafd_1 | Connecting to the system bus.
stafd_1 | Connecting to the system bus.
stafd_1 | Avahi._configure_browsers() - stypes_to_rm = []
stafd_1 | Avahi._configure_browsers() - stypes_to_add = ['_nvme-disc._tcp']
stafd_1 | Publishing an object at /org/nvmexpress/staf.
stafd_1 | Registering a service name org.nvmexpress.staf.
stafd_1 | avahi-daemon service available, zeroconf supported.
stafd_1 | Avahi._configure_browsers() - stypes_to_rm = []
stafd_1 | Avahi._configure_browsers() - stypes_to_add = []
stacd_1 | Stac._audit_connections() - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C3TC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C9TC88)]
stacd_1 | Publishing an object at /org/nvmexpress/stac.
stacd_1 | Connecting to the system bus.
stacd_1 | Connecting to the system bus.
stacd_1 | Registering a service name org.nvmexpress.stac.
stacd_1 | Stac._connect_to_staf() - Connected to staf
stacd_1 | Controller._try_to_connect() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
nvme-stas_stacd_1 exited with code 139
stafd_1 | Service._config_ctrls()
stafd_1 | NameResolver.resolve_ctrl_async() - resolving '172.20.165.201'
stafd_1 | NameResolver.resolve_ctrl_async() - resolving '172.21.165.201'
stafd_1 | NameResolver.resolve_ctrl_async() - resolved '172.20.165.201' -> 172.20.165.201
stafd_1 | NameResolver.resolve_ctrl_async() - resolved '172.21.165.201' -> 172.21.165.201
stafd_1 | Staf._config_ctrls_finish() - configured_ctrl_list = [{'transport': 'tcp', 'traddr': '172.20.165.201', 'trsvcid': '2023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}, {'transport': 'tcp', 'traddr': '172.21.165.201', 'trsvcid': '3023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}]
stafd_1 | Staf._config_ctrls_finish() - discovered_ctrl_list = []
stafd_1 | Staf._config_ctrls_finish() - referral_ctrl_list = []
stafd_1 | Staf._config_ctrls_finish() - controllers_to_add = [(tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery), (tcp, 172.21.165.201, 3023, nqn.2014-08.org.nvmexpress.discovery)]
stafd_1 | Staf._config_ctrls_finish() - controllers_to_del = []
stafd_1 | Controller._try_to_connect() - (tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery) Connecting to nvme control with cfg={'hdr_digest': False, 'data_digest': False, 'keep_alive_tmo': 30}
nvme-stas_stafd_1 exited with code 139
Hello,
I am trying to address various autopkgtest failures in Ubuntu.
Currently, legacy test G6 (from test-udev.py) is consistently failing in our test infrastructure. When running the test locally (i.e., meson test -C build
), it succeeds if I only have one IPv6 address but it fails if two addresses are configured on a specific interface
======================================================================
FAIL: test__cid_matches_tid (test-udev.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/autopkgtest.2Tar77/autopkgtest_tmp/test/test-udev.py", line 693, in test__cid_matches_tid
self.assertEqual(
AssertionError: False != True : Legacy Test Case G6 failed
I added some debug logs before the failing call to self.assertEqual
:
match is False
ipv6_addrs is ['2001:xxx:1xx0:8xx7::axc:cxx2', 'fe80::18b8:409b:9be:4db7']
get_ipaddress_obj is 2001:xxx:1xx0:8xx7::axc:cxx2
tid = (tcp, FE80::aaaa:BBBB:cccc:dddd, 8009, hello, 2001:xxx:1xx0:8xx7::axc:cxx2)
cid_legacy = {'transport': 'tcp', 'traddr': 'FE80::aaaa:BBBB:cccc:dddd', 'trsvcid': '8009', 'subsysnqn': 'hello', 'host-traddr': '', 'host-iface': 'tun0', 'src-addr': '', 'host-nqn': ''}
ifaces = {'lo': {4: [IPv4Address('127.0.0.1')], 6: [IPv6Address('::1')]},
'wallgarden0': {4: [IPv4Address('172.16.90.1')], 6: []},
'mpqemubr0': {4: [IPv4Address('10.164.167.1')], 6: []},
'lxdbr0': {4: [IPv4Address('172.16.82.1')], 6: [IPv6Address('fe80::216:3eff:fe0d:f967')]},
'wg0': {4: [IPv4Address('10.8.3.10')], 6: [IPv6Address('fe80::db90:ce2:8e5f:670b')]},
'dock0': {4: [IPv4Address('192.168.80.13')], 6: [IPv6Address('fe80::4a2a:e3ff:fe5b:d32f')]},
'tun0': {4: [IPv4Address('10.172.194.130')], 6: [IPv6Address('2001:xxx:1xx0:8xx7::axc:cxx2'), IPv6Address('fe80::18b8:409b:9be:4db7')]}}
_cid_matches_tid = True
I am not sure what the tests exactly does. Is this an expected failure? I'm running on a Ubuntu 23.10 host with udev version 253.5.
Thanks,
Olivier
run by
git clean -ffdx && make rpm
see
...
Created /home/glimcb/Dev/cto/nvme-stas/.build/meson-dist/nvme-stas-1.0.tar.xz
rpmbuild -ba .build/nvme-stas.spec
error: Bad source: .build/meson-dist/nvme-stas-%{version_no_tilde}.tar.gz: No such file or directory
make: *** [Makefile:68: rpm] Error 1
Libnvme/nvme-cli can use a config JSON file (schema available at https://github.com/linux-nvme/libnvme/blob/master/doc/config-schema.json) using the -J option for the respective connect-all & connect commands. This is especially useful for large scale systems with several NVMe objects where one needs to apply specific settings to individual subsystems/ports during the respective connect-all/connect in a single go. For e.g.
nvme connect-all -J /etc/nvme/config.json
nvme connect -J /etc/nvme/config.json -n <subsys_nqn> -t -a <target_IP>
But turns out nvme-stas doesn't handle the same nor provides an option to do so. So looks like this needs to be addressed in nvme-stas itself so that it can process the config JSON file, similar to how libnvme/nvme-cli already does.
I am working on packaging nvme-stas for Debian/Ubuntu (see https://bugs.debian.org/1032650). Debian/Ubuntu has autopkgtest for running test against the installed binary package. Can you add documentation how to run the test cases against the installed nvme-stas?
When a storage array needs to reboot due to software upgrade and this takes more than 10 minutes, nvme-stas will drop all connections.
It's a known issue when connecting manually using "nvme connect" and could be avoid by adding "-l -1" to make the retrying infinite.
However, with nvme-stas, it's connected automatically. nvme-stas better to make it infinite as well.
Starting nvme-stas_stacd_1 ... done
Starting nvme-stas_stafd_1 ... done
Attaching to nvme-stas_stacd_1, nvme-stas_stafd_1
stafd_1 | Cannot determine which NVMe options the kernel supports
stafd_1 | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stafd_1 | Connecting to the system bus.
stafd_1 | Connecting to the system bus.
stacd_1 | Cannot determine which NVMe options the kernel supports
stacd_1 | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stacd_1 | Connecting to the system bus.
stacd_1 | Connecting to the system bus.
stafd_1 | avahi-daemon not available, operating w/o mDNS discovery.
stafd_1 | Unable to save last known config: [Errno 2] No such file or directory: '/run/stafd/last-known-config.pickle'
stacd_1 | Unable to save last known config: [Errno 2] No such file or directory: '/run/stacd/last-known-config.pickle'
I finally managed to find some time to take a closer look at the nvme-stas
codebase and would like to point out couple of suggestions from usability point of view:
gdbus-codegen
can build a convenient ready-to-use client API. Related to the previous point, the benefit of having clearly defined signatures is a direct access to structure members and native data types provided. This can be somewhat done by running sta[cf]d.py --idl
but it's quite heavy on a build system.See also https://dbus.freedesktop.org/doc/dbus-api-design.html and https://dbus.freedesktop.org/doc/dbus-specification.html#container-types
Currently, it is possible that any user can enable the trace feature:
gdbus call -y -d org.nvmexpress.stac -o /org/nvmexpress/stac -m org.freedesktop.DBus.Properties.Set org.nvmexpress.stac.debug tron '<true>'
This enables debugging information also for the syslog/journal and so on. It should not be possible for regular users to trigger that or disable that. The services should check on D-Bus level whether the caller has UID 0 and only allow the change of the property if that is the case.
Would it possible to add an initial tag, e.g. v1.0 or v1.0-rc0? It would make the packaging simpler.
Noticed a change in behavior from SLES15 SP4's nvme-stas-1.1.9-150400.3.9.3 to SP5's nvme-stas-2.2.2-150500.3.6.1 in terms of PDC handling on receipt of mDNS goodbye packet.
SLES15 SP5 Config:
# uname -r
5.14.21-150500.53-default
# rpm -qa|grep nvme
libnvme-devel-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-2.4+17.gf4cfca93998a-150500.4.3.1.x86_64
libnvme-mi1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
python3-libnvme-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-bash-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch
libnvme1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-zsh-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch
Whenever a NVMe/TCP link is down, 'stafctl ls' would remove the respective PDC entries in the staf cache in SP4's nvme-stas-1.1.9-150400.3.9.3, but that is not the case with SP5's nvme-stas-2.2.2-150500.3.6.1.
In the presence of mDNS goodbye packets:
Step 3 above is where the nvme-stas behavior has changed from SP4 to SP5 where nvme-stas no longer disconnects the PDC on receipt of the mDNS goodbye packet.
So is this change in behavior intentional? What necessitated it?
On Fedora33 when building rpm like this:
rpmbuild --nodeps --build-in-place -ba .build/nvme-stas.spec
I get this error during %install
stage :
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.hQKIyt
+ umask 022
+ cd /root
+ '[' .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64 '!=' / ']'
+ rm -rf .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
++ dirname .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ mkdir -p .build/rpm-pkg/BUILDROOT
+ mkdir .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ /usr/bin/meson install -C noarch-redhat-linux-gnu --no-rebuild
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/mesonbuild/mesonmain.py", line 228, in run
return options.run_func(options)
File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 720, in run
installer.do_install(datafilename)
File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 511, in do_install
self.install_subdirs(d, dm, destdir, fullprefix) # Must be first, because it needs to delete the old subtree.
File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 540, in install_subdirs
self.do_copydir(d, i.path, full_dst_dir, i.exclude, i.install_mode, dm)
File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 445, in do_copydir
raise ValueError(f'dst_dir must be absolute, got {dst_dir}')
ValueError: dst_dir must be absolute, got .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
Installing subdir /root/etc/stas to .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
error: Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)
RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)
make: *** [Makefile:68: rpm] Error 1
looks like this should be absolute?
+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
reading https://docs.fedoraproject.org/en-US/packaging-guidelines/Meson ...
on
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)
# uname -r
5.16.7-1.el8.elrepo.x86_64
I see
# /usr/bin/python3 -u /usr/sbin/stafd --syslog
Traceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 299, in __init__
options = [ option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',') ]
OSError: [Errno 22] Invalid argument
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/sbin/stafd", line 91, in <module>
from staslib import stas, avahi # pylint: disable=wrong-import-position
File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 329, in <module>
NVME_OPTIONS = NvmeOptions()
File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 303, in __init__
LOG.warning('Cannot determine which NVMe options the kernel supports')
AttributeError: 'NoneType' object has no attribute 'warning'
Root cause is here - import ordering...
see https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L91
from staslib import stas, avahi # pylint: disable=wrong-import-position
which calls calls https://github.com/linux-nvme/nvme-stas/blob/main/staslib/stas.py#L329
NVME_OPTIONS = NvmeOptions()
which can call LOG but log defined only later on here https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L106
nvme-stas is not trimming the trsvcid and traddr field. In nvme-cli we have a trim function in place for this:
https://github.com/linux-nvme/nvme-cli/blob/3ebf5ff8a70c85dff8cd7a5c470d4f3bb55134fe/fabrics.c#L115
Not trimming results in nvme-stas not be able to connect to targets which do make use of the specification allowing padding.
nqn.XXX:subsystem.mdns_vs_1_tcpnvme_sub_2, eth0) IP address is not valid
stacd[19046]: (tcp, 192.168.1.1 , 4420
@martin-belanger is this expected ?
stafd_1 | Connecting to the system bus.
stafd_1 | Connecting to the system bus.
stacd_1 | Connecting to the system bus.
stacd_1 | Connecting to the system bus.
stafd_1 | avahi-daemon service available, zeroconf supported.
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1 | nvme0: failed to disconnect, error 2
stacd_1 | nvme1: failed to disconnect, error 2
more verbose:
stafd_1 | Connecting to the system bus.
stafd_1 | Connecting to the system bus.
stafd_1 | Avahi._configure_browsers() - stypes_to_rm = []
stafd_1 | Avahi._configure_browsers() - stypes_to_add = ['_nvme-disc._tcp']
stafd_1 | Publishing an object at /org/nvmexpress/staf.
stafd_1 | Registering a service name org.nvmexpress.staf.
stafd_1 | avahi-daemon service available, zeroconf supported.
stafd_1 | Avahi._configure_browsers() - stypes_to_rm = []
stafd_1 | Avahi._configure_browsers() - stypes_to_add = []
stacd_1 | Stac._audit_connections() - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1 | Publishing an object at /org/nvmexpress/stac.
stacd_1 | Connecting to the system bus.
stacd_1 | Connecting to the system bus.
stacd_1 | Registering a service name org.nvmexpress.stac.
stacd_1 | Stac._connect_to_staf() - Connected to staf
stacd_1 | Controller._try_to_connect() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
stacd_1 | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
stacd_1 | Controller._try_to_connect() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) Found existing control device: nvme1
stacd_1 | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
stacd_1 | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stafd_1 | Service._config_ctrls()
stafd_1 | Staf._config_ctrls_finish() - configured_ctrl_list = []
stafd_1 | Staf._config_ctrls_finish() - discovered_ctrl_list = []
stafd_1 | Staf._config_ctrls_finish() - referral_ctrl_list = []
stafd_1 | Staf._config_ctrls_finish() - controllers_to_add = []
stafd_1 | Staf._config_ctrls_finish() - controllers_to_del = []
stacd_1 | Service._config_ctrls()
stacd_1 | Stac._config_ctrls_finish() - configured_ctrl_list = []
stacd_1 | Stac._config_ctrls_finish() - discovered_ctrl_list = []
stacd_1 | Stac._config_ctrls_finish() - controllers_to_add = []
stacd_1 | Stac._config_ctrls_finish() - controllers_to_del = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1 | Controller.disconnect() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1 | Controller.disconnect() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1 | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1 | nvme0: failed to disconnect, error 2
stacd_1 | nvme1: failed to disconnect, error 2
stacd_1 | Controller._on_disconn_success() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1 | Controller._on_disconn_success() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1 | Service.remove_controller()
stacd_1 | Service._remove_ctrl_from_dict() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1 | Controller.kill() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1 | Controller._release_resources() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1 | Service.remove_controller()
stacd_1 | Service._remove_ctrl_from_dict() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1 | Controller.kill() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1 | Controller._release_resources() - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1 | Service._config_ctrls()
stacd_1 | Stac._config_ctrls_finish() - configured_ctrl_list = []
stacd_1 | Stac._config_ctrls_finish() - discovered_ctrl_list = []
stacd_1 | Stac._config_ctrls_finish() - controllers_to_add = []
stacd_1 | Stac._config_ctrls_finish() - controllers_to_del = []
We're facing a distribution packaging specific issue where we cannot afford to provide unique /etc/nvme/hostnqn
or /etc/nvme/hostid
files for various reasons (e.g. generic pre-built rootfs image). This is typically not a problem for nvme-cli
and libnvme
-based tools as a stable hostnqn
is autogenerated as a fallback. Not so much for hostid
that is often missing, that was not really a problem either.
However nvme-stas
demands those files to exist unless hostnqn
or hostid
are specified in sys.conf
.
Lines 351 to 360 in 45c1985
Line 9 in cf18029
A minor issue that I came across when packaging -rc5 in Fedora: the python3-libnvme
dependency as in term of a pkg-config
module is not provided by the upstream libnvme
tarball and is not (typically) present in Fedora/RHEL either. Using the 2.0-rc5
tarball that has the following snippet (that was later changed by commit 8d691bd) I still get a meson failure even though the fallback
argument is specified and which somewhat gets ignored by meson:
#libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'], version : '>= 1.2')
libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'])
Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme
test/meson.build:15:4: ERROR: Automatic wrap-based subproject downloading is disabled
With the above mentioned change on git master I still get a non-fatal meson failure:
Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme
Automatic wrap-based subproject downloading is disabled
Subproject libnvme is buildable: NO (disabling)
Dependency python3-libnvme from subproject libnvme found: NO (subproject failed to configure)
I've changed the python3-libnvme
in our builds to plain libnvme
in the meantime, not sure what was your original intention though. But again, this is just a very minor glitch.
The interface to create connections is the pseudo-device โ/dev/nvme-fabrics
โ. This is a blocking interface where only one connection can be made at a time. When multiple processes or threads try to make connections at the same time they simply block on โ/dev/nvme-fabrics
โ until the previous connection request completes. A successful connection usually takes only a few milliseconds to complete. However it takes the kernel about 3 seconds to return from an unsuccessful connection (this is probably a fixed internal kernel timeout value). This means that all processes or threads blocked on โ/dev/nvme-fabrics
โ can remain in a blocked state for several seconds when several connections fail.
Let's say 10 TCP connections are being requested and are all pending on โ/dev/nvme-fabrics
โ. And let's say that all connection requests except for 1 are going to fail due to a momentary network issue. Finally, let's say that the only connection to succeed is the last one that the kernel will attempt. In this case, it will take 27 seconds (3 sec * 9 connections) before the kernel attempts to make the last connection request.
While connect operations are pending to complete, let's say that we want to delete the connection (nvme disconnect --nqn [NQN]
) that is to be attempted last by the kernel. This command has no idea that a connect is currently pending. In fact, the command will check that there is no connection for the requested NQN and will simply return. Unfortunately, a few seconds later the pending connect will finally get executed by the kernel and the connection will be established.
We have seen this situation with nvme-stas
especially during network outages where nvme-stas
tries to delete connections while the connect
operation is pending on โ/dev/nvme-fabrics
โ. We end up with connections that should not exist being made.
We need to change the disconnect
code in nvme-stas to take into account potentially incomplete connect
operations.
The upstream project says:
Warning
netifaces needs a new maintainer. al45tair is no longer able to maintain it or make new releases due to work commitments.
https://github.com/al45tair/netifaces
Are you really sure you want to depend on this code base?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.