Coder Social home page Coder Social logo

nvme-stas's Introduction

STorage Appliance Services (STAS)

Build GitHub Code style: black Release GitHub commits Read the Docs codecov Minimum Python Version

What does nvme-stas provide?

  • A Central Discovery Controller (CDC) client for Linux
  • Asynchronous Event Notifications (AEN) handling
  • Automated NVMe subsystem connection controls
  • Error handling and reporting
  • Automatic (zeroconf) and Manual configuration

Overview

STAS is composed of two services, STAF and STAC, running on the Host computer.

STAF - STorage Appliance Finder. The tasks performed by STAF include:

  • Register with the Avahi daemon for service type _nvme-disc._tcp. This allows STAF to locate Central or Direct Discovery Controllers (CDC, DDC) with zero-touch provisioning (ZTP). STAF also allows users to manually enter CDCs and DDCs in a configuration file (/etc/stas/stafd.conf) when users prefer not to use ZTP.
  • Connect to discovered or configured CDCs or DDCs.
  • Retrieve the list of storage subsystems using the "get log page" command.
  • Maintain a cache of the discovered storage subsystems.
  • Provide a D-Bus interface where 3rd party applications can retrieve the data about the Discovery Controller connections (e.g. log pages).

STAC - STorage Appliance Connector. The tasks performed by STAC include:

  • Read the list of storage subsystems from STAF over D-Bus.
  • Similar to STAF, STAC can also read a list of storage subsystems to connect to from a configuration file.
  • Set up the I/O controller connections.
  • Provide a D-Bus interface where 3rd party applications can retrieve data about the I/O controller connections.

Definition

Design

stafd and stacd use the GLib main loop. The GLib Python module provides several low-level building blocks that are needed by stafd and stacd. In addition, many Python modules "play nice" with GLib such as dasbus and pyudev. GLib also provides additional components such as timers, signal handlers, and much more.

stafd connects to the avahi-daemon, which it uses to detect Central Discovery Controllers (CDC) and Direct Discovery Controllers (DDC). When Discovery Controllers (DC) are found with Avahi's help, stafd uses libnvme to set up persistent connections and retrieve the discovery log pages.

Daemonization

stafd and stacd are managed as systemd services. The following operations are supported (here showing only stafd, but the same operations apply to stacd):

  • systemctl start stafd. Start daemon.
  • systemctl stop stafd. Stop daemon. The SIGTERM signal is used to tell the daemon to stop.
  • systemctl restart stafd. Effectively a stop + start.
  • systemctl reload stafd. Reload configuration. This is done in real time without restarting the daemon. The SIGHUP signal is used to tell the daemon to reload its configuration file.

Configuration

As stated before, stafd can automatically locate discovery controllers with the help of Avahi and connect to them, and stacd can automatically set up the I/O connections to discovered storage subsystems. However, stafd and stacd can also operate in a non-automatic mode based on manually entered configuration. In other words, discovery controllers and/or storage subsystems can be entered manually. This is to provide customers with more flexibility. The configuration for each daemon is found in /etc/stas/stafd.conf and /etc/stas/stacd.conf respectively. The configuration files also provide additional parameters, such as log-level attributes used mainly for debugging purposes.

The following configuration files are defined:

File Consumer Purpose
/etc/stas/sys.conf stafd + stacd Contains system-wide (i.e. host) configuration such as the Host NQN, the Host ID, and the Host Symbolic Name. Changes to this file can be made manually or with the help of the stasadm utility as described in the previous section.

For example, stasadm hostnqn -f /etc/nvme/hostnqn writes the Host NQN to the file /etc/nvme/hostnqn, but also adds an entry to /etc/stas/sys.conf to indicate where the Host NQN has been saved.

This gives nvme-stas the flexibility of defining its own Host parameters or to use the same parameters defined by libnvme and nvme-cli.
/etc/stas/stafd.conf stafd Contains configuration specific to stafd. Discovery controllers can be manually added or excluded in this file.
/etc/stas/stacd.conf stacd Contains configuration specific to stacd. I/O controllers can be manually added or excluded in this file.

D-Bus interface

The interface to stafd and stacd is D-Bus. This allows other programs, such as stafctl and stacctl, to communicate with the daemons. This also provides third parties the ability to write their own applications that can interact with stafd and stacd. For example, someone could decide to write a GUI where they would display the discovery controllers as well as the all the discovery log pages in a "pretty" window. The next table provides info about the two D-Bus interfaces.

Component D-Bus address
stafd org.nvmexpress.staf.conf
stacd org.nvmexpress.stac.conf

Companion programs: stafctl and stacctl

stafctl and stacctl are utilities that allow users to interact with stafd and stacd respectively. This is a model used by several programs, such as systemctl with systemd.

At a minimum, these utilities provide debug tools, but they could also provide some configuration capabilities (TBD).

Packages

stafd and stacd as well as their companion programs stafctl and stacctl are released together in a package called "nvme-stas" for STorage Applicance Services (e.g. stas-1.0.0-1.x86_64.rpm or stas_1.0.0_amd64.deb).

Dependencies

stafd/stacd require Linux kernel 5.14 or later.

The following packages must be installed to use stafd/stacd

Debian packages (tested on Ubuntu 20.04):

sudo apt-get install -y python3-pyudev python3-systemd python3-gi
sudo apt-get install -y python3-dasbus # Ubuntu 22.04
OR:
sudo pip3 install dasbus # Ubuntu 20.04

RPM packages (tested on Fedora 34..35 and SLES15):

sudo dnf install -y python3-dasbus python3-pyudev python3-systemd python3-gobject

STAF - STorage Appliance Finder

Component Description
/usr/sbin/stafd A daemon that finds (discovers) NVMe storage appliances.
/usr/bin/stafctl A companion shell utility for stafd.
/etc/stas/stafd.conf Configuration file

stafd configuration file

The configuration file is named /etc/stas/stafd.conf. This file contains configuration parameters for the stafd daemon. One of the things you may want to configure is the IP address of the discovery controller(s) you want stafd to connect to. The configuration file contains a description of all the parameters that can be configured.

Service discovery with Avahi

stafd can automatically find and set up connections to Discovery Controllers. To do this, stafd registers with the Avahi, the mDNS/DNS-SD (Service Discovery) daemon. Discovery Controllers that advertise themselves with service type _nvme-disc._tcp will be recognized by Avahi, which will inform stafd.

Not receiving mDNS packets?

If stafd is not detecting any discovery controllers through Avahi, it could simply be that the mDNS packets are being suppressed by your firewall. If you know for a fact that the discovery controllers are advertizing themselves with mDNS packets, make sure that the Avahi daemon is receiving them as follows:

avahi-browse -t -r _nvme-disc._tcp

If you're not seeing anything, then check whether your firewall allows mDNS packets.

Why is Avahi failing to discover services on some interfaces?

Linux limits the number of multicast group memberships that a host can belong to. The default is 20. For Avahi to monitor mDNS (multicast DNS) packets on all interfaces, the host computer must be able to register one multicast group per interface. This can be physical or logical interfaces. For example, configuring 10 VLANs on a physical interface increases the total number of interfaces by 10. If the total number of interfaces is greater than the limit of 20, then Avahi won't be able to monitor all interfaces.

The limit can be changed by configuring the variable igmp_max_memberships. This variable is defined here in the kernel documentation. And this StackExchange page describes how one can increase the limit.

STAC - STorage Appliance Connector

File name Description
/usr/sbin/stacd A daemon that connects to NVMe storage appliances.
/usr/bin/stacctl A companion shell utility for stacd.
/etc/stas/stacd.conf Configuration file

stacd configuration file

The configuration file is named /etc/stas/stacd.conf. In this file you can configure storage appliances that stacd will connect to. By default, stacd uses information (log pages) collected from stafd to connect to storage appliances. However, you can also manually enter IP addresses of storage appliances in this file.

System configuration

A host must be provided with a Host NQN and a Host ID. nvme-stas will not run without these two mandatory configuration parameters. To follow in the footsteps of nvme-cli and libnvme, nvme-stas will use the same Host NQN and ID that nvme-cli and libnvme use by default. In other words, nvme-stas will read the Host NQN and ID from these two files by default:

  1. /etc/nvme/hostnqn
  2. /etc/nvme/hostid

Using the same configuration files will ensure consistency between nvme-stas, nvme-cli, and libnvme. On the other hand, nvme-stas can operate with a different Host NQN and/or ID. In that case, one can specify them in /etc/stas/sys.conf.

A new optional configuration parameters introduced in TP8010, the Host Symbolic Name, can also be specified in /etc/stas/sys.conf. The schema/documentation for /etc/stas/sys.conf can be found /etc/stas/sys.conf.doc.

Build, install, unit tests

STAS uses the meson build system. Since STAS is a Python project, there is no code to build. However, the code needs to be installed using meson. Unit tests can also be run with meson.

Using meson

Invoke meson to configure the project:

meson setup .build

The command meson setup .build need only be called once. This analyzes the project and the host computer to determine if all the necessary tools and dependencies are available. The result is saved to the directory named .build.

To compile the code:

meson compile -C .build

To install / uninstall the code:

meson install -C .build    # Wrapper for ninja install -C .build
ninja uninstall -C .build  # Unfortunately there's no meson wrapper

To run the unit tests:

meson test -C .build

For more information about testing, please refer to: TESTING.md

Alternate approach using Good-ole make

Recognizing that many people are not familiar with meson, we're providing a second way to install the code using the more familiar configure script combined with a make.

./configure
make

This performs the same operations as the meson approach described above. The configure script is automatically invoked (using default configuration parameters) when running make by itself.

make command Description
make Build the code. This command will automatically invoke the .configure scripts (using default configuration parameters) if the project is not already configured.
make install Install the code. Requires root privileges (you will be asked to enter your password).
make uninstall Uninstall the code. Requires root privileges (you will be asked to enter your password).
make test Run the unit tests
make clean Clean build artifacts, but does not remove the meson's configuration. That is, the configuration in .build is preserved.
make purge Remove all build artifacts including the .build directory.
make update-subprojects Bring subprojects like libnvme up to date
make black Verify that source code complies to black coding style

Containerization

Use published image (optional)

docker pull ghcr.io/linux-nvme/nvme-stas:main

Build your own image (optional)

docker-compose up --build

Run services using docker-compose like this

docker-compose up

Run companion programs stafctl and stacctl like this

docker-compose exec stafd stafctl ls
docker-compose exec stafd stafctl status

docker-compose exec stacd stacctl ls
docker-compose exec stacd stacctl status

dependencies: dbus, avahi.

Generating the documentation

nvme-stas uses the following programs to generate the documentation. These can be installed as shown in the "dependencies" section below.

  • xsltproc - Used to convert DocBook XML notation to "man pages" and "html pages".
  • gdbus-codegen - Used to convert D-Bus IDL to DocBook XML notation.

Dependencies

The following packages must be installed to generate the documentation

Debian packages (tested on Ubuntu 20.04, 22.04):

sudo apt-get install -y docbook-xml docbook-xsl xsltproc libglib2.0-dev

RPM packages (tested on Fedora 34..37 and SLES15):

sudo dnf install -y docbook-style-xsl libxslt glib2-devel

Configuring and building the man and html pages

By default, the documentation is not built. You need to run the configure as follows to tell meson that you want to build the documentation. You may need to first purge any previous configuration.

make purge
./configure -Dman=true -Dhtml=true
make

Generating RPM and/or DEB packages

make rpm
make deb

nvme-stas's People

Contributors

bdrung avatar dependabot[bot] avatar glimchb avatar igaw avatar keithbusch avatar martin-belanger avatar martin-gpy avatar ogayot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nvme-stas's Issues

D-Bus configuration files allow receive_*

The package lint check says

[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.stac"/> /usr/share/dbus-1/system.d/org.nvmexpress.stac.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[    4s] nvme-stas.x86_64: W: dbus-policy-allow-receive <allow receive_sender="org.nvmexpress.staf"/> /usr/share/dbus-1/system.d/org.nvmexpress.staf.conf
[    4s] allow receive_* is normally not needed as that is the default.

Add tag for packaging

Would it possible to add an initial tag, e.g. v1.0 or v1.0-rc0? It would make the packaging simpler.

meson: Dependency python3-libnvme not found

libnvme_dep = dependency('python3-libnvme', fallback: ['libnvme', 'libnvme_dep'], version : '>= 1.2', required: false) # Only required to run the tests

A minor issue that I came across when packaging -rc5 in Fedora: the python3-libnvme dependency as in term of a pkg-config module is not provided by the upstream libnvme tarball and is not (typically) present in Fedora/RHEL either. Using the 2.0-rc5 tarball that has the following snippet (that was later changed by commit 8d691bd) I still get a meson failure even though the fallback argument is specified and which somewhat gets ignored by meson:

    #libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'], version : '>= 1.2')
    libnvme_dep = dependency('python3-libnvme', fallback : ['libnvme', 'libnvme_dep'])
Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme

test/meson.build:15:4: ERROR: Automatic wrap-based subproject downloading is disabled

With the above mentioned change on git master I still get a non-fatal meson failure:

Found pkg-config: /usr/bin/pkg-config (1.8.0)
Found CMake: /usr/bin/cmake (3.24.1)
Run-time dependency python3-libnvme found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency python3-libnvme
Automatic wrap-based subproject downloading is disabled
Subproject  libnvme is buildable: NO (disabling)
Dependency python3-libnvme from subproject libnvme found: NO (subproject failed to configure)

I've changed the python3-libnvme in our builds to plain libnvme in the meantime, not sure what was your original intention though. But again, this is just a very minor glitch.

make rpm fails on fedora-36

I'm seeing the following issue with make rpm on v2.1.1 when building on Fedora 36.

Is this a known problem?

[20/87] gcc  -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
FAILED: subprojects/libnvme/src/libnvme.so.1.2.0 
gcc  -o subprojects/libnvme/src/libnvme.so.1.2.0 subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_cleanup.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_fabrics.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_filters.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_ioctl.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_linux.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_log.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_tree.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_util.c.o subprojects/libnvme/src/libnvme.so.1.2.0.p/nvme_json.c.o -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libnvme.so.1 -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -Wl,-dT,/home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection subprojects/libnvme/ccan/libccan.a -Wl,--version-script=/home/jmeneghi/repos/nvme-stas/subprojects/libnvme/src/libnvme.map /usr/lib64/libjson-c.so /usr/lib64/libssl.so /usr/lib64/libcrypto.so -Wl,--end-group
/usr/bin/ld: cannot open linker script file /home/jmeneghi/repos/nvme-stas/.package_note-nvme-stas-2.1.2-1.fc36.x86_64.ld: No such file or directory
collect2: error: ld returned 1 exit status
[21/87] gcc -Isubprojects/libnvme/src/libnvme-mi.so.1.2.0.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi.so.1.2.0.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
[22/87] gcc -Isubprojects/libnvme/test/main-test.p -Isubprojects/libnvme/test -I../subprojects/libnvme/test -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -I/usr/include/json-c -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -MD -MQ subprojects/libnvme/test/main-test.p/test.c.o -MF subprojects/libnvme/test/main-test.p/test.c.o.d -o subprojects/libnvme/test/main-test.p/test.c.o -c ../subprojects/libnvme/test/test.c
[23/87] gcc -Isubprojects/libnvme/src/libnvme-mi-test.so.p -Isubprojects/libnvme/src -I../subprojects/libnvme/src -Isubprojects/libnvme -I../subprojects/libnvme -Isubprojects/libnvme/ccan -I../subprojects/libnvme/ccan -Isubprojects/libnvme/internal -I../subprojects/libnvme/internal -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O0 -fomit-frame-pointer -D_GNU_SOURCE -include internal/config.h -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -MD -MQ subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -MF subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o.d -o subprojects/libnvme/src/libnvme-mi-test.so.p/nvme_mi.c.o -c ../subprojects/libnvme/src/nvme/mi.c
ninja: build stopped: subcommand failed.
error: Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.AX6nLw (%build)
make: *** [Makefile:98: .build/rpmbuild] Error 1
fedora-vm2:nvme-stas(branch_v2.1.1) > hostnamectl
 Static hostname: fedora-vm2
       Icon name: computer-vm
         Chassis: vm ๐Ÿ–ด
      Machine ID: 5c079f3479774fdbac04330b59e0e7a7
         Boot ID: 2f2d3e3fa116409e8406ab48e36d9657
  Virtualization: kvm
Operating System: Fedora Linux 36 (Server Edition) 
     CPE OS Name: cpe:/o:fedoraproject:fedora:36
          Kernel: Linux 6.1.7-100.fc36.x86_64
    Architecture: x86-64
 Hardware Vendor: QEMU
  Hardware Model: Standard PC _i440FX + PIIX, 1996_

nvme-stas needs additional config options to apply in-band auth settings

Libnvme/nvme-cli can use a config JSON file (schema available at https://github.com/linux-nvme/libnvme/blob/master/doc/config-schema.json) using the -J option for the respective connect-all & connect commands. This is especially useful for large scale systems with several NVMe objects where one needs to apply specific settings to individual subsystems/ports during the respective connect-all/connect in a single go. For e.g.

nvme connect-all -J /etc/nvme/config.json
nvme connect -J /etc/nvme/config.json -n <subsys_nqn> -t -a <target_IP>

But turns out nvme-stas doesn't handle the same nor provides an option to do so. So looks like this needs to be addressed in nvme-stas itself so that it can process the config JSON file, similar to how libnvme/nvme-cli already does.

Create/Delete operations get executed in the wrong order

The interface to create connections is the pseudo-device โ€œ/dev/nvme-fabricsโ€. This is a blocking interface where only one connection can be made at a time. When multiple processes or threads try to make connections at the same time they simply block on โ€œ/dev/nvme-fabricsโ€ until the previous connection request completes. A successful connection usually takes only a few milliseconds to complete. However it takes the kernel about 3 seconds to return from an unsuccessful connection (this is probably a fixed internal kernel timeout value). This means that all processes or threads blocked on โ€œ/dev/nvme-fabricsโ€ can remain in a blocked state for several seconds when several connections fail.

Let's say 10 TCP connections are being requested and are all pending on โ€œ/dev/nvme-fabricsโ€. And let's say that all connection requests except for 1 are going to fail due to a momentary network issue. Finally, let's say that the only connection to succeed is the last one that the kernel will attempt. In this case, it will take 27 seconds (3 sec * 9 connections) before the kernel attempts to make the last connection request.

While connect operations are pending to complete, let's say that we want to delete the connection (nvme disconnect --nqn [NQN]) that is to be attempted last by the kernel. This command has no idea that a connect is currently pending. In fact, the command will check that there is no connection for the requested NQN and will simply return. Unfortunately, a few seconds later the pending connect will finally get executed by the kernel and the connection will be established.

We have seen this situation with nvme-stas especially during network outages where nvme-stas tries to delete connections while the connect operation is pending on โ€œ/dev/nvme-fabricsโ€. We end up with connections that should not exist being made.

We need to change the disconnect code in nvme-stas to take into account potentially incomplete connect operations.

Udev test issues with esoteric network interfaces

Hello,

(as discussed in #406, I'm filing another issue for the sake of completeness)

In Ubuntu, we have an openstack infrastructure that we can use for testing things in VMs (this is different from the infrastructure where autopkgtests run). When running the test-suite in a VM deployed by this openstack, there are multiple udev test-cases failing:

  • Test Case 1
  • Test Case 8
  • Test Case 11
  • Test Case 12
  • Legacy Test Case D6
  • Legacy Test Case F6

What is particular in these machines is the presence of a VXLAN network interface having a link-local IPv6 set (it is also a member of a bridge but I don't think this is relevant). I am not exactly sure what this type of interface does but I managed to reproduce the failure after running the following steps:

ip link add type vxlan id 1234 dstport 0
ip link set vxlan0 up 

I also reproduced the failures with dummy interfaces but I'm not sure if that's a real use-case.

ip link add type dummy
ip link set dummy0 up
======================================================================
FAIL: test__cid_matches_tid (__main__.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test-udev.py", line 520, in test__cid_matches_tid
    self.assertEqual(
AssertionError: True != False : Test Case 1 failed

----------------------------------------------------------------------
Ran 6 tests in 0.025s

FAILED (failures=1)

Thanks,
Olivier

nvme-stas does not disconnect PDC on receipt of mDNS goodbye packet

Noticed a change in behavior from SLES15 SP4's nvme-stas-1.1.9-150400.3.9.3 to SP5's nvme-stas-2.2.2-150500.3.6.1 in terms of PDC handling on receipt of mDNS goodbye packet.

SLES15 SP5 Config:

# uname -r
5.14.21-150500.53-default

# rpm -qa|grep nvme
libnvme-devel-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-2.4+17.gf4cfca93998a-150500.4.3.1.x86_64
libnvme-mi1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
python3-libnvme-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-cli-bash-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch
libnvme1-1.4+18.g932f9c37e05a-150500.4.3.1.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-zsh-completion-2.4+17.gf4cfca93998a-150500.4.3.1.noarch

Whenever a NVMe/TCP link is down, 'stafctl ls' would remove the respective PDC entries in the staf cache in SP4's nvme-stas-1.1.9-150400.3.9.3, but that is not the case with SP5's nvme-stas-2.2.2-150500.3.6.1.

In the presence of mDNS goodbye packets:

  1. The NVMe/TCP link down will cause an immediate cache expiration in avahi.
  2. The host will start reconnecting to the PDC and any I/O controllers that were disconnected as the result of the link down for 10 minutes (i.e. the default ctrl_loss_tmo).
  3. nvme-stas will disconnect from the existing PDC upon receipt of the goodbye packet and immediate cache expiration. All PDC reconnects will stop. But IO controller reconnects will continue.
  4. After the link comes back online, an mDNS announcement is issued and the host should connect back to the PDC and to all the IO subsystems returned in the DLPE of the PDC.

Step 3 above is where the nvme-stas behavior has changed from SP4 to SP5 where nvme-stas no longer disconnects the PDC on receipt of the mDNS goodbye packet.

So is this change in behavior intentional? What necessitated it?

Unable to save last known config: [Errno 2] No such file or directory: '/run/stafd/last-known-config.pickle'

Starting nvme-stas_stacd_1 ... done
Starting nvme-stas_stafd_1 ... done
Attaching to nvme-stas_stacd_1, nvme-stas_stafd_1
stafd_1  | Cannot determine which NVMe options the kernel supports
stafd_1  | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stacd_1  | Cannot determine which NVMe options the kernel supports
stacd_1  | Kernel does not appear to support all the options needed to run this program. Consider updating to a later kernel version.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stafd_1  | avahi-daemon not available, operating w/o mDNS discovery.
stafd_1  | Unable to save last known config: [Errno 2] No such file or directory: '/run/stafd/last-known-config.pickle'
stacd_1  | Unable to save last known config: [Errno 2] No such file or directory: '/run/stacd/last-known-config.pickle'

Dependency on libnvme instead of python3-libnvme?

Is nvme-stas really depending on libnvme or only only python3-libnvme? I am asking because my packaging build fails to due to

meson.build:libnvme_dep = dependency('libnvme', fallback : ['libnvme', 'libnvme_dep'])

and I don't see any direct dependency on the c library.

nvme-stas discovers pcie nvme devices?

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = ['_nvme-disc._tcp']
stafd_1  | Publishing an object at /org/nvmexpress/staf.
stafd_1  | Registering a service name org.nvmexpress.staf.
stafd_1  | avahi-daemon service available, zeroconf supported.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = []
stacd_1  | Stac._audit_connections()          - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C3TC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0C9TC88)]
stacd_1  | Publishing an object at /org/nvmexpress/stac.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Registering a service name org.nvmexpress.stac.
stacd_1  | Stac._connect_to_staf()            - Connected to staf
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
nvme-stas_stacd_1 exited with code 139
stafd_1  | Service._config_ctrls()
stafd_1  | NameResolver.resolve_ctrl_async()  - resolving '172.20.165.201'
stafd_1  | NameResolver.resolve_ctrl_async()  - resolving '172.21.165.201'
stafd_1  | NameResolver.resolve_ctrl_async()  - resolved '172.20.165.201' -> 172.20.165.201
stafd_1  | NameResolver.resolve_ctrl_async()  - resolved '172.21.165.201' -> 172.21.165.201
stafd_1  | Staf._config_ctrls_finish()        - configured_ctrl_list = [{'transport': 'tcp', 'traddr': '172.20.165.201', 'trsvcid': '2023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}, {'transport': 'tcp', 'traddr': '172.21.165.201', 'trsvcid': '3023', 'subsysnqn': 'nqn.2014-08.org.nvmexpress.discovery'}]
stafd_1  | Staf._config_ctrls_finish()        - discovered_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - referral_ctrl_list   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_add   = [(tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery), (tcp, 172.21.165.201, 3023, nqn.2014-08.org.nvmexpress.discovery)]
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_del   = []
stafd_1  | Controller._try_to_connect()       - (tcp, 172.20.165.201, 2023, nqn.2014-08.org.nvmexpress.discovery) Connecting to nvme control with cfg={'hdr_digest': False, 'data_digest': False, 'keep_alive_tmo': 30}
nvme-stas_stafd_1 exited with code 139

nvme0 - Registration error. Result:0x0000, Status:0x4002 - Invalid Field in Command

Seeing the following errors from stafd when the SFSS controller is reset on the host.

Jan 4 08:28:43 rhel-storage-09 kernel: nvme nvme0: queue_size 128 > ctrl sqsize 31, clamping down
Jan 4 08:28:43 rhel-storage-09 kernel: nvme0: Unknown(0x21), Invalid Field in Command (sct 0x0 / sc 0x2) DNR> Jan 4 08:28:43 rhel-storage-09 stafd[1124]: (tcp, 172.18.210.70, 8009, nqn.1988-11.com.dell:SFSS:1:20230103150601e8,
enp10s0f0np0) | nvme0 - Registration error. Result:0x0000, Status:0x4002 - Invalid Field in Command: A reserved coded value or
an unsupported value in a defined field.

To reproduce:

  1. generate I/O to all the nvme devices
  2. Issue resets to all the nvme controllers - example command below:

echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller

IndexError: list index out of range

$ PYTHONPATH=nvme-stas/.build:nvme-stas/.build/subprojects/libnvme python3 nvme-stas/test/test-nvme_options.py
..E.
======================================================================
ERROR: test_fabrics_empty_file (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nvme-stas/test/test-nvme_options.py", line 34, in test_fabrics_empty_file
    nvme_options = stas.NvmeOptions()
  File "nvme-stas/.build/staslib/stas.py", line 475, in __init__
    options = [option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',')]
IndexError: list index out of range

----------------------------------------------------------------------
Ran 4 tests in 0.254s

FAILED (errors=1)

Do not allow to enable tracing for non root users

Currently, it is possible that any user can enable the trace feature:

gdbus call -y -d org.nvmexpress.stac -o /org/nvmexpress/stac -m org.freedesktop.DBus.Properties.Set org.nvmexpress.stac.debug tron '<true>'

This enables debugging information also for the syslog/journal and so on. It should not be possible for regular users to trigger that or disable that. The services should check on D-Bus level whether the caller has UID 0 and only allow the change of the property if that is the case.

D-Bus configuration files placement

Usually distribution place the D-Bus configuration under ${datadir}/dbus-1/system.d
nvme-stas seems to hard codes the placement to /etc

dbus_conf_dir = join_paths(etcdir, 'dbus-1', 'system.d')

Would it be possible to make this more packaging friendly?

Packaging: place staslib in arch-independent path

python3.install_sources(
files_to_install,
pure: false,
subdir: 'staslib',
)

One more issue found during packaging - as this is a pure Python project with no architecture-specific code, we aim for a single package for all architectures (i.e. noarch in rpm world). However, the meson project places staslib in architecture-specific directory (e.g. /usr/lib64) and also ignores any supplied libdir meson argument.

Setting pure: true in the code snippet above seems to do the trick, although I'm not sure what else would it break.

Also, per https://mesonbuild.com/Python-3-module.html the python3 meson module is deprecated.

ValueError: dst_dir must be absolute

On Fedora33 when building rpm like this:

rpmbuild --nodeps --build-in-place -ba .build/nvme-stas.spec

I get this error during %install stage :

Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.hQKIyt
+ umask 022
+ cd /root
+ '[' .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64 '!=' / ']'
+ rm -rf .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
++ dirname .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ mkdir -p .build/rpm-pkg/BUILDROOT
+ mkdir .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64
+ /usr/bin/meson install -C noarch-redhat-linux-gnu --no-rebuild
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/mesonbuild/mesonmain.py", line 228, in run
    return options.run_func(options)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 720, in run
    installer.do_install(datafilename)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 511, in do_install
    self.install_subdirs(d, dm, destdir, fullprefix) # Must be first, because it needs to delete the old subtree.
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 540, in install_subdirs
    self.do_copydir(d, i.path, full_dst_dir, i.exclude, i.install_mode, dm)
  File "/usr/lib/python3.9/site-packages/mesonbuild/minstall.py", line 445, in do_copydir
    raise ValueError(f'dst_dir must be absolute, got {dst_dir}')
ValueError: dst_dir must be absolute, got .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
Installing subdir /root/etc/stas to .build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64/etc/stas
error: Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.hQKIyt (%install)
make: *** [Makefile:68: rpm] Error 1

looks like this should be absolute?

+ DESTDIR=.build/rpm-pkg/BUILDROOT/nvme-stas-1.0-1.fc33.x86_64

reading https://docs.fedoraproject.org/en-US/packaging-guidelines/Meson ...

Scaling issues with nvme-stas

On a scaled up (roughly 450 namespaces from 80 subsystems with 2 NVMe/TCP controllers each) SLES15 SP5 MU host, one ends up seeing following stacd errors in the /var/log/messages during I/O testing with faults:

stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90414]
stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90394]
stacd[26003]: Udev._process_udev_event()         - Error while polling fd: 3 [90580]
...

This manifests as dropped connections, failed paths, I/O errors, etc. on the SP5 host.

Config details below:

# uname -r
5.14.21-150500.55.7-default
# rpm -qa | grep nvme
libnvme-devel-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-stas-2.2.2-150500.3.6.1.x86_64
nvme-cli-bash-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
libnvme1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-zsh-completion-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.noarch
python3-libnvme-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64
nvme-cli-2.4+24.ga1ee2099-150500.4.3.1.26528.1.PTF.1212598.x86_64
libnvme-mi1-1.4+27.g5ae1c39-150500.4.3.1.26528.1.PTF.1212598.x86_64

Test Udev (legacy test G6) fails (when interface has multiple IPv6 addresses)

Hello,

I am trying to address various autopkgtest failures in Ubuntu.
Currently, legacy test G6 (from test-udev.py) is consistently failing in our test infrastructure. When running the test locally (i.e., meson test -C build), it succeeds if I only have one IPv6 address but it fails if two addresses are configured on a specific interface

======================================================================
FAIL: test__cid_matches_tid (test-udev.Test.test__cid_matches_tid)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/autopkgtest.2Tar77/autopkgtest_tmp/test/test-udev.py", line 693, in test__cid_matches_tid
    self.assertEqual(
AssertionError: False != True : Legacy Test Case G6 failed

I added some debug logs before the failing call to self.assertEqual:

match is False
ipv6_addrs is ['2001:xxx:1xx0:8xx7::axc:cxx2', 'fe80::18b8:409b:9be:4db7']
get_ipaddress_obj is 2001:xxx:1xx0:8xx7::axc:cxx2
tid = (tcp, FE80::aaaa:BBBB:cccc:dddd, 8009, hello, 2001:xxx:1xx0:8xx7::axc:cxx2)
cid_legacy = {'transport': 'tcp', 'traddr': 'FE80::aaaa:BBBB:cccc:dddd', 'trsvcid': '8009', 'subsysnqn': 'hello', 'host-traddr': '', 'host-iface': 'tun0', 'src-addr': '', 'host-nqn': ''}
ifaces = {'lo': {4: [IPv4Address('127.0.0.1')], 6: [IPv6Address('::1')]},
    'wallgarden0': {4: [IPv4Address('172.16.90.1')], 6: []},
    'mpqemubr0': {4: [IPv4Address('10.164.167.1')], 6: []},
    'lxdbr0': {4: [IPv4Address('172.16.82.1')], 6: [IPv6Address('fe80::216:3eff:fe0d:f967')]},
    'wg0': {4: [IPv4Address('10.8.3.10')], 6: [IPv6Address('fe80::db90:ce2:8e5f:670b')]},
    'dock0': {4: [IPv4Address('192.168.80.13')], 6: [IPv6Address('fe80::4a2a:e3ff:fe5b:d32f')]},
    'tun0': {4: [IPv4Address('10.172.194.130')], 6: [IPv6Address('2001:xxx:1xx0:8xx7::axc:cxx2'), IPv6Address('fe80::18b8:409b:9be:4db7')]}}
_cid_matches_tid = True

I am not sure what the tests exactly does. Is this an expected failure? I'm running on a Ubuntu 23.10 host with udev version 253.5.

Thanks,
Olivier

AttributeError: 'NoneType' object has no attribute 'warning'

on

# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)

# uname -r
5.16.7-1.el8.elrepo.x86_64

I see

# /usr/bin/python3 -u /usr/sbin/stafd --syslog
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 299, in __init__
    options = [ option.split('=')[0].strip() for option in f.readlines()[0].rstrip('\n').split(',') ]
OSError: [Errno 22] Invalid argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/stafd", line 91, in <module>
    from staslib import stas, avahi # pylint: disable=wrong-import-position
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 329, in <module>
    NVME_OPTIONS = NvmeOptions()
  File "/usr/lib64/python3.6/site-packages/staslib/stas.py", line 303, in __init__
    LOG.warning('Cannot determine which NVMe options the kernel supports')
AttributeError: 'NoneType' object has no attribute 'warning'

Root cause is here - import ordering...

see https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L91

from staslib import stas, avahi # pylint: disable=wrong-import-position

which calls calls https://github.com/linux-nvme/nvme-stas/blob/main/staslib/stas.py#L329

NVME_OPTIONS = NvmeOptions()

which can call LOG but log defined only later on here https://github.com/linux-nvme/nvme-stas/blob/main/stafd.py#L106

Bad whatis entries for several man pages

lintian complains about following:

W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.stac.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man5/org.nvmexpress.staf.debug.5.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stacd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stafd.service.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/stas-config.target.8.gz]
W: nvme-stas: bad-whatis-entry [usr/share/man/man8/[email protected]]

Explanation:

A manual page should start with a NAME section, which lists the program name and a brief description. The NAME section is used to generate a database that can be queried by commands like apropos and whatis. You are seeing this tag because lexgrog was unable to parse the NAME section.

Manual pages for multiple programs, functions, or files should list each separated by a comma and a space, followed by - and a common description.

Listed items may not contain any spaces. A manual page for a two-level command such as fs listacl must look like fs_listacl so the list is read correctly.

Please refer to the lexgrog(1) manual page, the groff_man(7) manual page, and the groff_mdoc(7) manual page for details.

test_new: gi.repository.GLib.GError: g-io-error-quark: Could not connect: No such file or directory (1)

One test case fails when running inside a minimal Debian chroot:

==================================== 9/14 ====================================
test:         Test Avahi
start time:   14:03:39
duration:     0.11s
result:       exit status 1
command:      MALLOC_PERTURB_=0 PYTHONPATH=/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu:/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/subprojects/libnvme /usr/bin/python3 /<<PKGBUILDDIR>>/test/test-avahi.py
----------------------------------- stderr -----------------------------------
E
======================================================================
ERROR: test_new (__main__.Test.test_new)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/test/test-avahi.py", line 17, in test_new
    srv = avahi.Avahi(sysbus, lambda: "ok")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/staslib/avahi.py", line 121, in __init__
    self._sysbus.connection.signal_subscribe(
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 169, in connection
    self._connection = self._get_connection()
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 327, in _get_connection
    return self._provider.get_system_bus_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dasbus/connection.py", line 58, in get_system_bus_connection
    return Gio.bus_get_sync(
           ^^^^^^^^^^^^^^^^^
gi.repository.GLib.GError: g-io-error-quark: Could not connect: No such file or directory (1)

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)
==============================================================================

Installed packages inside the chroot: debhelper-compat dh-python docbook-xml docbook-xsl iproute2 libglib2.0-dev-bin meson pyflakes3 pylint python3-dasbus python3-gi python3-lxml python3-nvme python3-pyfakefs python3-pyudev python3-systemd python3:any xsltproc

D-Bus interfaces suggestions

I finally managed to find some time to take a closer look at the nvme-stas codebase and would like to point out couple of suggestions from usability point of view:

  • Most of the method calls return a simple string containing some json structure judging by the argument naming. This is quite unfortunate from consumers point of view as they need to take additional steps to parse the JSON structure, e.g. in C apps. Moreover I was unable to find any schema or definition of the returned structure, nor any API stability guarantees. D-Bus offers strong data type system among other objective principles, including complex container types and general Variant type for even more flexibility. Unless the returned JSON structure is free-form and not clearly defined, I would strongly suggest to come up with a concrete return signature.
  • The D-Bus interface XML files should be provided as external files so that binding generators like gdbus-codegen can build a convenient ready-to-use client API. Related to the previous point, the benefit of having clearly defined signatures is a direct access to structure members and native data types provided. This can be somewhat done by running sta[cf]d.py --idl but it's quite heavy on a build system.
  • And related to that point it's much easier to provide annotations and documentation within the interface XML files...

See also https://dbus.freedesktop.org/doc/dbus-api-design.html and https://dbus.freedesktop.org/doc/dbus-specification.html#container-types

`make rpm` fails on `version_no_tilde` undefined in the spec file

run by

git clean -ffdx  && make rpm

see

...
Created /home/glimcb/Dev/cto/nvme-stas/.build/meson-dist/nvme-stas-1.0.tar.xz
rpmbuild -ba .build/nvme-stas.spec
error: Bad source: .build/meson-dist/nvme-stas-%{version_no_tilde}.tar.gz: No such file or directory
make: *** [Makefile:68: rpm] Error 1

Use auto-generated hostnqn as a fallback

We're facing a distribution packaging specific issue where we cannot afford to provide unique /etc/nvme/hostnqn or /etc/nvme/hostid files for various reasons (e.g. generic pre-built rootfs image). This is typically not a problem for nvme-cli and libnvme-based tools as a stable hostnqn is autogenerated as a fallback. Not so much for hostid that is often missing, that was not really a problem either.

However nvme-stas demands those files to exist unless hostnqn or hostid are specified in sys.conf.

nvme-stas/staslib/stas.py

Lines 351 to 360 in 45c1985

def hostnqn(self):
'''@brief return the host NQN
@return: Host NQN
@raise: Host NQN is mandatory. The program will terminate if a
Host NQN cannot be determined.
'''
try:
value = self.__get_value('Host', 'nqn', '/etc/nvme/hostnqn')
except FileNotFoundError as ex:
sys.exit('Error reading mandatory Host NQN (see stasadm --help): %s', ex)

nvme: failed to disconnect, error 2

@martin-belanger is this expected ?

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stafd_1  | avahi-daemon service available, zeroconf supported.
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1  | nvme0: failed to disconnect, error 2
stacd_1  | nvme1: failed to disconnect, error 2

more verbose:

stafd_1  | Connecting to the system bus.
stafd_1  | Connecting to the system bus.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = ['_nvme-disc._tcp']
stafd_1  | Publishing an object at /org/nvmexpress/staf.
stafd_1  | Registering a service name org.nvmexpress.staf.
stafd_1  | avahi-daemon service available, zeroconf supported.
stafd_1  | Avahi._configure_browsers()        - stypes_to_rm  = []
stafd_1  | Avahi._configure_browsers()        - stypes_to_add = []
stacd_1  | Stac._audit_connections()          - tids = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1  | Publishing an object at /org/nvmexpress/stac.
stacd_1  | Connecting to the system bus.
stacd_1  | Connecting to the system bus.
stacd_1  | Registering a service name org.nvmexpress.stac.
stacd_1  | Stac._connect_to_staf()            - Connected to staf
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) Found existing control device: nvme0
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
stacd_1  | Controller._try_to_connect()       - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) Found existing control device: nvme1
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme1
stacd_1  | lookup subsystem /sys/class/nvme-subsystem/nvme-subsys1/nvme1
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Connection established!
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Connection established!
stafd_1  | Service._config_ctrls()
stafd_1  | Staf._config_ctrls_finish()        - configured_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - discovered_ctrl_list = []
stafd_1  | Staf._config_ctrls_finish()        - referral_ctrl_list   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_add   = []
stafd_1  | Staf._config_ctrls_finish()        - controllers_to_del   = []
stacd_1  | Service._config_ctrls()
stacd_1  | Stac._config_ctrls_finish()        - configured_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - discovered_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_add   = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_del   = [(pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88), (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)]
stacd_1  | Controller.disconnect()            - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0 - Disconnect initiated
stacd_1  | Controller.disconnect()            - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1 - Disconnect initiated
stacd_1  | nvme0: failed to disconnect, error 2
stacd_1  | nvme1: failed to disconnect, error 2
stacd_1  | Controller._on_disconn_success()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | Controller._on_disconn_success()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | Service.remove_controller()
stacd_1  | Service._remove_ctrl_from_dict()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88) | nvme0
stacd_1  | Controller.kill()                  - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1  | Controller._release_resources()    - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0CETC88)
stacd_1  | Service.remove_controller()
stacd_1  | Service._remove_ctrl_from_dict()   - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88) | nvme1
stacd_1  | Controller.kill()                  - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1  | Controller._release_resources()    - (pcie, , , nqn.2019-10.com.kioxia:KCM6XVUL1T60:71T0A0BGTC88)
stacd_1  | Service._config_ctrls()
stacd_1  | Stac._config_ctrls_finish()        - configured_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - discovered_ctrl_list = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_add   = []
stacd_1  | Stac._config_ctrls_finish()        - controllers_to_del   = []

Grammar mistake "allows to do"

There is a grammar mistake in NEWS.md and stasadm.xml: "allows to do". It should be "allows one to do" or "allows doing".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.