We have started a discussion on this issue on Discord.
The issue happens on version 0.3.0 and I've confirmed it happens on centos 7 and ubuntu 20.
Reproduction steps:
setup nvmf app, create 2 devices and add only one to the subsystem
./build/bin/nvmf_tgt &
./scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192
./scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1
./scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a 127.0.0.1 -s 4420
./scripts/rpc.py bdev_null_create Null0 8589934592 4096
./scripts/rpc.py bdev_null_create Null1 8589934592 4096
./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Null0
Run the following code (note the sleep)
#include <libxnvme.h>
#include <libxnvme_pp.h>
#include <unistd.h>
int
main(int argc, char *argv[])
{
struct xnvme_opts opts1 = xnvme_opts_default();
struct xnvme_opts opts2 = xnvme_opts_default();
struct xnvme_dev *dev1;
struct xnvme_dev *dev2;
opts1.nsid = 1;
opts2.nsid = 2;
dev1 = xnvme_dev_open("127.0.0.1:4420", &opts1);
sleep(20);
dev2 = xnvme_dev_open("127.0.0.1:4420", &opts2);
xnvme_dev_close(dev1);
xnvme_dev_close(dev2);
return 0;
}
When it starts sleeping add the other namespace
sudo ./scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Null1
Here are the logs:
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'spdk'
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-230: INFO: SPDK NVMe PCIe Driver registration -- BEGIN
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-235: INFO: skipping, already registered.
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-237: INFO: SPDK NVMe PCIe Driver registration -- END
[2022-06-09 13:20:24.558367] Starting SPDK v21.10 git sha1 4e4f11ff7 / DPDK 21.08.0 initialization...
[2022-06-09 13:20:24.558980] [ DPDK EAL parameters: [2022-06-09 13:20:24.559200] spdk [2022-06-09 13:20:24.559442] --no-shconf [2022-06-09 13:20:24.559667] -c 0x1 [2022-06-09 13:20:24.559891] --log-level=lib.eal:6 [2022-06-09 13:20:24.560124] --log-level=lib.cryptodev:5 [2022-06-09 13:20:24.560329] --log-level=user1:6 [2022-06-09 13:20:24.560526] --iova-mode=pa [2022-06-09 13:20:24.560754] --base-virtaddr=0x200000000000 [2022-06-09 13:20:24.560959] --match-allocations [2022-06-09 13:20:24.561153] --file-prefix=spdk_pid105448 [2022-06-09 13:20:24.561391] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-260: INFO: spdk_env_is_initialized: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: PCIe, # 1 of 4
EAL: Cannot find device (0127:00:00.1)
EAL: Failed to attach device on primary process
[2022-06-09 13:20:24.702748] nvme.c: 838:nvme_probe_internal: *ERROR*: NVMe ctrlr scan failed
[2022-06-09 13:20:24.702767] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:0, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: TCP, # 2 of 4
trid:
trstring: 'TCP'
trtype: 0x3
adrfam: 0x1
traddr: '127.0.0.1'
trsvcid: '4420'
subnqn: 'nqn.2016-06.io.spdk:cnode1'
priority: 0x0
[2022-06-09 13:20:25.111446] nvme_ctrlr.c: 704:nvme_ctrlr_set_intel_support_log_pages: *WARNING*: [nqn.2016-06.io.spdk:cnode1] Intel log pages not supported on Intel drive!
# DBG:xnvme_be_spdk_dev.c:attach_cb-453: INFO: nsid: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: RDMA, # 3 of 4
[2022-06-09 13:20:25.111555] nvme.c: 830:nvme_probe_internal: *ERROR*: NVMe trtype 1 not available
[2022-06-09 13:20:25.111570] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:1, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: FC, # 4 of 4
# DBG:xnvme_be_spdk_dev.c:_xnvme_be_spdk_ident_to_trid-322: FAILED: unsupported trtype: FC
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-804: SKIP/FAILED: ident_to_trid()
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-840: INFO: open() : OK
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-475: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_ZONED)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-499: INFO: no positive response to idfy(ZNS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-510: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_FS)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-538: INFO: no positive response to idfy(FS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-546: INFO: not csi-specific id-NVM
# DBG:xnvme_be.c:xnvme_be_dev_idfy-547: INFO: falling back to NVM assumption
# DBG:xnvme_be.c:xnvme_be_factory-672: INFO: obtained device handle
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'spdk'
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-224: INFO: already initialized
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-260: INFO: spdk_env_is_initialized: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-759: INFO: found dev->ident.uri: '127.0.0.1:4420' via cref_lookup()
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-770: FAILED: !spdk_nvme_ns_is_active(nsid:0x2)
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: PCIe, # 1 of 4
EAL: Cannot find device (0127:00:00.1)
EAL: Failed to attach device on primary process
[2022-06-09 13:20:45.228927] nvme.c: 838:nvme_probe_internal: *ERROR*: NVMe ctrlr scan failed
[2022-06-09 13:20:45.228948] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:0, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: TCP, # 2 of 4
trid:
trstring: 'TCP'
trtype: 0x3
adrfam: 0x1
traddr: '127.0.0.1'
trsvcid: '4420'
subnqn: 'nqn.2016-06.io.spdk:cnode1'
priority: 0x0
# DBG:xnvme_be_spdk_dev.c:attach_cb-453: INFO: nsid: 2
# DBG:xnvme_be_spdk_dev.c:attach_cb-462: FAILED: !spdk_nvme_ns_is_active(opts->nsid:0x2)
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:0, e:0, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: RDMA, # 3 of 4
[2022-06-09 13:20:45.399107] nvme.c: 830:nvme_probe_internal: *ERROR*: NVMe trtype 1 not available
[2022-06-09 13:20:45.399157] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:0, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: FC, # 4 of 4
# DBG:xnvme_be_spdk_dev.c:_xnvme_be_spdk_ident_to_trid-322: FAILED: unsupported trtype: FC
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-804: SKIP/FAILED: ident_to_trid()
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-786: FAILED: max attempts exceeded
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_dev_open-852: FAILED: xnvme_be_spdk_state_init()
# DBG:xnvme_be.c:xnvme_be_factory-687: INFO: skipping backend due to err: -6
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'linux'
# DBG:xnvme_be_linux_dev.c:xnvme_be_linux_dev_open-51: INFO: open() : opts->oflags: 0x4, flags: 0x2, opts->create_mode: 0x180
# DBG:xnvme_be_linux_dev.c:xnvme_be_linux_dev_open-56: FAILED: open(uri: '127.0.0.1:4420') : state->fd: '-1', errno: 2
# DBG:xnvme_be_linux_dev.c:xnvme_be_linux_dev_open-63: FAILED: open(uri: '127.0.0.1:4420') : state->fd: '-1', errno: 2
# DBG:xnvme_be.c:xnvme_be_factory-687: INFO: skipping backend due to err: -2
# DBG:xnvme_be.c:xnvme_be_factory-639: INFO: skipping be: 'fbsd'; !enabled
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'posix'
# DBG:xnvme_be_posix_dev.c:xnvme_be_posix_dev_open-46: INFO: open() : opts->oflags: 0x4, flags: 0x2, opts->create_mode: 0x180
# DBG:xnvme_be_posix_dev.c:xnvme_be_posix_dev_open-51: FAILED: open(uri: '127.0.0.1:4420'), state->fd: '-1', errno: 2
# DBG:xnvme_be.c:xnvme_be_factory-687: INFO: skipping backend due to err: -2
# DBG:xnvme_be.c:xnvme_be_factory-639: INFO: skipping be: 'windows'; !enabled
# DBG:xnvme_be.c:xnvme_be_factory-692: FAILED: no backend for uri: '127.0.0.1:4420'
# DBG:xnvme_dev.c:xnvme_dev_open-158: FAILED: failed opening uri: 127.0.0.1:4420
# DBG:xnvme_be_spdk_dev.c:_cref_deref-109: INFO: refcount: 0 => detaching
If you run the same code but with both of the namespaces already connected before you get:
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'spdk'
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-230: INFO: SPDK NVMe PCIe Driver registration -- BEGIN
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-235: INFO: skipping, already registered.
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-237: INFO: SPDK NVMe PCIe Driver registration -- END
[2022-06-09 13:22:59.612356] Starting SPDK v21.10 git sha1 4e4f11ff7 / DPDK 21.08.0 initialization...
[2022-06-09 13:22:59.612949] [ DPDK EAL parameters: [2022-06-09 13:22:59.613180] spdk [2022-06-09 13:22:59.613416] --no-shconf [2022-06-09 13:22:59.613651] -c 0x1 [2022-06-09 13:22:59.613850] --log-level=lib.eal:6 [2022-06-09 13:22:59.614100] --log-level=lib.cryptodev:5 [2022-06-09 13:22:59.614353] --log-level=user1:6 [2022-06-09 13:22:59.614645] --iova-mode=pa [2022-06-09 13:22:59.614844] --base-virtaddr=0x200000000000 [2022-06-09 13:22:59.615067] --match-allocations [2022-06-09 13:22:59.615477] --file-prefix=spdk_pid105572 [2022-06-09 13:22:59.615888] ]
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-260: INFO: spdk_env_is_initialized: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: PCIe, # 1 of 4
EAL: Cannot find device (0127:00:00.1)
EAL: Failed to attach device on primary process
[2022-06-09 13:22:59.759757] nvme.c: 838:nvme_probe_internal: *ERROR*: NVMe ctrlr scan failed
[2022-06-09 13:22:59.759778] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:0, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: TCP, # 2 of 4
trid:
trstring: 'TCP'
trtype: 0x3
adrfam: 0x1
traddr: '127.0.0.1'
trsvcid: '4420'
subnqn: 'nqn.2016-06.io.spdk:cnode1'
priority: 0x0
[2022-06-09 13:23:00.195438] nvme_ctrlr.c: 704:nvme_ctrlr_set_intel_support_log_pages: *WARNING*: [nqn.2016-06.io.spdk:cnode1] Intel log pages not supported on Intel drive!
# DBG:xnvme_be_spdk_dev.c:attach_cb-453: INFO: nsid: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: RDMA, # 3 of 4
[2022-06-09 13:23:00.195547] nvme.c: 830:nvme_probe_internal: *ERROR*: NVMe trtype 1 not available
[2022-06-09 13:23:00.195564] nvme.c: 915:spdk_nvme_probe: *ERROR*: Create probe context failed
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-810: FAILED: probe a:1, e:-1, i:0
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-798: ############################
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-799: INFO: trtype: FC, # 4 of 4
# DBG:xnvme_be_spdk_dev.c:_xnvme_be_spdk_ident_to_trid-322: FAILED: unsupported trtype: FC
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-804: SKIP/FAILED: ident_to_trid()
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-840: INFO: open() : OK
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-475: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_ZONED)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-499: INFO: no positive response to idfy(ZNS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-510: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_FS)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-538: INFO: no positive response to idfy(FS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-546: INFO: not csi-specific id-NVM
# DBG:xnvme_be.c:xnvme_be_dev_idfy-547: INFO: falling back to NVM assumption
# DBG:xnvme_be.c:xnvme_be_factory-672: INFO: obtained device handle
# DBG:xnvme_be.c:xnvme_be_factory-667: INFO: obtained backend instance be: 'spdk'
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-224: INFO: already initialized
# DBG:xnvme_be_spdk_dev.c:_spdk_env_init-260: INFO: spdk_env_is_initialized: 1
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-759: INFO: found dev->ident.uri: '127.0.0.1:4420' via cref_lookup()
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-779: INFO: re-using previously attached controller
# DBG:xnvme_be_spdk_dev.c:xnvme_be_spdk_state_init-840: INFO: open() : OK
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-475: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_ZONED)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-499: INFO: no positive response to idfy(ZNS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-510: INFO: !xnvme_adm_idfy_ctrlr_csi(CSI_FS)
# DBG:xnvme_be.c:xnvme_be_dev_idfy-538: INFO: no positive response to idfy(FS)
# DBG:xnvme_be_spdk_admin.c:xnvme_be_spdk_sync_cmd_admin-66: FAILED: xnvme_cmd_ctx_cpl_status()
# DBG:xnvme_be.c:xnvme_be_dev_idfy-546: INFO: not csi-specific id-NVM
# DBG:xnvme_be.c:xnvme_be_dev_idfy-547: INFO: falling back to NVM assumption
# DBG:xnvme_be.c:xnvme_be_factory-672: INFO: obtained device handle
# DBG:xnvme_be_spdk_dev.c:_cref_deref-109: INFO: refcount: 0 => detaching
It seems this issue is caused by xNVMe not handling a certain event sent by the target (SPDK_NVME_ASYNC_EVENT_NS_ATTR_CHANGED)
Here is the code in spdk which should handle it:
void
nvme_ctrlr_process_async_event(struct spdk_nvme_ctrlr *ctrlr,
const struct spdk_nvme_cpl *cpl)
{
union spdk_nvme_async_event_completion event;
struct spdk_nvme_ctrlr_process *active_proc;
int rc;
event.raw = cpl->cdw0;
if ((event.bits.async_event_type == SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE) &&
(event.bits.async_event_info == SPDK_NVME_ASYNC_EVENT_NS_ATTR_CHANGED)) {
nvme_ctrlr_clear_changed_ns_log(ctrlr);
rc = nvme_ctrlr_identify_active_ns(ctrlr);
if (rc) {
return;
}
nvme_ctrlr_update_namespaces(ctrlr);
nvme_io_msg_ctrlr_update(ctrlr);
}
if ((event.bits.async_event_type == SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE) &&
(event.bits.async_event_info == SPDK_NVME_ASYNC_EVENT_ANA_CHANGE)) {
if (!ctrlr->opts.disable_read_ana_log_page) {
rc = nvme_ctrlr_update_ana_log_page(ctrlr);
if (rc) {
return;
}
nvme_ctrlr_parse_ana_log_page(ctrlr, nvme_ctrlr_update_ns_ana_states,
ctrlr);
}
}
active_proc = nvme_ctrlr_get_current_process(ctrlr);
if (active_proc && active_proc->aer_cb_fn) {
active_proc->aer_cb_fn(active_proc->aer_cb_arg, cpl);
}
}
it is defined in nvme_ctrlr.c
To solve it on our end we created a workaround (that is probably not a good one) in xnvme_be_spdk_state_init, when probing fails on max attempt we call these functions
nvme_ctrlr_clear_changed_ns_log(ctrlr);
nvme_ctrlr_identify_active_ns(ctrlr);
nvme_ctrlr_update_namespaces(ctrlr);
nvme_io_msg_ctrlr_update(ctrlr);
then we run the function again.
@safl what do you think should be our approach to solve this?