Coder Social home page Coder Social logo

Comments (26)

poettering avatar poettering commented on May 24, 2024 1

I am pretty sure this has little to with systemd. The mix of ata hotplug, out-of-tree zfs mess and nfs really makes me shudder.

Your processes are probably hanging because of one of these things but that has nothing to do with systemd, and everything with those techs.

This isn't really actionable for us upstream hence.

If you can reproduce this without out-of-tree stuff/ata hotplug/nfs I might be more interested, but this way I'd really like to see proof first that this isn't caused by that stack before i invest any time in it.

from systemd.

poettering avatar poettering commented on May 24, 2024

So, have you straced the offending processes?

What makes you think systemd is at fault here?

Do you have fuse/cuse in the mix? Closed source crap (nvidia drivers, …)? Weird storage?

from systemd.

poettering avatar poettering commented on May 24, 2024

Anything in dmesg?

from systemd.

Zesko avatar Zesko commented on May 24, 2024

So, have you straced the offending processes?
What makes you think systemd is at fault here?

  • If I disable KVM/QEMU (NAS.service), PC shuts down quickly and every process stops smoothly in less than 1 sec, no timeout.

  • If I enable this NAS.service, the timeout problem occurs sometimes probably 20% during shutdown, accounts-daemon.service reaches the timeout, then its timeout is automatically extended by up to 4 times , then by 2 times.

  1. I thought the process order was messed up like deadlock during shutdown. I tried to create a new config for NetworkManager
    /etc/systemd/system/NetworkManager.service.d/99-fix-shutdown-order.conf
[Unit]
Before=accounts-daemon.service 
  1. The timeout issue of accounts-daemon.service is probably fixed, but a next service polkit.service still has the same timeout issue. I tried to add it into this config:
[Unit]
Before=accounts-daemon.service polkit.service
  1. The timeout issue of polkit.service is gone, but a next service systemd-logind or some random service still has this issue... I do not know why this is so.

Do you have fuse/cuse in the mix? Closed source crap (nvidia drivers, …)? Weird storage?

No, I have never used closed source crap for Systemd. Only AMD and 4x SSDs

$ systemctl list-units --type=service --state active
  UNIT                                                                                      LOAD   ACTIVE SUB     DESCRIPTION                                                                  
  accounts-daemon.service                                                                   loaded active running Accounts Service
  cups.service                                                                              loaded active running CUPS Scheduler
  dbus-broker.service                                                                       loaded active running D-Bus System Message Bus
  dracut-shutdown.service                                                                   loaded active exited  Restore /run/initramfs on shutdown
  ftpd.service                                                                              loaded active running FTPD Daemon
  grub-btrfsd.service                                                                       loaded active running Regenerate grub-btrfs.cfg
  kmod-static-nodes.service                                                                 loaded active exited  Create List of Static Device Nodes
  libvirtd.service                                                                          loaded active running libvirt legacy monolithic daemon
  lvm2-monitor.service                                                                      loaded active exited  Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
  NAS.service                                                                               loaded active exited  [NAS.service] Starting NAS and mounting NAS via NFS
  NetworkManager.service                                                                    loaded active running Network Manager
  polkit.service                                                                            loaded active running Authorization Manager
  portmaster.service                                                                        loaded active running Portmaster by Safing
  power-profiles-daemon.service                                                             loaded active running Power Profiles daemon
  rtkit-daemon.service                                                                      loaded active running RealtimeKit Scheduling Policy Service
  sddm.service                                                                              loaded active running Simple Desktop Display Manager
  systemd-fsck@dev-disk-by\x2duuid-36f91f60\x2d1e40\x2d4b8b\x2db9a8\x2db3e6cb406e1b.service loaded active exited  File System Check on /dev/disk/by-uuid/36f91f60-1e40-4b8b-b9a8-b3e6cb406e1b
  systemd-fsck@dev-disk-by\x2duuid-8837\x2d6B04.service                                     loaded active exited  File System Check on /dev/disk/by-uuid/8837-6B04
  systemd-journal-flush.service                                                             loaded active exited  Flush Journal to Persistent Storage
  systemd-journald.service                                                                  loaded active running Journal Service
  systemd-logind.service                                                                    loaded active running User Login Management
  systemd-machined.service                                                                  loaded active running Virtual Machine and Container Registration Service
  systemd-modules-load.service                                                              loaded active exited  Load Kernel Modules
  systemd-random-seed.service                                                               loaded active exited  Load/Save OS Random Seed
  systemd-remount-fs.service                                                                loaded active exited  Remount Root and Kernel File Systems
  systemd-sysctl.service                                                                    loaded active exited  Apply Kernel Variables
  systemd-timesyncd.service                                                                 loaded active running Network Time Synchronization
  systemd-tmpfiles-setup-dev-early.service                                                  loaded active exited  Create Static Device Nodes in /dev gracefully
  systemd-tmpfiles-setup-dev.service                                                        loaded active exited  Create Static Device Nodes in /dev
  systemd-tmpfiles-setup.service                                                            loaded active exited  Create Volatile Files and Directories
  systemd-udev-trigger.service                                                              loaded active exited  Coldplug All udev Devices
  systemd-udevd.service                                                                     loaded active running Rule-based Manager for Device Events and Files
  systemd-update-utmp.service                                                               loaded active exited  Record System Boot/Shutdown in UTMP
  systemd-user-sessions.service                                                             loaded active exited  Permit User Sessions
  systemd-vconsole-setup.service                                                            loaded active exited  Virtual Console Setup
  udisks2.service                                                                           loaded active running Disk Manager
  upower.service                                                                            loaded active running Daemon for power management
  [email protected]                                                             loaded active exited  User Runtime Directory /run/user/1000
  [email protected]                                                                         loaded active running User Manager for UID 1000
  virtlogd.service                                                                          loaded active running libvirt logging daemon

Anything in dmesg?

journalctl --dmesg --no-pager -e -b -4 --output short-delta > dmesg.log

dmesg.log

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I would compare two different boot logs: A good log vs. a bad log.

  • The boot log without timeout of accounts-daemon.service:
    good.log
>>> [   41.038318 <    0.000725 >] zesko systemd[1]: accounts-daemon.service: Deactivated successfully. <<<
[   41.038438 <    0.000120 >] zesko systemd[1]: Stopped Accounts Service.
...
...
[   45.112228 <    0.000648 >] zesko systemd[1]: NAS.service: Deactivated successfully.
[   45.112394 <    0.000166 >] zesko systemd[1]: Stopped [NAS.service] Starting NAS and mounting NAS via NFS.
[   45.113324 <    0.000930 >] zesko systemd[1]: Stopped target Network.
[   45.113958 <    0.000634 >] zesko systemd[1]: Stopped target libvirt guests shutdown target.
[   45.114662 <    0.000704 >] zesko systemd[1]: Stopping Network Manager...
[   45.114694 <    0.000032 >] zesko NetworkManager[904]: <info>  [1713786503.9041] caught SIGTERM, shutting down normally.
[   45.115094 <    0.000400 >] zesko NetworkManager[904]: <info>  [1713786503.9045] device (ethernet): bridge port enp5s0 was detached
[   45.115263 <    0.000169 >] zesko NetworkManager[904]: <info>  [1713786503.9047] dhcp4 (ethernet): canceled DHCP transaction
[   45.115303 <    0.000040 >] zesko NetworkManager[904]: <info>  [1713786503.9047] dhcp4 (ethernet): activation: beginning transaction (timeout in 45 seconds)
[   45.115332 <    0.000029 >] zesko NetworkManager[904]: <info>  [1713786503.9047] dhcp4 (ethernet): state changed no lease
[   45.115381 <    0.000049 >] zesko NetworkManager[904]: <info>  [1713786503.9048] manager: NetworkManager state is now CONNECTED_SITE
[   45.137427 <    0.022046 >] zesko NetworkManager[904]: <info>  [1713786503.9268] exiting (success)
[   45.180311 <    0.042884 >] zesko systemd[1]: NetworkManager.service: Deactivated successfully.
[   45.180459 <    0.000148 >] zesko systemd[1]: Stopped Network Manager.
[   45.182136 <    0.001677 >] zesko portmaster-start[903]: [pmstart] 2024/04/22 11:48:23 got terminated signal (ignoring), waiting for child to exit...
[   45.182136 <    0.000000 >] zesko portmaster-start[903]:  <INTERRUPT>
[   45.182367 <    0.000231 >] zesko portmaster-start[903]: 240422 13:48:23.971 e/run/main:085 ▶ WARN 039 main: program was interrupted, shutting down.
[   45.182391 <    0.000024 >] zesko systemd[1]: Stopping Portmaster by Safing...
[   45.283330 <    0.100939 >] zesko portmaster-start[903]: 240422 13:48:24.072 dules/stop:057 ▶ WARN 041 modules: starting shutdown...
[   45.488757 <    0.205427 >] zesko kernel: ata13: SATA link down (SStatus 0 SControl 300)
[   45.488798 <    0.000041 >] zesko kernel: ata14: SATA link down (SStatus 0 SControl 300)
[   45.520568 <    0.031770 >] zesko kernel: ata20: SATA link down (SStatus 0 SControl 300)
[   45.520607 <    0.000039 >] zesko kernel: ata16: SATA link down (SStatus 0 SControl 300)
[   45.675233 <    0.154626 >] zesko kernel: ata19: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   45.675258 <    0.000025 >] zesko kernel: ata15: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   45.675827 <    0.000569 >] zesko kernel: ata19.00: ATA-11: SanDisk SDSSDH3 1T00, 415020RL, max UDMA/133
[   45.675860 <    0.000033 >] zesko kernel: ata15.00: ATA-11: SanDisk SDSSDH3 1T00, 415020RL, max UDMA/133
[   45.680425 <    0.004565 >] zesko kernel: ata19.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[   45.680433 <    0.000008 >] zesko kernel: ata15.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[   45.724311 <    0.043878 >] zesko kernel: ata15.00: Features: Dev-Sleep
[   45.724712 <    0.000401 >] zesko kernel: ata19.00: Features: Dev-Sleep
[   45.781845 <    0.057133 >] zesko kernel: ata15.00: configured for UDMA/133
[   45.782751 <    0.000906 >] zesko kernel: ata19.00: configured for UDMA/133
[   45.797326 <    0.014575 >] zesko kernel: scsi 4:0:0:0: Direct-Access     ATA      SanDisk SDSSDH3  20RL PQ: 0 ANSI: 5
[   45.797691 <    0.000365 >] zesko kernel: sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[   45.797701 <    0.000010 >] zesko kernel: sd 4:0:0:0: [sda] Write Protect is off
[   45.797704 <    0.000003 >] zesko kernel: sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   45.797717 <    0.000013 >] zesko kernel: sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   45.797735 <    0.000018 >] zesko kernel: sd 4:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[   45.797779 <    0.000044 >] zesko kernel: scsi 8:0:0:0: Direct-Access     ATA      SanDisk SDSSDH3  20RL PQ: 0 ANSI: 5
[   45.798146 <    0.000367 >] zesko kernel: sd 8:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[   45.798162 <    0.000016 >] zesko kernel: sd 8:0:0:0: [sdb] Write Protect is off
[   45.798166 <    0.000004 >] zesko kernel: sd 8:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   45.798185 <    0.000019 >] zesko kernel: sd 8:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   45.798210 <    0.000025 >] zesko kernel: sd 8:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[   45.800372 <    0.002162 >] zesko kernel:  sdb: sdb1 sdb9
[   45.800537 <    0.000165 >] zesko kernel: sd 8:0:0:0: [sdb] Attached SCSI removable disk
[   45.806924 <    0.006387 >] zesko kernel:  sda: sda1 sda9
[   45.807018 <    0.000094 >] zesko kernel: sd 4:0:0:0: [sda] Attached SCSI removable disk
  • The boot log with timeout issue of accounts-daemon.service:
    bad.log
[ 4056.162077 <    0.000337 >] zesko systemd[1]: NAS.service: Deactivated successfully.
[ 4056.162231 <    0.000154 >] zesko systemd[1]: Stopped [NAS.service] Starting NAS and mounting NAS via NFS.
[ 4056.163590 <    0.001359 >] zesko systemd[1]: Stopped target Network.
[ 4056.165372 <    0.001782 >] zesko systemd[1]: Stopped target libvirt guests shutdown target.
[ 4056.128397 <    0.000000 >] zesko kernel: ata14: SATA link down (SStatus 0 SControl 300)
[ 4056.128498 <    0.000101 >] zesko kernel: ata13: SATA link down (SStatus 0 SControl 300)
[ 4056.168415 <    0.039917 >] zesko kernel: ata20: SATA link down (SStatus 0 SControl 300)
[ 4056.168511 <    0.000096 >] zesko kernel: ata16: SATA link down (SStatus 0 SControl 300)
[ 4056.323520 <    0.155009 >] zesko kernel: ata19: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 4056.323548 <    0.000028 >] zesko kernel: ata15: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 4056.324218 <    0.000670 >] zesko kernel: ata15.00: ATA-11: SanDisk SDSSDH3 1T00, 415020RL, max UDMA/133
[ 4056.324227 <    0.000009 >] zesko kernel: ata19.00: ATA-11: SanDisk SDSSDH3 1T00, 415020RL, max UDMA/133
[ 4056.328792 <    0.004565 >] zesko kernel: ata15.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 4056.328802 <    0.000010 >] zesko kernel: ata19.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 4056.372831 <    0.044029 >] zesko kernel: ata15.00: Features: Dev-Sleep
[ 4056.373361 <    0.000530 >] zesko kernel: ata19.00: Features: Dev-Sleep
[ 4056.430581 <    0.057220 >] zesko kernel: ata15.00: configured for UDMA/133
[ 4056.446067 <    0.015486 >] zesko kernel: scsi 4:0:0:0: Direct-Access     ATA      SanDisk SDSSDH3  20RL PQ: 0 ANSI: 5
[ 4056.446471 <    0.000404 >] zesko kernel: sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[ 4056.446487 <    0.000016 >] zesko kernel: sd 4:0:0:0: [sda] Write Protect is off
[ 4056.446491 <    0.000004 >] zesko kernel: sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 4056.446511 <    0.000020 >] zesko kernel: sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4056.446536 <    0.000025 >] zesko kernel: sd 4:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[ 4056.448514 <    0.001978 >] zesko kernel:  sda: sda1 sda9
[ 4056.448664 <    0.000150 >] zesko kernel: sd 4:0:0:0: [sda] Attached SCSI removable disk
[ 4056.487969 <    0.039305 >] zesko kernel: ata19.00: configured for UDMA/133
[ 4056.502938 <    0.014969 >] zesko kernel: scsi 8:0:0:0: Direct-Access     ATA      SanDisk SDSSDH3  20RL PQ: 0 ANSI: 5
[ 4056.503204 <    0.000266 >] zesko kernel: sd 8:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[ 4056.503213 <    0.000009 >] zesko kernel: sd 8:0:0:0: [sdb] Write Protect is off
[ 4056.503215 <    0.000002 >] zesko kernel: sd 8:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 4056.503227 <    0.000012 >] zesko kernel: sd 8:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4056.503243 <    0.000016 >] zesko kernel: sd 8:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[ 4056.505591 <    0.002348 >] zesko kernel:  sdb: sdb1 sdb9
[ 4056.505736 <    0.000145 >] zesko kernel: sd 8:0:0:0: [sdb] Attached SCSI removable disk

>>> [ 4058.285995 <    1.780259 >] zesko systemd[1]: accounts-daemon.service: Processes still around after SIGKILL. Ignoring. <<<
[ 4061.535276 <    3.249281 >] zesko systemd[1]: accounts-daemon.service: State 'final-sigterm' timed out. Killing.
[ 4061.535505 <    0.000229 >] zesko systemd[1]: accounts-daemon.service: Killing process 1984 (accounts-daemon) with signal SIGKILL.
[ 4064.785099 <    3.249594 >] zesko systemd[1]: accounts-daemon.service: Processes still around after final SIGKILL. Entering failed mode.
[ 4064.785521 <    0.000422 >] zesko systemd[1]: accounts-daemon.service: Failed with result 'timeout'.
[ 4064.785569 <    0.000048 >] zesko systemd[1]: accounts-daemon.service: Unit process 1984 (accounts-daemon) remains running after unit stopped.
[ 4064.785600 <    0.000031 >] zesko systemd[1]: Stopped Accounts Service.

I think the timeout issue of accounts-daemon.service occurs after stopping libvirt-guests-shutdown.target (NAS.service unmounting other external storage), but not before stopping libvirt-guests-shutdown.target.

But not sure why accounts-daemon.service is associated with KVM shutdown. KVM has nothing to do with accounts-daemon in my system.

from systemd.

poettering avatar poettering commented on May 24, 2024

get a backtrace for the hanging process. enable the debug shell on Alt-F9 for that, then do pstack on the hanging process to see what it is hanging on.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

@poettering

Just info

I tried downgrading systemd version 255.4 to 254.6 and checked shutdown and reboot more than 10 times without any trick (changing the order "Before & After") , the timeout problem did not occur. This is exactly what I expected more than 2 or 3 months ago.

This sounds like the problem affects the latest version 255.4 or some versions newer than 254.6.

I will test some versions between 255.4 and 254.6 when I have time.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I can reproduce the problem on versions: 255.1 - 255.5, except for the latest good version 254.6.

Check which change between 255.1 and 254.6 is causing the problem of shutdown.

from systemd.

arvidjaar avatar arvidjaar commented on May 24, 2024

Processes still around after final SIGKILL.

If process cannot be killed, it implies that it is stuck uninterruptible in kernel. Your logs contain a lot of SATA messages which look like devices drop off SATA ports and reappear. It may be related.

I do not see how systemd can be the root cause here. You need to find out why process(es) could not be killed.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Your logs contain a lot of SATA messages which look like devices drop off SATA ports and reappear. It may be related.

That makes sense. QEMU/KVM (libvirtd guest) is accessing to dual external physical SATA disks instead of my host. My system (host) can not access these SATA ports due to natural Kernel restriction for some reasons:

  • My host and my KVM guest are not allowed to access these SATA disks at the same time.

When shutdown, KVM (libvirtd) disconnects those SATA disks, then their ports are released to my host which will re-connects these SATA disks at shutdown. This would cause the problem? 🤔

However, stopping some services like accounts-daemon.service ... have nothing to do with re-connecting SATA ports as they are only for KVM and have no system and unsupported filesystem like ZFS.
Mounting the SATA disk is not configured at all in fstab of my host.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Can't Systemd version 255+ ignore re-connecting disks after stopping KVM (dropping those disks) during shutdown?

Does Systemd v254 ignore that?

from systemd.

poettering avatar poettering commented on May 24, 2024

Uh, did i get this right, are you unplugging disks that are mounted/referenced by running binaries? And then you are surprised that things go south?

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Two external disk's ports are invisible or do not exist in lsblk -f on my system / host when used and mounted by my VM guest on the same system. Only my VM guest can see and access these disks depending on KVM device config.

When shutdown, my VM guest is automatically unplugging the external disks then they are visible to my system / host or detected by Kernel in this log.

Which binaries? -> I'm just using libvirt (KVM/QEMU), Kernel 6.8.7, systemd v255.5 and dracut initramfs (or mkinitcpio has the same problem) from the arch repo.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Interesting, I tried removing two command lines (auto mounting & unmounting) in my NAS.service script:

...
### Mount the remote disks via NFS on my network when booting.
ExecStartPost=/usr/bin/bash -c "mount -t nfs -o vers=4,soft [${NAS_IP}]:/zfs /NAS || echo 'NAS is running, but NFS mounting failed' | systemd-cat -p err -t 'NAS.service'"
...
### Unmount the remote disks at shutdown or reboot. Does this cause the problem ? 
ExecStop=-umount /NAS 
...

Then the timeout problem did not occur after about 8 times reboot & shutdown, or maybe luck.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Thanks, interesting.
I tested:

  • Without ZFS but XFS with NFS in my NAS.service, yes I can reproduce this problem.
  • mount/unmount btrfs in my local disk in NAS.service without nfs --> no problem.

This is obvious that only NFS causes the problem of shutdown. However , systemd v254.6 had no issue with NFS, or maybe luck.

I have a question: What happens if I use mount/umount via Samba instead of NFS in NAS.service?

I will close this issue. Thanks!

from systemd.

poettering avatar poettering commented on May 24, 2024

Are you running binaries off that NFS file system? Maybe your network gets shut down before NFS is unmounted?

from systemd.

Zesko avatar Zesko commented on May 24, 2024

No, I have never enabled nfsv4-server.service in my system as client except KVM guest as server.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I checked nfs-utils package from Arch repo:
there are some nfs services in this package:

nfs-utils /usr/lib/systemd/system-generators/nfs-server-generator
nfs-utils /usr/lib/systemd/system-generators/rpc-pipefs-generator
nfs-utils /usr/lib/systemd/system/
nfs-utils /usr/lib/systemd/system/auth-rpcgss-module.service
nfs-utils /usr/lib/systemd/system/fsidd.service
nfs-utils /usr/lib/systemd/system/nfs-blkmap.service
nfs-utils /usr/lib/systemd/system/nfs-client.target
nfs-utils /usr/lib/systemd/system/nfs-idmapd.service
nfs-utils /usr/lib/systemd/system/nfs-mountd.service
nfs-utils /usr/lib/systemd/system/nfs-server.service
nfs-utils /usr/lib/systemd/system/nfs-utils.service
nfs-utils /usr/lib/systemd/system/nfsdcld.service
nfs-utils /usr/lib/systemd/system/nfsv4-exportd.service
nfs-utils /usr/lib/systemd/system/nfsv4-server.service
nfs-utils /usr/lib/systemd/system/proc-fs-nfsd.mount
nfs-utils /usr/lib/systemd/system/rpc-gssd.service
nfs-utils /usr/lib/systemd/system/rpc-statd-notify.service
nfs-utils /usr/lib/systemd/system/rpc-statd.service
nfs-utils /usr/lib/systemd/system/rpc_pipefs.target
nfs-utils /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount

Which service would you like to know?

I check systemctl cat nfs-client.target, the output:

[Unit]
Description=NFS client services
Before=remote-fs-pre.target
Wants=remote-fs-pre.target

# Note: we don't "Wants=rpc-statd.service" as "mount.nfs" will arrange to
# start that on demand if needed.
Wants=rpc-statd-notify.service

# GSS services dependencies and ordering
Wants=auth-rpcgss-module.service
After=rpc-gssd.service rpc-svcgssd.service gssproxy.service

[Install]
WantedBy=multi-user.target
WantedBy=remote-fs.target

You can re-open this issue if you want.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Are you running binaries off that NFS file system?

Today, I checked pstree -p but it doesn't show any running NFS binary when the client mounts the remote disks via NFS.
I notice that KDE System Monitor UI shows two running NFS binaries in kthreadd list.

Someone explained

kthreadd is not a process started by systemd. Kthreadd is a worker thread in kernel address space started by the Kernel.

Here is a screenshot:

  • NFSv4 callback
  • R-nfsio

Screenshot_20240427_085102


I have a question: What happens if I use mount/umount via Samba instead of NFS in NAS.service?

I tested Samba in NAS.service, there is no issue. I can confirm that only NFS is related to the problem.

I will check different NFS options for example: soft vs. hard

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I tried to play 4 different options of NFS: async, sync, soft and hard , the problem still persists on systemd v255 except systemd v254.

Never mind, I will switch from NFS to SFTP. Samba has some limitations.

I think this issue should be re-opened, as other systemd users will face the same problem when mounting/umounting NFS.

You can close this issue if you do not want to support annoying NFS.

from systemd.

poettering avatar poettering commented on May 24, 2024

This is not what I meant. My question was whether you run any binaries that are stored on the network storage. If you run programs from NFS and then kill the NFS backing them, the programs will necessarily hang.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I know so far that I only used mount and umount as programs with the option NFS, nothing more.

from systemd.

poettering avatar poettering commented on May 24, 2024

still not what I meant:

did you run any executables off the NFS file systems? i.e. are there program binaries on those NFS shares, that you ran?

from systemd.

Zesko avatar Zesko commented on May 24, 2024

No, I never manually executed any random binaries off the NFS. Or I could be blind and don't know where hidden executable binary files run.

Do you have some tool to detect any binary program from the NFS?

from systemd.

Zesko avatar Zesko commented on May 24, 2024

Wait, maybe stopping NFS binary requires an activated online network during shutdown to safely terminate the NFS connection?
A theoretical problem: If the KVM/QEMU network goes offline quickly, then NFS loses the connection and hangs.

I am trying to add sleep 3 between umount NFS and shutdown KVM/QEMU network in NAS.service to see if the problem is gone.

from systemd.

Zesko avatar Zesko commented on May 24, 2024

I changed NAS.service:

...
ExecStop=-umount /NAS
ExecStop=/usr/bin/virsh shutdown ${NAS_KVM_DOMAIN}

to

...
ExecStop=umount /NAS
ExecStop=sleep 1
ExecStop=/usr/bin/virsh shutdown ${NAS_KVM_DOMAIN}

After more than 8 shutdowns and reboots, the problem no longer occurred or maybe luck.

Systemd and NFS are not the problem, but this is my mistake to create the faulty NAS.service, sorry

from systemd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.