Coder Social home page Coder Social logo

Comments (9)

dbnicholson avatar dbnicholson commented on May 28, 2024

I cobbled together something to indicate the issue even though it doesn't cause errors since I couldn't get it to boot with /var initially read-only.

$ cat /usr/local/bin/var-overlay-setup 
#!/bin/sh -e
mkdir -p /run/var-overlay-upper /run/var-overlay-work
mount -t overlay -o lowerdir=/var,upperdir=/run/var-overlay-upper,workdir=/run/var-overlay-work var-overlay /var
$ cat /etc/systemd/system/var-overlay.service 
[Unit]
DefaultDependencies=no
Before=shutdown.target local-fs.target
Conflicts=shutdown.target
RequiresMountsFor=/var

[Service]
Type=oneshot
ExecStart=/usr/local/bin/var-overlay-setup
RemainAfterExit=yes

[Install]
WantedBy=local-fs.target

Here's the journal logs:

Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Mounted var.mount - /var.
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: systemd-pstore.service - Platform Persistent Storage Archival was skipped because of an unmet condition che
ck (ConditionDirectoryNotEmpty=/sys/fs/pstore).
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Starting var-overlay.service...
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Starting systemd-journal-flush.service - Flush Journal to Persistent Storage...
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Starting systemd-random-seed.service - Load/Save OS Random Seed...
Thu 2024-04-18 22:43:24 MDT endless systemd-journald.service[328]: Time spent on flushing to /var/log/journal/f6a1c4895ff144eb8e2b5866c4ce1498 is 47.023ms for 913 entries.
Thu 2024-04-18 22:43:24 MDT endless systemd-journald.service[328]: System Journal (/var/log/journal/f6a1c4895ff144eb8e2b5866c4ce1498) is 43.8M, max 45.2M, 1.4M free.
Thu 2024-04-18 22:43:24 MDT endless systemd-journald.service[328]: Received client request to flush runtime journal.
Thu 2024-04-18 22:43:24 MDT endless kernel: overlayfs: "xino" feature enabled using 2 upper inode bits.
Thu 2024-04-18 22:43:24 MDT endless systemd-journald.service[328]: /var/log/journal/f6a1c4895ff144eb8e2b5866c4ce1498/system.journal: Journal file uses a different sequence number ID, rotating.
Thu 2024-04-18 22:43:24 MDT endless systemd-journald.service[328]: Rotating system journal.
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=var-overlay comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-boot-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-binfmt comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Finished var-overlay.service.
Thu 2024-04-18 22:43:24 MDT endless init.scope[1]: Reached target local-fs.target - Local File Systems.

Notice how systemd-journal-flush.service and systemd-random-seed.service start before the var-overlay.service unit completes. You can see from the output that the journal files are being rotated essentially concurrently with overlayfs being initialized. Are these modifications being recorded in the overlay upper or lower directory? I don't know, but if the units started after local-fs.target, they'd start at basically the same time but after this mount post-processing.

from systemd.

yuwata avatar yuwata commented on May 28, 2024

Note, networkd should handle such case gracefully, as it touches files under /var only after systemd-networkd-persistent-storage.service is started. Ah, in this case, overlay... Hm.

from systemd.

yuwata avatar yuwata commented on May 28, 2024

If the lower directory /var is not a mount point, by creating var.mount with Type=overlay and some required options should work, though I have not tested. The upper directory can be created by setting RuntimeDirectory= (with PreserveRuntirmDirectory=yes) to the mount unit, or by another service.

from systemd.

yuwata avatar yuwata commented on May 28, 2024

If the underlying /var is also a partition, then how about to mount /var with a service unit, and set Before=var.mount ?

from systemd.

poettering avatar poettering commented on May 28, 2024

Several systemd units such as systemd-rfkill.service that persist data to /var fail if the overlayfs hasn't been mounted since the underlying mount is read only. These units are ordered correctly with respect to the mount they care about, but they all have DefaultDependencies=no and do not have After=local-fs.target. While this is technically correct, I think it would make more sense if they had After=local-fs.target since that's the defined point where local filesystem mounting has completed.

This is actually very carefully done, to minimize dependencies in early boot.

Note that these units are also ordered after systemd-remount-fs.service btw, which is supposed to be the point where / becomes writable if it was originally mounted read-only and is supposed to be writable.

It appears to me you could simply do your overlayfs replacement donce before systemd-remount-fs.service and things should mostly work.

from systemd.

dbnicholson avatar dbnicholson commented on May 28, 2024

Several systemd units such as systemd-rfkill.service that persist data to /var fail if the overlayfs hasn't been mounted since the underlying mount is read only. These units are ordered correctly with respect to the mount they care about, but they all have DefaultDependencies=no and do not have After=local-fs.target. While this is technically correct, I think it would make more sense if they had After=local-fs.target since that's the defined point where local filesystem mounting has completed.

This is actually very carefully done, to minimize dependencies in early boot.

Sure, I figured as much. However, these are the only services I've ever seen that write to real filesystems that aren't ordered after local-fs.target. Every service that doesn't have DefaultDependencies=no already gets that automatically, and every other service I've seen that sets DefaultDependencies=no and writes to a real filesystem orders itself after local-fs.target.

I guess what it comes down to is that my desired semantics are that if your unit writes to real filesystems then it should order itself after local-fs.target. To me that's the purpose of the target - local filesystems are ready to be used. That's the only way to reliably do this type of mount stacking since there can only be one mount unit per path.

Note that these units are also ordered after systemd-remount-fs.service btw, which is supposed to be the point where / becomes writable if it was originally mounted read-only and is supposed to be writable.

It appears to me you could simply do your overlayfs replacement donce before systemd-remount-fs.service and things should mostly work.

I don't think I would want to order before systemd-remount-fs.service since systemd-remount-fs would then be applying mount options to the overlayfs mount.

The most reliable thing would be to take over all the mount units that I care about with a generator. @yuwata's suggestion to order before var.mount (for example) would also work similarly. What I don't want to have to do, though, is reimplement the logic that the fstab and other generators have for creating the normal mount units. In a service unit, I could use systemctl show or similar to find the mount parameters that the generators determined. In a generator I'd have to reimplement, though, since generators execute in parallel and I couldn't read the units that another generator came up with.

from systemd.

poettering avatar poettering commented on May 28, 2024

I guess what it comes down to is that my desired semantics are that if your unit writes to real filesystems then it should order itself after local-fs.target. To me that's the purpose of the target - local filesystems are ready to be used. That's the only way to reliably do this type of mount stacking since there can only be one mount unit per path.

Sorry, but I vehemently disagree with this. The thing is that some services such as journald, timesyncd, networkd, coredump are so fundamental that they should not be delayed longer than necessary, and I am sorry, there's really no reason to wait for /home to be mounted to just allow journald to do its thing...

So no, we are certainly not going to move all early boot stuff behind local-fs.target, if we know precisely what it needs. And we do know for those services what they need.

I don't think I would want to order before systemd-remount-fs.service since systemd-remount-fs would then be applying mount options to the overlayfs mount.

Why wouldn't that be fine? Also it only does that if you actually list your rootfs on /etc/fstab. Why would you do that?

Alternatively, just list the five services explicitly with ordering deps on the service that establishes overlayfs for you?

from systemd.

yuwata avatar yuwata commented on May 28, 2024

So, I checked services, e.g. systemd-rfkill.service, have StateDirectory=. So, they actually wait for /var is mounted.
Problem here is that you implement overlay mount with a .service unit, rather than .mount unit. So, PID1 does not automatically add dependencies for the overlay mount to the services which requires StateDirectory=.
As I said, please mount the overlayfs through .mount. Then everything should work as expected.

from systemd.

dbnicholson avatar dbnicholson commented on May 28, 2024

So no, we are certainly not going to move all early boot stuff behind local-fs.target, if we know precisely what it needs. And we do know for those services what they need.

Fair enough.

I don't think I would want to order before systemd-remount-fs.service since systemd-remount-fs would then be applying mount options to the overlayfs mount.

Why wouldn't that be fine? Also it only does that if you actually list your rootfs on /etc/fstab. Why would you do that?

In my narrow use case it doesn't. In general, though, I don't know what mount options are in fstab. I don't want to apply a mount option to overlayfs that's only valid for ext4, for instance.

Alternatively, just list the five services explicitly with ordering deps on the service that establishes overlayfs for you?

This is what I originally did, but it proved problematic.

  • I can't order [email protected] (or any other templated unit) since that apparently gets intepreted as [email protected]. I looked through the documentation, but it doesn't appear that there's any way to order against all instances of a template unit. Is that a bug or just how it works?

  • Maintaining the list of units to order before is bound to go stale. It's also not 5 units. It was 9 units in v254, but that's more than doubled now:

    $ git grep -l -e StateDirectory -e CacheDirectory -e LogsDirectory -e var.mount -e 'RequiresMountsFor=.*/var' units | xargs grep -l -e DefaultDependencies=no
    units/[email protected]
    units/[email protected]
    units/systemd-journal-flush.service
    units/systemd-networkd-persistent-storage.service
    units/systemd-pcrlock-file-system.service.in
    units/systemd-pcrlock-firmware-code.service.in
    units/systemd-pcrlock-firmware-config.service.in
    units/systemd-pcrlock-machine-id.service.in
    units/systemd-pcrlock-make-policy.service.in
    units/systemd-pcrlock-secureboot-authority.service.in
    units/systemd-pcrlock-secureboot-policy.service.in
    units/[email protected]
    units/systemd-pstore.service.in
    units/systemd-rfkill.service.in
    units/systemd-rfkill.socket
    units/systemd-timesyncd.service.in
    units/systemd-tpm2-setup.service.in
    units/systemd-update-utmp-runlevel.service.in
    units/systemd-update-utmp.service.in
    

    It also sucks to have to go analyze all of those units (and any others on the system) and determine exactly what their filesystem requirements are.

Anyways, it does seem that having Before=var.mount and handling the /var mount manually works. I wouldn't want to do that in general, but for this narrow use case it's not bad. Feel free to close this issue.

from systemd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.