Coder Social home page Coder Social logo

Occasional build failure about debos HOT 7 CLOSED

Thread974 avatar Thread974 commented on August 15, 2024
Occasional build failure

from debos.

Comments (7)

d4s avatar d4s commented on August 15, 2024

Some other logs from @Thread974 collected on different host:

2017/11/16 09:54:54 Formatting partition 6 |    ID        SIZE  PATH
2017/11/16 09:54:54 Formatting partition 6 |     1     3.81GiB  /dev/disk/by-id/virtio-fakedisk-0-part6
2017/11/16 09:54:54 Formatting partition 6 | 
2017/11/16 09:54:54 Formatting partition 7 | ERROR: cannot stat '/dev/disk/by-id/virtio-fakedisk-0-part7': No such file or directory
2017/11/16 09:54:54 Formatting partition 7 | ERROR: open ctree failed
2017/11/16 09:54:54 Formatting partition 7 | btrfs-progs v4.9.1
2017/11/16 09:54:54 Formatting partition 7 | See http://btrfs.wiki.kernel.org for more information.
2017/11/16 09:54:54 Formatting partition 7 | 
2017/11/16 09:54:54 Action `image-partition` failed at stage Run, error: exit status 1
2017/11/16 10:45:48 Formatting partition 7 |     1     2.07GiB  /dev/disk/by-id/virtio-fakedisk-0-part7
2017/11/16 10:45:48 Formatting partition 7 | 
2017/11/16 10:45:49 Action image-partition failed at stage Run, error: Failed to get uuid: exit status 2
2017/11/16 10:45:49 >>> Starting a debug shell
root@i7:~/Documents/bos0009/apertis-image-recipes# ../go/src/github.com/go-debos/debos/debos -t architecture:$ARCH --debug-shell apertis-image-$ARCH.yaml
2017/11/17 07:39:46 Ignored link: /etc/ld.so.conf.d//etc/ld.so.conf.d/x86_64-linux-gnu_EGL.conf
2017/11/17 07:39:46 Ignored link: /etc/ld.so.conf.d//etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf
[    1.072426] systemd-fstab-generator[147]: Failed to create mount unit file /run/systemd/generator/home-fredo-Documents-bos0009-apertis\x2dimage\x2drecipes.mount, as it already exists. Duplicate entry in /etc/fstab?
[    1.483118] Error: Driver 'pcspkr' is already registered, aborting...
Running /debos --artifactdir /home/fredo/Documents/bos0009/apertis-image-recipes --template-var architecture:"amd64" /home/fredo/Documents/bos0009/apertis-image-recipes/apertis-image-amd64.yaml --debug-shell --shell /bin/bash --internal-image /dev/disk/by-id/virtio-fakedisk-0
2017/11/17 06:39:49 ==== Unpack ospack_17.12-amd64-minimal_00000000.0.tar.gz ====
2017/11/17 06:39:51 ==== image-partition ====
2017/11/17 06:39:52 Formatting partition 1 | mkfs.fat 4.0 (2016-05-06)
2017/11/17 06:39:52 Formatting partition 2 | mkfs.fat: warning - lowercase labels might not work properly with DOS or Windows
2017/11/17 06:39:52 Formatting partition 2 | mkfs.fat 4.0 (2016-05-06)
2017/11/17 06:39:52 Formatting partition 4 | mkfs.fat: warning - lowercase labels might not work properly with DOS or Windows
2017/11/17 06:39:52 Formatting partition 4 | mkfs.fat 4.0 (2016-05-06)
2017/11/17 06:39:52 Formatting partition 5 | mkfs.fat: warning - lowercase labels might not work properly with DOS or Windows
2017/11/17 06:39:52 Formatting partition 5 | mkfs.vfat: unable to open /dev/disk/by-id/virtio-fakedisk-0-part5: No such file or directory
2017/11/17 06:39:52 Formatting partition 5 | mkfs.fat 4.0 (2016-05-06)
2017/11/17 06:39:52 Action `image-partition` failed at stage Run, error: exit status 1
2017/11/17 06:39:52 >>> Starting a debug shell
bash-4.4# exit
Powering off.
[   74.266920] reboot: Power down

from debos.

stappersg avatar stappersg commented on August 15, 2024

In #20 wrote @d4s

But to be honestly -- I catch #50 once during last 2 months but can't reproduce it again.

How often do others encounter this issue? Can they reproduce it?
(needs this issue to remain open??)

from debos.

d4s avatar d4s commented on August 15, 2024

Unfortunately have no enough information to close or debug :-(

from debos.

Thread974 avatar Thread974 commented on August 15, 2024

I haven't tried for a while, but I can give it another try and let you know if it still happens.

from debos.

Thread974 avatar Thread974 commented on August 15, 2024

It occured several times this week while building armhf images. I notice it while doing ospacks and images, rather that doing only images.

from debos.

sjoerdsimons avatar sjoerdsimons commented on August 15, 2024

So with some recent improvement to fakemachine to speed up the 9pfs stuff this actually tends to happen more often. Which made it reproducible enough for some more analysis/testing.

The problem is that we currently assume in debos that after doing the partitioning we can use udev settle to wait for the device node to be create.. Unfortunately while you would expect adding a partitoin to just cause a single uevent adding the new device, it seems reality isn't that simple.

What actually comes from the kernel is:

  • change uevent on all old partitions
  • remove uevent on all old partitions
  • change uevent on the full block device (e.g. /dev/vda)
  • add uevent on all old partitions
  • change uevent on all old partitions
  • add uevent for the new partitions
  • remove uevent on all partitions (new and old)
  • change uevent on the full block device
  • add uevent on all partitions (new and old)

So it seems that when the kernel is asked to recheck the partition table it seemingly drops all current partition device and re-adds/creates them. In some artifical testing in a fakemachine udevmonitor also shows there is a gap of about 16 miliseconds between the first add event on the new partition and the second round of remove/add events.

So the reason for this race is that there is a small window where the devices files are removed and recreated by udev, if that just happens when a filesystem is created or tried to be mounted you get the issues above.

from debos.

sjoerdsimons avatar sjoerdsimons commented on August 15, 2024

After a bit of peering at the kernel code and not understand why we're getting multiple sequences of events i did a bit more debugging.. It turns out udev is trying to be too helpful and will trigger a rescan of the partition table iff the block device itsef triggers an inotify IN_CLOSE_WRITE and it can get an exclusive lock on the device (e.g. no partitions mounted). When using parted this isn't useful, as parted already tell the kernel about the new partition (triggering an add uevent) and just adds confusion as seen above..

Testing with flock shows that indeed prevents it from happening and only and only a single add uevent for the new partition occurs, which is what we'd want to clear this up properly.

from debos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.