Comments (26)
Added http://ci.ros2.org/job/ci_turtlebot-demo_linux-aarch64/102/ to the list of builds above.
Edit: er.. http://ci.ros2.org/job/ci_linux-aarch64/548/
from ci.
This is a sort of time bomb / hot potato for whichever buildfarmer's touch to the Dockerfile re-triggers this issue.
moby/moby#1171 looks very promising. I might get a chance to circle back to this during my buildfarmercop reign.
from ci.
moby/moby#27384 is perhaps the root cause of this. Docker 1.12.x on ARM64 is compiled with Go 1.7, and Go has a bug that's tickled by Docker.
The good news is that we're looking at Go 1.9 successfully being available as a binary release for ARM64 from the Go project, and in some reasonable timeframe after that there should be a fresh binary for ARM64 of Docker that can be easily consumed.
In the meantime the best strategy is to reduce the number of layers you use. There are various Docker optimization strategies to pursue that would reduce the size of the Dockerfile and thus eliminate layers and build steps, with a performance improvement. Looking at
https://github.com/ros2/ci/blob/master/linux_docker_resources/Dockerfile
In particular successive RUN commands can usually be merged, e.g. as
RUN foo
RUN bar
is almost always equivalent to your purposes to
RUN foo && bar
and that wipes out a layer. You might look at this (vintage) writeup
https://blog.tutum.co/2014/10/22/how-to-optimize-your-dockerfile/
that's pretty useful. Newest Docker has some more tricks up its sleeve, so don't go completely overboard on optimizing, but there are small changes that should help stability.
from ci.
My understanding is that this has been resolved by upgrading the version of Docker on the arm64 machines to one that has better file system stability.
from ci.
My understanding is that this has been resolved by upgrading the version of Docker on the arm64 machines to one that has better file system stability.
We haven't upgraded this Jenkins instance yet. We have what was previously referred to as the ROS 2 buildfarm which lives at http://ci.ros2.org and provides continuous integration and build archiving for ROS 2 omnibus builds, and in the past couple weeks have created a ROS 2 buildfarm at http://build.ros2.org based more closely on the ROS buildfarm (build.ros.org) to build the Xenial debs for the beta2 release of ROS 2.
The Docker deb that I built last weekend has been deployed to the package buildfarm http://build.ros2.org Depending on the CI load today it might be a good day to upgrade docker on the CI farm and see if that does indeed resolve this.
from ci.
There's an updated version of Docker 1.12.x also available through a PPA that should fix this particular issue. @nuclearsandwich - if you have not yet resolved this issue, I'd like to help you work through testing it.
from ci.
@vielmetti it's been a while since I've looked at this since I've been pulled away to other matters.
It looks like we're still running 1.12.6 on the aarch64 host. I've got the deb I built for the buildfarm of 17.6-ce which we could also test resolution with.
I think @wjwwood has the buildfarm shift currently but I don't know if anyone has bandwidth to update docker and test without the workaround leading up to beta 3.
from ci.
I think @wjwwood has the buildfarm shift currently but I don't know if anyone has bandwidth to update docker and test without the workaround leading up to beta 3.
I don't at the moment. But if someone can show it fixes the issue I can work on upgrading the machines in the background (less work than testing it out I think).
from ci.
FWIW, I pushed a branch to https://github.com/ros2/ci called failing-docker
that restores the problematic Docker code we had before. I kicked off a build using that branch here: http://ci.ros2.org/job/ci_turtlebot-demo_linux-aarch64/96/console . Early indications is that it does not show the problem we previously had, but I haven't spent time to investigate why.
from ci.
I think that the problem is a matter of number of layers and not about a specific command. Given that we reduced the total number of layers in the Dockerfile, the referenced job will not prove if it's fixed or not (very close though given that it's now 49 layers deep and the job crashes as soon as we reach 50 layers).
I increased the number of layers on that branch and ran ci with it. It fails with the expected error so that branch can be used for testing.
Edit: actually it failed at step 46 and not 50 o_O, but it does exhibit the error message described in the original issue error creating aufs mount to /var/lib/docker/aufs/mnt/2a5e9a9926e8feb37fc35aedba7e40299e347f493a1e73da38255e9cb1376f2c: invalid argument
from ci.
@mikaelarguedas Ah, I didn't realize that we had reduced the layers. Thanks for the update.
from ci.
The instructions for installing this newly fixed version of Docker 1.12 are as follows:
A build with option #1 suggested in comment #2 is now available in a PPA
(Many thanks to mwhudson!) and is ready for test. To install :
$ sudo add-apt-repository ppa:mwhudson/devirt
$ sudo apt-get update
Then
apt-get upgrade
orapt-get install docker.io containerd runc
should
get the rebuilt versions.
The related bug report for reference is https://bugs.launchpad.net/bugs/1702979
from ci.
@nuclearsandwich do you think that you will have spare cycles in the near future to try a more recent docker on the packet machines? (I don't remember if a newer kernel was needed as well or if newer docker could be enough)
from ci.
Two things of note -
The newest Docker installs very easily with the instructions to use get.docker.com
or test.docker.com
. Uninstall the old docker.io first and the script will grab keys and install the latest docker-ce package on Arm.
Also, it's much easier to get Arm base images these days that do what you want, because there's support for multi-arch images in the main docker library. (ie. you can say "FROM ubuntu" and it will do the right thing).
from ci.
Also, it's much easier to get Arm base images these days that do what you want, because there's support for multi-arch images in the main docker library. (ie. you can say "FROM ubuntu" and it will do the right thing).
whoa do you have a link for more info on this? cc @tfoote @ruffsl
from ci.
do you think that you will have spare cycles in the near future to try a more recent docker on the packet machines? (I don't remember if a newer kernel was needed as well or if newer docker could be enough)
To answer this question, yes. I can give this a shot Monday or this afternoon. Since we're not wrangling the CI hosts with puppet we can use the script from get.docker.com or the 17.06 deb I built for the buildfarm hosts.
from ci.
My notes on multiarch are here
and the Works on Arm newsletter covered them today
A good read is this from Phil Estes
https://integratedcode.us/2017/09/13/dockerhub-official-images-go-multi-platform/
from ci.
Updated the CI host to 17.07 with the get.docker.com script @vielmetti linked.
Running 3 builds
CI Linux ARM64
CI 🐢🤖
CI TURTLEBOT (failing docker branch)
mikael/failing-docker
All three have made it past the Dockerfile building stage so this looks like it does the thing. I think maybe we let the nightlies run over the weekend and if nothing bad happens probably update the other linux hosts in order to keep the same version of docker everywhere.
/cc @sloretz as build farmer.
from ci.
@nuclearsandwich Can you please retrigger the turtlebot job to use the mikael/failing-docker
branch, the failing-docker
branch was not failing last time we tried and has not been deleted since. @clalancette FYI
from ci.
and 💥
23:00:01 Step 46/59 : RUN (apt-get update || true) && apt-get install --no-install-recommends -y python3-dev
23:00:02 error creating aufs mount to /var/lib/docker/aufs/mnt/d7c3a82c1f164dbc8143945e6c0cb559932d36e796204081e59b36456e4f65e6: invalid argument
It seems like aufs is still part of the problem as it's still the default driver. Since we're on Ubuntu Xenial is there anything we need to do to try again with the overlay2 storage driver?
from ci.
We've been waiting to upgrade the OS before trying overlayfs2 it looks like it's not too hard to enable: https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#configure-docker-with-the-overlay-or-overlay2-storage-driver maybe the puppet formula had an option too.
from ci.
whoa do you have a link for more info on this? cc @tfoote @ruffsl
I touch on this in my previous docker for arm announcement:
https://discourse.ros.org/t/announcing-ros-docker-images-for-arm-and-debian/2467
Specifically, the related ticket for this new functionality is here:
docker-library/official-images#2289
from ci.
I touch on this in my previous docker for arm announcement:
Oh nice. I must have missed it with my head down. Sorry for the spurious ping.
from ci.
If you're looking to test a Docker installation to see if it will crash when the file system gets too deep, I give you
https://gist.github.com/anonymous/bdafb8e961f55b2533fee8fa5221d186
If you are running an unpatched apt-get install docker.io
on Ubuntu 16.04, this will fail at about layer 40.
from ci.
This is still marked as "open", but we should be OK now with anything resembling a modern Docker version.
from ci.
This is still marked as "open", but we should be OK now with anything resembling a modern Docker version.
Very good point @vielmetti. In fact we just updated to 18.09.5 last week. Thanks!
from ci.
Related Issues (20)
- Windows docker image uses pyreadline causing deprecation warnings HOT 11
- Unpin flake8-blind-except version
- Painted source code isn't updated in coverage jobs
- Update xunit plugins and restore backwards-compatible templates.
- Docker image for windows HOT 1
- About https://build.ros2.org/ Jenkins's "Collapsing Console Sections" setting HOT 2
- Connext incorrectly trying to be installed in RHEL packaging job for Foxy HOT 1
- rticonnextdds-src HOT 4
- ros2_batch_job --workspace-path option does not work HOT 1
- CI is broken because osrf/rticonnextdds-src doesn't exist anymore? HOT 3
- CI should throw an error if ``CI_BRANCH_TO_TEST`` is not found in any of the repos
- ROS 2 CI not merging branches with master HOT 1
- Phased updates can cause build regressions on Ubuntu Jammy HOT 2
- `error waiting for container: unexpected EOF` failing builds HOT 12
- Tight coupling of `pyside2` to `Qt` version can cause build regressions HOT 1
- RHEL CI jobs have many colcon warnings
- Consider restoring --src-mounted argument HOT 1
- :farmer: Nightly coverage job failing in subprocess run
- CI_BRANCH_TO_TEST interferes with CI_COLCON_BRANCH
- :fly: `create_jenkins_job.py` doesn't support `empy>=4`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ci.