Coder Social home page Coder Social logo

Use docker cache across runs about conjob HOT 5 CLOSED

ScottG489 avatar ScottG489 commented on August 28, 2024
Use docker cache across runs

from conjob.

Comments (5)

ScottG489 avatar ScottG489 commented on August 28, 2024

See these links:
Persistence of Inner Docker Images
Persistence of Inner Container Images using Docker Volumes
Persistence of Inner Container Images using Bind Mounts

However, note this warning:

A given host directory mounted into a system container's /var/lib/docker must
only be mounted on a single system container at any given time. This is a
restriction imposed by the inner Docker daemon, which does not allow its image
cache to be shared concurrently among multiple daemon instances. Sysbox will
check for violations of this rule and report an appropriate error during system
container creation.

This would essentially mean that we'd only be able to run 1 build at a time which isn't really acceptable. However, I think there may be some at least partial solutions.

If I follow the documentation above, it lays out how to create a volume for your container which you then mount as the docker cache dir (/var/lib/docker). This would mean that we could create a volume per job and all subsequent jobs would then have access to that volume and cache. However, this would impose the restriction of conjob only being allowed to run 1 instance of a particular job at a time. This may not be ideal but is acceptable.

I think this would work similarly to how secret volumes work. For the docker cache it would look something like this:

  1. Request to run job
    1. If job of same name is already running, reject the request
    2. If job isn't running, continue
  2. Check if a volume with a name corresponding to the job name exists
    1. If it exists then mount it on the jobs docker cache dir (/var/lib/docker)
    2. If it doesn't exist, create it then do 2i

I believe this is doable and a pretty clean solution.

One question to be answered though: what are the implications of allowing all instances of the same job have access to the same cache? Particularly what are the security concerns?

from conjob.

ScottG489 avatar ScottG489 commented on August 28, 2024

Another thought after revisiting the previous comment. I think we could get around the limitation of restricting jobs to a single instance if we just don't make it an expectation that there will always be a cache and instead think of it as a performance perk when possible. After all, that's really what the docker build cache is anyways. When you build something for the first time you don't have a cache, so I don't think it's some kind of expectation that the user should have.

Of course, the user may expect that once they do one build, subsequent builds will be cached, but what about for caches of builds running in parallel? How does docker itself handle cache layers for parallel builds? Do caches become available as soon as they finish within the build or not until the entire build finishes and likewise for consuming a cached layer.

  1. Only 1 build uses the cache volume and all others build as normal without a cache
  2. Another volume is created if an existing one is taken

(1) is fairly straightforward and may be a good place to start. (2) introduces a lot more into play. How do we proceed after these jobs have finished and another starts. Which volume will be used? I imagine we'd want to use the most recent one, but then should we just delete the other one? It may still be useful in the future to have image caches from older runs so we likely won't want to delete it. I also think it would be best to give the impression of all the caches from all previous runs to be in place unless they were removed explicitly. It would be confusing behavior to be missing caches all of a sudden just because two jobs ran at the same time.

I wonder if it would be possible to somehow merge caches? This seems like it would be the ideal solution, but could be complex. My idea here would be that after two given volumes are unmounted and no longer used by any containers, the caches from both could be merged together, if that's even possible. So when the next job ran, it would have access to everything as if the previous two concurrent jobs ran serially.

That said, it's always going to be best case that a build running in parallel to another won't have the cache from that other run. However, I think that is fine since the build cache is really just a performance improvement and any kind of dependence on an "ordered" cache shouldn't be assumed anyways with docker. If the user wants that then they should disable parallel builds (disabling parallel builds isn't yet supported).

It's also worth considering what effect this will all have on the expectation that normally docker images that are completely different share layers they have in common. Could this throw off someones workflow? I don't have any insight here yet.

from conjob.

ScottG489 avatar ScottG489 commented on August 28, 2024

Another note of emphasis is that the reason we can't mount the same build cache on all volumes is because

a given host directory mounted into a system container's /var/lib/docker must only be mounted on a single system container at any given time

as mentioned in an above comment.

from conjob.

ScottG489 avatar ScottG489 commented on August 28, 2024

The approach we ended up taking was to simply mount a volume meant for a specific job. We don't do any checks that the volume is already mounted elsewhere. Sysbox's docs mention that

Sysbox will check for violations of this rule and report an appropriate error during system container creation.

From what I can tell, these errors are reported in the inner dockerd's logs which won't be visible in the job output (depending on how the user has set up their job). The job then fails without further output.

This is not ideal and some error should be reported to the user (other than the 400). We should do some checking before the job is run that the volume is not already in use and report this back to the user. A follow up issue will be made for this work.

from conjob.

ScottG489 avatar ScottG489 commented on August 28, 2024

Closed by 8323344.

Follow up work can be found in linked issue #41.

from conjob.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.