Comments (8)
If I understand it correctly, the idea is that a layer that is 50% of the total image size should be started first, so that the others can all be pulled in parallel with the big one, and the total time is about the same as the time to pull the big layer, instead of spending time copying 10 smaller layers, and only after most of that is done, starting the big layer.
That only makes a difference for moderately unbalanced images, where the largest layer is probably > 1/6 of the total image size, but not something like 99%.
I think it’s an interesting optimization worth exploring. We can’t/shouldn’t do that for c/storage, and due to compatibility we’d need an opt-in anyway, but that’s not too bad.
We might want to think about the UI impact — e.g. should we list the progress bars in the original image order, to show the user what’s going on? Currently we create the progress bars only when we start pulling, in order. That might end up being the most complex part of the feature.
from image.
Thanks for reaching out.
Can you elaborate on why the order matters?
As for pulls: the order must be preserved as the layers must be applied to the local storage in the exact order.
from image.
Adequate unbalancing is guaranteed in many containerized python applications for example, which have to be based on Ubuntu, so the base image layer is much larger than the application layers (all the way up to the NVIDIA CUDA images with their astoundingly heavy 3.5... GB base images). The problem is if the unbalanced images are pre-sorted already, and this unfortunately looks likely, as the base layer is first already, so the size-sorting might not make much of a difference in practice.
On the other hand, the forking has to be done anyway, and altering its sequence does not add any extra overhead, so unless there is some noticeable overhead on gathering layer sizes and sorting them or on accessing server-side layers "out of order", this new method should be always outperforming the current method, regardless of how small or unnoticeable (and performance gains should be double, because they should be also achievable during the push phase). I suspect the main reason why this is has not been done already like this is the way in which the legacy system from which skopeo
inherited operates. The docker pull
however has a very different use case - to run
the container after the pull
is complete, rather than to immediately push
it somewhere else.
from image.
The way c/storage is set up, pulls must create layers from base to the last child, in order (they have parent links).
Now, whether that’s a 100% hard requirement, where we just can’t create the child before the parent, or more of an implementation choice, depends on the graph driver (it‘s 100% hard for device-mapper-snapshots, and it might be a choice for overlay, but I’m not quite sure). Even if it were 100% an implementation choice, that would be a pretty large implementation effort (we would need to have a concept of an extracted diff that is not yet a layer, a mechanism to turn that into a layer quickly, and a cleanup mechanism to delete that extracted diff on unexpected aborts).
For direct registry-to-registry copies, this should be quite easy to do; the progress UI is the hardest part, the rest is just mechanical work. (But note that such copies are not pulls+pushes with a disk intermediary; they are direct streaming copies, so there are no “double” gains.)
For pushes, I think it’s same as registry-to-registry copies, but there’s a small chance I’m missing something.
from image.
A friendly reminder that this issue had no activity for 30 days.
from image.
You would also take up more temporary space as the blobs would exist on disk for a longer point of time. Currently once a blob is downloaded, completely that layer is applied to storge and the layer is removed.
But if this is a minor change, I think we should do it.
from image.
A friendly reminder that this issue had no activity for 30 days.
from image.
Moving to c/image; this would be transparent to Skopeo itself.
from image.
Related Issues (20)
- isManifestUnknownError fails against Harbor registries, breaking sigstore signature upload HOT 15
- Blob reuse decisions do not take into account manifest support HOT 1
- Cannot copy buildkit cache images HOT 2
- Support for structured logging (using `log/slog`) HOT 5
- proposal: Support append images into docker archive HOT 1
- Make a new release HOT 2
- Docker client code can no longer talk to the latest verson of the docker daemon 25.0.0 HOT 5
- Allow empty OCI configs for artifacts HOT 9
- policy.json overwrite not honouring $XDG_CONFIG_HOME HOT 3
- Podman cannot pull image from local registry HOT 4
- copy.Options.EnsureCompressionVariantsExist doesn’t detect existing variants with zstd:chunked
- support multiple sigstore keys HOT 6
- How can I copy from a tar file stream HOT 7
- "slices" module only in go 1.21 HOT 1
- Cannot pull sigstore signed image with podman HOT 4
- Error inspecting local manifest-lists HOT 6
- platform.WantedPlatforms is noisy on macOS HOT 7
- Incorrect syntax highlighting in containers-transports.5
- Why do we get the whole image when inspect with docker daemon? HOT 2
- Support sigstore BYO PKI verification HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from image.