opencontainers / runtime-spec Goto Github PK

View Code? Open in Web Editor NEW

3.1K 203.0 535.0 1.77 MB

OCI Runtime Specification

Home Page: http://www.opencontainers.org

License: Apache License 2.0

Go 92.62% Makefile 7.38%

oci docker runc containers

runtime-spec's Issues

"version" should not be used for the manifest format version

I think "version" should be reserved for future expansion and/or metadata extensions by the environment, not used for the manifest version the JSON document complies with. That should be specified with a key like Chrome's manifest_version.

linux: mounts: document bind mounts

We have no examples of how a bind mind from the host works, we need to fix this.

Support for Quality of Service

The API has to let users express the notion of priority among containers. Consider the case when a web server is run alongside a logging container. The web-server is more important than the logging side-car container. When there is a resource crunch, the user doesn't mind killing one of the lower priority containers, which is logging in this case. Ideally, if all containers were to be run with limits, this might not be necessary. In reality though, setting limits is hard and users tend to over provision resources and that leads to poor resource utilization.

Since this notion of priority can be expressed in different ways, I propose letting users handle cgroup management and only expose the following knobs:

CgroupsPath - This lets users manage cgroup hierarchies.
OomScoreAdj - Apply a custom 'oom_score_adj' to the init process of the container.

spec.go: move Go types to this repository

We should move the initial spec.go spec_linux.go to this repository. This allow validation tools to be built to check and/or generate documentation along with configuration. It also allows anyone writing OCF tools in Go to have the types in one repository that they can import and feel confident that they are correct and follow the spec.

Version compatibility with an evolving spec

The current docs say:

If the container is compliant with multiple versions, it SHOULD advertise the most recent known version to be supported.

That's going to make backward compatibility with old tools hard. I'd rather take major and minor numbers from semantic versioning, so a v1.3 config would be compatible with a v1.3 launcher, or a v1.4 launcher, etc., etc., but not a v1.2 launcher or a v2.0 launcher. That gives you some granularity for specifying which of several compatible features you need (e.g. UID-mapping for #10 was only added in v1.3, but v1.3 launchers can still handle the v1.0 containers).

One limitation to the sem-ver appoach is that feature addition needs to be serialized. If you have to orthogonal extensions, and A lands in v1.3 while B lands in v1.4, implementations that want to support feature B also need to support feature A. For example, Notmuch uses feature tags to mark supported optional features (which is what sem-ver's minor releases are for).

So depending on how flexible you want to make life for spec implementors, I'd recommend chosing either sem-ver (simple, some cross compatibility, serialized features) or feature tags (more complicated, lots of flexibility, parallel features). Having a single integer that spec authors are supposed to bump (the current recommendation here) gives you essentially zero cross-compatibility without building the list of changing features into the implementation itself.

draft: reformat everything to one sentence per paragraph

In accordance with this style change: https://github.com/opencontainers/specs#markdown-style

I can do this right before we finish the first draft and most of the PRs are merged.

Mounting cgroups in a container

This issue has been created to continue the discussion in #65

Is PORTS field neccessary for oci specs?

The container will listen on the specified network ports at runtime for the purpose of interconnecting, as a result, I think exposed port is important for the container runtime. Is it neccessary to add PORTS field for the specs. Maybe it is appropriate to add Port struct in runtime_config.go.

Expand the list of valid instruction set specifiers

Currently the spec states that the arch property in config.json should conform with the Go language spec. However, this spec only covers the following architectures: amd64, 386 and arm.

Since one of the goals of the Open Container Project is to support multiple architectures, and given the fact that, at least for ARM [1], it is very common to have different versions and different extensions depending on the SoC, the current list is insufficient.

Common architectures include armv5, armv6 and armv7a. The list could be further expanded by adding armv5, mips and powerpc. Another dimention that should be put into consideration is whether or not the architecture features a floating point unit, which will require architecture specifiers such as armv7ahf, armv6hf, mipseletc.

A good source of conventions can be found in the source code of the Open Embedded project [2]. Doing a quick grep for available architecture specifiers (AVAILTUNES) yields the following list:

aarch64,aarch64_be,arm1136jfs,arm920t,arm926ejs,arm9tdmi,armv4,armv4b,armv4t,armv4tb,armv5,armv5b,armv5b-vfp,armv5e,armv5eb,armv5eb-vfp,armv5ehfb-vfp,armv5ehf-vfp,armv5e-vfp,armv5hfb-vfp,armv5hf-vfp,armv5t,armv5tb,armv5tb-vfp,armv5te,armv5teb,armv5teb-vfp,armv5tehfb-vfp,armv5tehf-vfp,armv5te-vfp,armv5thfb-vfp,armv5thf-vfp,armv5t-vfp,armv5-vfp,armv6,armv6b,armv6b-novfp,armv6hf,armv6hfb,armv6-novfp,armv6t,armv6tb,armv6tb-novfp,armv6thf,armv6thfb,armv6t-novfp,armv7a,armv7ab,armv7ab-neon,armv7ahf,armv7ahfb,armv7ahfb-neon,armv7ahf-neon,armv7ahf-neon-vfpv4,armv7a-neon,armv7at,armv7atb,armv7atb-neon,armv7athf,armv7athfb,armv7athfb-neon,armv7athf-neon,armv7athf-neon-vfpv4,armv7at-neon,c3,core2-32,core2-64,core2-64-x32,corei7-32,corei7-64,corei7-64-x32,cortexa15,cortexa15hf,cortexa15hf-neon,cortexa15-neon,cortexa15t,cortexa15thf,cortexa15thf-neon,cortexa15t-neon,cortexa5,cortexa5hf,cortexa5hf-neon,cortexa5-neon,cortexa5t,cortexa5thf,cortexa5thf-neon,cortexa5t-neon,cortexa7,cortexa7hf,cortexa7hf-neon,cortexa7hf-neon-vfpv4,cortexa7-neon,cortexa7t,cortexa7thf,cortexa7thf-neon,cortexa7thf-neon-vfpv4,cortexa7t-neon,cortexa8,cortexa8hf,cortexa8hf-neon,cortexa8-neon,cortexa8t,cortexa8thf,cortexa8thf-neon,cortexa8t-neon,cortexa9,cortexa9hf,cortexa9hf-neon,cortexa9-neon,cortexa9t,cortexa9thf,cortexa9thf-neon,cortexa9t-neon,cortexm1,cortexm3,cortexr4,ep9312,i586,i586-nlp-32,iwmmxt,mips,mips32,mips32el,mips32el-nf,mips32-nf,mips32r2,mips32r2el,mips64,mips64el,mips64el-n32,mips64el-nf,mips64el-nf-n32,mips64-n32,mips64-nf,mips64-nf-n32,mipsel,mipsel-nf,mips-nf,powerpc,powerpc64,powerpc-nf,ppc476,ppc603e,ppc64e5500,ppc64e6500,ppc64p5,ppc64p6,ppc64p7,ppc7400,ppce300c2,ppce300c3,ppce300c3-nf,ppce500,ppce500mc,ppce500v2,ppce5500,ppce6500,ppcp5,ppcp6,ppcp7,sh3,sh3eb,sh4,sh4a,sh4aeb,sh4eb,strongarm,x86,x86-64,x86-64-x32,xscale,xscale-be

The above list is not exhaustive of the variations that each platform can have but it should be a good start of available architectures.

Something to note is that the Open Embedded project uses x86-64 instead of amd64. Currently all built the images in the docker registry have their Arch field set to amd64 so it might be a sensible exception.

[1] https://en.wikipedia.org/wiki/List_of_instruction_sets#ARM
[2] https://github.com/openembedded/oe-core/tree/master/meta/conf/machine/include

New v0.1.1 tag doesn't match v0.1.0 in version.go

We just tagged v0.1.1 yesterday, but I think the plan had been to make the opening tag v0.1.0 (which matches SemVer's first FAQ entry). In any event, #173 went with v0.1.0. It would be nice if the tag and the version.go constant matched each other ;), so I'd suggest bumping both to v0.1.2 to avoid any ambiguity.

tooling: basic validation tool for config.json

We should wait until the first "draft" is done so we don't rewrite it all the time :)

Basic operation:

octool lint
octool validate-state
anything else?

bundle: filesystem metadata format

@stevvooe and I caught-up in person about our digest discussion and the need for serialize file-system metadata. If you want to read my attempt it is found here: #5 (comment)

Problem: a rootfs for a container bundle sitting on-disk may not reflect the exact intended state of the bundle when it was copied to its current location. Possible causes might include: running on filesystems with varying levels of metadata support (nfs w/o xattrs), accidental property changes (chown -R), or purposeful changes (xattrs added to enforce local policies).

Obviously the files contents will be identical so that isn't a concern.

Solution: If we hope to create a stable digest of the bundle in the face of these likely scenarios we should store the intended filesystem metadata into a file itself. This can be done in a variety of ways and this issue is a place to discuss pros/cons. As a piece of prior-art @vbatts has implemented https://github.com/vbatts/tar-split and we have the linux package managers with tools to verify and restore filesystem metadata from a database with rpm -a --setperms and rpm -V.

[Tracker] Platform/Compiler agnostic schema

like #135 began to fix, this needs to track our movement from golang compiler constrained schema, to something like json-schema.
At that point, we can have the provided *.go (if at all) as a reference for valid structs for the schema.

Multi platform container

From what I understood from the specs is that a container targets a single platform? If so, wouldn't it be better to allow for different configurations depending on the platform, all packaged into one container?

Clarify readonlyRootfs

In confid.md:
We should clarify the readonlyRootfs flag. My interpretation is that when true, writes to that file system from within the container would result in copy on write and when false they would actually modify the underlying file system. Is that an accurate understanding of the intent?

Storing/retrieving container state

We need to standardize on how to store/retrieve the state of a running container. runc today allows one to specify an id during invocation and then stores the state of that container under /var/run/ocf/<id>.

I think we need to agree on a directory and the mechanism if there are any alternate suggestions/ideas.

@crosbymichael @vmarmol @LK4D4 @philips

Separate linux-specific config from os-independent config

Moving opencontainers/runc#2 here:

As discussed, we like the idea of having 2 different kinds of configuration in the manifest: 1) os-specific config (for example: create a new pid namespace, or set cgroup foo to bar, or drop CAP_SYSADMIN), and 2) os-independent config (for example: execute /bin/bash, or set environment DEBUG to 1).

Currently these 2 different kinds of configs are mixed in the manifest. We should segment them more clearly, to make more visible the tradeoff between control and portability, and to allow for new sections to be created for other OSes - Windows, Solaris, FreeBSD etc.

linux: figuring out the uid/gid

A config.json may specify a string "username" on Linux as the user to execute the process as. The trouble is we need to know what uid/gid this maps to inside of the container. Sadly this requires hacks:

parse /etc/passwd (if it exists!)
call getent passwd inside of the filesystem

This spec likely should make a recommendation on what needs to be done here and in what order.

checks, tests and style

golint, go tests, etc.

Allow Updates

Collecting Update requirements and use-cases for the working group.

We should be able to update a running container. There are a few simple updates like resources for vertical scaling of containers (memory, cpu) that should be allowed. Adding a volume is another possible use-case.

Do we split the config into immutable and mutable parts? The updates can be done out of band, but we would like the config to reflect reality of the running state.

Hopefully, we can collect more requirement and concerns before handing it over to a working group.

System properties

We have key=value pairs that are written to files under the /proc/sys directory in Linux which are exposed as system properties in libcontainer. Does Windows have any such equivalent? I would imagine that registry setting might fit in there. If that is the case, then we could move system properties to the portable portion of the spec.

linux: specs for security development

Add spec for developers - what they should implement for security. Example for Linux.
Goals:
Right now it hard to find list of features: you have to read source code, descriptions and man pages of another software to get shards of information.

b

Spec: Comments on Cryptographic Sig Questions

Hello.. attached are some comments / thoughts on the crypto aspect. I think the overall goal should probably be secure, flexible, avoid really opinionated decisions, but also be decisive enough to avoid implementation ambiguity.

The most sensible way forward on this part of the standard I think it to make this optional until it's been reviewed and is mostly supported. It might make sense here to use Digests as a stepping stone to Digital Signatures.

-## 3 Cryptographic signatures

-NOTE: I know this is sounding crazy, but it just might work! The main problem is that this is very slow.
Every file in the container’s root filesystem must be read. It is, however, very flexible and quite portable. Some things to decide here:

-* How carefully do we specify the digest file? (bad, but accurate name)

Well that's easy ....I think.. just make the container digest manifest recommended but optional like the .asc digital sig in rocket for the time being, that way it doesnt break anything existing.

like...
container.img (container file / archive format for distribution)
container.sig (digital signature)
container.sha256 (sha digest)

There's a few options overall when speaking of signatures, hashes, etc... I'd think:

(1) Do Nothing
(2) Digest: SHA2 / SHA3 sum of container file (only provides very basic integrity , but low barrier to implementation)
(3) HMAC: (not really useful in this context , so this is probably out)
(4) Digital Signature: (this is the gpg style sigs like in Ubuntu PPA, Rkt, etc. Integrity + Authentication + Non-Repudiation)

I'd say at minimum you need SHA2/SHA3 digest of the container file, and probably very important to get to a digital signature in the standard sooner than later. But Digest could be a step along the way to Signatures... Anyway the digital signature should support open standards either way, which brings me to...

Digital Signature Formats:

You've got two or three options here I'd think for the sigs. Basically you can put them in CMS (aka PKCS#7 aka RFC 5083) format, or you can put them in OpenPGP format (aka RFC4880). Both formats are equally annoying.

CMS / PKCS#7 / RFC 5083
OpenPGP / RFC4880
TUF / The Update Framework

I noticed TUF got mentioned as well, so I included that... Here is the link to the standard for TUF...
https://github.com/theupdateframework/tuf/blob/develop/docs/tuf-spec.txt

The CMS format is ASN.1 while OpenPGP is ascii armor. The former is used in S/MIME, the latter in OpenPGP. Regarding TUF, while I like how it uses JSON and I love how it's in Python.... However.. if you look at the TUF standard you can see (A) the project doesn't seem to be quite production-ready yet , though it is promising, and (B) the TUF standard does not have the million implementations that the former two have.

But I would definitely say pick one digital signature format, rather than allow users to implement all three. That'd be a mistake ... I think for digital signatures anyway, there's just so much that can go wrong already, best to keep it concise and clear. Honestly I'd vote for OpenPGP here.

So either way I suppose .... you're gonna have to use an external lib or tool in either case. Probably better just to make it like SHA256 or SHA512 and/or OpenPGP.

-To ensure that containers can be reliably transferred between implementations and machines, we define a flexible hashing and signature system that can be used to verify the unpacked content. The generation of signatures is separated into three different steps, known as "digest", “sign” and “verify”.

K... If you verify (digital signature and/or authenticated encryption) the packed content, you also verify the unpacked content. They are the same as far as I'm aware, at least for a given instant in time.

Meaning: If an attacker flips any one bit in the compressed container image, the uncompressed container will fail validation just the same as if the attacker flips any one bit in the uncompressed filesystem. So there only really needs to be one validation here I'd think, at the container level , but I certainly could be missing something ...

Unless we are talking about protection of data-at-rest or files-in-a-container-on-my-laptop-from-day-to-day

Okay, first, let's separate the two mentions above about digest, sign, verify... I like how they are seperated into steps ... I actually support that ...

So first we separate what are Digests/Hashes from Digital Signatures.. Then we can separate them again based on whether they are at the Container Level (single container file) or at the internal filesystem level (within the container uncompressed filesystem).

I'd suggest going for the low hanging fruit and define these in regards to the container first, in light of the idea that a bit flip in either the container or the filesystem is one and the same . That's the really important part.

Adding filesystem-level hashes is probably a good idea, but might be a bit much at this point. I can already (maybe?) see issues arising with OverlayFS etc. . . I wonder if you've got multiple layers there might be a bit of interpretation about when to calculate the changes, how to perform the merge, whiteout files, etc in regard to sigs and digests.

On top of that .... as much as I like the digest-of-all-files idea personally (having hashes/digests and/or digital sigs of all files in the container filesystem)... this seems like it's almost a re-implementation of something like Tripwire.

-The purpose of the "digest" step is to create a stable, summary of the content, invariant to irrelevant changes yet strong enough to avoid tampering. The algorithm for the digest is defined by an executable file, named “digest”, directly in the container directory. If such a file is present, it can be run with the container path as the first argument:

Well , running a digest/hash AND/OR digital signature on a compressed container image is the sort of critical juncture. Meaning: If I get owned... it's gonna be when I download and run a container image from a host somewhere in the Russian Business Network autonomous system , heh.

Not when I start the same container for the second time in a day.

Anyway, the point at which I first run teh container... that's (I think) where malicious code is most likely to enter my system , my container network, etc -- when I grab a container image someone made from a repository .

If I download that image and it does no harm, and then a week later I create a file on the filesystem... I don't see how that's the same risk as when I first obtain the container archive... So again, I like the idea of having a digest of the files in the container ... buuttttt

There's just alot of questions with maintaining a list of hashes within the container's filesystem... .I suppose it wouldn't hurt to make it optional. . . Anyway .. with a digest / signature of the packed container , it's pretty straightforward to unpack and verify, and it's also straightforward for me to repack and/or sign if I want to share the image.

Maintaining the digest manifest of some or all container files is not a bad idea at all, but it's gonna be slowww in the event of a large filesystem, especially considering we can't use SHA1 anymore, so we have to use SHA2 or SHA3 , and that could be painful with really large files....

Another issue I'm wondering is at what point do we overwrite the 'digest manifest' , to say "Yes, I modified some file like /etc/blah.conf, and I want to update it's SHA2 digest AND it's digital signature"... It just seems like it could create a lot of headache for users , but not really offer much in terms of security ... Like the standards would have to define the conditions on which to overwrite the digest manifest with user approval...

Anyway, so I suppose what I'd say is that I'd be opposed to a digest of the entire container filesystem on a file-by-file basis... it might make sense to include a digest of certain key directories and binaries, but even still , the concern I'd have there is "How do you present to the user such that they are warned about filesystem changes , but they can also accept or reject them as line-items?" That's really what integrity would mean here, and it'd be way granular and in the weeds.

As opposed to just 'container matches' or 'container does not match' like how OpenSSH works with signatures.

I mean maintaining user-friendliness is not the responsibility of this standard, but it just seems like it'd be creating a nightmare scenario up the stack in terms of repackaging a container back into an image, or warning a user about fileysstem changes one-by-one etc. I think if file-level digest is included, it's got to be optional or else really really restricted by default. But I just think it's not any additional security, not in the context of what a digest does... and that's provide integrity in the event of malicious 'bit-flip' type attacks...

The best bet in my mind is probably a layered approach... perhaps requiring a Digest/Hash (mandatory mostly) of the final container image.. but auto-generated and super idiot proof ... and then ALSO making an OpenPGP Digital Signature of the final container image optional but recommended.

I think that approach gives a strong compromise between implementation headache, usability, simplicity, etc.

-$ $CONTAINER_PATH/digest $CONTAINER_PATH
-```
-The nature of this executable is not important other than that it should run on a variety of systems with minimal dependencies. Typically, this can be a bourne shell script. The output of the script is left to the implementation but it is recommend that the output adhere to the following properties:

-* The script itself should be included in the output in some way to avoid tampering

This is a good idea and important, but gotta remember that digest will only ensure integrity in relation to some __external trust anchor that might even be malicious. __. We really need digital signatures to really prevent tampering... Because otherwise how do we know we haven't gotten a fake SHA-sum

Really we need to start with digest, but make digital signatures recommended but not required... this way we can achieve authentication + integrity.

-* The output should include the content, each filesystem path relative to the root and any other attributes that must be static for the container to operate correctly

Sure , this make sense as far as making like a Tripwire-like feature for containers. Which I think is cool, and I wouldn't complain personally if it was in there... But I sure wouldnt' want to implement this, nor would I want to wait for my slow container to load b/c it's running filesystem checksums that are probably redundant... I'm just not 100% sure filesystem-level digests are necessary if you add in a SHA2/SHA3 digest and digital sigs of the container archive file...

-* The output must be stable

I think this might be a challenge .... heh... but I suppose if you limit the directories we are adding to the digest this could work...

Again, I don't see what benefit this approach has over simply just calculating a digest and digital sig over a squashfs compressed container... People probably are not going to be distributing 'loose-files' (aka non-tgz) uncompressed containers for the most part I'd think, so the security threat inherent in distribution is the main problem...

Data-at-rest can be secured using all the existing solutions already available , ala Tripwire, Snort, DM-Crypt, TrueCrypt, etc.

This all looks good below besides just the main objection... I definitely agree with the use of GPG / OpenPGP. X509 might not be an appropriate choice unless we all want to ride on top of the crappy CA infrastructure. I definitely do not want to do that , heh. . X509 kind of brings with it the rot of OpenSSL I'd fear.

I'd just say use GPG / OpenPGP keys, that's a tested and proven solution without the bloat of OpenSSL.

Sign

-The output, known as the container’s digest, can be signed using any cryptography system (pgp, x509, jose). The result of which should be deposited in the container’s top-level signatures directory.

-To sign the digest, we pipe the output to a cryptography tool. We can demonstrate the concept with gpg:
- -$ $CONTAINER_PATH/digest $CONTAINER_PATH | gpg --sign --detach-sign --armor > $CONTAINER_PATH/signatures/gpg/signature.asc -

-Notice that the signatures have been added to a directory, "gpg" to allow multiple signing systems to coexist.

-### Verify

-The container signature can be verified on another machine by piping the same command output, from the digest to a verification command.

-Following from the gpg example:
- -$CONTAINER_PATH/digest $CONTAINER_PATH | gpg --verify $CONTAINER_PATH/signatures/gpg/signature.asc - -

The rest of this all looks good to me; thanks for reading my comments

please version opencontainers/specs

Hello,

As I mentioned in another issue, I am packaging opencontainers/specs for Debian.

We are going to upload a recent commit soon as golang-github-opencontainers-specs 0.0~git20150829.0.e9cb564-1, but we face the issue that runc cannot build with the tip of opencontainers/specs.

Newer versions of Docker also depend on runc, so this has a great impact for us. The fact that runc needs Docker to build puts us in a circular dependencies situation, versioning would help us find stable points where everything builds together.

How far are we from this repository being versioned? This would help us a great deal with the packaging.

security profiles: discussion

There is a nascent concept of security profiles that needs to be tackled. I think we should remove it from the spec as it is a little too ill-defined at the moment.

The big idea is that a system can have a collection of "high-level" security profile options that a user can apply to their container. For example:

Default security profile
Privileged security profile
Untrusted security profile

These profiles would map to low-level details like seccomp filters, selinux profiles, apparmor profiles, etc. The challenge for the spec is to ensure that we define the "merge" operation from the restrictions that an image defines for itself and what the policy it is going to run under defines.

Allow fstab like specifications in specs/config

When I read the Linux example about additional file systems, I immediately wonder if it's really useful and advisable to mix the well known fstab format with field names. This way the "mounts" configuration objects becomes harder to read for human beings, or at least for me.

I would prefer a configuration that looks more like fstab itself

"mounts": [
    "proc /proc proc",
    "tmpfs /dev tmpfs nosuid,strictatime,mode=755,size=65536k",
    "devpts /dev/pts devpts nosuid,strictatime,mode=755,size=65536k",
    "tmpfs /dev/shm shm nosuid,noexec,nodev,mode=1777,size=65536k"
]

This form to setup file system is well known and hasn't changed for years. Defining additional mounts could be an option and might be mixed with the object structure you propose.

But completely ignoring the form of the old fstab line format, looks to me like a bug.

Add config validation and environment tests

The spec should have testing tools for validating the config and testing that an implementation provides the expected environment for the container process. This will involve launching a container process from a bundle that will introspect to make sure that the runtime set it up correctly.

Config validation (mrunalp)
Environment testing

Is it necessary to provide a minimum config in specs?

In my developing of OCT project, I find that there are some requeired config for the specs item.
For example,
if we want to run a linux container, we should at least containing mount config of proc filesystem, other mounted filesystem is not the necessary one.

"mounts": [
    {
        "type": "proc",
        "source": "proc",
        "destination": "/proc",
        "options": ""
    },

So, I am considering weather we should provid minimum config example or adding some explaining to make notes to the neccesary one?

byte order info

I didn't see any way to know the byte order (endian) of the container image other than inspecting binaries in the image. Byte order info should be in the config file so that a system can determine compatibility with only the config, and not need to download the entire container image when testing.

Maybe this is a shortcoming of using GOARCH as the set of values for platform:arch.

linux: username: how to figure out uid reliably

Right now the spec says you need to specify an OS relevant user id for the process to exec on behalf of. Many people don't think about this low-level primitive and rely on user databases like /etc/passwd. In order to support a user saying apache in the open container configuration we would need to do hacks on Linux: Sadly this requires hacks:

parse /etc/passwd (if it exists!)
call getent passwd inside of the filesystem

This spec likely should make a recommendation on what needs to be done here and in what order if we are to support a "username".

This issues replaces #10 and is being refiled since we made a decision to be more explicit and simple for the initial draft milestone.

Mounting own cgroups readonly

Should there be a recommendation or a requirement in the spec that the container's own cgroups are mounted to a particular location read-only for introspection? This is useful for use cases like JVM tuning without having to resort to environment variables.

what happen if set mount path in linux namespaces and mounts in spec at the same time

Allow joining-to-mnt-ns to change the existing mount info? or mounts in spec should be ignored?

"mounts": {
    "data": {
        "type": "bind",
        "source": "/data",
        "options": ["rw"]
    }
    ...
}
"linux": {
    "namespaces" : [
        {
            "type": "mount",
            "path": "/proc/1234/ns/pid"
        }
        ...
    ]
    ...
}

Add rootfs Mount Propagation

Open up new discussion from #56

Rootfs mount propagation deserves a new spec to set rootfs's mount propagation to slave, private, or shared.

Allow joining cgroups of another container

The spec allows specifying namespace fds via paths and that feature allows one container to share the namespaces of another container. Another feature to make that more useful is allowing a container to join the cgroups of another container using its pid. This will make it possible for higher level tooling to add nsinit/nsenter like functionality.

@crosbymichael @philips WDYT?

Dependencies/layers?

As of right now, it doesn't appear that the spec contains any notion of dependencies/layers; that is, something akin to the base image (FROM) in Docker, or dependencies in ACI. Is this something that will be added to the spec eventually?

linux: specify a state pid/filelock for lifecycle

linux: specify the default devices/filesystems available

Linux applications rely on a number of devices and filesystems. Lets define a default set, what do people think of this set lifted from the appc OS-SPEC?

The following devices and filesystems MUST be made available in each application's filesystem

Path	Type	Notes
/proc	procfs
/sys	sysfs
/dev/null	device
/dev/zero	device
/dev/full	device
/dev/random	device
/dev/urandom	device
/dev/tty	device
/dev/console	device
/dev/pts	devpts
/dev/ptmx	device	Bind-mount or symlink of /dev/pts/ptmx
/dev/shm	tmpfs

It should be possible to set the mount propagation modes for containers and volumes

Propagation modes determine how mount and umount events propagate between a mount namespace and its parent. The 'shared' and 'slave' propagation modes are critical to implementing use-cases where a container performs a mount that should be visible to other containers.

Currently the configuration spec lacks any way to specify the propagation mode of a mnt namespace relative to the host's mount namespace, or any indication of what the default propagation mode is. Perhaps this should be an option you can specify in the 'namespaces' config section.

@mrunalp @rhatdan @rootfs

Labels and extension meta data in containers

I would like to discuss ability to attach application-specific meta data to container images. Here are a couple of examples that come to mind:

bookeeping of versions of libraries included in the image that can be later inspected by security audit tools to update containers with obsolete/vulnerable libraries
adding application-specific information that can be used by implementation to execute custom actions, e.g. host restrictions (this container can only run on a host with > 60 GB or RAM)
adding custom signatures to the app-specific content in the image

For example, app container spec adresses this use-case by introducing labels:

labels (list of objects, optional) used during image discovery and dependency resolution. The listed objects must have two key-value pairs: name is restricted to the AC Identifier formatting and value is an arbitrary string. Label names must be unique within the list, and (to avoid confusion with the image's name) cannot be "name". Several well-known labels are defined:
version when combined with "name", this SHOULD be unique for every build of an app (on a given "os"/"arch" combination).
os, arch can together be considered to describe the syscall ABI this image requires. arch is meaningful only if os is provided. If one or both values are not provided, the image is assumed to be OS- and/or architecture-independent. Currently supported combinations are listed in the types.ValidOSArch variable, which can be updated by an implementation that supports other combinations. The combinations whitelisted by default are (in format os/arch): linux/amd64, linux/i386, freebsd/amd64, freebsd/i386, freebsd/arm, darwin/x86_64, darwin/i386. See the Operating System spec for the environment apps can expect to run in given a known os label.

After trying to use labels in practice, I can say that ACI's labels have the following limitations:

Value of the label is limited to string, what makes it hard to include structured data in the value
Label names are unique, but not enforced in the data structure that is presented as a list.

The useful part about labels is namespaces, that helps to identify the purpose of meta data and avoid collisions.

It would be very helpful to include the mechanism of adding meta data to the open container images that addresses some of the limitations listed above.

Copyright owner

Hello!

I am packaging this opencontainers/specs for Debian and I would like to know who is the copyright owner.

It would be a good idea to mention it somewhere.

I noticed that Michael Crosby [email protected] was the one to commit the license.

See: 84707b0

Is he the copyright owner?

//cc @crosbymichael

Thanks,

Support Lifecycle Hooks

Capturing requirements for Lifecycle hooks discussed during the summit.

Lifecycle hooks was agreed to be a useful feature and support for a few basic hooks is expected to be in place. Specifically, PreStart, PreStop and PostStop hooks will be introduced to start with. Supporting hooks might require separating the lifetime of namespaces from that of the processes in the container.

Hooks will be binaries that will be exec'ed. The Spec and current State of the container will be provided as arguments to the hook (via fds?). Are the exec hooks executed synchronously?

This issue is meant to define the Spec for Hooks and discuss the hooks that are necessary to begin with.

Add a complete oci-bundle example for specs

Now, initial draft for OCI specs has been released, so I want to get an oci-bundle to study, and transform it to other container format.
However, I found that there is no complete oci-bundle example to use, and I had to scrabble up an oci-bundle directory through scanning the opencontainers/specs project.
So I think a complete oci-bundle example is needed. oci-bundle example is accompanied by oci specs, and its aim is for the convenience of users.

A complete oci-bundle example is something like this:
https://github.com/huawei-openlab/oci2aci/tree/master/example/oci-bundle

commit validation broken for merges

with #167, the merge commit itself failed, so master failed https://travis-ci.org/opencontainers/specs/jobs/79590679

The command "$HOME/gopath/bin/golint ./..." exited with 0.
0.33s$ go run .tools/validate.go -range ${TRAVIS_COMMIT_RANGE}
 * 2d9842b Merge pull request #167 from vbatts/validate-dco ... FAIL
  - does not have a valid DCO
 * 8b55acf .tools: repo validation tool ... PASS
1 issues to fix
exit status 1
The command "go run .tools/validate.go -range ${TRAVIS_COMMIT_RANGE}" exited with 1.

Setting oom_score_adj while launching applications

This spins off a more tightly-scoped version of #114. The oom_score_adj is more of a host-side and/or multi-container-orchestration issue, and less of a bundle issue, which means it probably should be in runtime.json if/once #88 lands. Possible approaches for setting this include:

The runtime writes to /proc/<pid>/oom_score_adj, which would need a config-side setting.
The host injects a pre-start hook to write to /proc/<pid>/oom_score_adj. This would require a hook with sufficient permissions for the write.
The application has a startup phase where it handles this sort of thing before execing the main process. This would require an application have sufficient permissions for the write, although it could drop them after writing and before execing the “real” application.

A number of attributes where you could use (2) currently have explicit, (1)-style configs for via hooks (e.g. setting up networking and creating cgroups and namespaces). I'd guess the balance involves “how easy is it to handle without (1)?” and “how frequently will folks be tweaking this attribute?”, with high-cost or high-frequency attributes being handled via (1). So which way do we think makes the most sense for this particular setting?

Is it easy to handle via (2) or (3)? It seems like (2) would be easy assuming sufficient hook permissions, but (3) is probably too annoying to be worth the trouble.

How frequently do we expect folks will use this? I can't weigh in here, since I haven't set this. And I expect most runtime managers that set this will be doing it automatically, so in that case it's a wash between (1) and (2) for difficulty.

If those assumptions are correct, then I think we should go with (2), since that is the least work on the spec/runtime-implementation side. If nobody chimes in with anti-(2) thoughts in the next few days, I'll merge opencontainers/runc#160 locally see whether I can get it working ;).

bundle: trade offs of schemes for bundle digest

The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.

The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.

Benefits and Cost

Let's review the features we get from digesting a container:

Provide a common digest based on the on disk container image. It should
be invariant to distribution methods. Any implementation that creates a container
distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
identifier to verify and sign.
The digest should be cryptographically secure and can be verified across
implementations. Signing the digest should be sufficient to verify that a
container root file system has not been tampered. We provide a common
base to provide pre-run verification.
Such a digest should only be used to verify after building a container
root. Such a system is not a replacement for validation of content from an
untrusted source. Ensuring trust and content integrity are left to the content
distribution system.

We need to consider the following properties of any approach to achieve these goals:

Security - Such a system needs to provide a sufficient level of security to
be useful. Content should be well-identified by its hash.
Cost - Walking a filesystem tree is slow and hashing all files is expensive
and wrecks the buffer cache. Minimizing this IO or not doing it all is
ideal. We need consider the cost against the benefits.
Stability - The digest needs to be calculated at a time when the container
layout is not changing. It also needs to be reproducible across runtime
environments.

Requirements

We can take the above to define specific requirements for the digest:

The digest will be made up of the hash of hashes of each resource in the
container.
The order of the additions to the digest should be based on the lexical sort
order of the relative container path of the resource ensuring stability under
additions and deletions.
Each resource should only be stat’ed and read once during a digesting process.
Unless specifically omitted, the digest should include the following resource types:
1. files
2. directories
3. hard links
4. soft links
5. character devices
6. block devices
7. named fifo/pipes
8. sockets
The digest of each resource must fix the following attributes:
1. File contents
2. File path relative to the container root.
3. Owners (uid or names?)
4. Groups (gid or names?)
5. File mode/permissions
6. xattr
7. major/minor device numbers for block/char devices
8. link target names for hard/soft links
The digest should be re-calculable using information about only changed
files.

The Straw Man

The specification currently proposes the following approach to provide
a common "script" location for containers to provide a digest. It is included
here for reference.

Digest

The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:

$ $CONTAINER_PATH/digest $CONTAINER_PATH

The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:

The script itself should be included in the output in some way to avoid
tampering
The output should include the content, each filesystem path relative to the
root and any other attributes that must be static for the container to
operate correctly
The output must be stable
The only constraint is that the signatures directory should be ignored to
avoid the act of signing preventing the content from being verified

The following is a naive example:

#!/usr/bin/env bash

set -e

# emit content for building a hash of the container filesystem.

content() {

    root=$1
    if [ -z "$root" ]; then
        echo "must specify root" 1>&2;
        exit 1;
    fi

    cd $root

    # emit the file paths, stat and their content hash
    find . -type f -not -path './signatures/*' -exec shasum -a256 {} \; | sort

    # emit the script itself to prevent tampering
    cat $scriptpath
}

scriptpath=$( cd $(dirname $0) ; pwd -P )/$(basename $0)

content $1 | shasum -a256

The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.

Goals

Let's use this issue to decide the following:

Do we all agree on the benefits of generating a common digest and signature scheme for containers at the runtime level?
Are there any benefits, trade offs or considerations missed above?
Should we provide algorithmic flexibility with a verified "script"
approach or should we define a very specific algorithm?

Container dependencies

At this early stage the specs say that The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable.

My interpretation of this passage is that there will be no inter-container dependencies (for Standard Containers). With this issue I would like to ask for an explanation of the current state and bring the matter up for discussion starting with my personal opinion.

I think that container dependencies is a valuable feature that can significantly increase resource efficiency for all processes/states of the container lifecycle. It should be taken into account from the earliest stage possible.

linux-config: clean-up namespaces docs

The docs don't reflect things we discussed in-person:

Adding the path field
Making the namespaces OS specific
Plumbing through the hooks to make changes to the namespace

@crosbymichael volunteered for this one I think.

Minor: Change name of github organization to "Open Container Initiative"

The github organization should be renamed "Open Container Initiative" since:

*We originally named this effort “Open Container Project,” or OCP. Given the potential for confusion with the awesome Open Compute Project,” we have renamed this the Open Container Initiative or OCI."

http://blog.docker.com/2015/07/open-container-format-progress-report/

Link to settings for maintainer convenience: https://github.com/organizations/opencontainers/settings/profile

runtime: create state directory and structure

For supporting things like hooks and updates we need a way to define runtime state of the container. This can be used to pass information to hooks so that they know the runtime state of the container and also used to dynamically make changes to resource allocations by running an update command for the runtime and have it reconcile the changes against the container's current state and the update requested.

For the hooks usecase the state should contain information such as:

pid of the container's process
paths to the namespaces (linux)
paths to the cgroups setup for the container

For supporting updates the state should container information such as:

pid of the container's process
cgroup paths
container root path
cgroup/resource information
device nodes
rlimits
sysctls
mounts

Some of this information would be duplicated from the initial container's config so it maybe worth it to look into embedding the original config into the state structure.