projectatomic / atomic-system-containers Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 56.0 294 KB

Collection of system containers images

Shell 50.72% Python 49.28%

atomic-system-containers's People

Contributors

Stargazers

Watchers

Forkers

yuqi-zhang jasonbrooks nzwulfin rhatdan giuseppe trishnaguha chuanchang dustymabe baude vinzenz dav1x ashcrow gbraad jwhonce strigazi matmaul eliskasl jlebon tomastomecek mikalv wouterhummelink rochaporto krsacme runcom ufojun cleverlzc jridky peterbaouoft pombredanne bluemutedwisdom mkwm cwoac clcollins alexxnica kryndex vrutkovs colemickens fengyunpan2 jordanferenz openstacker xenithorb shawn111 th3architect albinodrought viatoro 831jsh korczis kiwicloudmaster martink07 mesosinfo dcode masto devopstoday11 vardaan-raj icsy7867

atomic-system-containers's Issues

Add jsonbrooks

I'd like to propose we add @jasonbrooks to this repo. He maintains a few containers, takes and fixes issues as well.

source for registry.centos.org/projectatomic/cri-o:latest

Where is the source for registry.centos.org/projectatomic/cri-o:latest, for example?

kubeadm & weave: No networks found in /etc/cni/net.d

Hi,

i'm trying to setup k8s 1.8.x on fedora host 27 with the kubeadm system container:

setenforce 0
atomic install --system --system-package=no --name kubelet jasonbrooks/kubeadm
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/kubelet.conf
systemctl enable kubelet
kubeadm init --skip-preflight-checks

export kubever=$(kubectl version | base64 | tr -d '\n')
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

After that i'm seeing the following error messages:

Nov 20 11:47:43 fedora-atomic-host runc[3522]: W1120 10:47:43.656415    3532 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 20 11:47:43 fedora-atomic-host runc[3522]: E1120 10:47:43.656627    3532 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Nov 20 11:47:48 fedora-atomic-host runc[3522]: W1120 10:47:48.658418    3532 cni.go:196] Unable to update cni config: No networks found in /etc/cni/net.d
Nov 20 11:47:48 fedora-atomic-host runc[3522]: E1120 10:47:48.658588    3532 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Nov 20 11:47:48 fedora-atomic-host dockerd-current[1294]: 2017-11-20 10:47:48.904677 W | etcdserver: apply entries took too long [10.1646ms for 1 entries]
Nov 20 11:47:48 fedora-atomic-host dockerd-current[1294]: 2017-11-20 10:47:48.904714 W | etcdserver: avoid queries with large range/delete range!

Kubectl shows me that the "po/kube-controller-manager-fedora-atomic-host" is pending in the "ContainerCreating" state.

Does anybody have an idea whats wrong?

Runtime spec changed, config.json files are not usable anymore by runc

I tried to use the ovirt-guest-agent-centos system container (thanks @giuseppe) . Install works fine (followed https://www.projectatomic.io/blog/2016/09/intro-to-system-containers/)
But the agent failed to run. When I try to run it by hand to know why, I've an json: cannot unmarshal array into Go struct field Process.capabilities of type specs.LinuxCapabilities error from runc.

From https://github.com/opencontainers/runtime-spec/blob/master/config.md, the runtime spec changed and capabilities have to be like

"capabilities": {
        "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
        ],
        "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
        ],
        "inheritable": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
        ],
        "permitted": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
        ],
        "ambient": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
        ]
},

Tested on CentOS Atomic host:

bash-4.2# atomic host status
State: idle
Deployments:
● centos-atomic-host:centos-atomic-host/7/x86_64/standard
                Version: 7.1708 (2017-09-15 15:32:30)
                 Commit: 33b4f0442242a06096ffeffadcd9655905a41fbd11f36cd6f33ee0d974fdb2a8
           GPGSignature: 1 signature
                         Signature made ven. 15 sept. 2017 19:17:39 CEST using RSA key ID F17E745691BA8335
                         Good signature from "CentOS Atomic SIG <[email protected]>"

  centos-atomic-host:centos-atomic-host/7/x86_64/standard
                Version: 7.1705.1 (2017-06-20 18:46:11)
                 Commit: 550d13d3e1fda491afab6d368adc13e307512ea51734055d83f0cc4d9049e91d
           GPGSignature: 1 signature
                         Signature made mar. 20 juin 2017 20:47:45 CEST using RSA key ID F17E745691BA8335
                         Good signature from "CentOS Atomic SIG <[email protected]>"
bash-4.2# cat /etc/systemd/system/ovirt-guest-agent.service 
[Unit]
Description=oVirt Guest Agent Container

[Service]
ExecStart=/bin/runc --systemd-cgroup run 'ovirt-guest-agent'
ExecStop=/bin/runc --systemd-cgroup kill 'ovirt-guest-agent'
Restart=on-failure
WorkingDirectory=/sysroot/ostree/deploy/centos-atomic-host/var/lib/containers/atomic/ovirt-guest-agent.0

[Install]
WantedBy=multi-user.target


bash-4.2# cd /sysroot/ostree/deploy/centos-atomic-host/var/lib/containers/atomic/ovirt-guest-agent.0
bash-4.2# /bin/runc --systemd-cgroup run 'ovirt-guest-agent'                                                                                                                                                                                  
json: cannot unmarshal array into Go struct field Process.capabilities of type specs.LinuxCapabilities
bash-4.2# runc --version
runc version 1.0.0-rc3
commit: c1e53b52d62435f0114b98117c5957d182cdd67f-dirty
spec: 1.0.0-rc5

atomic uninstall flannel stops docker service

Running atomic uninstall flannel stops the docker service. The expected result is that docker would be restarted after the uninstall is completed.

$ sudo atomic install --system --name=flannel gscrivano/flannel
Extracting to /var/lib/containers/atomic/flannel.0
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/flannel.conf
systemctl enable flannel
$ sudo systemctl start flannel
$ sudo systemctl status flannel
● flannel.service - Flanneld overlay address etcd agent
Loaded: loaded (/etc/systemd/system/flannel.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-01-09 16:55:04 UTC; 3s ago
Process: 16527 ExecStartPost=/usr/bin/sh -c while test ! -s /run/flannel/docker; do sleep 0.1; done (code=exited, status=0/SUCCESS)
Main PID: 16526 (runc)
CGroup: /system.slice/flannel.service
└─16526 /bin/runc run flannel

Jan 09 16:55:04 cloud-test-8.localdomain systemd[1]: Starting Flanneld overlay address etcd agent...
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.725491 16542 main.go:275] Installing si...lers
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.725742 16542 main.go:130] Determining I...face
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.727324 16542 main.go:188] Using 192.168...face
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.727348 16542 main.go:189] Using 192.168...oint
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.730836 16542 etcd.go:129] Found lease (...sing
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.731628 16542 etcd.go:84] Subnet lease a...0/24
Jan 09 16:55:04 cloud-test-8.localdomain runc[16526]: I0109 16:55:04.734812 16542 udp.go:222] Watching for n...ases
Jan 09 16:55:04 cloud-test-8.localdomain systemd[1]: Started Flanneld overlay address etcd agent.
Hint: Some lines were ellipsized, use -l to show in full.
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-01-09 16:54:39 UTC; 33s ago
Docs: http://docs.docker.com
Main PID: 16392 (docker-current)
CGroup: /system.slice/docker.service
└─16392 /usr/bin/docker-current daemon --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-d...

Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364785435Z" level=inf...t."
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364836940Z" level=inf...e."
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364851837Z" level=inf...on"
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364868484Z" level=inf...0.3
Jan 09 16:54:39 cloud-test-8.localdomain systemd[1]: Started Docker Application Container Engine.
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.372550057Z" level=inf...ck"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.510647173Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.531907643Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.574968422Z" level=inf...0}"
Jan 09 16:54:59 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:59.318968814Z" level=err...el"
Warning: docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Hint: Some lines were ellipsized, use -l to show in full.
$ sudo systemctl daemon-reload
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─flannel.conf
Active: active (running) since Mon 2017-01-09 16:54:39 UTC; 42s ago
Docs: http://docs.docker.com
Main PID: 16392 (docker-current)
CGroup: /system.slice/docker.service
└─16392 /usr/bin/docker-current daemon --exec-opt native.cgroupdriver=systemd --selinux-enabled --log-d...

Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364785435Z" level=inf...t."
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364836940Z" level=inf...e."
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364851837Z" level=inf...on"
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364868484Z" level=inf...0.3
Jan 09 16:54:39 cloud-test-8.localdomain systemd[1]: Started Docker Application Container Engine.
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.372550057Z" level=inf...ck"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.510647173Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.531907643Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.574968422Z" level=inf...0}"
Jan 09 16:54:59 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:59.318968814Z" level=err...el"
Hint: Some lines were ellipsized, use -l to show in full.
$ sudo atomic uninstall flannel
systemctl stop flannel
systemctl disable flannel
systemd-tmpfiles --remove /etc/tmpfiles.d/flannel.conf
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2017-01-09 16:55:29 UTC; 3s ago
Docs: http://docs.docker.com
Main PID: 16392 (code=exited, status=0/SUCCESS)

Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.364868484Z" level=inf...0.3
Jan 09 16:54:39 cloud-test-8.localdomain systemd[1]: Started Docker Application Container Engine.
Jan 09 16:54:39 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:39.372550057Z" level=inf...ck"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.510647173Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.531907643Z" level=err...el"
Jan 09 16:54:50 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:50.574968422Z" level=inf...0}"
Jan 09 16:54:59 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:54:59.318968814Z" level=err...el"
Jan 09 16:55:29 cloud-test-8.localdomain systemd[1]: Stopping Docker Application Container Engine...
Jan 09 16:55:29 cloud-test-8.localdomain docker-current[16392]: time="2017-01-09T16:55:29.578279483Z" level=inf...d'"
Jan 09 16:55:29 cloud-test-8.localdomain systemd[1]: Stopped Docker Application Container Engine.
Hint: Some lines were ellipsized, use -l to show in full.

flannel container not working w/ vxlan backend

I'm able to get the flannel system container working w/ udp, but not with the vxlan backend. Perhaps we need something added to the config.json.template for this?

Here's what I'm doing:

node 1:

node 1:

# atomic install --system gscrivano/etcd
# atomic install --system gscrivano/flannel
# systemctl start etcd
# runc exec etcd etcdctl set /atomic.io/network/config '{"Network":"10.40.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan" } }'
# systemctl start flannel

node 2:

# atomic install --system --set FLANNELD_ETCD_ENDPOINTS=http://10.10.171.92:2379 gscrivano/flannel
# systemctl start flannel

Then I try to ping the flannel0 of node 1 from node 2, or vice versa, and I can't ping. With udp backend, it works.

cc @giuseppe

kube-dns

I am using Fedora-Atomic-25 image for deploying kubernetes through OpenStack Magnum. It doesn't have kube-dns running by default which is like a default feature now for any kubernetes cluster, afaik.

I noticed that Fedora-Atomic-25 runs Kubernetes version 1.5 which might be a little older as compared to the recent versions.

How do I run kube-dns then, is there a newer image which runs kubernetes 1.7+ ?

system container is not multi-arch aware

For ppc64le, I have to set ETCD_UNSUPPORTED_ARCH=ppc64le during atomic install and set arch = ppc64le in config.json to start the daemon.

Services fail bc. service script is lacking executable bit

The following scripts lack the x-bit (mode 644):

/var/lib/containers/atomic/kube-scheduler/rootfs/usr/bin/kube-scheduler-docker.sh
/var/lib/containers/atomic/kube-controller-manager/rootfs/usr/bin/kube-controller-manager-docker.sh
/var/lib/containers/atomic/kubelet/rootfs/usr/bin/kubelet-docker.sh
/var/lib/containers/atomic/kube-proxy/rootfs/usr/bin/kube-proxy-docker.sh

Images

Images on master:

sudo atomic images list --filter type=ostree

   REPOSITORY                                                     TAG      IMAGE ID       CREATED            VIRTUAL SIZE   TYPE
   registry.fedoraproject.org/f27/kubernetes-proxy                latest   68406693c322   2017-12-10 17:30   237.16 MB      ostree
   registry.fedoraproject.org/f27/kubernetes-kubelet              latest   7c77c5914213   2017-12-10 17:30   241.94 MB      ostree
>  registry.fedoraproject.org/f27/kubernetes-controller-manager   latest   887559c1573c   2017-12-10 17:28   205.33 MB      ostree
>  registry.fedoraproject.org/f27/etcd                            latest   a370ba01d9c3   2017-12-09 11:20   110.78 MB      ostree

Images on the node:

sudo atomic images list --filter type=ostree

   REPOSITORY                                          TAG      IMAGE ID       CREATED            VIRTUAL SIZE   TYPE
>  registry.fedoraproject.org/f27/kubernetes-proxy     latest   68406693c322   2017-12-10 17:26   237.16 MB      ostree
>  registry.fedoraproject.org/f27/kubernetes-kubelet   latest   7c77c5914213   2017-12-10 17:25   241.94 MB      ostree

Lifestyle -> lifecycle

atomic-system-containers/README.md

Line 30 in 0a8c94b

    
           * [systemd](https://github.com/systemd/systemd) manages the lifestyle of the container.

- * [systemd](https://github.com/systemd/systemd) manages the lifestyle of the container.
+ * [systemd](https://github.com/systemd/systemd) manages the lifecycle of the container.

kubeadm init ends with modprobe error

i try to setup kubernetes cluster on atomic fedora. I tried atomic fedora 27, 28 and 29 but I am facing same error. Running kubeadm init ends with following error:

failed to parse kernel config: unable to load kernel module "configs": output - "modprobe: FATAL: Module configs not found in directory /lib/modules/4.19.5-200.fc28.x86_64\n", err - exit status

I have tried to install the kernel-headers and kernel-devel, dracut regenerating, but nothing solved the issue. Do you have any idea how to solve it? And more important qustion, does someone have any idea what is causing the problem and what I am doing wrong?

Is this caused by the operating system or rather than misconfigured kubernetes? I just tried to follow the atomic docs.

kubelet syscontainer can't get the /dev/disk of host

My openstack env have not metadata service, so I can't get vm metadata by metadata service.
When I start kubelet, I get he following error:
1 openstack_instances.go:39] openstack.Instances() called
3 09:31:34.882593 1 openstack_instances.go:46] Claiming to support Instances
3 09:31:34.890697 1 metadata.go:151] Attempting to fetch metadata from http://169.254.169.254/openstack/2012-08-10/meta_data.json
r setting the external host value: failed to get NodeName from "openstack" cloud provider: unexpected status code when reading metadata from http://169.254.169.254/openstack/2012-08-10/meta_

the reason is:
kubernetes openstack cloud provider can't get the vm metadata from '/dev/disk', then try to use metadata service.

fix:
Bind /dev/disk for kubernete service.

flannel service shows 'failed' after stopping

For the flannel system container, the service is working fine, but shows 'failed' after running systemctl stop flannel despite there being no visible error:

● flannel.service - Flanneld overlay address etcd agent
Loaded: loaded (/etc/systemd/system/flannel.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-01-18 17:49:51 UTC; 2s ago
Process: 13634 ExecStopPost=/bin/rm /etc/systemd/system/docker.service.d/flannel.conf (code=exited, status=0/SUCCESS)
Process: 13625 ExecStop=/bin/runc kill flannel (code=exited, status=0/SUCCESS)
Process: 12781 ExecStartPost=/usr/bin/sh -c while test \! -s /run/flannel/docker; do sleep 0.1; done (code=exited, status=0/SUCCESS)
Process: 12780 ExecStart=/bin/runc run flannel (code=exited, status=143) Main PID: 12780 (code=exited, status=143)

The service itself can be re-started with systemctl start flannel and there are no errors. I tried handling SIGTERM and there seems to be no difference. This is replicable in both F25/RHELAH.

kube-proxy fails to expose NodePort because of r/o filesystem

The containerised kube-proxy fails to expose services with NodePort because it cannot lock /run/xtables.lock (open /run/xtables.lock: read-only file system).
.

Version used

sudo atomic images list
...
>  registry.fedoraproject.org/f27/kubernetes-proxy     latest   68406693c322   2017-12-10 17:26   237.16 MB      ostree

Service definition

Given the following yaml:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.8
        ports:
        - containerPort: 80

---
kind: Service
apiVersion: v1
metadata:
  name: my-nginx
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80

kubectl

kubectl get pods

NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment-3718365652-1ht6q   1/1       Running   1          1h
nginx-deployment-3718365652-9g2jq   1/1       Running   1          1h

kubectl get service my-nginx

NAME       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
my-nginx   NodePort   10.254.106.83   <none>        80:32315/TCP   1h

Expected behaviour

curl http://172.20.61.51:32315 should return the nginx page.

Observed behaviour

The port is not exposed.

sudo netstat -tulpen | grep proxy

tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      994        25072      797/kube-proxy
tcp6       0      0 :::10256                :::*                    LISTEN      994        24025      797/kube-proxy

Although I can connect to the ports of the container:

curl http://127.0.0.1:10249/

404 page not found

journaltctl -xe -u kube-proxy.service returns the following errors:

Failed to start in resource-only container "/kube-proxy": mkdir /sys/fs/cgroup/cpuset/kube-proxy: read-only file system
...
Failed to execute iptables-restore: failed to open iptables lock /run/xtables.lock: open /run/xtables.lock: read-only file system

# the last line is repeated every 30 seconds

Repository Usage Guidance?

Are these packages published somewhere? Are they the source for the images hosted at registry.fedoraproject.org?

I'd like to be able to utilize the very latest available kubelet package (preferably even beta releases, but I haven't looked into building a custom RPM yet). I suppose if nothing else I can build my own kubernetes-kubelet from here, push it to docker and install via atomic install.

Issues running kubernetes on Fedora Atomic Host

All steps are done according to http://www.projectatomic.io/docs/gettingstarted/ and http://www.projectatomic.io/blog/2017/11/migrating-kubernetes-on-fedora-atomic-host-27/

kubernetes-apiserver

Initial values:

[root@atomic-node1 easyrsa3]# atomic host status
State: idle
Deployments:
● fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.25 (2017-12-10 18:40:57)
                    Commit: a2b80278eea897eb1fec7d008b18ef74941ff5a54f86b447a2f4da0451c4291a
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4

  fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.16 (2017-11-28 23:08:35)
                    Commit: 86727cdbc928b7f7dd0e32f62d3b973a8395d61e0ff751cfea7cc0bc5222142f
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4

Generate server certificates:

[root@atomic-node1 ~]# curl -L -O https://storage.googleapis.com/kubernetes-release/easy-rsa/easy-rsa.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43444  100 43444    0     0  43444      0  0:00:01 --:--:--  0:00:01 86714
[root@atomic-node1 ~]# tar xzf easy-rsa.tar.gz
[root@atomic-node1 ~]# cd easy-rsa-master/easyrsa3
[root@atomic-node1 easyrsa3]# ./easyrsa init-pki

init-pki complete; you may now create a CA or requests.
Your newly created PKI dir is: /root/easy-rsa-master/easyrsa3/pki

[root@atomic-node1 easyrsa3]# MASTER_IP=10.0.1.4
[root@atomic-node1 easyrsa3]#
[root@atomic-node1 easyrsa3]# ./easyrsa --batch "--req-cn=${MASTER_IP}@`date +%s`" build-ca nopass
Generating a 2048 bit RSA private key
...............................................................................+++
..+++
writing new private key to '/root/easy-rsa-master/easyrsa3/pki/private/ca.key'
-----
[root@atomic-node1 easyrsa3]# ./easyrsa --subject-alt-name="IP:${MASTER_IP}" build-server-full server nopass
Generating a 2048 bit RSA private key
...........................+++
................................................................................................................................................+++
writing new private key to '/root/easy-rsa-master/easyrsa3/pki/private/server.key'
-----
Using configuration from /root/easy-rsa-master/easyrsa3/openssl-1.0.cnf
Can't open /root/easy-rsa-master/easyrsa3/pki/index.txt.attr for reading, No such file or directory
139822177810240:error:02001002:system library:fopen:No such file or directory:crypto/bio/bss_file.c:74:fopen('/root/easy-rsa-master/easyrsa3/pki/index.txt.attr','r')
139822177810240:error:2006D080:BIO routines:BIO_new_file:no such file:crypto/bio/bss_file.c:81:
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
commonName            :ASN.1 12:'server'
Certificate is to be certified until Dec 18 07:17:26 2027 GMT (3650 days)

Write out database with 1 new entries
Data Base Updated

Install kubernetes-apiserver:

[root@atomic-node1 easyrsa3]# atomic install --system --system-package=no --name kube-apiserver registry.fedoraproject.org/f27/kubernetes-apiserver

Note: Switching from the 'docker' backend to the 'ostree' backend based on the 'atomic.type' label in the image.  You can use --storage to override this behaviour.

Getting image source signatures
Skipping fetch of repeat blob sha256:04331e646521ddb577d113f3c103aef620cc4451641452c347864298669f8572
Copying blob sha256:5393e05eae68f3221008c7aa9e7677721304f64bca97c10dcf54b8a0fb7efa55
 109.75 MB / 109.75 MB [====================================================] 2s
Copying blob sha256:a53a9d8d02712e289074cf465b2d7dbe4d5620269cffa78215da2625871a7ba4
 52.08 MB / 52.08 MB [======================================================] 1s
Copying config sha256:de8dfe370000134bd1154351a6a97807195be57cdceff27c3e83b1607c8fab66
 2.98 KB / 2.98 KB [========================================================] 0s
Writing manifest to image destination
Storing signatures
Extracting to /var/lib/containers/atomic/kube-apiserver.0
Created file /etc/kubernetes/apiserver
Created file /etc/kubernetes/config
Created file /usr/local/bin/kubectl
systemctl daemon-reload
systemd-tmpfiles --create /etc/tmpfiles.d/kube-apiserver.conf
systemctl enable kube-apiserver

Note the permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# ls -al /etc | grep kube
drwx------.  2 root root       37 Dec 20 08:21 kubernetes

Copy certificates to /etc/kubernetes/certs

[root@atomic-node1 easyrsa3]# mkdir /etc/kubernetes/certs
[root@atomic-node1 easyrsa3]# for i in {pki/ca.crt,pki/issued/server.crt,pki/private/server.key}; do cp $i /etc/kubernetes/certs; done
[root@atomic-node1 easyrsa3]# chown -R kube:kube /etc/kubernetes/certs

Add KUBE_API_ARGS to /etc/kubernetes/apiserver

KUBE_API_ARGS="--tls-cert-file=/etc/kubernetes/certs/server.crt --tls-private-key-file=/etc/kubernetes/certs/server.key --client-ca-file=/etc/kubernetes/certs/ca.crt --service-account-key-file=/etc/kubernetes/certs/server.crt"

Run kube-apiserver

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

See it fails

[root@atomic-node1 easyrsa3]# systemctl status kube-apiserver
● kube-apiserver.service - kubernetes-apiserver
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-12-20 08:28:44 CET; 56s ago
  Process: 1777 ExecStart=/bin/runc --systemd-cgroup run kube-apiserver (code=exited, status=1/FAILURE)
 Main PID: 1777 (code=exited, status=1/FAILURE)
      CPU: 37ms

Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:28:44 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Start request repeated too quickly.
Dec 20 08:28:44 atomic-node1.local systemd[1]: Failed to start kubernetes-apiserver.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.

Check journalctl

Dec 20 08:28:43 atomic-node1.local runc[1777]: container_linux.go:274: starting container process caused "exec: \"/usr/bin/kube-apiserver-docker.sh\": permission denied"
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:28:43 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:28:43 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:28:44 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:28:44 atomic-node1.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termina
Dec 20 08:28:44 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:28:44 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix permissions of /usr/bin/kube-apiserver-docker.sh in container rootfs

[root@atomic-node1 easyrsa3]# chmod +x /var/lib/containers/atomic/kube-apiserver/rootfs/usr/bin/kube-apiserver-docker.sh

Run kube-apiserver

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

See it fails. And check journalctl

Dec 20 08:33:14 atomic-node1.local runc[1924]: I1220 07:33:14.009558       1 server.go:112] Version: v1.7.3
Dec 20 08:33:14 atomic-node1.local runc[1924]: W1220 07:33:14.009940       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:33:14 atomic-node1.local runc[1924]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:33:14 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:33:14 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:33:14 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# chmod +x /etc/kubernetes

Run kube-apiserver and see it fails

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

Dec 20 08:34:46 atomic-node1.local runc[2049]: I1220 07:34:46.266909       1 server.go:112] Version: v1.7.3
Dec 20 08:34:46 atomic-node1.local runc[2049]: W1220 07:34:46.267281       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:34:46 atomic-node1.local runc[2049]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:34:46 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:34:46 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:34:46 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Fix uid and gid in /var/lib/containers/atomic/kube-apiserver/config.json

        "user": {
            "uid": 996,
            "gid": 994
        },

Run kube-apiserver and see it works

[root@atomic-node1 easyrsa3]# systemctl start kube-apiserver

[root@atomic-node1 easyrsa3]# systemctl status kube-apiserver
● kube-apiserver.service - kubernetes-apiserver
   Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-12-20 08:37:17 CET; 58s ago
 Main PID: 2200 (runc)
    Tasks: 8 (limit: 4915)
   Memory: 6.1M
      CPU: 32ms
   CGroup: /system.slice/kube-apiserver.service
           └─2200 /bin/runc --systemd-cgroup run kube-apiserver

Change permissions of /etc/kubernetes

[root@atomic-node1 easyrsa3]# chmod -x /etc/kubernetes

Restart kube-apiserver and check journalctl

Dec 20 08:39:40 atomic-node1.local runc[2462]: I1220 07:39:40.765857       1 server.go:112] Version: v1.7.3
Dec 20 08:39:40 atomic-node1.local runc[2462]: W1220 07:39:40.766524       1 authentication.go:368] AnonymousAuth is not allowed with the AllowAll authorizer.  Resetting AnonymousAuth to false. You should use a different authorizer
Dec 20 08:39:40 atomic-node1.local runc[2462]: unable to load server certificate: open /etc/kubernetes/certs/server.crt: permission denied
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Unit entered failed state.
Dec 20 08:39:40 atomic-node1.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=kube-apiserver comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal
Dec 20 08:39:40 atomic-node1.local systemd[1]: kube-apiserver.service: Failed with result 'exit-code'.
Dec 20 08:39:41 atomic-node1.local systemd[1]: kube-apiserver.service: Service hold-off time over, scheduling restart.
Dec 20 08:39:41 atomic-node1.local systemd[1]: Stopped kubernetes-apiserver.

Conclusions:

/usr/bin/kube-apiserver-docker.sh inside kube-apiserver container must have +x permissions (applies to all other components of kubernetes)
/etc/kubernetes on host machine must have +x permissions
2.1 After reboot /etc/kubernetes permissions should not be reset to 700

Issues regarding system containers not using runc

Some considerations came up from the azure container, which uses systemd but not runc. If we run this container from the atomic CLI, there are a few issues I noticed:

We generate a default config.json for a container image missing a config.json.template. For containers that don't use runc, should we somehow check that and not generate that file?
In follow up to 1, I recall that we use the file at /run/runc/container/state.json for some things before, would that be affected?
In the info file, we still have:
"EXEC_START": "/bin/runc run 'azure'"
"EXEC_STOP": "/bin/runc kill 'azure'"
which doesn't match up to the service file

Perhaps I'm misunderstanding the method of running said containers and we shouldn't be using the atomic CLI. What do you think @giuseppe

atomic uninstall kubelet does not remove flannel, etc.

To reproduce:

install kubelet/kubeadm as system containers per Jbrooks' instructions
kubeadm init
add flannel as the pod network
atomic unininstal kubelet

Result: kubelet system containers go away, but the kubernetes containers running in docker are still there and still running (flannel, kube-proxy, apiserver, etc.). Further, these will cause issues if the user attempts to re-install containerized kube on the same system. Restarting docker gets rid of them.

This is not a major issue, obviously. However, it would be great if there was some way to give users a clean uninstall experience here -- particularly if we can, it makes system containers clearly superior to other methods of installing/removing kubeadm.

kubernetes-apiserver needs a writable directory for certificates

If the user doesn't specify any certificates, the apiserver will try to create those in /var/run/kubernetes or in the directory that is specified with --cert-dir. Currently, all dirs mounted in the apiserver container are read-only.

So, we need to mount /var/run/kubernetes as rw