Coder Social home page Coder Social logo

Comments (7)

brandond avatar brandond commented on June 21, 2024 1

According to the containerd docs at https://github.com/containerd/containerd/blob/release/1.7/docs/hosts.md, all the host fields are valid at the root level:

For each registry host namespace directory in your registry config_path you may include a hosts.toml configuration file. The following root level toml fields apply to the registry host namespace:

This is what k3s generates:

root@systemd-node-1:/# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/hosts.toml
# File generated by k3s. DO NOT EDIT.

server = "https://172-17-0-7.sslip.io/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/usr/local/share/ca-certificates/registry.crt"]

However, containerd fails to load that:

time="2024-04-01T22:11:02.070675417Z" level=error msg="failed to decode hosts.toml" error="invalid `host` tree"

Apparently it goes looking for at least one host section; if it can't find one it fails to use the hosts.toml file entirely, despite the presence of valid config at the root level.

As a workaround, we can generate an empty host section; the following works properly:

root@systemd-node-1:/# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/hosts.toml
# File generated by k3s. DO NOT EDIT.

server = "https://172-17-0-7.sslip.io/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/usr/local/share/ca-certificates/registry.crt"]

[host]

I can address this in the next release. In the mean time, if you do not currently specify a port in your registry namespace, you should be able to work around the issue with something like this in your registries.yaml:

mirrors:
 172-17-0-7.sslip.io:
   endpoint:
     - https://172-17-0-7.sslip.io:443
configs:
 "172-17-0-7.sslip.io:443":
   tls:
     ca_file: /usr/local/share/ca-certificates/registry.crt

Note use of a port in the endpoint to force it to generate a host entry in the hosts.toml.

from k3s.

brandond avatar brandond commented on June 21, 2024

Can you confirm that you are not using a custom containerd config template? Can you provide the output of find /var/lib/rancher/k3s/agent/etc/containerd/ -type f -print -exec cat {} \; along with containerd.log showing the failed pull?

from k3s.

intrand avatar intrand commented on June 21, 2024

Can you confirm that you are not using a custom containerd config template? Can you provide the output of find /var/lib/rancher/k3s/agent/etc/containerd/ -type f -print -exec cat {} \; along with containerd.log showing the failed pull?

I have not touched the template at all. I also inspected the containerd toml and compared everything that seemed relevant to a backup from an earlier version and everything was identical.

I do not have the containerd log anymore. Are you unable to reproduce this behavior in 1.29.3+k3s1? 🤔 If absolutely need be I can destroy my cluster and build from scratch, but that should be the last resort.

EDIT: the cluster is up and running on 1.29.2+k3s1 with traffic going to/from. It's disruptive for me to test this on the same metal. I can try on another machine, but so can anyone :) it would be nice to see if anyone else can reproduce this

from k3s.

brandond avatar brandond commented on June 21, 2024

from k3s.

intrand avatar intrand commented on June 21, 2024

Thank you very much for going through the work to reproduce this, @brandond!

from k3s.

brandond avatar brandond commented on June 21, 2024

Using 172-17-0-7.sslip.io as an example registry, the two possible work-arounds are:

  1. If your registry namespace does not currently include a port, configure a mirror endpoint with a port:
    mirrors:
      172-17-0-7.sslip.io:
        endpoint:
          - https://172-17-0-7.sslip.io:443
    configs:
      "172-17-0-7.sslip.io:443":
        tls:
          ca_file: /usr/local/share/ca-certificates/registry.crt
  2. Manually drop the CA certificate into the registry namespace's configuration directory, and make it immutable so that k3s does not remove it when restarting:
    mkdir -p /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/
    cp /usr/local/share/ca-certificates/registry.crt /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt
    chattr +i /var/lib/rancher/k3s/agent/etc/containerd/certs.d/172-17-0-7.sslip.io/ca.crt

from k3s.

aganesh-suse avatar aganesh-suse commented on June 21, 2024

Validated on master branch with version v1.29.4-rc1+k3s1

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

registries.yaml:

 $ sudo cat /etc/rancher/k3s/registries.yaml
mirrors:
  pvt-registry.com:
    endpoint:
      - pvt-registry.com
  docker.io:
    endpoint:
      - pvt-registry.com      
  k8s.gcr.io:
    endpoint:
      - pvt-registry.com      
configs:
  pvt-registry.com:
    auth:
      username: xxxx
      password: xxxx
    tls:
      ca_file: /home/user/ca.pem

test-image.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: pvt-reg-test
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pvt-reg-test
  namespace: pvt-reg-test
spec:
  selector:
    matchLabels:
      k8s-app: nginx-app-clusterip
  replicas: 2
  template:
    metadata:
      labels:
        k8s-app: nginx-app-clusterip
    spec:
      containers:
      - name: nginx
        image: pvt-registry.com/nginx:latest
        ports:
        - containerPort: 8080

Testing Steps

  1. Copy config.yaml and registries.yaml
$ sudo mkdir -p /etc/rancher/k3s 
$ sudo cp config.yaml /etc/rancher/k3s
$ sudo cp registries.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.29.4-rc1+k3s1' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Push an image onto the private registry and try to deploy a pod with said image.
    The image should get pulled and pod should come up without any tls certificate errors.
$ kubectl apply -f test-image.yaml
$ kubectl get pods -n pvt-reg-test
$ kubectl describe pod/pvt-reg-test-abcd -n pvt-reg-test
  1. Check the hosts.toml files for host section

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.29.3+k3s1 (8aecc26b)
go version go1.21.8
$ kubectl get pods -A
kube-system      coredns-6799fbcd5-p7pkw                   1/1     Running            0          4m38s
kube-system      helm-install-traefik-9v8gb                0/1     Completed          1          4m38s
kube-system      helm-install-traefik-crd-5n2cw            0/1     Completed          0          4m38s
kube-system      local-path-provisioner-6c86858495-gps56   1/1     Running            0          4m38s
kube-system      metrics-server-54fd9b65b-mtzk5            1/1     Running            0          4m38s
kube-system      svclb-traefik-44e43501-4kkng              2/2     Running            0          3m26s
kube-system      svclb-traefik-44e43501-hd2qx              2/2     Running            0          4m16s
kube-system      svclb-traefik-44e43501-rx2pt              2/2     Running            0          2m37s
kube-system      svclb-traefik-44e43501-smtfd              2/2     Running            0          4m16s
kube-system      traefik-f4564c4f4-2t2l8                   1/1     Running            0          4m17s
pvt-reg-test     pvt-reg-test-64bc967f8b-6j8jk             0/1     ImagePullBackOff   0          28s
pvt-reg-test     pvt-reg-test-64bc967f8b-sgxg9             0/1     ErrImagePull       0          28s

Pod Events:

Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  7m39s                   default-scheduler  Successfully assigned pvt-reg-test/pvt-reg-test-64bc967f8b-sgxg9 to ip-172-31-16-132
  Normal   Pulling    6m6s (x4 over 7m38s)    kubelet            Pulling image "pvt-registry.com/nginx:latest"
  Warning  Failed     6m6s (x4 over 7m38s)    kubelet            Failed to pull image "pvt-registry.com/nginx:latest": failed to pull and unpack image "pvt-registry.com/nginx:latest": failed to resolve reference "pvt-registry.com/nginx:latest": failed to do request: Head "https://pvt-registry.com/v2/nginx/manifests/latest": tls: failed to verify certificate: x509: certificate signed by unknown authority
  Warning  Failed     6m6s (x4 over 7m38s)    kubelet            Error: ErrImagePull
  Warning  Failed     5m54s (x6 over 7m38s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m27s (x21 over 7m38s)  kubelet            Back-off pulling image "pvt-registry.com/nginx:latest"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.29.4-rc1+k3s1 (d973fadb)
go version go1.21.9
$ kubectl get pods -A
NAMESPACE        NAME                                      READY   STATUS            RESTARTS   AGE
kube-system      coredns-6799fbcd5-ccwrw                   1/1     Running           0          4m42s
kube-system      helm-install-traefik-667w4                0/1     Completed         1          4m43s
kube-system      helm-install-traefik-crd-2nq47            0/1     Completed         0          4m43s
kube-system      local-path-provisioner-6c86858495-dvwzt   1/1     Running           0          4m42s
kube-system      metrics-server-54fd9b65b-nkzds            1/1     Running           0          4m42s
kube-system      svclb-traefik-045f5f22-9cdff              2/2     Running           0          4m27s
kube-system      svclb-traefik-045f5f22-dnvkt              2/2     Running           0          4m27s
kube-system      svclb-traefik-045f5f22-jwx2j              2/2     Running           0          3m27s
kube-system      svclb-traefik-045f5f22-rmx7m              2/2     Running           0          2m37s
kube-system      traefik-7d5f6474df-26pw8                  1/1     Running           0          4m27s
pvt-reg-test     pvt-reg-test-66cb57586c-7ckvp             1/1     Running           0          28s
pvt-reg-test     pvt-reg-test-66cb57586c-f88jb             1/1     Running           0          28s

Check the hosts.toml for host section:

 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/pvt-registry.com/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://pvt-registry.com/v2"
capabilities = ["pull", "resolve", "push"]

ca = ["/home/ubuntu/ca.pem"]


[host]
 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/docker.io/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://registry-1.docker.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

 $ sudo cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/k8s.gcr.io/hosts.toml 
# File generated by k3s. DO NOT EDIT.

server = "https://k8s.gcr.io/v2"
capabilities = ["pull", "resolve", "push"]


[host]
[host."https://pvt-registry.com/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/home/ubuntu/ca.pem"]

from k3s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.