Coder Social home page Coder Social logo

Comments (15)

jasonbrooks avatar jasonbrooks commented on September 28, 2024 2

I'm looking into this

from atomic-system-containers.

neuhalje avatar neuhalje commented on September 28, 2024 1

@jasonbrooks Aligning installation and configuration with other projects is a good idea.

I will close the issue because with the updated containers it very likely is a layer 8 problem on my side. Thank you for looking into this!

from atomic-system-containers.

ashcrow avatar ashcrow commented on September 28, 2024

Thanks for the report @neuhalje! It looks like this is due to the latest version not being available in Fedora as /run should be mounted in from the system: https://github.com/projectatomic/atomic-system-containers/blob/master/kubernetes-proxy/config.json.template#L324-L334

@jasonbrooks can you push through an update?

from atomic-system-containers.

ashcrow avatar ashcrow commented on September 28, 2024

Related: https://pagure.io/releng/issue/7217

from atomic-system-containers.

neuhalje avatar neuhalje commented on September 28, 2024

Update

I upgraded the system & containers:

  • I cannot access it from any other system (no reply to SYN packets).
  • I can access the service from my (single) node
  • the message has changed (slightly) but the service still logs an error (Failed to start in resource-only container "/kube-proxy": mkdir /sys/fs/cgroup/cpuset/kube-proxy: read-only file system).

Status

Installed Versions

sudo atomic images list | grep proxy
>  registry.fedoraproject.org/f27/kubernetes-proxy     latest   4660f3d3b9a3   2018-01-13 10:13   262.53 MB      ostree

Log

journalctl -xe -u kube-proxy.service
...
-- Unit kube-proxy.service has finished starting up.
--
-- The start-up result is done.
Jan 13 10:11:54 node-1.[redacted] runc[772]: 2018-01-13 10:11:54.524258 I | proto: duplicate proto type registered: google.protobuf.Any
Jan 13 10:11:54 node-1.[redacted] runc[772]: 2018-01-13 10:11:54.537896 I | proto: duplicate proto type registered: google.protobuf.Duration
Jan 13 10:11:54 node-1.[redacted] runc[772]: 2018-01-13 10:11:54.538171 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Jan 13 10:11:54 node-1.[redacted] runc[772]: W0113 10:11:54.934872       1 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are deprecated. Please begin using a config file ASAP.
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.133819       1 server.go:478] Using iptables Proxier.
Jan 13 10:11:55 node-1.[redacted] runc[772]: W0113 10:11:55.155968       1 proxier.go:488] clusterCIDR not specified, unable to distinguish between internal an
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.156343       1 server.go:513] Tearing down userspace rules.
Jan 13 10:11:55 node-1.[redacted] runc[772]: W0113 10:11:55.475478       1 server.go:628] Failed to start in resource-only container "/kube-proxy": mkdir /sys/fs/cgroup/cpuset/kube-proxy: read-only file system
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.476775       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.476989       1 conntrack.go:52] Setting nf_conntrack_max to 131072
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.477159       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.477307       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.478839       1 config.go:202] Starting service config controller
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.478880       1 config.go:102] Starting endpoints config controller
Jan 13 10:11:55 node-1.[redacted] runc[772]: I0113 10:11:55.524651       1 controller_utils.go:994] Waiting for caches to sync for service config controller

Nodes

 kubectl get nodes
NAME                               STATUS    ROLES     AGE       VERSION
node-1.[redacted       ]   Ready     <none>    33d       v1.7.3

node-1 has the ip address 172.20.61.51.

Services

The service is running:

 kubectl get service
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.254.0.1     <none>        443/TCP        33d
my-nginx    NodePort    10.254.17.99   <none>        80:30849/TCP   3h
kubectl describe service my-nginx

Name:                     my-nginx
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=nginx
Type:                     NodePort
IP:                       10.254.17.99
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30849/TCP
Endpoints:                172.17.0.2:80,172.17.0.3:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
 kubectl get pods
NAME                                READY     STATUS    RESTARTS   AGE
nginx-deployment2-540558622-9zxwt   1/1       Running   1          3h
nginx-deployment2-540558622-jzjv0   1/1       Running   1          3h

Analysis

Log

Compared to the old output the first message is still logged but Failed to execute iptables-restore: failed to open iptables lock /run/xtables.lock: open /run/xtables.lock: read-only file system is no longer logged:

Failed to start in resource-only container "/kube-proxy": mkdir /sys/fs/cgroup/cpuset/kube-proxy: read-only file system
...

Access service from the node works

On the node the service can be accessed (curl http://172.20.61.51:30849 succeeds).

Access service from other systems does not work

When I access the service from my laptop the service cannot be accessed (curl http://172.20.61.51:30849 hangs).

tcpdump shows that my host gets no reply for the initial SYN packet:

# on the host node-1
sudo tcpdump -nn port 30849
...
13:55:15.300884 IP 172.20.10.50.54187 > 172.20.61.51.30849: Flags [S], seq 1814234020, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 1060856956 ecr 0,sackOK,eol], length 0
13:55:16.303943 IP 172.20.10.50.54187 > 172.20.61.51.30849: Flags [S], seq 1814234020, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 1060857956 ecr 0,sackOK,eol], length 0
...

Firewall

iptables has rules for the service:

# on the host node-1
sudo iptables -n -L -t nat
....
Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/my-nginx: */ tcp dpt:32474
KUBE-SVC-BEPXDJBUHFCSYIC3  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/my-nginx: */ tcp dpt:32474

...

Chain KUBE-SEP-BLX3X6UTIG6UGCA2 (1 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  172.17.0.5           0.0.0.0/0            /* default/my-nginx: */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/my-nginx: */ tcp to:172.17.0.5:80

...


Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  tcp  -- !172.17.0.0/16        10.254.0.1           /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y  tcp  --  0.0.0.0/0            10.254.0.1           /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-MARK-MASQ  tcp  -- !172.17.0.0/16        10.254.168.195       /* default/my-nginx: cluster IP */ tcp dpt:80
KUBE-SVC-BEPXDJBUHFCSYIC3  tcp  --  0.0.0.0/0            10.254.168.195       /* default/my-nginx: cluster IP */ tcp dpt:80
KUBE-MARK-MASQ  tcp  -- !172.17.0.0/16        10.254.210.142       /* ingress-nginx/default-http-backend: cluster IP */ tcp dpt:80
KUBE-SVC-J4PGGZ6AUXZWNA2B  tcp  --  0.0.0.0/0            10.254.210.142       /* ingress-nginx/default-http-backend: cluster IP */ tcp dpt:80
KUBE-NODEPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Chain KUBE-SVC-BEPXDJBUHFCSYIC3 (2 references)
target     prot opt source               destination
KUBE-SEP-BLX3X6UTIG6UGCA2  all  --  0.0.0.0/0            0.0.0.0/0            /* default/my-nginx: */ statistic mode random probability 0.50000000000
KUBE-SEP-J5WBW7HEOGAHN6ZG  all  --  0.0.0.0/0            0.0.0.0/0            /* default/my-nginx: */

...
# Outbound

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

...

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

from atomic-system-containers.

neuhalje avatar neuhalje commented on September 28, 2024

@ashcrow Is this a bug or a setup problem on my side?

from atomic-system-containers.

ashcrow avatar ashcrow commented on September 28, 2024

@jasonbrooks ^^

from atomic-system-containers.

jasonbrooks avatar jasonbrooks commented on September 28, 2024

@neuhalje It might be a setup problem on your side. I'm testing this on a three node cluster with system containers installed, and my nodeport is exposed on each of my nodes, and I'm able to curl the nginx server.

I am getting the /sys/fs/cgroup/cpuset/kube-proxy: read-only file system error as well. The system containers for the openshift origin node (https://github.com/openshift/origin/blob/release-3.7/images/node/system-container/config.json.template), which cover the kubelet and the proxy components, bind /sys rw, we could take that approach, or we could change our ro bind of /sys/fs/cgroup to rw.

A wider issue is that we need to update / refine our suggested kubernetes setup process. I've always used https://github.com/kubernetes/contrib/tree/master/ansible, but those scripts have been deprecated for a different ansible-based approach that doesn't use these system containers at all.

I think it might make sense to try and work out upstream kube master and node roles that work with https://github.com/openshift/openshift-ansible.

from atomic-system-containers.

deuscapturus avatar deuscapturus commented on September 28, 2024

I've hit this same issue.

Able to connect to tutor-proxy nodePort locally, but not remotely. I'm running the latest available version of the kube-proxy system container from registry.fedoraproject.org/f27/kubernetes-proxy.

kube-proxy output:

Feb 13 18:40:45 ip-10-107-20-177.us-west-2.compute.internal systemd[1]: Started kubernetes-proxy.
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: 2018-02-13 18:40:46.089456 I | proto: duplicate proto type registered: google.protobuf.Any
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: 2018-02-13 18:40:46.089550 I | proto: duplicate proto type registered: google.protobuf.Duration
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: 2018-02-13 18:40:46.089570 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: W0213 18:40:46.133750       1 server.go:190] WARNING: all flags other than --config, --write-config-to, and --cleanup-iptables are deprecated. Please begin using a config file ASAP.
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.140159       1 server.go:478] Using iptables Proxier.
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: W0213 18:40:46.145704       1 proxier.go:488] clusterCIDR not specified, unable to distinguish between internal and external traffic
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.146164       1 server.go:513] Tearing down userspace rules.
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: W0213 18:40:46.156672       1 server.go:628] Failed to start in resource-only container "/kube-proxy": mkdir /sys/fs/cgroup/cpuset/kube-proxy: read-only file system
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157028       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157116       1 conntrack.go:52] Setting nf_conntrack_max to 131072
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157264       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157307       1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157646       1 config.go:202] Starting service config controller
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157662       1 controller_utils.go:994] Waiting for caches to sync for service config controller
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157701       1 config.go:102] Starting endpoints config controller
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.157708       1 controller_utils.go:994] Waiting for caches to sync for endpoints config controller
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.257846       1 controller_utils.go:1001] Caches are synced for endpoints config controller
Feb 13 18:40:46 ip-10-107-20-177.us-west-2.compute.internal runc[6281]: I0213 18:40:46.257870       1 controller_utils.go:1001] Caches are synced for service config controller

from atomic-system-containers.

ashcrow avatar ashcrow commented on September 28, 2024

Reopening. @jasonbrooks can you reproduce?

from atomic-system-containers.

deuscapturus avatar deuscapturus commented on September 28, 2024

It has been stated that this issue will be resolved with 2d50826

But I have doubt that the above fix applies to the kubernetes-proxy system container. It looks like it only applies to the kubelet container.

from atomic-system-containers.

jasonbrooks avatar jasonbrooks commented on September 28, 2024

@deuscapturus Right, I'm going to test adding a similar fix in the kube-proxy container

from atomic-system-containers.

jasonbrooks avatar jasonbrooks commented on September 28, 2024

@deuscapturus So, I tested the change, and it got rid of the error, but I'm able to access my nodeport from a separate system with or without the change.

I can try to reproduce what you're seeing, do you have a test manifest or something I can try

from atomic-system-containers.

deuscapturus avatar deuscapturus commented on September 28, 2024

My problem is somewhere in iptables. I'm able to connect to my service externally on the nodePort when I change kube-proxy to --proxy-mode=userspace.

@jasonbrooks as your test suggests the ro filesystem error/warning is an entirely different issue. Would you prefer a new issue or to change the title on this one?

from atomic-system-containers.

jasonbrooks avatar jasonbrooks commented on September 28, 2024

@deuscapturus we can keep this issue. I'm curious if you install the and run the proxy from the rpm, will you still have this issue. The following command will do it. I'm including a dl of the particular package because the current latest kube in f27 is 1.9.1, but a system container w/ that version hasn't been released yet.

atomic uninstall kube-proxy &&  curl -O https://kojipkgs.fedoraproject.org//packages/kubernetes/1.7.3/1.fc27/x86_64/kubernetes-node-1.7.3-1.fc27.x86_64.rpm && rpm-ostree install kubernetes-node-1.7.3-1.fc27.x86_64.rpm -r

from atomic-system-containers.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.