Coder Social home page Coder Social logo

Comments (11)

brandond avatar brandond commented on July 20, 2024

As the docs say:
https://docs.k3s.io/networking/networking-services#deploying-an-external-cloud-controller-manager

K3s provides an embedded Cloud Controller Manager (CCM) stub that does the following:

Sets node InternalIP and ExternalIP address fields based on the --node-ip and --node-external-ip flags.

If you disable the built-in cloud-controller, K3s no longer has a native integration point to set the external IPs. This is instead handled by whatever infrastructure provider specific cloud controller you deploy. Since those are not integrated into K3s's embedded flannel, you'll need to manually set any additional annotations necessary to inform Flannel about those IPs.

from k3s.

brandond avatar brandond commented on July 20, 2024

This disallows several use-cases on cloud hosted deployments (e.g. on AWS EC2 hosts as in our case) where some agent nodes are located on user-premises and/or cross-cloud setups.

To be clear, you're trying to use this to manage a hybrid deployment? You have some nodes that you want to be managed by the AWS CCM, and have the node and flannel use the external IPs set by that CCM, while other nodes are managed by the K3s CCM, and have the node and flannel use the external IPs set by the --node-external-ip flag?

This sort of thing isn't really allowed for by the cloud provider model, it is generally expected that all nodes in the cluster will be managed by the same CCM. It would take some additional work to make K3s set the flannel external IP annotations based on the node external IP provided by another CCM, and ensure that flannel starts up AFTER that CCM has already had a chance to initialize the node.

I don't even know how the AWS CCM will handle presence of non-aws nodes in the cluster.

from k3s.

brandond avatar brandond commented on July 20, 2024

cc @manuelbuil I think this would require

  1. Ensure that flannel doesn't start until after the cloud-provider uninitialized taint has been removed, same as the netpol controller
    I don't think this will break anything? Might make startup a small bit slower I guess.
  2. Move the flannel annotation setters out of the main agent startup, into the flannel setup code.
    This is probably good anyway because there's no point in setting these if flannel is disabled...
  3. Set the flannel external-ip-overwrite annotation values based on the node external IPs, instead of the CLI flag value.
    Honestly I'm not even sure why we need to set these if the external IPs are set properly; what is the point of these again? Were we just doing this because flannel was getting started before the external IPs were set by the embedded cloud provider?

from k3s.

ludost avatar ludost commented on July 20, 2024

Thanks for looking at this seriously. It would help us significantly, as described below:

To be clear, you're trying to use this to manage a hybrid deployment? You have some nodes that you want to be managed by the AWS CCM, and have the node and flannel use the external IPs set by that CCM, while other nodes are managed by the K3s CCM, and have the node and flannel use the external IPs set by the --node-external-ip flag?

Actually our setup is somewhat simpler: We have a cluster on AWS, to which we want to add remote nodes (k3s agents) which run on local workstations. Basically, we like to "federate" the cluster to local workstations. This works well enough, using bootstrap tokens, tightly controlled Flannel configuration (=which is where the current issue pops up), etc. We're even quite a few steps towards running the local k3s agent in a rootless setup.

Just for completion: our aws-cloud-controller-manager is configured to not do any network configuration inside the cluster:
--allocate-node-cidrs=false --configure-cloud-routes=false

It would take some additional work to make K3s set the flannel external IP annotations based on the node external IP provided by another CCM, and ensure that flannel starts up AFTER that CCM has already had a chance to initialize the node.

I'm not familiar enough with the peculiarities, but setting the IP annotations based on the CLI arguments, given during startup, seems independent from whether there is an external CCM or not? Or am I'm fully missing the point here?

2 Move the flannel annotation setters out of the main agent startup, into the flannel setup code.
This is probably good anyway because there's no point in setting these if flannel is disabled...

This seems like a good change, these annotations are very Flannel-specific by nature.

For the short term we've solved this issue by manually deploying Flannel as an CNI-plugin daemonset, as described at: https://github.com/flannel-io/flannel This ensures the node is started before Flannel initializes. However, passing the correct External-IP address to that setup is also non-trivial, especially in a rootless configuration. Just having these CLI arguments working would simplify our setup significantly.

from k3s.

brandond avatar brandond commented on July 20, 2024

We're even quite a few steps towards running the local k3s agent in a rootless setup.

How exactly are you accomplishing that? All the CNI-related stuff seemed pretty broken last time I tried to get it working rootless.

passing the correct External-IP address to that setup is also non-trivial

Even if K3s did set the annotations for you when you're not using our cloud-provider, you'll still need to properly set the external IPs for each node, right?

from k3s.

ludost avatar ludost commented on July 20, 2024

How exactly are you accomplishing that? All the CNI-related stuff seemed pretty broken last time I tried to get it working rootless.

Yes, it's a bit of a mess. Basically that's the next issue to tackle: in the current rootless client options, k3s hardcodes the list of copy-up dirs, which is missing /opt/cni as an entry. Which means that the kube-flannel based CNI plugin can't create that folder. So we moved the: cni-bin-dir to /run/opt/cni/bin. But as you know that's another set of annoying configuration changes that need to be done, both on the containerd and kubelet side.
(but let's handle one issue at a time:)

Even if K3s did set the annotations for you when you're not using our cloud-provider, you'll still need to properly set the external IPs for each node, right?

In effect, the external IPs are a given by the host you're running K3s on. In the case of the AWS servers, they are part of the EC2 setup, and can be obtained from inside the host. Similarly, at the remote k3s agent's hosts, we can just determine the Public IP addresses as a pre-given. Effectively we tell K3s what the external-ip is, expecting K3s to just sec pass this along to Flannel (and Flannel would pass it along to WireGuard).

An alternative setup would be to manually (outside K3s) setup a WireGuard network, and tell Flannel to directly use that pre-configured network. But I really like the dynamic setup of Wireguard that Flannel provides out-of-the-box.

from k3s.

manuelbuil avatar manuelbuil commented on July 20, 2024

I don't think this will break anything? Might make startup a small bit slower I guess.

I don't think so either

3. Were we just doing this because flannel was getting started before the external IPs were set by the embedded cloud provider?

Yes, very likely. The public-ip-overwrite annotation needs to be there before flanneld starts, otherwise it will pick the node-ip as the public-ip

from k3s.

manuelbuil avatar manuelbuil commented on July 20, 2024

By reading my issue #6177, I can confirm that we decided to set them as part of the cloud provider so that they are ready before flanneld is started

from k3s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.