The talos's discuss from siderolabs

fix: install cut

A new feature in v1.11.0-beta.1 kubeadm requires cut: https://github.com/kubernetes/kubernetes/blame/e85b81bbee098a7ec75cc894a8785867ec586798/pkg/util/ipvs/kernelcheck_linux.go#L64

[ERROR RequiredIPVSKernelModulesAvailable]: error getting installed ipvs required kernel modules: executable file not found in $PATH()

feat: configure Kernel Self Protection Project recommendations

http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings

test: run conformance tests on releases

We should become "Certified Kubernetes" by following the the CNCF guide.

chore: create AWS AMI

An automated process to create AMIs is needed for releases.

fix: open /proc/self/fd: no such file or directory

The CRI-O logs have a lot of logs like the following:

time="2018-04-03 04:07:18.909401344Z" level=info msg="Received container exit code: -1, message: exec failed: container_linux.go:348: starting container process caused "open /proc/self/fd: no such file or directory"

feat: generate /etc/os-release

Instead of generating /etc/os-release at build time, generate it at runtime. We set a version variable at build time.

fix: setting up CRI-O network breaks Flannel

The network config causes pods to come up with an IP from the 10.88.0.0/16 CIDR block. This breaks DNS resolution in pods.

I haven't been able to find much documentation around it, but it seems we should not create a network configuration and leave it up to the network plugin to handle that.

feat: detailed process information

We should provide info from /proc/<pid>/stat (and perhaps other places in procfs) that would give users detailed info on the running processes.

Verify user data

The user data fields should be verified. Clear and early errors in the initramfs relating to the user data would provide a better development and user experience.

feat: user data should use raw kubeadm config

Currently we use text/template to render MasterConfiguration and NodeConfiguration YAML files that are then fed into kubeadm. This is suboptimal since we are duplicating work done in kubeadm. We should instead support specifying the files in their raw format:

kubernetes:
  kubeadm: |
    kind: NodeConfiguration
    apiVersion: kubeadm.k8s.io/v1alpha1
    token: abcd.1234567898765432

fix: install conntrack

The following warning is showing up in the kubelet logs. We should install conntrack.

W0322 04:14:51.391333    1339 hostport_manager.go:68] The binary conntrack is not installed, this can 
cause failures in network connection cleanup.

Integration and E2E tests

Remove musl from initramfs

feat: use gRPC and protocol buffers

Instead of a RESTful API, I would prefer we use gRPC for a number of reasons. Using protocol buffers, we can generate the client and server API that can be used by init and a CLI.

refactor: pull out the gRPC server into a dedicated service

The init logic needs to be robust and as simple as possible. Currently, the gRPC service and init are one and the same. We should pull out the gRPC server code into a dedicated service. This would ensure that if something goes bad within the gRPC service, that we don't take out the whole node with it in the case of a bug that could trigger a kernel panic. I propose we call it osd to compliment osctl.

Refactor build to use BuildKit

See buildkit.

Upgrades

Implement upgrades.

feat: self-managed user data

Creating images is relatively simple now due to #114, and opens up the idea of encouraging users to build images with sensitive files baked in. The user data file is one example. Right now, the cluster depends on an external source of truth depending on the platform that the cluster is running on.

For example, in AWS, the user data is sourced from http://169.254.169.254/latest/user-data. The problem with this is that any pod in the cluster can reach this endpoint, which is a huge security risk considering what is in it.

In bare metal, there is the added burden of maintaining an http server for the purpose of serving the user data. In this example, you could secure the endpoint, but you still have to get the credentials onto the nodes somehow.

This proposal leverages the existing RoT features we have by further extending the responsibility of such a node. Nodes that serve as roots of trust are trusted by their very definition and therefore makes a perfect candidate for hosting a user data service on.

The workflow would be something like the following:

A user builds master nodes with baked in configs for master and worker nodes, including the RoT credentials.
A user builds worker nodes with baked in configs, including the RoT credentials.
The master nodes start a service that uses the RoT credentials to securely serve the user data files stored on them.
A worker node uses the baked in RoT credentials to make a request to a master node for the user data.

This method simplifies things operationally, and improves security in the process. It does move some of the complexity into the image building requirement, but the security and day-to-day operational value it brings, outweighs the complexity it adds.

Automated OSD Certificate Rotation

fix: explicitly set the version of Kubernetes to use

The MasterConfiguration should have kubernetesVersion set to the same version in the build.

GCE E2E test

fix: processes are blocked by io.Pipe

The way in which we are handling stdout and stderr in the process manager is blocking the process itself. The problem is that io.Pipe blocks until something reads and writes from it. This means that process are blocked until a user accesses the process logs via the API.

fix: error executing command in container: open /dev/ptmx: no such file or directory

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
test      1/1       Running   0          3m

$ kubectl exec -it test -- sh
error: Internal error occurred: error executing command in container: open /dev/ptmx: no such file or directory

fix: hierarchical accounting and reclaim

Since #59 was merged, the following shows up in the kubelet logs:

E0502 13:37:38.488869    1386 remote_runtime.go:209] StartContainer "c9d4b9a2cf132d74f3d3d5ca3b36763f88fab8e7dcaf8d37bfbb84ba6cc7371e" from runtime service failed: rpc error: code = Unknown desc = failed to start container "c9d4b9a2cf132d74f3d3d5ca3b36763f88fab8e7dcaf8d37bfbb84ba6cc7371e": Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"failed to set memory.kmem.limit_in_bytes, because either tasks have already joined this cgroup or it has children\\\"\"\n"

This breaks kubeadm init.

See kubernetes/kubernetes#61937.

fix: cgroup: "memory" requires setting use_hierarchy to 1 on the root

[   20.240345] cgroup: kubelet (1209) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[   20.242831] cgroup: "memory" requires setting use_hierarchy to 1 on the root

feat: HA cluster

The tasks outlined in the official HA guide can be automated. I think Option 2 is the best route to go, at least to start with. We can always change this later, or even add support for both. So that leaves us with having to:

Copy over the PKI directory from master0
Create the load balancer
Run kubeadm init
Configure kube-proxy

For 1, we can add an RPC call. We can use either a push or pull model. In the pull model, master nodes, excluding master0, would request the PKI directory upon initialization. This would require us to verify that requesters of the PKI directory are authorized to do so. The alternative is to push the PKI directory to other master nodes when they come up. This would require that a well known set of machines will participate as master nodes. I think the push model is better, as we can tie it in to the notion of a RoT we have established already.

For 2, we can deploy an ingress controller that runs only on the master nodes. This model offers a generic solution that can handle cloud and bare-metal deployments. The only issue then becomes the management of the DNS record pointing to the master nodes. How should we add/remove healthy/unhealthy nodes to/from the list?

For 3, this ties in to 2. Whatever we decide in 2, we would need to be sure to add the node to the load balancer.

For 4, we can use the admin.conf on master0 to update kube-proxy.

feat: default to IPVS

https://github.com/kubernetes/kubernetes/tree/master/pkg/proxy/ipvs

test: perform runc check-config.sh tests

https://github.com/opencontainers/runc/blob/master/script/check-config.sh

feat: explicit platform flag

Instead of discovering the platform, we should require explicit flags from a user.

Musl

feat: provide logs from /proc/kmsg

The ability to debug kernel level issues is important. We should provide the contents of /proc/kmsg via the API.

fix: port-forward socat getaddrinfo: Name or service not known

Terminal 1

kubectl port-forward etcd-master-0 2379:2379 -n kube-system
Forwarding from 127.0.0.1:2379 -> 2379

Terminal 2

curl localhost:2379/v2
curl: (52) Empty reply from server

Terminal 1

kubectl port-forward etcd-master-0 2379:2379 -n kube-system
Forwarding from 127.0.0.1:2379 -> 2379
Handling connection for 2379
E0402 21:26:31.026140   78292 portforward.go:331] an error occurred forwarding 2379 -> 2379: error forwarding port 2379 to pod 67354788607ba8b9caad4f69c7a4256581b46655281b6fd182ec0fd21ef0591c, uid : exit status 1: 2018/04/03 04:26:29 socat[11867] E getaddrinfo("localhost", "NULL", {1,2,1,6}, {}): Name or service not known

feat: a CLI for interacting with the API

A CLI would be useful for things like:

Process logs
Process status
Retrieving the kubeconfig

and other features as this project grows. It should probably be gRPC based after #19 is implemented.

feat: embed the kubeadm config

Leveled logging

Using either the standard library, or something like logrus, we should provide more robust logging.

Packet E2E test

feat: automated CSR workflow

Since #64 was merged, we now enforce security without the possibility of opting out. CSRs need to be automated for nodes as they come up. One way to achieve this would be to define a CustomResourceDefinition (or more than one) along with a controller that would automate the CSR workflow.

Remove libblkid dependency in init

AWS E2E test

feat: join a node to an existing cluster

There are a couple of ways we can implement this:

Pass in a token via user data with a TTL set to never expire.
Implement a kubeadm token management interface that integrates with the various tools we anticipate users will use to manage a cluster.

Option one has security risks and does not align with our goals.

EDIT:
Option one was chosen, and option two will be implemented in time.

I0705 03:19:00.863140    1170 kernel_validator.go:96] Validating kernel config
	[WARNING Service-Kubelet]: no supported init system detected, skipping checking for services
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

siderolabs / talos Goto Github PK

talos's Issues

Terminal 1

Terminal 2

Terminal 1

Recommend Projects

Recommend Topics

Recommend Org