rancher / rke2 Goto Github PK

License: Apache License 2.0

Dockerfile 1.04% Makefile 0.94% Go 64.87% Shell 18.26% Python 1.31% PowerShell 11.81% Ruby 0.99% HTML 0.78%

rke2's Issues

CPU usage 100%

Hi! I am using

curl https://raw.githubusercontent.com/rancher/rke2/master/install.sh | INSTALL_RKE2_VERSION=v0.0.1-alpha.5 sh -

command to start single-node cluster on Ubuntu 20.04 VM (1 CPU, 2GB RAM) .

When canal helm charts is deployed CPU usage is 100% and logs full of

Jul 03 13:32:49 rke2-1 systemd-udevd[3692]: calico_tmp_A: Failed to get link config: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 150 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 151 seen, reloading interface list
Jul 03 13:32:49 rke2-1 systemd-udevd[3698]: calico_tmp_B: Failed to get link config: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 151 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 151 seen, reloading interface list
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_B: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_A: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 151 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 150 seen, reloading interface list
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_B: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_A: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 150 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 152 seen, reloading interface list
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_B: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 systemd-networkd[569]: calico_tmp_A: Could not find device, waiting for device initialization: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 152 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 153 seen, reloading interface list
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 153 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 153 seen, reloading interface list
Jul 03 13:32:49 rke2-1 systemd-udevd[3692]: calico_tmp_A: Failed to get link config: No such device
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: ERROR:Unknown interface index 153 seen even after reload
Jul 03 13:32:49 rke2-1 networkd-dispatcher[697]: WARNING:Unknown index 152 seen, reloading interface list
Jul 03 13:32:50 rke2-1 systemd-udevd[3698]: calico_tmp_B: Failed to get link config: No such device

In calico-node container's log i see

2020-07-03T13:43:51.219280859Z stdout F 2020-07-03 13:43:51.217 [INFO][25660] int_dataplane.go 1258: Applying XDP actions did not succeed, disabling XDP error=failed to resync: cannot find XDP object "/usr/lib/calico/bpf/filter.o"
2020-07-03T13:43:51.311974047Z stdout F 2020-07-03 13:43:51.298 [INFO][25660] int_dataplane.go 778: Linux interface addrs changed. addrs=set.mapSet{} ifaceName="calico_tmp_B"
2020-07-03T13:43:51.312013473Z stdout F 2020-07-03 13:43:51.300 [INFO][25660] int_dataplane.go 778: Linux interface addrs changed. addrs=set.mapSet{} ifaceName="calico_tmp_A"
2020-07-03T13:43:51.325926965Z stdout F 2020-07-03 13:43:51.325 [WARNING][25660] int_dataplane.go 981: failed to wipe the XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o" try=0
2020-07-03T13:43:51.510189347Z stdout F 2020-07-03 13:43:51.509 [WARNING][25660] int_dataplane.go 981: failed to wipe the XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o" try=1
2020-07-03T13:43:51.724108398Z stdout F 2020-07-03 13:43:51.718 [WARNING][25660] int_dataplane.go 981: failed to wipe the XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o" try=2
2020-07-03T13:43:51.927056232Z stdout F 2020-07-03 13:43:51.913 [WARNING][25660] int_dataplane.go 981: failed to wipe the XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o" try=3
2020-07-03T13:43:52.102314442Z stdout F 2020-07-03 13:43:52.097 [WARNING][25660] int_dataplane.go 981: failed to wipe the XDP state error=cannot find XDP object "/usr/lib/calico/bpf/filter.o" try=4

kubectl -n kube-system set env ds/canal -c calico-node FELIX_XDPENABLED=false and reboot is fixing the problem. Looks like in ranchertest/calico:v3.13.3 docker image is missing /usr/lib/calico/bpf/ directory:

docker run --rm ranchertest/calico:v3.13.3 ls -l /usr/lib/calico
ls: cannot access /usr/lib/calico: No such file or directory

Helm Chart for metrics-server

Migrate images from ranchertest to rancher docker hub repo

We need to migrate from ranchertest over to our rancher docker hub repo.

At build time scan for non-fips algorithms (or panic)

FIPS-140 does not permit some algorithms from being used. For example, MD5 may not be allowed.

We should determine a solution that allows us to either parse and scan thru go code and alert on an invalid algorithm or perhaps we just want to halt the build process and panic when an invalid algorithm is detected.

May just involve updating our shim or something to this effect? (As we cannot touch GoBoring library).

CIS Benchmark

We will need to create a unique set of benchmarks for RKE2.
The hardening guide effort is tracked in #84

Issue from k3s-io/k3s#1504

Certified Image Pipeline

Epic covering building drone pipeline and images and the following below.

GoBoring compilation for FIPS compliance
UBI7 base image
Vulnerability scanning via trivvy
Support for multiple architectures
Leveraging multi-stage builds with common build image base
Template project for creating new certified image pipelines

Note: Rancher Federal team to take this and STIG these images. Then, via their own private repo/pipeline publish the STIG'ed images.

Related K3s issue: k3s-io/k3s#1503

Helm chart for Canal CNI

Autodetect binaries and bind-mount if needed

Autodetect binaries, if they are not available int he container bind-mount to host w/ chroot.

Rationale:
UBI8 seems to be missing many common binaries needed to get a kubernetes cluster up and running. This is our workaround for this.

Install rke2 using commit id is broken

Version:
Rke v0.0.1-alpha.7

Describe the bug:
Install rke2 using commit id.

INSTALL_RKE2_COMMIT= ./install.sh

# INSTALL_RKE2_COMMIT=4ccaa37d20b38e7d95a1ccd577894d4689b36a84 ./install.sh
[INFO]  using commit 4ccaa37d20b38e7d95a1ccd577894d4689b36a84 as release
[INFO]  downloading hash https://storage.googleapis.com/rke2-ci-builds/rke2-4ccaa37d20b38e7d95a1ccd577894d4689b36a84.sha256sum
#

Create Helm chart for Nginx

Effort to get a helm chart for the nginx controller.
Nginx will not be FIPS-compiled. After much research this is a large effort and not feasible for MVP release.

Rancher Integration: Upgrades in Rancher Phase 2 - Config of nodes

Depends on #42 completion.

Add support for updating config of the nodes.

Helm Chart for CoreDNS

Consider CoreDNS autoscaler as well.

Helm chart for push proxy (support for Rancher monitoring v2)

Test New Kubelet argument --protect-kernel-defaults

A new argument has been added to kubelet that needs to be set to true to comply with CIS 1.5 requires. The work to accomplish this was done in #87 .

To test

grep protect /var/lib/rancher/rke2//logs/kubelet.log

This command should return a string with the argument and it being set to false.

RKE2 Fails on Execution Looking for ETCD User in Non CIS Mode

Version:
rke2 version v0.0.1-alpha.6

Node OS:
Ubuntu 20.04

Issue:

When NSTALL_RKE2_CIS_MODE is set to true installation is successful.
When INSTALL_RKE2_CIS_MODE is not passed to install script as below, installation fails with message "unknown user etcd"
INSTALL_RKE2_VERSION=v0.0.1-alpha.6 ./install.sh

Jul 07 23:07:16 ip-172-31-15-215 systemd[1]: Failed to start Rancher Kubernetes Engine v2.
Jul 07 23:07:21 ip-172-31-15-215 systemd[1]: rke2.service: Scheduled restart job, restart counter is at 6.
Jul 07 23:07:21 ip-172-31-15-215 systemd[1]: Stopped Rancher Kubernetes Engine v2.
Jul 07 23:07:21 ip-172-31-15-215 systemd[1]: Starting Rancher Kubernetes Engine v2...
Jul 07 23:07:21 ip-172-31-15-215 rke2[2299]: time="2020-07-07T23:07:21Z" level=warning msg="not running in CIS 1.5 mode"
Jul 07 23:07:21 ip-172-31-15-215 rke2[2299]: time="2020-07-07T23:07:21Z" level=info msg="Starting rke2 v0.0.1-alpha.6 (HEAD)"
Jul 07 23:07:21 ip-172-31-15-215 rke2[2299]: time="2020-07-07T23:07:21Z" level=fatal msg="starting kubernetes: preparing server: start cluster and https: user: unknown user etcd"

Support for etcd snapshot and restore

Add support for RKE2 snapshot, backup, and restore ~~via Rancher~~ via CLI.

must also take automatic snapshots periodically.
On by default
Controlled via config args
~~Supports S3 (need this capability similar to RKE1)~~ Not need for 1.0
Restore functionality folded into rke2 --cluster-reset - You restore by triggering a cluster-reset with a restore path arg specifified

RPM Support for RHEL7/CentOS 7

We should be building/publishing RPM's to https://rpm.rancher.io for consumption by EL systems.

TLS FIPS Support

Support for FIPS TLS

Rancher Integration: Recognize RKE2 cluster on import

Much like with K3s, add support for recognizing an imported RKE2 cluster.

Validating embedded etcd

RKE2 works only with embedded etcd driver defined in k3s repo https://github.com/rancher/k3s/blob/master/pkg/etcd/etcd.go, the etcd driver is responsible for the following:

Registering the driver
Start the controller (responsible for deleting nodes and update etcd node information)
Start the etcd node with a specific configuration, if the node is joining a cluster then it will add the etcd node as a member

Join etcd cluster:

To join the cluster you just need to start a new rke2 server and it will join the cluster automatically.

Remove etcd member

To remove an etcd member all you need to do is just remove the node from the cluster using kubectl:

kubectl delete node <node-name>

Reset cluster

In case of any quorum loss you can reset the cluster with the same data on the server by passing --cluster-reset to rke2, after it resets the cluster you should remove --cluster-reset flag and restart rke2 again.

Basic Test Scenarios

Test 1 (Join a new node)

start with 3 master nodes
add a new rke2 server node to cluster

expected

A new node should join the cluster and etcd member should be added

verify

You can exec to any etcd pod running in kube-system and verify using the following command:

etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt member list

Test 2 (Remove a node)

start with 3 master nodes
remove a node using kubectl delete node

expected

A node should be removed from k8s cluster as well as from etcd cluster as a member

verify

you can verify by exec-ing to any of the etcd pods left and run the following command to list the members

etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt member list

Test 3 (Reset cluster)

start 3 master nodes
shut down two master nodes
cluster should lose quorum
restart the first node with --cluster-reset
remove the failed nodes from the k8s cluster

Expected

cluster should come back up again

Test 4 (Restore functionality)

start 3 master nodes
shutdown two nodes
cluster should lose quorum
start the two failed nodes

Expected

cluster should restore quorum and come back up again

Error while running kubectl while using install script method of installation.

Version:
Rke2 v0.0.1-alpha.4

Describe the issue:

kubectl errors with below message instead of command not found
rke2 kubectl get nodes --kubeconfig=/etc/rancher/rke2/rke2.yaml
WARN[0000] not running in CIS 1.5 mode
No help topic for 'kubectl'

2. After installing kubectl, default path of kubeconfig need to be passed explicitly.

kubectl get nodes --kubeconfig=/etc/rancher/rke2/rke2.yaml
NAME STATUS ROLES AGE VERSION
ip-172-31-13-222 Ready etcd,master 9m9s v1.18.4

Need for node-external-ip flag for rke2

As of rke2 version v0.0.1-alpha.6 we do not have node-external-ip flag with rke2.
Ingress creation fails while importing rke2 into rancher.

UNdisable default docker seccomp

Just going to pull directly from the tweet that brought it up
https://twitter.com/uncontainer/status/1194987600185974786?s=20

By default, #Kubernetes disables the Docker default Seccomp profile that jessfraz worked so hard on.

Several K8s cloud providers don’t override that setting, making their containers completely insecure by default, requiring pod level config.

Stress Testing RKE2

We should do some load/stress testing of RKE2 as there may be a performance impact due to implemented crypto. It would be a good idea to do the following:

Test with upstream tests (similar to what Hussein has done in the past with K3s)
Test with Rancher with many clusters (harsher test, similar to what Dan has done in the past)

While testing consider that we may have a performance impact compared to k3s tests we have done. We should test when we have an alpha available and try to complete this within a couple of weeks well before beta release.

RHEL8 Support

This is a high level task that encompasses the work required to support RHEL8. It expands on work done as a part of #2

Autodetect binaries, if they are not available in the container bind-mount to host w/ chroot.

[Documentation] CIS Hardening Guide

Separated out from original issue #1

CIS Hardening is complete. Next step here is to create a guide with details.

--cluster-reset not mentioned in help

Version:
(all)
latest validated on version v0.0.1-alpha.6

Issue:
There is no mention of --cluster-reset when using rke2 server --help

RKE2 SELinux Support

This tracks any work that may be required to get SELinux support into RKE2.

When second node is added console gets flooded

Version:
rke2 v0.0.1-alpha.4
Ubuntu 20.04

Issue:
Console get flooded with below msgs. Node is successfully added.

To reproduce:

Install node1 rke2 server
Join node2 passing server and token

Additional info:

ERRO[1700] Failed to connect to proxy                    error="dial tcp 172.31.33.122:9345: connect: connection refused"
ERRO[1700] Remotedialer proxy error                      error="dial tcp 172.31.33.122:9345: connect: connection refused"
INFO[1705] Connecting to proxy                           url="wss://172.31.33.122:9345/v1-rke2/connect"
ERRO[1705] Failed to connect to proxy                    error="dial tcp 172.31.33.122:9345: connect: connection refused"
ERRO[1705] Remotedialer proxy error                      error="dial tcp 172.31.33.122:9345: connect: connection refused"
INFO[1710] Connecting to proxy                           url="wss://172.31.33.122:9345/v1-rke2/connect"

nginx-ingress-controller service is in pending state

Version:

rke2 v0.0.1-alpha.4

Issue:
nginx-ingress-controller service is in pending state. Since we dont have servicelb it should not expected to be of type LoadBalancer

kube-system   nginx-ingress-controller        LoadBalancer   10.43.5.161     <pending>     80:30782/TCP,443:30488/TCP   4h32m

Rancher Integration: Upgrades in Rancher

Standalone upgrade controller integration and integration with Rancher.
Base off of rancher/system-upgrade-controller framework.

Config of nodes: #43 (to be done later)

Config file support for RKE2

Work should start on k3s first, then port this into RKE2. k3s-io/k3s#1505

Add support for flat config file which specifies flags to run binary with. Please reference the K3s issue for details. This issue is for tracking the work to port this over to RKE2

Helm chart for cloud-controller-manager

We might close this later in favor of #142. We decided to provide flags to the user that allows them to set up cloud providers.
However, it's possible we may need this later. Such as for vmware external ccm

We'll need to have this sorted out before we make a call on if this issue needs to be closed or not.

Support for other linux distros and versions

RKE2 version:
v0.0.1-alpha.4

Node OS:
Ubuntu 18.04:

Describe the issue:
rke2 installation errors on ubuntu18.04.

Logs:

Using binary for rke2. now seeing this error `rke2: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.28' not found (required by rke2)

Logs are flooded with TLS handshake error messages.

RKE2 version:
v0.0.1-alpha.4

Describe the issue:

Run rke2 using binary using the command
rke2 server 2>&1 &
Notice the logs being flooded with the TLS hankshake error with IP of the Loadbalancer.

I0629 22:43:19.270728    1995 log.go:181] http: TLS handshake error from 172.31.7.168:48004: EOF
I0629 22:43:19.813270    1995 log.go:181] http: TLS handshake error from 172.31.7.168:46580: EOF
I0629 22:43:20.260827    1995 log.go:181] http: TLS handshake error from 172.31.7.168:60068: EOF
I0629 22:43:20.265857    1995 log.go:181] http: TLS handshake error from 172.31.7.168:41756: EOF
I0629 22:43:20.572638    1995 log.go:181] http: TLS handshake error from 172.31.7.168:63126: EOF
I0629 22:43:20.868685    1995 log.go:181] http: TLS handshake error from 172.31.7.168:48299: EOF
I0629 22:43:21.174996    1995 log.go:181] http: TLS handshake error from 172.31.7.168:28564: EOF
I0629 22:43:22.825398    1995 log.go:181] http: TLS handshake error from 172.31.7.168:24861: EOF

An health care use case feedback of RKE

Hello!

I'm excited about the future of RKE, though the current version does not yet fit into our use case.
I work at a french state hospital called APHP for "Assistance Publique - Hôpitaux de Paris".

We are a little lightweight on system administration and development resources, so the ease of use of RKE was a great fit to us. The support of Airgap installations which are also really important for us is there, so that's good too. We run behind an HTTP proxy for anything that goes outside and our base systems run CentOS FYI.

The current project is about making environments available for remote computation such as JupyterHub with strict confidentiality requirements.

We have users with different rights to a big data warehouse, so they must not step onto each other's permissions and access data they're not allowed to.

So we determined that Kubernetes orchestration mechanisms were great for resource management and fast to spin up and down as well but the isolation between users in their own pods is insufficient.

We could study that the Kubernetes eco-system is currently evolving towards better security mainly with efforts driven by Red Hat, Google and Intel/OpenStack.

Red Hat is pushing unprivileged containers with podman, buildah, etc.
Google is pushing gVisor.
Intel/OpenStack is pushing Kata Containers.

And where's RKE in all this? Well RKE depends on Docker so it can't use gVisor, Kata Containers or any other custom runtime such as cri-o.

So here's me saying that I'd really love if RKE2 could support non-Docker deployments while keeping the ease of use!

Thanks a lot for the awesome work!

By the way I wish we could fund you in some way but the process that leads to such a thing is complicated, but if I have a working prototype it'll be easier for me to justify it to the people that can do it. But again, it's not your responsability to drive us to a working prototype, but know that with my rather limited knowledge of Golang (but I'm a fast learner), I'm happy to help in any way I can.

And by the way x2, I'm currently experimenting with k3s with multi master HA deployments with dqlite but it's not quite there yet. I could also get Kata Containers running with k3s so that's good!

Leo

Customize aws profile in provider.

Few issues found during the initial install of rke2

aws profile is hardcoded, had to manually adjust the tf files to set the correct profile

provider "aws" {
  region  = "us-east-2"
  profile = "rancher-eng"
}

With terraform version 0.12.26 warning is seen with interpolation syntax. terraform version 0.11 or earlier accepts below syntax.

Warning: Interpolation-only expressions are deprecated

  on main.tf line 238, in resource "aws_lb_target_group_attachment" "rke2-nlb-attachement":
 238:   target_group_arn = "${aws_lb_target_group.rke2-master-nlb-tg.arn}"

Build process fails while generating kubeconfig, could be due to reuse of ip for ec2 instance

Are you sure you want to continue connecting (yes/no/[fingerprint])? 
null_resource.get-kubeconfig (local-exec): Host key verification failed.

Second master node not able to join the master node

Version:
Rke v0.0.1-alpha.4

Describe the bug:
Install first node

INSTALL_RKE2_VERSION=v0.0.1-alpha.4 ./install.sh

Join second master
.Node is available but not joined to master

INSTALL_RKE2_VERSION=v0.0.1-alpha.4 INSTALL_RKE2_EXEC='server' RKE2_URL='MasterIP:9345' RKE2_TOKEN='<TOKEN>' ./install.sh

Logs:

Jun 30 23:56:22 ip-172-31-1-120 rke2[448714]: time="2020-06-30T23:56:22Z" level=info msg="Shutting down /v1, Kind=Node workers"
Jun 30 23:56:22 ip-172-31-1-120 rke2[448714]: time="2020-06-30T23:56:22Z" level=info msg="Shutting down /v1, Kind=Secret workers"
Jun 30 23:56:22 ip-172-31-1-120 rke2[448714]: time="2020-06-30T23:56:22Z" level=fatal msg="server stopped: http: Server closed"

Integrate Helm charts for addons to rke2

As per discussion with @ibuildthecloud:

rke2 will integrate helm charts as CRs manifests in the manifest directory, however since rke2 is using different supervisor port the helm controller will not be able to download the charts, so the following changes will be added:

helm controller will have an added spec to the crd called chartContent
rke2 build process will download the tgz charts and add it to the yaml manifest as helmChart CRs
helm controller will create configmap as the chartContent
klipper helm job will mount this configmap as a chart tgz and it will decode it

Go FIPS Support

Review Go's use of BoringCrypto. Determine what needs to be done to get a FIPS-Compliant go build going.

Support for Centos and RHEL

Node OS: Centos, RHEL

Issue:
rpms are not available. as mentioned in the issue #49

Additional info:

rpm -i https://rpm.rancher.io/rke2-selinux-0.1.1-rc1.el7.noarch.rpm
curl: (22) The requested URL returned error: 404 Not Found
error: skipping https://rpm.rancher.io/rke2-selinux-0.1.1-rc1.el7.noarch.rpm - transfer failed

Test image pull secrets

Based on 6/10/20 call with Rancher Federal team, there was some concern that image pull secrets do not work with containerd. We believe this is not the case but proposed to have QA briefly check this area to verify.

Test INSTALL_RKE2_CIS_MODE Functionality in Install Script

Functionality introduced in PR needs to be tested and verified. #58

Verify the etcd user has been created:

grep etcd /etc/passwd

Verify kernel parameters have been updated, run the commands below:

sysctl vm.panic_on_oom
sysctl kernel.panic
sysctl kernel.panic_on_oops
sysctl kernel.keys.root_maxbytes

Expected values:

vm.panic_on_oom=0
sysctl kernel.panic=10
sysctl kernel.panic_on_oops=1
sysctl kernel.keys.root_maxbytes=25000000

Verify secrets-encryption Flag Set to True

PR #32 sets the flag "secrets-encryption" to true by default and passed down to k3s. The same tests that verify this flag in k3s can be used for rke2.

rke2-uninstall.sh does not remove etcd user

Version:
version v0.0.1-alpha.6

Issue:
etcd user persists after running rke2-uninstall.sh, thus failing re-install of rke2 on the same node.

rke2 -v
-bash: /usr/local/bin/rke2: No such file or directory
cat /etc/passwd|grep etcd
etcd:x:997:997:ETCD Service User:/var/lib/rancher/rke2:/usr/sbin/nologin

Support for Rancher Logging v2

Make sure static pods go into the default logging v2 and also the supervisor process log.
Basically we need to ensure all logs can get into Rancher log v2.

RPM Installer

Based on recent discussions with the Rancher Federal team and Will, a full RPM installer is a must for MVP.

If a supplemental RPM is needed such as for SELinux policy this is okay. Best case one RPM does everything (is this possible?)

Is waiting on internal eio issue #36 to be completed.

Test and Verify Etcd Runs as Etcd User

The functionality introduced in PR: #56 needs to be validated. This work is in conjunction to the install script for adding CIS mode.

This can be done by:

./rke2 --profile=cis-1.5 server

If it runs, it thinks it succeeded.

To get the etcd process, run the command below.

ps aux | grep etcd

Check the pod manifest for a security context section that has the etcd user id and group id. Those id's can be references from the output from cat /etc/passwd | grep etcd. To see the manifest:

cat /var/lib/rancher/rke2/agent/pod-manifests/etcd.yaml

Helm Chart for Kube Proxy

Support Install Feature of using commit id

Installation using commit id fails at download

INSTALL_RKE2_COMMIT=1dd8d99d86daac97b2cf2a060288c86e0059e7b6 ./install.sh 
[INFO]  using commit 1dd8d99d86daac97b2cf2a060288c86e0059e7b6 as release
[INFO]  downloading hash https://storage.googleapis.com/rke2-ci-builds/rke2-1dd8d99d86daac97b2cf2a060288c86e0059e7b6.sha256sum
root@ip-172-31-4-195:~#

rancher / rke2 Goto Github PK

rke2's Issues

Join etcd cluster:

Remove etcd member

Reset cluster

Basic Test Scenarios

Test 1 (Join a new node)

expected

verify

Test 2 (Remove a node)

expected

verify

Test 3 (Reset cluster)

Expected

Test 4 (Restore functionality)

Expected

Recommend Projects

Recommend Topics

Recommend Org