rancher / rke2-docs Goto Github PK

License: Apache License 2.0

JavaScript 31.50% CSS 17.22% Shell 21.91% Python 29.38%

rke2-docs's Issues

CNCF url is incorrect

Hey there,

I've just seen, that the URL to the RKE2 CNCF card in the Introduction ("It is a fully conformant Kubernetes distribution ...") is incorrect (Error 404):

https://github.com/rancher/rke2-docs/blob/main/docs/introduction.md?plain=1#L11

You may want to use e.g. https://landscape.cncf.io/?view-mode=card&item=platform--certified-kubernetes-distribution--rke-government as an replacement.

favicon not loading although present in the repo

Just noticed today while using the site that the favicon isn't loading into the page despite being present in the assets repository.

Document default server config values

Many of the settings in server_config.md have no documented default value:

Would be amazing if this could be improved! I would be happy to help, but I don't know where to look to identify the default values.

Need to document steps for manual CA certificate generation / rotation

We need to document the steps to manually pre-generate CA certs + keys, or update the expiration on existing certs to extend the expiration. Ideally we would also output the CA hash in a format compatible with the --token arg ("K10" + SHA256 sum of cert bytes).

We don't currently have any documentation on how to do this for either k3s or rke2, but there are some good starting points at:

One important note is that the hash we are using is a SH256 sum of the server's certificate, NOT its public key. This means that any changes to the cert, including extending its expiration, will change the hash and require passing a new --token to joining clients if they are to trust it.

gz#13930
gz#15095
gz#16112

How to disable rke2 snapshot controller charts

Since Kubernetes version 1.25 there are 3 new charts for the snapshot-controller. Our docs are not updated reflecting just canal,coredns,ingres and metric-server and it would be nice to include these charts also.

Describe config.yaml syntax more prominently

A significant number of the different configuration options are described in their respective documents only in terms of command-line argument passed to the rke2 binary, significantly burying the lede on being able to configure the server using the same keys and values in config.yaml.

Given that the quickstart install directions lead you to a configuration where changing the server's command line arguments is rather not straightforward, I'd wager that the config.yaml method is probably far more useful to most people following those directions, and information on how to do this needs to be a lot more prominent than it currently is, buried between several other less-relevant sections.

If nothing else, the section describing the equivalence between command-line arguments and config.yaml keys should be made a lot more prominent and discoverable. But I'd even argue that the config.yaml syntax should probably be the primary way of describing the configuration options themselves, with the command-line syntax only secondary.

For instance, take this paragraph:

To enable Multus, pass multus as the first value to the --cni flag, followed by the name of the plugin you want to use alongside Multus (or none if you will provide your own default plugin). Note that multus must always be in the first position of the list. For example, to use Multus with canal as the default plugin you could specify --cni=multus,canal or --cni=multus --cni=canal.

IMO this would probably be more useful if paragraphs like that were rewritten something along the lines of:

To enable Multus, add multus as the first list entry in the cni config key, followed by the name of the plugin you want to use alongside Multus (or none if you will provide your own default plugin). Note that multus must always be in the first position of the list. For example, to use Multus with canal as the default plugin you could specify:
# /etc/rancher/rke2/config.yaml
---
cni:
- multus
- canal
This can also be specified with command-line arguments, i.e. --cni=multus,canal or --cni=multus --cni=canal.

That paragraph above is actually specifically the one I just spent two days stumped on before finally figuring it out.

Just to describe my own experience here:
I had RKE2 set up and installed from the quickstart guide. I knew that I needed Multus for my use-case, and was able to find the Using Multus section, and therefore "knew" that I somehow needed to get --cni=multus,canal passed as a command-line argument... and then spent the next two days going down dead ends, reinstalling RKE2 half a dozen times, even just reading the RKE2 source code (e.g. to see if there was some undocumented option that would get the https://get.rke2.io installer script to append --cni=multus,canal to the ExecStart line in the rke2-server.service systemd unit file it created). It wasn't until near the end of day 2 combing through the docs and reading every bit on every page that I finally found what I needed with the config.yaml file.

STIG default umask of 077 and CIS-1.23 prevents etcd from starting

Something to consider for the "Known Issues", or the "Hardening Guide", as discovered in rancher/rke2#4313, if running with the cis-1.23 profile, and in a STIG'd RHEL environment, you must set the umask to 022 before running any rke2 binary commands (rke2 server, rke2 server --cluster-reset, etc). It is also suggested to set the umask to 022 in the systemd unit file.

Markdown formatting on Matrix Compatibility link on Requirements page

https://docs.rke2.io/install/requirements

#174 was the original request

It looks like we got [] and () mixed up

Created #197 to fix

Update CIS information with CIS-1.7 and new Generic CIS profile

Starting with October releases, we will be in full compliance with the cis-1.7 benchmark. In order to ease future compliance with upcoming cis profiles, we also introduced a generic cis profile flag, so future users will not have to deal with deprecating cis-1.23 -> cis-1.7 flag changes. CIS profile means CIS for whatever benchmark is relevant for that minor release.

All of this needs to communicated in the docs.

Also:

CIS Benchmark	Applicable RKE2 Minors	Profile Flag
1.5	1.15-1.18	`cis-1.5`
1.6	1.19-1.22	`cis-1.6`
1.23	1.23	`cis-1.23`
1.24	1.24	`cis-1.23`
1.7	1.25-1.28	`cis-1.23`, `cis`
1.8	1.29	`cis`

Review RKE2 cert rotation codeblock for correctness

The doc page here covers using systemctl stop rke2.

Should a specific service name be expected? For example systemctl stop rke2-server, such as in the link here.

Update docs to provide firewalling advice/guidance

Summary

RKE2 docs suggest disabling firewalld (location: here). While this is valid advice, I have come across many users who think RKE2 in general is incompatible with firewalld because of this section alone.
RKE2 is intended to be a more security focused k8s distro, as such I think given Canal is the default CNI some guidance should be provided on mitigating risks either by pointing users to a CNI that is compatible with firewalld (cilium with eBPF?), or with some custom iptable rules as a starting point (not my preferred solution).

Potential solutions

If my understanding is correct, cilium with eBPF enabled on RKE2 should be compatible with firewalld. If my understanding is correct that could be listed as a mitigation.
I also know while not supported, iptable rules can be set manually and should work, again I know its not supported but some guidance is needed. This way if Canal is still the preferred CNI of a user they can be informed that manual rules can work (at their own risk).

Lost change in rke2/docs during transfer concerning FIPS support

In rancher/rke2#3405 "Update fips_support.md" a change was made in regard to the FIPS support, in effect telling that the original FIPS certification is marked as historical and adding the following clarifying statement:

However due to changes introduced by SP 800-56A Rev3, this validation is now historical. A re-validation effort is currently underway to return this module to active FIPS 140-2 status.

This was not copied over in #7 "Sync with rke2/docs" to this repo.

It is not clear - at least when looking at rancher/rke2#3405 - if rke2 is now FIPS 140-2 compliant or not.
According to https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3836 - its status is marked as historical.

Is there a re-validation effort ongoing? And if it is, is there some rough timeline available?

Problematic Markdown formatting in Linux uninstall page

In the linux uninstall doc page, there are multiple Markdown formatting issues.

Please update the documentation on HelmChart resource usage and available variables in RKE2.

Client feedback from Slack:

Please update the documentation on HelmChart resource usage and available variables in RKE2. https://docs.rke2.io/helm
E.g. spec.repoCA is not listed in fields table but it is listed in https://docs.k3s.io/helm
There are many other variables in helm-controller repository https://github.com/k3s-io/helm-controller/blob/345c53c9b2b6711d8ba3b4495aef6f810abd52fb/pkg/apis/helm.cattle.io/v1/types.go#L38. Please include them in documentation.

Update Packaged Components Information

We need to update the RKE2 docs have parity with all relevant https://docs.k3s.io/installation/packaged-components, information.

Allow for sqlite installs in k3s

When installing a single server, we should allow to NOT pass cluster-init and not have any datastore endpoint so that we use kine with sqlite.

The ordering of the steps in the CIS hardening guide is a bit confusing

I ran into these issues myself while trying to follow the guide.

At the very end of the guide, it is said that the config.yaml file needs to be created before installing RKE2. Technically you can install RKE2 but you shouldn't launch it until config.yaml has been setup.
The host-level requirements imply that you should have already installed RKE2 in order to copy the sysctl file to /etc/sysctl.d .
The Kubernetes Runtime Requirements steps also appear before the config.yaml steps and use kubectl which implies that the server should be running.

I think a better ordering for the steps would be:

Install RKE2
Configure Host requirements
Setup config.yaml
Start & enable RKE2
Configure Kubernetes runtime requirements

Add additional info on where to find `rke2-killall.sh` and `rke2-uninstall.sh`

Users with non-standard mount configurations (read-only or btrfs specific mounts) may have their uninstall and killall scripts under /opt/rke2 or another location if overriding INSTALL_RKE2_TAR_PREFIX . Additional information should be included to point the user in the right direction for finding their scripts.

Closing #37 and #33 in favor of this overarching issue.

[Tracking - RKE2] Update docs around network-manager fixed version to avoid reboots

Update docs for RKE2
Check and update, if applicable, relevant Rancher Manager docs

Ref: https://jira.suse.com/browse/SURE-4419

Add backlink to rancher.com to header

Per feedback from the web team, we should have a link back to https://www.rancher.com for branding/SEO purposes similar to what Rancher Manager has done. The link's label should be "Rancher Home".

Update the RKE2 cluster reset procedure to include the backup and delete of db files.

When running the reset as defined here:
https://docs.rke2.io/backup_restore#cluster-reset

You are instructed to backup and delete some db files as part of the output in the CLI then rerun the command without the flag and rejoin nodes.

Would be nice to have this step documented.

No storage requirement detailed on Docs

Hi,

Currently the Hardware requirements page only references CPU and Memory as a hardware requirement.

Recently while deploying Rancher on RKE2, We were in a resource restricted environment, the volumes that were assigned to the nodes were 10GB in size and were thick provisioned by underlying Hypervisor. While deploying Rancher using a Private Registry and the Air gapped Helm method, the install halted when the systems ran out of disc space.

Currently there is no reference to disc size for Rancher, only reference to the technologies that are recommended.

https://ranchermanager.docs.rancher.com/pages-for-subheaders/installation-requirements#disks

While we see more deployments in restricted infrastructures as well as at edge deployments, resources can often be constrained.

Can we please add some details on Minimum and Recommended sizes of volumes/discs to assist with these more resource constrained deployments.

Document required S3 bucket policy

It's currently unclear what S3 bucket policy is required for the S3 support for etcd snapshots to work. Would be good if this was documented with an example!

Troubleshooting guidance for etcd

Unsure if this is the correct area, however, it would be nice to have an official public reference for RKE2 etcdctl commands (via crictl or kubectl) like those detailed here.

As an example the RKE steps are detailed in the Rancher documentation.

This could be a wider series of troubleshooting doc updates, but etcd has probably the highest interest.

Invalid Calico URL on RKE 2 Architecture page

I reviewed the links/URLs at https://docs.rke2.io/. Most links were validated. One URL for the Calico About page was not valid (https://docs.tigera.io/calico/3.25/about/about-calico). This URL is reference twice on the RKE2 page (https://docs.rke2.io/architecture#:~:text=Canal%20(Calico%20%26%20Flannel)%2C%20Cilium%20or%20Calico)

[Epic] Reach Parity with K3s docs

The following commits to K3s documentation need to be implemented in RKE2 docs:

k3s-io/docs@2819b88
k3s-io/docs@2afd3dc
k3s-io/docs@b3f57ee
k3s-io/docs@95fbe79
k3s-io/docs@b5ddfcf#diff-af1b4835b31f4c6876a4d5cfcab62499095b476638b4ffd846677f5a4d8749b0 (section on AddOn naming)

Additionally:

Creation of diagram similar to https://docs.k3s.io/img/k3s-architecture-ha-embedded.svg for HA RKE2

Need clear documentation for upgrading hardened setup v1.24 to v1.25

Due to the drop of PSPs in v1.25, there needs to be clear documentation around how to perform an upgrade to this minor.

For rke2 installs to vmware, note that open-vm-tools must be installed to prevent cni issues

Note that when rke2, though not exclusively k3s should be affected similarly like any other userland application in linux, is installed into a system on vsphere. To prevent intermittent errors relating to CNI traffic as well as local inter process IPC, open-vm-tools must be installed to have a CNI setup that doesn't have intermittent tcp errors.

Not sure where we would want to document this guidance appreciated.

Remove experimental from the NVIDIA operator guidelines

Since our GPU support is no longer experimental, we should remove that verbiage from the docs to reflect that. Mainly at this point here.

Expand server/agent config reference to include full config.yaml spec on consolidated page

Reading through theserver configuration reference and agent configuration reference, I can see the various CLI command flags, but not a consolidated view of the config.yaml files.

This was brought up in #56 and partially fixed in #57 on individual pages, but I think that spec should be consolidated to the server configuration reference and agent configuration reference as well. In short, take the work of #57 and put directly into those pages.

This would make it easier for someone who is configuring RKE2 from a config.yaml file to know which properties can be adjusted and see the syntax (digging deeper into the settings for each one could be done on individual pages).

Update Requirements documentation for OS compatibility

When choosing what operating system to use with RKE2, you will find yourself on this page: https://docs.rke2.io/install/requirements

The matrix of operating systems is out of date and inaccurate compared to the SUSE support matrix: https://www.suse.com/suse-rke2/support-matrix/all-supported-versions/rke2-v1-29/

I think it might be better to simply link to the SUSE document instead, providing a single source of truth instead of maintaining identical lists in two places. The other option being of course to simply keep this page up to date.

Document the non-zero exit code that is displayed after RKE2 processes

Some of our customers and users have expressed interest in having the RKE2 exit behavior better documented. The exit code is non-zero, which some have argued is confusing and unclear. The ask here is for this behavior to be explained in an accessible and public area within our RKE2 documentation.

Local-storage is not included on RKE2

According to the docs https://docs.rke2.io/helm?_highlight=addons#automatically-deploying-manifests-and-helm-charts You will find AddOns for packaged components such as CoreDNS, Local-Storage, Nginx-Ingress, etc. but Local-Storage is not included.

Update RKE2 documentation about Windows

From rancher/rancher-docs#660:

@vincebrannon

Summary: https://docs.rke2.io/install/quickstart#windows-agent-worker-node-installation

Is still referring to this feature being experimental and is referencing a old version of RKE2:

Windows Support is currently Experimental as of v1.21.3+rke2r1 Windows Support requires choosing Calico as the CNI for the RKE2 cluster

We need to review this section and al least update the versions mentioned to keep it up-to-date.

Add section in Networking that dynamic IPs are not supported

Add a message in the Networking section that dynamic IPs are not supported.

Document antivirus restrictions to avoid performance impacts

Is your feature request related to a problem? Please describe.
Antivirus programs if not specified with a list of files/directories or process to ignore will check everytime after each change which can generate a lot of resources consumption if said process/files are changed very frequently.

Describe the solution you'd like
A documentation listing :

Files/Directories
process
That would harm RKE2 or containers performance if an antivirus were to scan them regularly.

Describe alternatives you've considered
Well, finding out by myself, either by listing files and assuming or pushing no exceptions and seeing if performances suffer.

Additional context
I'm using RHEL here, and this might be distribution-dependant.

Clearer upgrade documentation, variation for agent nodes

Is your feature request related to a problem? Please describe.
When following the upgrade instructions the command given to upgrade via installation script doesn't mention that you need to specify INSTALL_RKE2_TYPE=agent to prevent it from installing the latest server. This makes it easy to accidentally perform a partial upgrade on agent nodes, in addition to installing needless server components.

Describe the solution you'd like
A mention on what variation to perform to upgrade agent nodes should be added under https://github.com/rancher/rke2/blob/master/docs/upgrade/basic_upgrade.md#upgrade-rke2-using-the-installation-script

Describe alternatives you've considered
Some manner of auto detection of node type when upgrading existing installs would be nice, but I won't assume that to be easy so documentation seems best.

Additional context

Update system-upgrade-controller version in RKE2 Automated Upgrades doc

Summary

The RKE2 automated upgrades document here advises to install v.0.9.1 of the system-upgrade-controller:
https://docs.rke2.io/upgrade/automated_upgrade

The latest version is v0.13.1. Can we get that updated?

rke2 logo duplicated and not changing in firefox on main page.

Hello!
I noticed that in firefox the rke2 logo doesn't appear to conform to the color change and instead just displays both logos all the time. The version of firefox in these screenshots is 115.4.0esr and I'm able to reproduce it on 119.0.1 and 120.0. Please note in the following images both umatrix and ublock are disabled.

Hi there,
The links for the charts are mixed in the Release note for the kubernetes versions. As an example,
Version 1.26
The link https://docs.rke2.io/release-notes/v1.26.X has some hyperlinks wrong. For instance, for v1.26.10,(https://docs.rke2.io/release-notes/v1.26.X), under the ingress column, the chart redirects to COREDNS v1.10.1, same situation with rke2-coredns columns which point to the metric-servers and so on.

kubeapi server arg incorrect

In https://github.com/rancher/rke2-docs/edit/main/docs/security/pod_security_standards.md it states that:

If you want to override the default pod security standard configuration file, you can pass pod-security-admission-config-file: to the RKE2 config file.

This is incorrect, the argument per Kubernetes documentation is admission-control-config-file.

Consolidate Etcd Backup pages

Currently have 2 pages describing how to use the etcd snapshot command at https://docs.rke2.io/backup_restore and https://docs.rke2.io/reference/subcommands#etcd-snapshot. These should be consolidated into a single page.

rancher / rke2-docs Goto Github PK

rke2-docs's Issues

Summary

Potential solutions

Additional context

Summary

Recommend Projects

Recommend Topics

Recommend Org