rancher / rke2-docs Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Hey there,
I've just seen, that the URL to the RKE2 CNCF card in the Introduction ("It is a fully conformant Kubernetes distribution ...") is incorrect (Error 404):
https://github.com/rancher/rke2-docs/blob/main/docs/introduction.md?plain=1#L11
You may want to use e.g. https://landscape.cncf.io/?view-mode=card&item=platform--certified-kubernetes-distribution--rke-government as an replacement.
Just noticed today while using the site that the favicon isn't loading into the page despite being present in the assets repository.
Many of the settings in server_config.md have no documented default value:
Would be amazing if this could be improved! I would be happy to help, but I don't know where to look to identify the default values.
We need to document the steps to manually pre-generate CA certs + keys, or update the expiration on existing certs to extend the expiration. Ideally we would also output the CA hash in a format compatible with the --token
arg ("K10" + SHA256 sum of cert bytes).
We don't currently have any documentation on how to do this for either k3s or rke2, but there are some good starting points at:
One important note is that the hash we are using is a SH256 sum of the server's certificate, NOT its public key. This means that any changes to the cert, including extending its expiration, will change the hash and require passing a new --token
to joining clients if they are to trust it.
gz#13930
gz#15095
gz#16112
Since Kubernetes version 1.25 there are 3 new charts for the snapshot-controller. Our docs are not updated reflecting just canal,coredns,ingres and metric-server and it would be nice to include these charts also.
A significant number of the different configuration options are described in their respective documents only in terms of command-line argument passed to the rke2
binary, significantly burying the lede on being able to configure the server using the same keys and values in config.yaml
.
Given that the quickstart install directions lead you to a configuration where changing the server's command line arguments is rather not straightforward, I'd wager that the config.yaml
method is probably far more useful to most people following those directions, and information on how to do this needs to be a lot more prominent than it currently is, buried between several other less-relevant sections.
If nothing else, the section describing the equivalence between command-line arguments and config.yaml
keys should be made a lot more prominent and discoverable. But I'd even argue that the config.yaml
syntax should probably be the primary way of describing the configuration options themselves, with the command-line syntax only secondary.
For instance, take this paragraph:
To enable Multus, pass
multus
as the first value to the--cni
flag, followed by the name of the plugin you want to use alongside Multus (ornone
if you will provide your own default plugin). Note that multus must always be in the first position of the list. For example, to use Multus withcanal
as the default plugin you could specify--cni=multus,canal
or--cni=multus --cni=canal
.
IMO this would probably be more useful if paragraphs like that were rewritten something along the lines of:
To enable Multus, add
multus
as the first list entry in thecni
config key, followed by the name of the plugin you want to use alongside Multus (ornone
if you will provide your own default plugin). Note that multus must always be in the first position of the list. For example, to use Multus withcanal
as the default plugin you could specify:# /etc/rancher/rke2/config.yaml --- cni: - multus - canalThis can also be specified with command-line arguments, i.e.
--cni=multus,canal
or--cni=multus --cni=canal
.
That paragraph above is actually specifically the one I just spent two days stumped on before finally figuring it out.
Just to describe my own experience here:
I had RKE2 set up and installed from the quickstart guide. I knew that I needed Multus for my use-case, and was able to find the Using Multus section, and therefore "knew" that I somehow needed to get --cni=multus,canal
passed as a command-line argument... and then spent the next two days going down dead ends, reinstalling RKE2 half a dozen times, even just reading the RKE2 source code (e.g. to see if there was some undocumented option that would get the https://get.rke2.io installer script to append --cni=multus,canal
to the ExecStart
line in the rke2-server.service
systemd unit file it created). It wasn't until near the end of day 2 combing through the docs and reading every bit on every page that I finally found what I needed with the config.yaml
file.
Something to consider for the "Known Issues", or the "Hardening Guide", as discovered in rancher/rke2#4313, if running with the cis-1.23 profile, and in a STIG'd RHEL environment, you must set the umask to 022 before running any rke2
binary commands (rke2 server
, rke2 server --cluster-reset
, etc). It is also suggested to set the umask to 022 in the systemd unit file.
https://docs.rke2.io/install/requirements
#174 was the original request
It looks like we got []
and ()
mixed up
Created #197 to fix
Starting with October releases, we will be in full compliance with the cis-1.7 benchmark. In order to ease future compliance with upcoming cis profiles, we also introduced a generic cis
profile flag, so future users will not have to deal with deprecating cis-1.23
-> cis-1.7
flag changes. CIS profile means CIS for whatever benchmark is relevant for that minor release.
All of this needs to communicated in the docs.
Also:
CIS Benchmark | Applicable RKE2 Minors | Profile Flag |
---|---|---|
1.5 | 1.15-1.18 | cis-1.5 |
1.6 | 1.19-1.22 | cis-1.6 |
1.23 | 1.23 | cis-1.23 |
1.24 | 1.24 | cis-1.23 |
1.7 | 1.25-1.28 | cis-1.23 , cis |
1.8 | 1.29 | cis |
The doc page here covers using systemctl stop rke2
.
Should a specific service name be expected? For example systemctl stop rke2-server
, such as in the link here.
RKE2 docs suggest disabling firewalld (location: here). While this is valid advice, I have come across many users who think RKE2 in general is incompatible with firewalld because of this section alone.
RKE2 is intended to be a more security focused k8s distro, as such I think given Canal is the default CNI some guidance should be provided on mitigating risks either by pointing users to a CNI that is compatible with firewalld (cilium with eBPF?), or with some custom iptable rules as a starting point (not my preferred solution).
If my understanding is correct, cilium with eBPF enabled on RKE2 should be compatible with firewalld. If my understanding is correct that could be listed as a mitigation.
I also know while not supported, iptable rules can be set manually and should work, again I know its not supported but some guidance is needed. This way if Canal is still the preferred CNI of a user they can be informed that manual rules can work (at their own risk).
In rancher/rke2#3405 "Update fips_support.md" a change was made in regard to the FIPS support, in effect telling that the original FIPS certification is marked as historical and adding the following clarifying statement:
However due to changes introduced by SP 800-56A Rev3, this validation is now historical. A re-validation effort is currently underway to return this module to active FIPS 140-2 status.
This was not copied over in #7 "Sync with rke2/docs" to this repo.
It is not clear - at least when looking at rancher/rke2#3405 - if rke2 is now FIPS 140-2 compliant or not.
According to https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3836 - its status is marked as historical.
Is there a re-validation effort ongoing? And if it is, is there some rough timeline available?
In the linux uninstall doc page, there are multiple Markdown formatting issues.
Client feedback from Slack:
Please update the documentation on HelmChart resource usage and available variables in RKE2. https://docs.rke2.io/helm
E.g. spec.repoCA is not listed in fields table but it is listed in https://docs.k3s.io/helm
There are many other variables in helm-controller repository https://github.com/k3s-io/helm-controller/blob/345c53c9b2b6711d8ba3b4495aef6f810abd52fb/pkg/apis/helm.cattle.io/v1/types.go#L38. Please include them in documentation.
We need to update the RKE2 docs have parity with all relevant https://docs.k3s.io/installation/packaged-components, information.
When installing a single server, we should allow to NOT pass cluster-init
and not have any datastore endpoint so that we use kine with sqlite.
I ran into these issues myself while trying to follow the guide.
config.yaml
file needs to be created before installing RKE2. Technically you can install RKE2 but you shouldn't launch it until config.yaml
has been setup./etc/sysctl.d
.config.yaml
steps and use kubectl
which implies that the server should be running.I think a better ordering for the steps would be:
config.yaml
Users with non-standard mount configurations (read-only or btrfs specific mounts) may have their uninstall and killall scripts under /opt/rke2
or another location if overriding INSTALL_RKE2_TAR_PREFIX
. Additional information should be included to point the user in the right direction for finding their scripts.
Per feedback from the web team, we should have a link back to https://www.rancher.com for branding/SEO purposes similar to what Rancher Manager has done. The link's label should be "Rancher Home".
When running the reset as defined here:
https://docs.rke2.io/backup_restore#cluster-reset
You are instructed to backup and delete some db files as part of the output in the CLI then rerun the command without the flag and rejoin nodes.
Would be nice to have this step documented.
Hi,
Currently the Hardware requirements page only references CPU and Memory as a hardware requirement.
Recently while deploying Rancher on RKE2, We were in a resource restricted environment, the volumes that were assigned to the nodes were 10GB in size and were thick provisioned by underlying Hypervisor. While deploying Rancher using a Private Registry and the Air gapped Helm method, the install halted when the systems ran out of disc space.
Currently there is no reference to disc size for Rancher, only reference to the technologies that are recommended.
https://ranchermanager.docs.rancher.com/pages-for-subheaders/installation-requirements#disks
While we see more deployments in restricted infrastructures as well as at edge deployments, resources can often be constrained.
Can we please add some details on Minimum and Recommended sizes of volumes/discs to assist with these more resource constrained deployments.
It's currently unclear what S3 bucket policy is required for the S3 support for etcd snapshots to work. Would be good if this was documented with an example!
Unsure if this is the correct area, however, it would be nice to have an official public reference for RKE2 etcdctl commands (via crictl or kubectl) like those detailed here.
As an example the RKE steps are detailed in the Rancher documentation.
This could be a wider series of troubleshooting doc updates, but etcd has probably the highest interest.
I reviewed the links/URLs at https://docs.rke2.io/. Most links were validated. One URL for the Calico About page was not valid (https://docs.tigera.io/calico/3.25/about/about-calico). This URL is reference twice on the RKE2 page (https://docs.rke2.io/architecture#:~:text=Canal%20(Calico%20%26%20Flannel)%2C%20Cilium%20or%20Calico)
The following commits to K3s documentation need to be implemented in RKE2 docs:
Additionally:
Due to the drop of PSPs in v1.25, there needs to be clear documentation around how to perform an upgrade to this minor.
Note that when rke2, though not exclusively k3s should be affected similarly like any other userland application in linux, is installed into a system on vsphere. To prevent intermittent errors relating to CNI traffic as well as local inter process IPC, open-vm-tools must be installed to have a CNI setup that doesn't have intermittent tcp errors.
Not sure where we would want to document this guidance appreciated.
Since our GPU support is no longer experimental, we should remove that verbiage from the docs to reflect that. Mainly at this point here.
Reading through theserver configuration reference and agent configuration reference, I can see the various CLI command flags, but not a consolidated view of the config.yaml files.
This was brought up in #56 and partially fixed in #57 on individual pages, but I think that spec should be consolidated to the server configuration reference and agent configuration reference as well. In short, take the work of #57 and put directly into those pages.
This would make it easier for someone who is configuring RKE2 from a config.yaml file to know which properties can be adjusted and see the syntax (digging deeper into the settings for each one could be done on individual pages).
When choosing what operating system to use with RKE2, you will find yourself on this page: https://docs.rke2.io/install/requirements
The matrix of operating systems is out of date and inaccurate compared to the SUSE support matrix: https://www.suse.com/suse-rke2/support-matrix/all-supported-versions/rke2-v1-29/
I think it might be better to simply link to the SUSE document instead, providing a single source of truth instead of maintaining identical lists in two places. The other option being of course to simply keep this page up to date.
Some of our customers and users have expressed interest in having the RKE2 exit behavior better documented. The exit code is non-zero, which some have argued is confusing and unclear. The ask here is for this behavior to be explained in an accessible and public area within our RKE2 documentation.
According to the docs https://docs.rke2.io/helm?_highlight=addons#automatically-deploying-manifests-and-helm-charts You will find AddOns for packaged components such as CoreDNS, Local-Storage, Nginx-Ingress, etc.
but Local-Storage is not included.
From rancher/rancher-docs#660:
Summary: https://docs.rke2.io/install/quickstart#windows-agent-worker-node-installation
Is still referring to this feature being experimental and is referencing a old version of RKE2:
Windows Support is currently Experimental as of v1.21.3+rke2r1 Windows Support requires choosing Calico as the CNI for the RKE2 cluster
We need to review this section and al least update the versions mentioned to keep it up-to-date.
Add a message in the Networking section that dynamic IPs are not supported.
Is your feature request related to a problem? Please describe.
Antivirus programs if not specified with a list of files/directories or process to ignore will check everytime after each change which can generate a lot of resources consumption if said process/files are changed very frequently.
Describe the solution you'd like
A documentation listing :
Describe alternatives you've considered
Well, finding out by myself, either by listing files and assuming or pushing no exceptions and seeing if performances suffer.
Additional context
I'm using RHEL here, and this might be distribution-dependant.
Is your feature request related to a problem? Please describe.
When following the upgrade instructions the command given to upgrade via installation script doesn't mention that you need to specify INSTALL_RKE2_TYPE=agent to prevent it from installing the latest server. This makes it easy to accidentally perform a partial upgrade on agent nodes, in addition to installing needless server components.
Describe the solution you'd like
A mention on what variation to perform to upgrade agent nodes should be added under https://github.com/rancher/rke2/blob/master/docs/upgrade/basic_upgrade.md#upgrade-rke2-using-the-installation-script
Describe alternatives you've considered
Some manner of auto detection of node type when upgrading existing installs would be nice, but I won't assume that to be easy so documentation seems best.
The RKE2 automated upgrades document here advises to install v.0.9.1 of the system-upgrade-controller:
https://docs.rke2.io/upgrade/automated_upgrade
The latest version is v0.13.1. Can we get that updated?
Hello!
I noticed that in firefox the rke2 logo doesn't appear to conform to the color change and instead just displays both logos all the time. The version of firefox in these screenshots is 115.4.0esr and I'm able to reproduce it on 119.0.1 and 120.0. Please note in the following images both umatrix and ublock are disabled.
RKE2 should have no space in between RKE and 2. See sub-heading Why two names?. Verify all pages are consistent with product name
https://docs.rke2.io/
K3s Docs cover the ability to disable specific system components - https://docs.k3s.io/installation/disable-flags
This functionality is available in RKE2 and provides parity to RKE1 which offers role separation on nodes - so should be documented
Hi there,
The links for the charts are mixed in the Release note for the kubernetes versions. As an example,
Version 1.26
The link https://docs.rke2.io/release-notes/v1.26.X has some hyperlinks wrong. For instance, for v1.26.10,(https://docs.rke2.io/release-notes/v1.26.X), under the ingress column, the chart redirects to COREDNS v1.10.1, same situation with rke2-coredns columns which point to the metric-servers and so on.
In https://github.com/rancher/rke2-docs/edit/main/docs/security/pod_security_standards.md it states that:
If you want to override the default pod security standard configuration file, you can pass
pod-security-admission-config-file:
to the RKE2 config file.
This is incorrect, the argument per Kubernetes documentation is admission-control-config-file
.
Currently have 2 pages describing how to use the etcd snapshot command at https://docs.rke2.io/backup_restore and https://docs.rke2.io/reference/subcommands#etcd-snapshot. These should be consolidated into a single page.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.