rancher / quickstart Goto Github PK

Shell 7.51% HCL 81.33% Ruby 7.98% Makefile 1.20% Go 1.98%

quickstart's Introduction

Quickstart examples for the Rancher by SUSE product portfolio

Quickly stand up an HA-style installation of Rancher by SUSE products on your infrastructure provider of choice.

Intended for experimentation/evaluation ONLY.

You will be responsible for any and all infrastructure costs incurred by these resources. As a result, this repository minimizes costs by standing up the minimum required resources for a given provider. Use Vagrant to run Rancher locally and avoid cloud costs.

Rancher Management Server quickstart

Rancher Management Server Quickstarts are provided for:

Cloud quickstart

You will be responsible for any and all infrastructure costs incurred by these resources.

Each quickstart will install Rancher on a single-node K3s cluster, then will provision another single-node RKE2 workload cluster using a Custom cluster in Rancher. This setup provides easy access to the core Rancher functionality while establishing a foundation that can be easily expanded to a full HA Rancher server.

Local quickstart

A local quickstart is provided in the form of Vagrant configuration.

The Vagrant quickstart does not currently follow Rancher best practices for installing a Rancher management server. Use this configuration only to evaluate the features of Rancher. See cloud provider quickstarts for an HA foundation according to Rancher installation best practices.

NeuVector quickstart

NeuVector Quickstarts are provided for:

Amazon Web Services for NeuVector (aws)

You will be responsible for any and all infrastructure costs incurred by these resources.

Each quickstart will install NeuVector on a single-node RKE2 cluster. Optionally, a Rancher Management Server can be deployed as well. This setup provides easy access to the NeuVector Rancher functionality while establishing a foundation that can be easily expanded to a full HA NeuVector installation.

Requirements - Vagrant (local)

Vagrant
VirtualBox
6GB unused RAM

Using Vagrant quickstarts

See /vagrant for details on usage and settings.

Requirements - Cloud

Terraform >=1.0.0
Credentials for the cloud provider used for the quickstart

Using cloud quickstarts

To begin with any quickstart, perform the following steps:

Clone or download this repository to a local folder
Choose a cloud provider and navigate into the provider's folder
Copy or rename terraform.tfvars.example to terraform.tfvars and fill in all required variables
Run terraform init
Run terraform apply

When provisioning has finished, terraform will output the URL to connect to the Rancher server. Two sets of Kubernetes configurations will also be generated:

kube_config_server.yaml contains credentials to access the cluster supporting the Rancher server
kube_config_workload.yaml contains credentials to access the provisioned workload cluster

For more details on each cloud provider, refer to the documentation in their respective folders.

Remove

When you're finished exploring the Rancher server, use terraform to tear down all resources in the quickstart.

NOTE: Any resources not provisioned by the quickstart are not guaranteed to be destroyed when tearing down the quickstart. Make sure you tear down any resources you provisioned manually before running the destroy command.

Run terraform destroy -auto-approve to remove all resources without prompting for confirmation.

quickstart's People

Contributors

Stargazers

Watchers

Forkers

ngocngv vskiev oats87 superseb totallygreg sharops uzh-bf gridl therapybox phouverneyuff lunaticzorr dnoland1 jlaird todd-maynard mixedupbits kenfdev ferhimedamine quickserver anypm haroldsphinx vitta-health openfrontier kellygriffin xjplke joni-lover tylerd79 realdragon70 ansikko arush0 250lth amherrington13 drycriling1108 jatinsuri doriftoshoes vinhnguyen116 srinivas7143 slamkalukas thuvh chris-hill1990 stephaneerard arashkaffamanesh dwoods tecdevel taodawn mincloud1501 eumel8 orpere anirvanr bbelkacem jinkyoungjeong bmccann36 dddiaz magictank lepffm dscottwg zopz joshuakapellen mocofound nicktgr15 ivuk stefan25 exoscale tlvu uwej711 petit007 fenikkusu nikkelma ishau-o dragonstuff c0deaddict zparnold mlevitt oip-core-mirrors fgather giacomo92 ascdso2020 hanhongyuan bernddoser muram akshaya835 sharp99 estkae hungtran84 bashofmann mbookham7 techgeek03 wilsonge orquestracd belgaied2 bdeak franciscorabet bizmate ashwilliams asingh64 vmathpal chrislevi kwsorensen yankcrime leodotcloud chrisurwin

quickstart's Issues

Self assigned SSL Cert ist "not trusted"

When using it on Digital Ocean I get the info that the SSL Cert is not trusted and I can not use it in my gitlab deployment. Is there a workaround for that (not assigning the port 443 in the userdata_server etc. ) ?

Vagrant quickstart not working

Running vagrant up results in the server-01 VM being created, and started. It pulls the rancher/rancher:latest but then goes into a never ending loop with a TLS error (log snippet blow). It never progresses any further and so never creates any of the node VMs.

System Info
OS: Windows 10
VirtualBox v5.2.26
RAM: 16GB
CPU: Intel Core i7-7500U

Logs

server-01: 7c3eae4ec924: Pull complete
server-01: 821c0eae95fd: Pull complete
server-01: Digest: sha256:1943e9b7d802992d3c61184af7ae2ca5d414c15152bc40ec995e71e28cb80844
server-01: Status: Downloaded newer image for rancher/rancher:latest
server-01: 29075c0ebebb8707bc79171a3aa77956aa9be95259d3a41c8e1c7f152ae3083a
server-01: + true
server-01: + wget -T 5 -c https://localhost/ping
server-01: Connecting to localhost (127.0.0.1:443)
server-01: wget: got bad TLS record (len:0) while expecting handshake record
server-01: wget: error getting response: Connection reset by peer
server-01: + sleep 5
server-01: + true
server-01: + wget -T 5 -c https://localhost/ping
server-01: Connecting to localhost (127.0.0.1:443)
server-01: wget: got bad TLS record (len:0) while expecting handshake record
server-01: wget: error getting response: Connection reset by peer
server-01: + sleep 5
server-01: + true
server-01: + wget -T 5 -c https://localhost/ping
server-01: Connecting to localhost (127.0.0.1:443)
server-01: wget: got bad TLS record (len:0) while expecting handshake record
server-01: wget: error getting response: Connection reset by peer

Rancher server IP

There is a different IP http://169.254.169.254/ and wrong API path, I don't know correct one so no PR.

https://github.com/rancher/quickstart/blob/master/aws/files/userdata_server#L53

not start node


2018-12-28T12:45:35.153968168Z + docker start kubelet
2018-12-28T12:45:35.211766946Z Error response from daemon: {"message":"No such container: kubelet"}
2018-12-28T12:45:35.211784138Z Error: failed to start containers: kubelet2018-12-28T12:45:35.213946409Z + sleep 2
--

Update rancher2 terraform provider to 1.8.3

Current quickstarts have been tested on and locked to version 1.7.3 of the rancher2 terraform provider. Use the latest terraform provider by updating to version 1.8.3, as suggested by @rawmind0.

vagrant quickstart eats too much CPU

I just wanted to quickly jump into Rancher but my first observation is a little bit worrying: the vagrant quickstart with 3 nodes eats way too much cpu for doing nothing. On the host the virtualbox machines are idling on about 25%, rancher gui shows me values from 35% to 54% for CPU - after a fresh installation, no payload on that cluster!

On the same host an idling k3os vm will never go above 10%, mostly between 3% and 7%.

I know Virtualbox is not the target for production environments, but has this CPU burning to be expected? Looks more like a bug to me.

Thanks for your attention!

* aws_security_group.rancher_sg_allowall: 1 error(s) occurred:

Hi!

Currently getting the following when using the quickstart to deploy to AWS.

aws_security_group.rancher_sg_allowall: Error authorizing security group ingress rules: InvalidPermission.Malformed: Unsupported IP protocol "-1" - supported: [tcp, udp, icmp]
status code: 400, request id: GUID

Remove requirement to provide SSH key

Requiring users to provide an SSH key can raise the barrier to entry and add complexity when dealing with passwords or file paths. Use the TLS private key resource in terraform instead to remove this requirement.

Problems to pull images from gitlab.com with t2.medium

Hi, I have installed Rancher2 using aws and t2.medium but I am having a problem to pull images from gitlab.com. It seems to be a random problem so I think it could be a bandwidth limit for t2.medium
instances. Any idea/recommendation?

AWS Quickstart has an invalid Security group definition.

quickstart/aws/main.tf

Lines 98 to 114 in 3b4d1c6

    
           resource "aws_security_group" "rancher_sg_allowall" { 
        
             name = "${var.prefix}-allowall" 
        
             ingress { 
        
               from_port   = "0" 
        
               to_port     = "0" 
        
               protocol    = "-1" 
        
               cidr_blocks = ["0.0.0.0/0"] 
        
             } 
        
             egress { 
        
               from_port   = "0" 
        
               to_port     = "0" 
        
               protocol    = "-1" 
        
               cidr_blocks = ["0.0.0.0/0"] 
        
             } 
        
           }

A protocol_id of "-1" is not valid for security groups in EC2-Classic.
Egress are not valid for EC2-Classic security groups.

UI is not accessible https://172.22.101.101

I am not able to access the rancher UI at the URL. I can ping the ip address but I cannot connect to the UI.

Update RKE terraform provider to 1.0.0

Current quickstarts have been tested on and locked to version 0.14.1 of the rke terraform provider. Use instead the first v1 release of this provider, 1.0.0.

rke_cluster.rancher_cluste creation failing because of etcd

I am testing against Azure.... with Terraform v0.12.24
created an SP for Rancher and modified tfvars accordingly.
also installed RKE for my platform

terraform providers
.
├── provider.azurerm ~> 2.0.0
└── module.rancher_common
    ├── provider.helm ~> 1.0
    ├── provider.kubernetes 1.10.0
    ├── provider.local ~> 1.4
    ├── provider.rancher2.admin ~> 1.7
    ├── provider.rancher2.bootstrap ~> 1.7
    └── provider.rke 0.14.1

as per the module folder

> ls .\.terraform\plugins\windows_amd64\
\azure\.terraform\plugins\windows_amd64

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
------       2020-04-17   9:20 PM       51515904 terraform-provider-kubernetes_v1.10.0_x4.exe
------       2020-04-17   9:20 PM       49622528 terraform-provider-helm_v1.1.1_x4.exe
------       2020-04-17   9:21 PM      124372992 terraform-provider-azurerm_v2.0.0_x5.exe
------       2020-04-17   9:22 PM       31506432 terraform-provider-rancher2_v1.8.3_x4.exe
------       2020-04-17   9:22 PM       22242304 terraform-provider-local_v1.4.0_x4.exe
------       2020-04-17   9:42 PM            483 lock.json
------       2019-08-21   4:24 AM       48482816 terraform-provider-rke_v0.14.1.exe

terraform init gives the green light.
terraform apply starts but fails on the creation of the rks_cluster with this error:

module.rancher_common.rke_cluster.rancher_cluster: Creating...

Error:
Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

Just after creating that first server:

azurerm_virtual_network.rancher-quickstart: Creating...
azurerm_public_ip.rancher-server-pip: Creating...
azurerm_public_ip.quickstart-node-pip: Creating...
azurerm_public_ip.rancher-server-pip: Creation complete after 1s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/publicIPAddresses/rancher-server-pip]
azurerm_public_ip.quickstart-node-pip: Creation complete after 1s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/publicIPAddresses/quickstart-node-pip]
azurerm_virtual_network.rancher-quickstart: Still creating... [10s elapsed]
azurerm_virtual_network.rancher-quickstart: Creation complete after 11s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/virtualNetworks/quickstart-network]
azurerm_subnet.rancher-quickstart-internal: Creating...
azurerm_subnet.rancher-quickstart-internal: Creation complete after 1s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/virtualNetworks/quickstart-network/subnets/rancher-quickstart-internal]
azurerm_network_interface.rancher-server-interface: Creating...
azurerm_network_interface.quickstart-node-interface: Creating...
azurerm_network_interface.quickstart-node-interface: Creation complete after 1s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/networkInterfaces/quickstart-node-interface]
azurerm_network_interface.rancher-server-interface: Creation complete after 2s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Network/networkInterfaces/rancher-quickstart-interface]
azurerm_linux_virtual_machine.rancher_server: Creating...
azurerm_linux_virtual_machine.rancher_server: Still creating... [10s elapsed]
azurerm_linux_virtual_machine.rancher_server: Provisioning with 'remote-exec'...
azurerm_linux_virtual_machine.rancher_server (remote-exec): Connecting to remote host via SSH...
azurerm_linux_virtual_machine.rancher_server (remote-exec):   Host: x
azurerm_linux_virtual_machine.rancher_server (remote-exec):   User: ubuntu
azurerm_linux_virtual_machine.rancher_server (remote-exec):   Password: false
azurerm_linux_virtual_machine.rancher_server (remote-exec):   Private key: true
azurerm_linux_virtual_machine.rancher_server (remote-exec):   Certificate: false
azurerm_linux_virtual_machine.rancher_server (remote-exec):   SSH Agent: false
azurerm_linux_virtual_machine.rancher_server (remote-exec):   Checking Host Key: false
azurerm_linux_virtual_machine.rancher_server (remote-exec): Connected!
azurerm_linux_virtual_machine.rancher_server (remote-exec): status: done
azurerm_linux_virtual_machine.rancher_server: Creation complete after 2m42s [id=/subscriptions/x-104d-45a1-a02e-x/resourceGroups/quickstart-rancher-quickstart-rg/providers/Microsoft.Compute/virtualMachines/quickstart-rancher-server]
module.rancher_common.rke_cluster.rancher_cluster: Creating...

Error:
Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

============= RKE outputs ==============

[info] Tearing down Kubernetes cluster
[info] [dialer] Setup tunnel for host [x]
[warning] Failed to set up SSH tunneling for host [x]: Can't initiate NewClient: protocol not available
[warning] Removing host [x] from node lists
[info] Initiating Kubernetes cluster
[info] [dialer] Setup tunnel for host [x]
[warning] Failed to set up SSH tunneling for host [x]: Can't initiate NewClient: protocol not available
[warning] Removing host [x] from node lists
[warning] [state] can't fetch legacy cluster state from Kubernetes
[info] [certificates] Generating CA kubernetes certificates
[info] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
[info] [certificates] Generating Kube Proxy certificates
[info] [certificates] Generating Kubernetes API server proxy client certificates
[info] [certificates] Generating Kube Controller certificates
[info] [certificates] Generating Kube Scheduler certificates
[info] [certificates] Generating Node certificate
[info] [certificates] Generating admin certificates and kubeconfig
[info] [certificates] Generating Kubernetes API server certificates
[info] Successfully Deployed state file at [\azure\terraform-provider-rke-252231970/cluster.rkestate]  
[info] Building Kubernetes cluster

========================================


  on ..\rancher-common\rke.tf line 4, in resource "rke_cluster" "rancher_cluster":
   4: resource "rke_cluster" "rancher_cluster" {

This is pretty much out of the box... what is missing on that new server that is not taken care of by the creation code?

A provider named "rke" could not be found in the Terraform Registry.

Provider "rke" not available for installation.

A provider named "rke" could not be found in the Terraform Registry.

This may result from mistyping the provider name, or the given provider may
be a third-party provider that cannot be installed automatically.

In the latter case, the plugin must be installed manually by locating and
downloading a suitable distribution package and placing the plugin's executable
file in the following directory:
terraform.d/plugins/linux_amd64

Terraform detects necessary plugins by inspecting the configuration and state.
To view the provider versions requested by each module, run
"terraform providers".

Error: no provider exists with the given name

Terraform v0.12.23
+ provider.archive v1.3.0
+ provider.aws v2.51.0
+ provider.digitalocean v1.14.0

AWS quickstart not working

Hello, i am trying to run Runcher terraform quickstart.
After preparing variables, we got an error:

Unspoorted block type
on main.tf line 251, in data "template_file" "userdata_server":
251: vars {
Blocks of type "vars" are not expected here. Did you mean to define argument "vars"? If so, use the equals sign to assign it a value.

Same for line 263

Terraform version: v0.12.0

Can you investigate please?
Thank you.

Vagrant environment never completely started

Hi,

I'm testing using vagrant it get stuck in step below, never completes. Please tell me how to resolve this. Thanks.

    server-01: + true
    server-01: + wget -T 5 -c https://localhost/ping
    server-01: Connecting to localhost (127.0.0.1:443)
    server-01: wget: TLS error from peer (alert code 40): handshake failure
    server-01: wget: error getting response: Connection reset by peer
    server-01: + sleep 5
    server-01: + true
    server-01: + wget -T 5 -c https://localhost/ping
    server-01: Connecting to localhost (127.0.0.1:443)
    server-01: wget: TLS error from peer (alert code 40): handshake failure
    server-01: wget: error getting response: Connection reset by peer
    server-01: + sleep 5
    server-01: + true
    server-01: + wget -T 5 -c https://localhost/ping
    server-01: Connecting to localhost (127.0.0.1:443)
    server-01: wget: TLS error from peer (alert code 40): handshake failure
    server-01: wget: error getting response: Connection reset by peer
    server-01: + sleep 5
    server-01: + true
    server-01: + wget -T 5 -c https://localhost/ping
    server-01: Connecting to localhost (127.0.0.1:443)
    server-01: wget: TLS error from peer (alert code 40): handshake failure
    server-01: wget: error getting response: Connection reset by peer

AWS quick start cannot be easily completed as written

The AWS quick start guide indicates that one needs to have terraform-provider-rke installed as a pre-requisite.

What it does not say is that it requires v0.14.1 of that repo, and that by installing according to the instructions from that repo's README (which says to name the executable terraform-provider-rke), Terraform will think the version is v0.0.0.

Either the quick start guide or the README for terraform-provider-rke should be updated to clarify this issue.

See this Stack Overflow Q&A.

docker configuration in server-01 fails

Using the Vagrant platform behind a corporate proxy, the following error occurs during 'vagrant up':

==> server-01: Machine booted and ready!
==> server-01: Setting hostname...
==> server-01: Configuring and enabling network interfaces...
==> server-01: Configuring proxy for Docker...
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

systemctl restart docker || service docker restart || /etc/init.d/docker restart

Stdout from the command:

Stderr from the command:

bash: line 4: systemctl: command not found
bash: line 4: service: command not found
bash: line 4: /etc/init.d/docker: No such file or directory

Why not use RancherOS on DigitalOcean instead of Ubuntu 16.04?

GCP workload clusters only expose nodes' private IP

As in #91 the gcp workload cluster is also only exposing the nodes' private IP.

We need to implement a similar approach to #93.

Refactor cross-module provider organization

According to terraform module provider best practices, providers should only appear in root modules, be passed explicitly into child modules, and should specify provider requirements in a terraform block.

AWS local cluster node experiences disk pressure

In early startup stages, the local cluster's node experiences disk pressure, and may continue intermittently throughout the lifecycle of the cluster.

Install Rancher version 2.4.3

Rancher has released version 2.4.3 and has marked this as the latest stable release - change default variable values to install this version.

Vagrant quickstart runs out of memory & the API server fails

I'm following the Rancher Quickstart guides at https://github.com/rancher/quickstart & https://rancher.com/docs/rancher/v2.x/en/quick-start-guide/deployment/quickstart-vagrant/ . Following those instructions does not result in a working Rancher cluster.

It appears that the API starts but becomes unresponsive. Please see the screenshot.

Steps to reproduce:

Clone the repo:

$ git clone https://github.com/rancher/quickstart ~/Vagrant/rancher-2.0/
Cloning into '/Users/stefanl/Vagrant/rancher-2.0'...
remote: Enumerating objects: 26, done.
remote: Counting objects: 100% (26/26), done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 98 (delta 4), reused 13 (delta 2), pack-reused 72
Unpacking objects: 100% (98/98), done.

Start the cluster without any modifications:

$ cd ~/Vagrant/rancher-2.0/vagrant/
$ vagrant up
Config: {"default_password"=>"admin", "version"=>"latest", "kubernetes_version"=>"v1.11.2-rancher1-1", "ROS_version"=>"1.4.0", "server"=>{"cpus"=>1, "memory"=>1500}, "node"=>{"count"=>1, "cpus"=>1, "memory"=>1500}, "ip"=>{"master"=>"172.22.101.100", "server"=>"172.22.101.101", "node"=>"172.22.101.111"}, "linked_clones"=>true, "net"=>{"private_nic_type"=>"82545EM", "network_type"=>"private_network"}}

Bringing machine 'server-01' up with 'virtualbox' provider...
Bringing machine 'node-01' up with 'virtualbox' provider...
......
    node-01: Status: Downloaded newer image for rancher/rancher-agent:v2.1.0
    node-01: c819502471345d044db3a3ba482a9a53ce8b9e5729c664151a7d83d3d3825e35
$

Log into the server at https://172.22.101.101/login, which redirects us to https://172.22.101.101/g/clusters.
Wait about 10 minutes.
Notice the cluster named 'quickstart' is in the State of 'Unavailable' and reports the following error, even after 10 minutes:

Failed to communicate with API server: Get https://172.22.101.111:6443/api/v1/componentstatuses?timeout=30s: waiting for cluster agent to connect

See the following screenshot:

System details:

Mac Version: 10.13.6 (High Sierra)
Vagrant version: Vagrant 2.1.2
Virtualbox version: 5.2.18
Rancher v2.1.0

aws permission error

I'm trying on AWS and I got this error

Error: Error applying plan:
1 error(s) occurred:
* aws_security_group.rancher_sg_allowall: 1 error(s) occurred:
* aws_security_group.rancher_sg_allowall: Error revoking default egress rule for Security Group (sg-06351aed6c6e2d460): UnauthorizedOperation: You are not authorized to perform this operation.

I guess the policy for the user needs to be updated... Can you share the policy json for which this will work correctly?

Node status is stuck on Unknown

Trying to run the quick start vagrant version the machines are created and there is no error in scripts
accessing the Admin UI on https://172.22.101.101 works and show me the cluster
However, the node never get connected to the K8S cluster
and is stuck on status unknown

getting the following error in UI
This cluster is currently Provisioning; areas that interact directly with it will not be available until the API is ready.

[[network] Host [10.0.2.15] is not able to connect to the following ports: [nc: unrecognized option: address, BusyBox v1.27.2 (2018-06-06 09:08:44 UTC) multi-call binary., , Usage: nc [OPTIONS] HOST PORT - connect, nc [OPTIONS] -l -p PORT [HOST] [PORT] - listen, , -e PROG Run PROG after connect (must be last), -l Listen mode, for inbound connects, -lk With -e, provides persistent server, -p PORT Local port, -s ADDR Local address, -w SEC Timeout for connects and final net reads, -i SEC Delay interval for lines sent, -n Don't do DNS resolution, -u UDP mode, -v Verbose, -o FILE Hex dump traffic, -z Zero-I/O mode (scanning), --address:2379]. Please check network policies and firewall rules]

Side note: can not access the machine which the master is running on tried to copy ssh public key but with little success

Support for AWS as cloud provider

I was wondering why the resulting cluster from aws isn't marked as using Amazon as cloud provider. A few questions I had for resolution.

Is this planned
do I just need to make changes to the script and add this functionality
do I extend main.tf to use rancher2 provider and mark resulting cluster from script as using Amazon as cloud provider

Provider "rke" not available for installation

Initializing provider plugins...
- Checking for available provider plugins...

Provider "rke" not available for installation.

A provider named "rke" could not be found in the Terraform Registry.

This may result from mistyping the provider name, or the given provider may
be a third-party provider that cannot be installed automatically.

In the latter case, the plugin must be installed manually by locating and
downloading a suitable distribution package and placing the plugin's executable
file in the following directory:
    terraform.d/plugins/darwin_amd64

Terraform detects necessary plugins by inspecting the configuration and state.
To view the provider versions requested by each module, run
"terraform providers".


Error: no provider exists with the given name

Provider installed on both ~/ terraform.d/plugins/darwin_amd64 and ~/ terraform.d/plugins/darwin_amd64, named both terraform-provider-rke and terraform-provider-rke_v0.14.1

aws provisioning failing reading the variables

HI,
When I try to follow the quickstart tutorial with aws provisioning, the terraform variables and files are not read by the command"terraform init":

➜  aws git:(master) ✗ terraform init                           
There are some problems with the configuration, described below.

The Terraform configuration must be valid before initialization so that
Terraform can determine which modules and providers need to be installed.

Error: Error parsing quickstart/aws/main.tf: At 3:16: Unknown token: 3:16 IDENT var.aws_access_key

If I replace the variables in the main.tf I'm able to get stuck in the next step, that is read the template file, follow the error:

➜  aws git:(master) ✗ terraform init -var-file=terraform.tfvars
There are some problems with the configuration, described below.

The Terraform configuration must be valid before initialization so that
Terraform can determine which modules and providers need to be installed.

Error: Error parsing quickstart/aws/main.tf: At 124:20: Unknown token: 124:20 IDENT data.template_file.userdata_server.rendered

Not sure if is something related to my mac.

Vagrant VMs are unusable after restart

Restarting Vagrant VMs or using "vagrant halt" to shutdown a VM can subsequently cause the VM to become unusable. The cause of this is that the default NAT interface which is automatically provisioned by the virtualbox driver, is never brought back on. #35 fixes this.

[AWS] Configure the provider credentials with a profile

The default ~/.aws/credentials file is often configured to store credentials (access/secret keys) for AWS CLI and SDK use. With that, to simplify the aws quickstart, these can be leveraged in the provider opposed to configuring these manually in a .tfvars file.

https://www.terraform.io/docs/providers/aws/index.html#shared-credentials-file

Example:

 provider "aws" {
  region = var.aws_region
}

Source for chrisurwin/RancherOS vagrant image

Hello,

I am unable to download the chrisurwin/RancherOS vagrant image used due to corporate policy restricting access to Vagrant Cloud for security reasons.

I've searched Google / GitHub, but haven't been able to track down the source code of the base image, so that I can built it myself. E.g. The base Vagrantfile or packer files etc.

Can anyone point me to the sources?

Thanks

No HA for RancherServer for AWS quickstart terraform template

Dear Team,

We are trying to spin a production grade Rancher using the given AWS kickstart guide. But no HA for RancherServer for AWS quickstart terraform template as shown in the snippet below. But we do see that the RancherAgent supports HA configuration.

Does this template can be enhanced to support HA for Rancherserver?

resource "aws_instance" "rancherserver" {
  ami             = "${data.aws_ami.ubuntu.id}"
  instance_type   = "${var.type}"
  key_name        = "${var.ssh_key_name}"
  security_groups = ["${aws_security_group.rancher_sg_allowall.name}"]
  user_data       = "${data.template_cloudinit_config.rancherserver-cloudinit.rendered}"

  tags {
    Name = "${var.prefix}-rancherserver"
  }
}

Add terraform tests

Validation of code changes currently occurs manually. Add integration tests to enable continuous integration and further automation. These tests should, create, smoke test, and destroy all terraform-based quickstarts.

rancher2 v1.8+ is not compatible with aws quickstart

terraform apply --auto-approve
module.rancher_common.data.helm_repository.rancher_stable: Refreshing state...
module.rancher_common.data.helm_repository.rancher_latest: Refreshing state...
module.rancher_common.data.helm_repository.jetstack: Refreshing state...
data.aws_ami.ubuntu: Refreshing state...

Error: Timeout, Rancher is not ready: <nil>

  on ../rancher-common/provider.tf line 44, in provider "rancher2":
  44: provider "rancher2" {

To fix, change rancher-common/provider.tf for rancher 2, in 2 places:

provider "rancher2" {
  version = "1.7.3"

Tested on darwin
Terraform v0.12.24

Vagrant quickstart kubernetes version cannot be empty

If the vagrant quickstart Kubernetes version is left empty, you end up with errors like:

2020/01/14 03:37:17 [ERROR] getSvcOptions: error finding system image for  resource name may not be empty
2020/01/14 03:37:17 [ERROR] getK8sServiceOptions: k8sVersion  [resource name may not be empty]

This is solved by editing/saving the cluster (you don't have to change anything). Likely a change due to the KDM integration for 2.3+

Unfreeze Rancher version to install stable releases

The version of Rancher installed by the quickstart should follow the stable release channel. As suggested by @chrisurwin, instead of locking the version to a default in the variables, allow specifying a version of Rancher but use the most recent chart from rancher-stable (https://releases.rancher.com/server-charts/stable).

Azure and GCP quick start failes after creating first VM

Attempting to follow the quickstart information presented here
https://rancher.com/docs/rancher/v2.x/en/quick-start-guide/deployment/microsoft-azure-qs/

for deploying to an azure environment or GCP

running terraform apply --auto-approve proceeds to create a resource group, IPs, and a VM but process halts with the following.

azurerm_linux_virtual_machine.rancher_server: Creation complete after 1m49s [id=/subscriptions/2458118e-7ade-4a51-aa5e-c4bcaed7dc97/resourceGroups/quickstart-rancher-quickstart/providers/Microsoft.Compute/virtualMachines/quickstart-rancher-server]
module.rancher_common.rke_cluster.rancher_cluster: Creating...

Error:
Cluster must have at least one etcd plane host: please specify one or more etcd in cluster config

Was able to ssh into the VM created, no docker images or failed containers present.

The RKE Outputs were the following

============= RKE outputs ==============

[info] Tearing down Kubernetes cluster
[info] [dialer] Setup tunnel for host [40.71.88.240]
[warning] Failed to set up SSH tunneling for host [40.71.88.240]: Can't initiate NewClient: protocol not available
[warning] Removing host [40.71.88.240] from node lists
[info] Initiating Kubernetes cluster
[info] [dialer] Setup tunnel for host [40.71.88.240]
[warning] Failed to set up SSH tunneling for host [40.71.88.240]: Can't initiate NewClient: protocol not available
[warning] Removing host [40.71.88.240] from node lists
[warning] [state] can't fetch legacy cluster state from Kubernetes
[info] [certificates] Generating CA kubernetes certificates
[info] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
[info] [certificates] Generating Kubernetes API server proxy client certificates
[info] [certificates] Generating Kubernetes API server certificates
[info] [certificates] Generating Service account token key
[info] [certificates] Generating Kube Scheduler certificates
[info] [certificates] Generating Node certificate
[info] [certificates] Generating admin certificates and kubeconfig
[info] [certificates] Generating Kube Controller certificates
[info] [certificates] Generating Kube Proxy certificates
[info] Successfully Deployed state file at [C:\Projects\NTTData\Rancher\QuickStart\azure\terraform-provider-rke-223987894/cluster.rkestate]
[info] Building Kubernetes cluster

========================================

on ..\rancher-common\rke.tf line 4, in resource "rke_cluster" "rancher_cluster":
4: resource "rke_cluster" "rancher_cluster" {

vagrant behind corporate proxy

I'm unable to run rancher via vagrant behind a corporate proxy. I've installed the vagrant proxy plugin and configured that, but it seems rancher's docker isn't able to retrieve an image.

What needs to be done to configure quickstart vagrant for corporate proxy?

Unable to find image 'rancher/rancher:latest' locally
server-01: docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp 52.206.40.44:443: getsockopt: connection refused.
server-01: See 'docker run --help'.
server-01: + true
server-01: + wget -T 5 -c https://localhost/ping
server-01: Connecting to localhost (127.0.0.1:443)
server-01: wget: can't connect to remote host (127.0.0.1): Connection refused
server-01: + sleep 5
server-01: + true

terraform plan error - aws module

Hi, says cannot find the path specified.

Files - userdata_agent | userdata_server

main.tf
terraform.tfvars

Error: Timeout, Rancher is not ready: <nil>

Getting this issue when applying terraform quickstar for DO.

Any thoughts?

➜  do git:(master) ✗ terraform apply --auto-approve
digitalocean_ssh_key.quickstart_ssh_key: Refreshing state... [id=27022084]
digitalocean_droplet.rancher_server: Refreshing state... [id=187249495]
module.rancher_common.data.helm_repository.rancher_stable: Refreshing state...
module.rancher_common.data.helm_repository.rancher_latest: Refreshing state...
module.rancher_common.data.helm_repository.jetstack: Refreshing state...

Error: Timeout, Rancher is not ready: <nil>

  on ../rancher-common/provider.tf line 44, in provider "rancher2":
  44: provider "rancher2" {

Add k3s-based Rancher install

Create a k3s cluster instead of an RKE cluster for the Rancher server, and use an external database (in the form of a container, possibly) to show the config needed to use an external datastore with k3s.

worked fine with 1 node, changed to 3 nodes in config.yml, it seems to be stuck at provisioning

I changed the config.yml to have 3 nodes. vagrant up provisions 4 servers.

https://172.22.101.101/g/clusters
shows the cluster in provisioning status, but it seems to be stuck.

I vagrant ssh into one of the nodes, how do I check what is wrong?

AWS gets stuck at Docker install

The Terraform tool created everything but it never comes up. The system logs from the server ec2 instance are just this:

  130.444276] cloud-init[1445]: + export curlimage=appropriate/curl
[  130.447937] cloud-init[1445]: + curlimage=appropriate/curl
[  130.450622] cloud-init[1445]: + export jqimage=stedolan/jq
[  130.453002] cloud-init[1445]: + jqimage=stedolan/jq
[  130.456635] cloud-init[1445]: ++ command -v curl
[  130.458945] cloud-init[1445]: + '[' /usr/bin/curl ']'
[  130.464133] cloud-init[1445]: + curl -sL https://releases.rancher.com/install-docker/18.09.sh
[  130.469959] cloud-init[1445]: + sh

It stays there for a long time. I waited 15 minutes and it is still there. Will check in the morning when I get in.

Azure workload clusters only expose nodes' private IP

The custom command to register the workload nodes with the example clusters only exposes the nodes' private IP by default, limiting the possibilities for Ingress. Modify all docker commands with needed --address and --internal-address arguments to expose both public and private addresses.

Vagrant Virtualbox DNS problems

Hi,

when I'm starting up my rancher quickstart using vagrant, I get the following errors during startup:

    server-01: Error response from daemon: Get https://registry-1.docker.io/v2/appropriate/curl/manifests/latest: Get https://auth.docker.io/token?scope=repository%3Aappropriate%2Fcurl%3Apull&service=registry.docker.io: dial tcp: lookup auth.docker.io on 10.0.16.1:53: read udp 10.0.16.15:33107->10.0.16.1:53: i/o timeout

I tested it on ubuntu bionic and on macos mojave, Virtualbox 6.x, maybe it's because they have their own dns resolver running on the system which isn't bound to the virtualbox interfaces (just guessing, wasn't able to debug it yet).

I found the following line in the Vagrant file:

quickstart/vagrant/Vagrantfile

Line 20 in cf814dd

v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]

("--natdnshostresolver1", "on")
After removing that line, everything worked very well. Is there a specific reason for this line or would it be possible to remove it?

Specify disk space in AWS

Is there a way to specify the size of the disk for AWS in the terraform variables? I keep getting "Kubelet has disk pressure" error message.

Rancher - AWS, RKE gives default backend - 404 error

We installed Kubernetes with RKE in our AWS environment as per the link - https://rancher.com/docs/rancher/v2.x/en/installation/ha/

All the steps worked mostly fine and the nodes are healthy in the AWS NLB. I do not see any issue with any pods. But when we hit the NLB url --> https://nlburl.amazonaws.com it gives an error/message as "default backend - 404". Same error comes up when I type within each of the nodes when i type localhost. Version and other cmd outputs shown below.

We are expecting the NLB url should give us the rancher UI for managing the kubernetes cluster.
Thoughts or inputs on if this is expected behaviour & how to debug and fix the issue and get the rancher UI loaded?

ubuntu@xxx:/tmp$ ./rke -v
rke version v0.1.14

ubuntu@xxx:/tmp$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

ubuntu@xxx:/tmp$ kubectl --kubeconfig /tmp/kube_config_cluster.yml get ingress -n cattle-system -o wide
NAME HOSTS ADDRESS PORTS AGE
rancher rancher.mydomain.com 1.2.3.4,5.6.7.8,9.0.1.2 80, 443 19h

ubuntu@xxx:/tmp$ kubectl --kubeconfig /tmp/kube_config_cluster.yml get nodes
NAME STATUS ROLES AGE VERSION
1.2.3.4 Ready controlplane,etcd,worker 21h v1.11.5
5.6.7.8 Ready controlplane,etcd,worker 21h v1.11.5
9.0.1.2 Ready controlplane,etcd,worker 21h v1.11.5

ubuntu@xxx:/tmp$ kubectl --kubeconfig /tmp/kube_config_cluster.yml describe ingress -n cattle-system
Name: rancher
Namespace: cattle-system
Address: 1.2.3.4,5.6.7.8,9.0.1.2
Default backend: default-http-backend:80 ()
TLS:
tls-rancher-ingress terminates rancher.mydomain.com
Rules:
Host Path Backends

rancher.mydomain.com
rancher:80 ()
Annotations:
certmanager.k8s.io/issuer: rancher
field.cattle.io/publicEndpoints: [{"addresses":["1.2.3.4","5.6.7.8","9.0.1.2"],"port":443,"protocol":"HTTPS","serviceName":"cattle-system:rancher","ingressName":"cattle-system:rancher","hostname":"rancher.mydomain.com","allNodes":false}]
nginx.ingress.kubernetes.io/proxy-connect-timeout: 30
nginx.ingress.kubernetes.io/proxy-read-timeout: 1800
nginx.ingress.kubernetes.io/proxy-send-timeout: 1800
Events:

[email protected]:/tmp$ curl localhost
default backend - 404

AWS workload cluster only exposes node's private IP

The custom command to register the workload node with the example cluster only exposes the node's private IP by default, limiting the possibilities for Ingress. Modify the docker command with needed --address and --internal-address arguments to expose both public and private addresses.

Create container to run terraform quickstarts

Terraform-based quickstarts require the terraform binary and terraform-provider-rke to be installed where the quickstart will run.

As an alternative, create a container that contains all required tools for running the quickstart, only requiring configuration to be passed in via volumes or arguments. This container should include the terraform binary, provider binaries (official and community) and should require an argument to choose which infrastructure provider folder to deploy.

	resource "aws_security_group" "rancher_sg_allowall" {
	name = "${var.prefix}-allowall"

	ingress {
	from_port = "0"
	to_port = "0"
	protocol = "-1"
	cidr_blocks = ["0.0.0.0/0"]
	}

	egress {
	from_port = "0"
	to_port = "0"
	protocol = "-1"
	cidr_blocks = ["0.0.0.0/0"]
	}
	}