dwmkerr / terraform-aws-openshift Goto Github PK

Create infrastructure with Terraform and AWS, install OpenShift. Party!

Home Page: http://www.dwmkerr.com/get-up-and-running-with-openshift-on-aws

License: MIT License

HCL 66.63% Shell 24.30% Makefile 9.07%

terraform-aws-openshift's Introduction

terraform-aws-openshift

This project shows you how to set up OpenShift on AWS using Terraform. This the companion project to my article Get up and running with OpenShift on AWS.

I am also adding some 'recipes' which you can use to mix in more advanced features:

Recipe - Splunk

Index

Overview
Prerequisites
Creating the Cluster
Installing OpenShift
Accessing and Managing OpenShift
Connecting to the Docker Registry
Additional Configuration
Choosing the OpenShift Version
Destroying the Cluster
Makefile Commands
Pricing
Recipes
- Splunk
Troubleshooting
Developer Guide
- CI
- Linting
References

Overview

Terraform is used to create infrastructure as shown:

Once the infrastructure is set up an inventory of the system is dynamically created, which is used to install the OpenShift Origin platform on the hosts.

Prerequisites

You need:

Terraform (0.12 or greater) - brew update && brew install terraform
An AWS account, configured with the cli locally -

if [[ "$unamestr" == 'Linux' ]]; then
        dnf install -y awscli || yum install -y awscli
elif [[ "$unamestr" == 'FreeBSD' ]]; then
        brew install -y awscli
fi

Creating the Cluster

Create the infrastructure first:

# Make sure ssh agent is on, you'll need it later.
eval `ssh-agent -s`

# Create the infrastructure.
make infrastructure

You will be asked for a region to deploy in, use us-east-1 or your preferred region. You can configure the nuances of how the cluster is created in the main.tf file. Once created, you will see a message like:

$ make infrastructure
var.region
  Region to deploy the cluster into

  Enter a value: ap-southeast-1

...

Apply complete! Resources: 20 added, 0 changed, 0 destroyed.

That's it! The infrastructure is ready and you can install OpenShift. Leave about five minutes for everything to start up fully.

Installing OpenShift

To install OpenShift on the cluster, just run:

make openshift

You will be asked to accept the host key of the bastion server (this is so that the install script can be copied onto the cluster and run), just type yes and hit enter to continue.

It can take up to 30 minutes to deploy. If this fails with an ansible not found error, just run it again.

Once the setup is complete, just run:

make browse-openshift

To open a browser to admin console, use the following credentials to login:

Username: admin
Password: 123

Accessing and Managing OpenShift

There are a few ways to access and manage the OpenShift Cluster.

OpenShift Web Console

You can log into the OpenShift console by hitting the console webpage:

make browse-openshift

# the above is really just an alias for this!
open $(terraform output master-url)

The url will be something like https://a.b.c.d.xip.io:8443.

The Master Node

The master node has the OpenShift client installed and is authenticated as a cluster administrator. If you SSH onto the master node via the bastion, then you can use the OpenShift client and have full access to all projects:

$ make ssh-master # or if you prefer: ssh -t -A ec2-user@$(terraform output bastion-public_ip) ssh master.openshift.local
$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-d9734    1/1       Running   0          2h
registry-console-1-cm8zw   1/1       Running   0          2h
router-1-stq3d             1/1       Running   0          2h

Notice that the default project is in use and the core infrastructure components (router etc) are available.

You can also use the oadm tool to perform administrative operations:

$ oadm new-project test
Created project test

The OpenShift Client

From the OpenShift Web Console 'about' page, you can install the oc client, which gives command-line access. Once the client is installed, you can login and administer the cluster via your local machine's shell:

oc login $(terraform output master-url)

Note that you won't be able to run OpenShift administrative commands. To administer, you'll need to SSH onto the master node. Use the same credentials (admin/123) when logging through the commandline.

Connecting to the Docker Registry

The OpenShift cluster contains a Docker Registry by default. You can connect to the Docker Registry, to push and pull images directly, by following the steps below.

First, make sure you are connected to the cluster with The OpenShift Client:

oc login $(terraform output master-url)

Now check the address of the Docker Registry. Your Docker Registry url is just your master url with docker-registry-default. at the beginning:

% echo $(terraform output master-url)
https://54.85.76.73.xip.io:8443

In the example above, my registry url is https://docker-registry-default.54.85.76.73.xip.io. You can also get this url by running oc get routes -n default on the master node.

You will need to add this registry to the list of untrusted registries. The documentation for how to do this here https://docs.docker.com/registry/insecure/. On a Mac, the easiest way to do this is open the Docker Preferences, go to 'Daemon' and add the address to the list of insecure regsitries:

Finally you can log in. Your Docker Registry username is your OpenShift username (admin by default) and your password is your short-lived OpenShift login token, which you can get with oc whoami -t:

% docker login docker-registry-default.54.85.76.73.xip.io -u admin -p `oc whoami -t`
Login Succeeded

You are now logged into the registry. You can also use the registry web interface, which in the example above is at: https://registry-console-default.54.85.76.73.xip.io

Persistent Volumes

The cluster is set up with support for dynamic provisioning of AWS EBS volumes. This means that persistent volumes are supported. By default, when a user creates a PVC, an EBS volume will automatically be set up to fulfil the claim.

More details are available at:

No additional should be required for the operator to set up the cluster.

Note that dynamically provisioned EBS volumes will not be destroyed when running terrform destroy. The will have to be destroyed manuallly when bringing down the cluster.

Additional Configuration

The easiest way to configure is to change the settings in the ./inventory.template.cfg file, based on settings in the OpenShift Origin - Advanced Installation guide.

When you run make openshift, all that happens is the inventory.template.cfg is turned copied to inventory.cfg, with the correct IP addresses loaded from terraform for each node. Then the inventory is copied to the master and the setup script runs. You can see the details in the makefile.

Choosing the OpenShift Version

Currently, OKD 3.11 is installed.

To change the version, you can attempt to update the version identifier in this line of the ./install-from-bastion.sh script:

git clone -b release-3.11 https://github.com/openshift/openshift-ansible

However, this may not work if the version you change to requires a different setup. To allow people to install earlier versions, stable branches are available. Available versions are listed here.

Version	Status	Branch
3.11	Tested successfull	`release/okd-3.11`
3.10	Tested successfully	`release/okd-3.10`
3.9	Tested successfully	`release/ocp-3.9`
3.8	Untested
3.7	Untested
3.6	Tested successfully	`release/openshift-3.6`
3.5	Tested successfully	`release/openshift-3.5`

Destroying the Cluster

Bring everything down with:

terraform destroy

Resources which are dynamically provisioned by Kubernetes will not automatically be destroyed. This means that if you want to clean up the entire cluster, you must manually delete all of the EBS Volumes which have been provisioned to serve Persistent Volume Claims.

Makefile Commands

There are some commands in the makefile which make common operations a little easier:

Command	Description
`make infrastructure`	Runs the terraform commands to build the infra.
`make openshift`	Installs OpenShift on the infrastructure.
`make browse-openshift`	Opens the OpenShift console in the browser.
`make ssh-bastion`	SSH to the bastion node.
`make ssh-master`	SSH to the master node.
`make ssh-node1`	SSH to node 1.
`make ssh-node2`	SSH to node 2.
`make sample`	Creates a simple sample project.
`make lint`	Lints the terraform code.

Pricing

You'll be paying for:

1 x m4.xlarge instance
2 x t2.large instances

Recipes

Your installation can be extended with recipes.

Splunk

You can quickly add Splunk to your setup using the Splunk recipe:

To integrate with splunk, merge the recipes/splunk branch then run make splunk after creating the infrastructure and installing OpenShift:

git merge recipes/splunk
make infracture
make openshift
make splunk

There is a full guide at:

http://www.dwmkerr.com/integrating-openshift-and-splunk-for-logging/

You can quickly rip out container details from the log files with this filter:

source="/var/log/containers/counter-1-*"  | rex field=source "\/var\/log\/containers\/(?<pod>[a-zA-Z0-9-]*)_(?<namespace>[a-zA-Z0-9]*)_(?<container>[a-zA-Z0-9]*)-(?<conatinerid>[a-zA-Z0-9_]*)" | table time, host, namespace, pod, container, log

Troubleshooting

Image pull back off, Failed to pull image, unsupported schema version 2

Ugh, stupid OpenShift docker version vs registry version issue. There's a workaround. First, ssh onto the master:

$ ssh -A ec2-user@$(terraform output bastion-public_ip)

$ ssh master.openshift.local

Now elevate priviledges, enable v2 of of the registry schema and restart:

sudo su
oc set env dc/docker-registry -n default REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ACCEPTSCHEMA2=true
systemctl restart origin-master.service

You should now be able to deploy. More info here.

OpenShift Setup Issues

TASK [openshift_manage_node : Wait for Node Registration] **********************
FAILED - RETRYING: Wait for Node Registration (50 retries left).

fatal: [node2.openshift.local -> master.openshift.local]: FAILED! => {"attempts": 50, "changed": false, "failed": true, "results": {"cmd": "/bin/oc get node node2.openshift.local -o json -n default", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): nodes \"node2.openshift.local\" not found\n", "stdout": ""}, "state": "list"}
        to retry, use: --limit @/home/ec2-user/openshift-ansible/playbooks/byo/config.retry

This issue appears to be due to a bug in the kubernetes / aws cloud provider configuration, which is documented here:

#40

At this stage if the AWS generated hostnames for OpenShift nodes are specified in the inventory, then this problem should disappear. If internal DNS names are used (e.g. node1.openshift.internal) then this issue will occur.

Unable to restart service origin-master-api

Failure summary:


  1. Hosts:    ip-10-0-1-129.ec2.internal
     Play:     Configure masters
     Task:     restart master api
     Message:  Unable to restart service origin-master-api: Job for origin-master-api.service failed because the control process exited with error code. See "systemctl status origin-master-api.service" and "journalctl -xe" for details.

Developer Guide

This section is intended for those who want to update or modify the code.

CI

CircleCI 2 is used to run builds. You can run a CircleCI build locally with:

make circleci

Currently, this build will lint the code (no tests are run).

Linting

tflint is used to lint the code on the CI server. You can lint the code locally with:

make lint

References

https://www.udemy.com/openshift-enterprise-installation-and-configuration - The basic structure of the network is based on this course.
https://blog.openshift.com/openshift-container-platform-reference-architecture-implementation-guides/ - Detailed guide on high available solutions, including production grade AWS setup.
https://access.redhat.com/sites/default/files/attachments/ocp-on-gce-3.pdf - Some useful info on using the bastion for installation.
http://dustymabe.com/2016/12/07/installing-an-openshift-origin-cluster-on-fedora-25-atomic-host-part-1/ - Great guide on cluster setup.
Deploying OpenShift Container Platform 3.5 on AWS

terraform-aws-openshift's People

Contributors

Stargazers

Watchers

Forkers

xguitian xacaxulu colebrumley sufyaan-kazi ngerasimatos mtoddw greenorchid tuxpiper globallogicpractices jayunit100 ap1kenobi azipory gokulchandrap jpoley yukuansong boreal321 daniyalj rburton04 rberlind arctiqteam pinksummit tjmai slaterx jasonnic srve4 rtellamalla patricklucas rei-systems chrisbsmith crazy450 synctree mtbvang ykhadilkar donaldsimpson praveennadipi miams-reisys svarlamov zhrum maximmold rcdelacruz ronaldkonjer adithya-sn vgoldin ironicbadger alistairfay abelnieva edwardjrp stujb oeeckhoutte wagoodman victorrikhotso markjacksonfishing chainsaw-contino rayl15 opsnow-tools ryanswart vineetreynolds owen76512911 ramdhakne nhoomtham bilalyasar macie-kassh swapdisk judewin jetzlstorfer cdjg35 mrafieee collinjlesko jfe726 sathishmr hferentschik automatedit arashkaffamanesh socialskyinc kulpree b43646 pittd zoobab viniciuseduardo akniffe1 sdscgithub dirtyonekanobi octopent endajf aymenabdelwahed fossabot ntman4real smanojsrinivasan dennyjoe kevinvriens paychex peterhack drakun teralabpolska hakimo003 satishkori hayaten0415 conradf7 nadoby alkis-hexa

terraform-aws-openshift's Issues

Think of a good way to deal w/ undestroyed clusters.

I'm working on making an ephemeral infrastructure out of this - and to do that - it seems like some manual deletion needs to happen:


* module.openshift.aws_vpc.openshift: 1 error(s) occurred:

* aws_vpc.openshift: Error creating VPC: VpcLimitExceeded: The maximum number of VPCs has been reached.
        status code: 400, request id: 9eefbbe2-1609-4ae9-a059-4fecdbdf4d6e
* module.openshift.aws_iam_policy.openshift-policy-forward-logs: 1 error(s) occurred:

* aws_iam_policy.openshift-policy-forward-logs: Error creating IAM policy openshift-instance-forward-logs: EntityAlreadyExists: A policy called openshift-instance-forward-logs already exists. Duplicate names are not allowed.
        status code: 409, request id: 70c8c246-b28c-11e7-a4db-f701a4b20913
* module.openshift.aws_iam_role.openshift-instance-role: 1 error(s) occurred:

* aws_iam_role.openshift-instance-role: Error creating IAM Role openshift-instance-role: EntityAlreadyExists: Role with name openshift-instance-role already exists.
        status code: 409, request id: 70c9857f-b28c-11e7-9db0-3532c5b1a3f0
* module.openshift.aws_key_pair.keypair: 1 error(s) occurred:

* aws_key_pair.keypair: Error import KeyPair: InvalidKeyPair.Duplicate: The keypair 'openshift' already exists.
        status code: 400, request id: 41bfcbea-5147-4585-a75b-cbaa8deac27a

It seems like we can't necessarily always rely on terraform destroy, because that command expects that terraform at least ran once successfully, locally, in order to know what needs to be destroyed.
Maybe we can make completely new resource names everytime ?

Would be nice if there was a concept of AWS namespaces we could use for cleaner global deletion.

feat: create issue template

It would be useful to have an issue template allowing people to specify the openshift version, and do some sanity checks (e.g. re-running make openshift).

Support Internal DNS Hostnames for OpenShift Nodes

If we set the ec2 instance hostnames then we'll get better logs and info when ssh-ing (we're nearly there, with internal DNS setup, adding the hostname is the last part)

Use a templated sshconfig using SSH ProxyJump instead?

Hi,

I just make a simple sshconfig file from a template which uses the ProxyJump feature of SSH:

https://wiki.gentoo.org/wiki/SSH_jump_host

The hardcoded sshconfig file looks like this:

$ cat sshconfig
Host *
    StrictHostKeyChecking no
    UserKnownHostsFile=/dev/null
    LogLevel QUIET

Host bastion
    Hostname 100.24.1.3
    User ec2-user
    IdentityFile /home/centos/.ssh/id_rsa
    ForwardAgent yes

Host master
    Hostname master.openshift.local
    ProxyJump bastion
    User ec2-user

Host node1
    Hostname node1.openshift.local
    ProxyJump bastion
    User ec2-user

Host node2
    Hostname node2.openshift.local
    ProxyJump bastion
    User ec2-user

To ssh to the master, bastion, node1, node2:

$ ssh -F sshconfig master
$ ssh -F sshconfig bastion
$ ssh -F sshconfig node1
$ ssh -F sshconfig node2

To what I can figure out, the "ForwardAgent yes" seems to do the job to add automatically the key to the ssh-agent, which I found fragile right now.

The 2 items to template are the Hostname and the location of the SSH key.

What do you think?

Can I make a PR to template that dynamically and replace parts of the makefile?

No OpenShift version available, please ensure your systems are fully registered and have access to appropriate yum repositories

I changed the deployment_type=openshift-enterprise
I manually logged into the master,node1, and node2 and registered with redhat and attached a pool ID with openshift subscriptions.

Is there a manual way for me to verify I have the subscription I need? Here are the ones associated with the pool ID I used:

Red Hat OpenShift Container Platform Broker/Master Infrastructure

Red Hat OpenShift Enterprise Infrastructure
Red Hat OpenShift Container Platform
Red Hat OpenShift Enterprise Client Tools

Any other ideas on what my issue may be?

Thanks!

Note - I was able to successfully run this with deployment type origin

Incorporate Latest Terraform Interpolation Syntax

The changes below will clean up the handling of tags:

hashicorp/terraform#14516 (comment)

This will be in v0.12

Pause OpenShift Installation until instances are ready

A common source of errors is to run:

make openshift

While the EC2 instances are still in an 'initilialsing' state. If we poll until the instances report as ready, it makes the installation more seamless and automation friendly. Likely needed for #38.

Broker API

Does the broker api not get created/started during the initialization process on the master? I get a 403 when trying to call the endpoint /broker/rest/api. Is there a command to start up the broker or can the script be modified to include that service?

Where is the openshift.pem key stored?

openshift_hostname is replaced by Removed

I'm doing a clean install on AWS following the steps in the README. make infrastructure succeeds. make openshift fails with the following error:

Failure summary:


  1. Hosts:    ip-REDACTED.us-west-2.compute.internal
     Play:     Verify Requirements
     Task:     Run variable sanity checks
     Message:  last_checked_host: ip-REDACTEDus-west-2.compute.internal, last_checked_var: openshift_master_manage_htpasswd;Found removed variables: openshift_hostname is replaced by Removed: See documentation; 
make: *** [openshift] Error 2

I just cloned the master branch of this repo today:

git describe
v3.10-6-g4f3dd43

Any idea what is causing this?

okd release-3.11

How to support 3.11? Just changing version in the install-from-bastion.sh is not enough. Anything else to change?

Install to an existing VPC

Thank you for this. I am new to openshift/k8, and still pretty new to AWS. Using terraform to bootstrap this is so helpful.

I have a need to create the openshift cluster in a pre-existing vpc and subnet.
I am wondering what the best method is to bypass the vpc creation and pass the vpc and subnet ids as a variable

Public DNS is being re-written

When a valid public DNS such as xip.io is used, it is rewritten. E.g:

openshifting.com:8443

Becomes:

https://ec2-174-129-157-151.compute-1.amazonaws.com:8443

Would be good if we can use the same DNS consistently (is this just config in the template file?)

Multiple OpenShift Environments in one AWS-Account

Hi,
I try to setup an OpenShift Environment for Training purposes: cluster installation, generating project etc.

Thankfully I found the terraform-aws-openshift project :)

The idea I generated with Dave is just to copy the code in different folders and change the region as well as the cluster_name and cluster_id in the main.tf

So far so good, if you now want to make a new infrastructure ('make infrastructure') the following errors occured:

`3 error(s) occurred:

module.openshift.aws_iam_role.openshift-instance-role: 1 error(s) occurred:
aws_iam_role.openshift-instance-role: Error creating IAM Role openshift-instance-role: EntityAlreadyExists: Role with name openshift-instance-role already exists.
status code: 409, request id: d0a10ff1-56d1-11e8-8d7f-6372f8cf09fc
module.openshift.aws_iam_policy.openshift-policy-forward-logs: 1 error(s) occurred:
aws_iam_policy.openshift-policy-forward-logs: Error creating IAM policy openshift-instance-forward-logs: EntityAlreadyExists: A policy called openshift-instance-forward-logs already exists. Duplicate names are not allowed.
status code: 409, request id: d09e5161-56d1-11e8-963f-6d117c496f53
module.openshift.aws_iam_user.openshift-aws-user: 1 error(s) occurred:
aws_iam_user.openshift-aws-user: Error creating IAM User openshift-aws-user: EntityAlreadyExists: User with name openshift-aws-user already exists.
status code: 409, request id: d09e50c9-56d1-11e8-8d7f-6372f8cf09fc`

I bolded the interesting passages which says that the IAM role, policy and user already exist.

Does anyone has an idea or an efficient way how to deal with that?
Is it possible to reuse those IAM roles, policies and users?
Or should I rename them in the modules?

Thanks a lot and best regards!

Failed to pull image "<ip.xip.io/project/name:version": rpc error: x509: certificate signed by unknown authority

While create an app through oc create specifying yaml file which having images path is getting fail and getting below exception , how to resolve this.

Also docker gets stuck during restart. nothing comes up , whats the perfect way.

Failed to pull image "docker-registry-default.ip.xip.io/project/name": rpc error: code = Unknown desc = Get https:/docker-registry-default.ip.xip.io/project/name : x509: certificate signed by unknown authority

replacing self-signed cert to letsencrypt

Hi, have you tried using letsencrypt? If yes, pls help what's the best way to set it up on master node.

Add README for deleting instance profile .

When re-creating, you need to do this:

 aws iam delete-instance-profile --instance-profile-name openshift-instance-profile

Because I think the instance-profiles otherwise cause a terraform error.

Maybe add it to the shell script or something?

Note: otherwise this is a hard error to fix, bc instance-profiles dont show up in the AWS console :)

feat: OKD 3.12

AWS / Kubernetes: Internal DNS is not supported

Note

This may be fixed with the latest version (3.9 at the time of writing) but needs to be tested.

Details

When we use the AWS Cloud Provider (which is required for Persistent Volumes (see #33)), we lose the ability to name our nodes, e.g:

[masters]
master.openshift.local openshift_hostname=master.openshift.local

# host group for etcd
[etcd]
master.openshift.local openshift_hostname=master.openshift.local

# host group for nodes, includes region info
[nodes]
master.openshift.local openshift_hostname=master.openshift.local openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
node1.openshift.local openshift_hostname=node1.openshift.local openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
node2.openshift.local openshift_hostname=node2.openshift.local openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

Becomes:

[masters]
ip-10-0-1-31.ec2.internal openshift_hostname=ip-10-0-1-31.ec2.internal

# host group for etcd
[etcd]
ip-10-0-1-31.ec2.internal openshift_hostname=ip-10-0-1-31.ec2.internal

# host group for nodes, includes region info
[nodes]
ip-10-0-1-31.ec2.internal openshift_hostname=ip-10-0-1-31.ec2.internal openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
ip-10-0-1-91.ec2.internal openshift_hostname=ip-10-0-1-91.ec2.internal openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
ip-10-0-1-91.ec2.internal openshift_hostname=ip-10-0-1-91.ec2.internal openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

This does not cause any functional problems, but is frustrating for users as it makes it hard to identify nodes.

The root cause seems to be:

kubernetes/kubernetes#11543

The following issue is also related:

openshift/openshift-ansible#5692

Openshift (Docker) Registry

Hi Dave,

it's me Rheza, from Jenius.
Will be good if we have docker registry in this setup. We can specify it easily with S3 as storage in ansible inventory.

Consider Azure Support

A few colleagues have mentioned it would be useful, worth considering.

var.region is requested when running terraform destroy

Looks like sometimes, if create fails, destroy winds up prompting you ... about where the cluster should be "deployed into".

[root@shared-dev terraform-aws-openshift]# terraform destroy
var.region
  Region to deploy the cluster into

  Enter a value:

This is fixable by adding the region= thingy to terraform.tfvars.

Not sure if there is another way to serialize this file out earlier in the apply process ? Or if there is something specific about our terraform setup that is causing it to ask that question (using Terraform v0.10.7).

Apologies if this is more a terraform then terraform-aws-openshift issue :)

Move Splunk into Master

The splunk code is becoming stale in a PR. It should be moved to master and just toggled on/off in code.

feat: OKD 3.13

Add Support for NVMe EC2 Instances (m5, c5)

As OpenShift 3.9 is now fully compatible with ec2 m5 instances - those can be used in the configuration, after this proposed setup-master.sh and setup-node-sh update:

Replace

cat <<EOF > /etc/sysconfig/docker-storage-setup
DEVS=/dev/xvdf
VG=docker-vg
EOF

with

DEVICE_NAME=xvdf
lsblk | grep -q "xvdf"
if [ $? -ne 0 ]; then
   DEVICE_NAME=nvme1n1
fi   
cat <<EOF > /etc/sysconfig/docker-storage-setup
DEVS=/dev/$DEVICE_NAME
VG=docker-vg
EOF

Incorrect Security Group Specification forces EC2 instance recreation

Using security_groups instead of vpc_security_group_ids means that we need to recreate all EC2 instances unnecessarily when reapplying the infrastructure. See #27 for details:

#27 (comment)

Use Elastic IPs for the various instances

If Elastic IPs are used for each instance, the whole cluster can be stopped and restarted without any issues. This allows to create the cluster once and only start it when needed.

Atm, a restart of the cluster leads new IPs getting assigned to the instances, making it hard to ssh into the machine and even worse breaking the public route of the master.

Since aws_eip does not have a public DNS attribute (see hashicorp/terraform-provider-aws#1149) and since the DNS attribute of the instance does not get updated once the Elastic IP is assigned, one needs for now to remove the use of public DNS names of the nodes. This does not have any impact on the functionality of the cluster though.

Stopping and starting the EC2 instances (master and nodes)

Hi,

This is more a question than an issue.

Can I shutdown/start the EC2 instances without any bad surprise? What are the consequences I should be aware of? Of course, I could try on my own but I've already got some projects on my OpenShift installation.

I am asking because of following concern: OpenShift installation done by this project generates monthly costs of around 500$. There are periods of time where we're NOT using the installation and, to save some money, we would like to stop the EC2 instances.

Is that recommendable? Many thanks and cheers,

christian

Support usage as a Terraform Module

Basically I'd like to use this as part of the infrastructure I manage with Terraform. The idiomatic way of doing this is to use it as a module e.g.

module "openshift" {
  source = "github.com/dwmkerr/terraform-aws-openshift//modules/openshift"
}

However there are a few problems with this currently:

The ansible provisioning stuff is run through a Makefile, not immediately usable when this repo is used as a module. Ideally for use as a terraform module it would be managed via Terraform either by userdata bootstrapping or local execution through a TF provision block.
Admin creds, cluster size, node tags, and many other aspects are hard coded and not configurable without editing the TF code or ansible config file directly. This necessitates maintaining a separate fork for each environment I want to use this TF setup for.

I'm interested in doing some work to support usage of this repository as a module but would like to know if this is worthwhile in the maintainer's view and start a general discussion. The necessary changes would be fairly large and change a lot of the fundamentals of how this setup is implemented.

Support easily customisable DNS

It would be nice to support integrating a domain name, e.g. if I've bought something like test.com, where can I update the code to make sure test.com is used? See also #6.

Would like to get this working with OpenShift Origin 3.7

When I try to use release-3.7 or release-3.7.0day by specifying them in the git clone command inside install-from-bastion.sh, I end up getting error at the end of the Ansible run:

TASK [template_service_broker : Reconcile with RBAC file] **********************
fatal: [master.openshift.local]: FAILED! => {"changed": true, "cmd": "oc process -f "/tmp/tsb-ansible-keZijh/rbac-template.yaml" | oc auth reconcile -f -", "delta": "0:00:00.285904", "end": "2017-11-29 12:45:42.125009", "failed": true, "rc": 1, "start": "2017-11-29 12:45:41.839105", "stderr": "Error: unknown shorthand flag: 'f' in -f\n\n\nUsage:\n oc auth [options]\n\nAvailable Commands:\n can-i Check whether an action is allowed\n\nUse "oc --help" for more information about a given command.\nUse "oc options" for a list of global command-line options (applies to all commands).", "stderr_lines": ["Error: unknown shorthand flag: 'f' in -f", "", "", "Usage:", " oc auth [options]", "", "Available Commands:", " can-i Check whether an action is allowed", "", "Use "oc --help" for more information about a given command.", "Use "oc options" for a list of global command-line options (applies to all commands)."], "stdout": "", "stdout_lines": []}
to retry, use: --limit @/home/ec2-user/openshift-ansible/playbooks/byo/config.retry

This seems related to openshift/openshift-ansible#6086.

I tried fixing the commit to 56b529e (which someone on that ticket said fixed the problem) by running git checkout 56b529e after the git clone command, but I got the same error.

Can anyone suggest a workaround to get this working with OpenShift Origin 3.7? The problem is not with Terraform itself, but with the openshift-ansible code.

Make default instance size smaller to reduce costs

The default instance sizing is production grade, but expensive for those who just want to try out OpenShift. Suggest making the default size smaller, with clear instructions on how to make it larger for prod

Can this support version 3.7?

Use the requirements.txt instead of hardcoding the version of ansible?

In the following section of install-from-bastion.sh, you specify by hand which version of ansible to use, which is different by version of openshift-ansible:

# Get the OpenShift 3.10 installer.
pip install -I ansible==2.6.5
git clone -b release-3.10 https://github.com/openshift/openshift-ansible

# Get the OpenShift 3.9 installer.
# pip install -I ansible==2.4.3.0
# git clone -b release-3.9 https://github.com/openshift/openshift-ansible

# Get the OpenShift 3.7 installer.
# pip install -Iv ansible==2.4.1.0
# git clone -b release-3.7 https://github.com/openshift/openshift-ansible

# Get the OpenShift 3.6 installer.
# pip install -Iv ansible==2.3.0.0
# git clone -b release-3.6 https://github.com/openshift/openshift-ansible

Would not it be better to use at least the requirements.txt shipped with openshift-ansible? And even further, in a venv.

deploy with openshift release-3.6 not found playbooks/deploy_cluster.yml

in openshift git repo:
https://github.com/openshift/openshift-ansible

in branche 3.6 it hase no playbooks/deploy_cluster.yml

and install_from_bastion.sh

# Run the playbook.
ANSIBLE_HOST_KEY_CHECKING=False /usr/local/bin/ansible-playbook -i ./inventory.cfg ./openshift-ansible/playbooks/prerequisites.yml
ANSIBLE_HOST_KEY_CHECKING=False /usr/local/bin/ansible-playbook -i ./inventory.cfg ./openshift-ansible/playbooks/deploy_cluster.yml

bring me the error.

i saw in readme v3.6 is tested

'make ssh-[master|node1|node2]' does no longer work

Hi,
For some reason I currently do not understand, I'm no longer able to directly SSH access the master resp. the nodes. make ssh-master does no longer work (it used to). ssh master.openshift.local from the bastion does not work neither. However my OpenShift installation works perfectly.
I see that the EC2 instances are using the openshift key pair name. However I do not own this key (has this key been stored somewhere on the bastion?). How could I now access the master resp. the nodes? Is there a documented or non-documented to do it?
Thanks and cheers,
christian

Creating Loadbalancer fails because of multiple tagged security groups

Hi,

When I'm creating a service as type of LoadBalancer I'm getting following error. It gives the same error for both worker node. I couldn't figure out what is multiple tagged security groups. Anyone having same issue or fix for it?

Error creating load balancer (will retry): failed to ensure load balancer for service database/primeapps: Multiple tagged security groups found for instance i-0f5333888818a6f50; ensure only the k8s security group is tagged

Solution should be similar to this: coreos/tectonic-installer#243

Support for OKD 3.10, first run fails, second run works

Some minor changes needs to be done to provide support for OKD 3.10, the main changes in inventory.template.cfg are:

openshift_release=v3.10

# Changed for OpenShift 3.10 (filename not needed)
# https://bugzilla.redhat.com/show_bug.cgi?id=1565447

# openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

# Define node groups
openshift_node_groups=[{'name': 'node-config-master', 'labels': ['node-role.kubernetes.io/master=true']}, {'name': 'node-config-infra', 'labels': ['node-role.kubernetes.io/infra=true']}, {'name': 'node-config-compute', 'labels': ['node-role.kubernetes.io/compute=true']}]

# host group for nodes, includes region info
[nodes]
${master_hostname} openshift_hostname=${master_hostname} openshift_node_group_name='node-config-master' openshift_schedulable=true
${node1_hostname} openshift_hostname=${node1_hostname} openshift_node_group_name='node-config-compute'
${node2_hostname} openshift_hostname=${node2_hostname} openshift_node_group_name='node-config-compute'

and in install-from-bastion.sh set the branch to release-3.10:

git clone -b release-3.10 https://github.com/openshift/openshift-ansible

But after the first run the following failure summary is shown, but the second run succeeds:

TASK [openshift_storage_glusterfs : load kernel modules] ***********************
fatal: [ip-10-0-1-154.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
fatal: [ip-10-0-1-29.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
fatal: [ip-10-0-1-123.eu-central-1.compute.internal]: FAILED! => {"changed": false, "msg": "Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}

RUNNING HANDLER [openshift_node : reload systemd units] ************************
	to retry, use: --limit @/home/ec2-user/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP *********************************************************************
ip-10-0-1-123.eu-central-1.compute.internal : ok=103  changed=51   unreachable=0    failed=1
ip-10-0-1-154.eu-central-1.compute.internal : ok=128  changed=51   unreachable=0    failed=1
ip-10-0-1-29.eu-central-1.compute.internal : ok=103  changed=51   unreachable=0    failed=1
localhost                  : ok=12   changed=0    unreachable=0    failed=0


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:00:17)
Health Check                : Complete (0:00:38)
Node Bootstrap Preparation  : In Progress (0:02:18)
	This phase can be restarted by running: playbooks/openshift-node/bootstrap.yml

Failure summary:

  1. Hosts:    ip-10-0-1-123.eu-central-1.compute.internal, ip-10-0-1-154.eu-central-1.compute.internal, ip-10-0-1-29.eu-central-1.compute.internal
     Play:     Configure nodes
     Task:     load kernel modules
     Message:  Unable to restart service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code. See "systemctl status systemd-modules-load.service" and "journalctl -xe" for details.

make: *** [openshift] Error 2

Could one confirm this behaviour on his / her side?

It seems this issue #40 was already reported:

feat: support CentOs 7.5

See the chain on #70

Support Persistent Volumes

Two options:

a. AWS EFS. Easy, but only available in a few regions
b. AWS EBS. Harder, single AZ only, but global.

Probably go for the first one to start with

Could not open to access the console in the browser

Hi Dave,
Able to install openshift and could verify that pods are running but could not access the
console as the installation did not promt the console URL at the end.

further trying this, following error comes -
[[email protected] terraform-aws-openshift]$ make browse-openshift
open $(terraform output master-url)
Couldn't get a file descriptor referring to the console
[[email protected] terraform-aws-openshift]$

Could you please let us know how to access the console in the browser?

Add htpasswd_auth Authentication

We should create an admin user by default, see:

https://docs.openshift.com/enterprise/3.0/install_config/install/advanced_install.html

module.openshift.aws_iam_instance_profile.openshift-instance-profile: "roles": [DEPRECATED] Use `role` instead. Only a single role can be passed to an IAM Instance Profile

Hi Dave,
Can you please check the below error.

Error : Error applying plan:

1 error(s) occurred:

module.openshift.aws_iam_instance_profile.openshift-instance-profile: 1 error(s) occurred:
aws_iam_instance_profile.openshift-instance-profile: Error creating IAM instance profile openshift-instance-profile: EntityAlreadyExists: Instance Profile openshift-instance-profile already exists.
status code: 409, request id: 36715f10-5cf2-11e7-b146-e900a1ed359e

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Add a router

It Looks like no routers are being configured in these installations ?

Secrets are not working for each node

It seems there might be an issue with the way the nodes are created. When setting up a new cluster with the scripts, the second node runs successfully with pull secrets, but the first doesn't...

Unable to log in to the Web console with default password

Hi,

I am unable to log in to the OKD web console using the username: admin and the password: 123.

Add CONTRIBUTORS.md

Would be a good to add an auto-generated CONTRIBUTOR list.

Update to openshift 1.5

Openshift 1.5 is now avaliable.

Would it be possible to add AWS ALB and make it route to LoadBalancer services?

I was able to deploy some pods and services, but found that I had to expose my service (type LoadBalancer) with an OpenShift Route in order to access it from the internet even though it was assigned a public IP. When running the same pods and services on GKE and ACS, I did not have to create a route. I believe that the provisioning of the k8s clusters in those managed k8s services (which I did with Terraform) probably creates some sort of load balancer.

I was wondering if your Terraform code could be extended to add an AWS Application Load Balancer (ALB) and associated listeners, rules, and target groups and then configure them to route to public IPs of k8s LoadBalancer services created in the OpenShift cluster. Provisioning them could be done with Terraform's aws_alb resource (https://www.terraform.io/docs/providers/aws/d/lb.html), but I'm not sure how one would make the ALB actually talk to the services.

The first run of the install script fails with ansible-playbook: command not found

Run the install script, the first time you'll see:

 Running setup.py install for PyYAML
    Running command /usr/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-nxCpVT/PyYAML/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dMThSr-record/install-record.txt --single-version-externally-managed --compile
  Running setup.py install for pycrypto
    Running command /usr/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-nxCpVT/pycrypto/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-j7w1_Q-record/install-record.txt --single-version-externally-managed --compile
  Running setup.py install for ansible
    Running command /usr/bin/python2.7 -c "import setuptools, tokenize;__file__='/tmp/pip-build-nxCpVT/ansible/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-gItayt-record/install-record.txt --single-version-externally-managed --compile
Successfully installed MarkupSafe-0.23 PyYAML-3.12 ansible-2.2.0.0 appdirs-1.4.0 cffi-1.9.1 cryptography-1.7.2 enum34-1.1.6 idna-2.2 ipaddress-1.0.18 jinja2-2.9.5 packaging-16.8 paramiko-2.1.1 pyasn1-0.1.9 pycparser-2.17 pycrypto-2.6.1 pyparsing-2.1.10 setuptools-34.1.0 six-1.10.0
Cleaning up...
Cloning into 'openshift-ansible'...
bash: line 10: ansible-playbook: command not found

Run it again - no problems. Annoying.