terraform-google-modules / terraform-google-kubernetes-engine Goto Github PK

View Code? Open in Web Editor NEW

1.1K 46.0 1.1K 4.93 MB

Configures opinionated GKE clusters

Home Page: https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google

License: Apache License 2.0

Makefile 0.32% HCL 86.07% Python 1.86% Shell 2.16% Ruby 4.48% Smarty 0.03% Go 5.08%

cft-terraform compute containers

terraform-google-kubernetes-engine's People

Stargazers

Watchers

Forkers

stratospire pratikmallya dirakx1 sravankumarch jberlinsky sunggun-yu crayfishx rohit1675 trotttrotttrott danisla miles- precocity-labs freexploit egnyte jaxxstorm worldsibu devlware ketchoop ogreface nstogner inkaabi nicollette graidelak brandonjbjelland gkocur edward2a awolde imajnashn emalloy injectives yisusfyg nixsticks globallogicpractices arvindpai foobaar lantier dkozlov danieldeleo thefirstofthe300 tfmenard hangnt1001 haineko3 seanmichaeljoy marienor gabrielrossanezi tmc babypoint justinafgraham marko7460 mencywoo mauna-ai isaac88 ipv1337 deviavir gonzoflip ctrox joseret santoshkatageri neowaylabs zouhairboukarkour laymer daviddl9 dhirenshumsher alexkonkin pradeeppandeyy dmyerscough bohdanyurov-gl nephosolutions vivekthakker chrislovecnm ivankorn subatomichero google38438 ianpackard 38438-38438-org viglesiasce dev25 chuckwilliams10 paulpalamarchuk kevensen sambab4u sureshinuguru kadirtaskiran jbennett7 jkantihub richardmcsong sylvioneto bbenzikry jkeith39 atlaspilotpuppy robeferre-leroy ingwarr alphab18 julianvmodesto codeinstall aparajita111 asproul scn2016 vsheffer ghaliba3

terraform-google-kubernetes-engine's Issues

No self link, apply is failing

We're following almost exactly the readme spec and terraform plan works fine, but when we run apply we get this error: * module.gke.google_container_cluster.primary: Resource 'data.google_compute_subnetwork.gke_subnetwork' not found for variable 'data.google_compute_subnetwork.gke_subnetwork.self_link'

Is this a versioning problem on our end maybe? We've tried going through the other issues and the readme but have struggled to find the source of our problem. for the provider we have:

provider "google-beta" {
  project     = "project-name"
  region      = "region-name"
}

And in main.tf we define the module with that specified provider:

module "gke" {
  providers {
    google = "google-beta"
  }
...

Any help is much appreciated, thanks.

Documentation generation broken

As of 0f780cc make generate_docs fails with

2019/03/22 10:23:51 At 17:1: expected: IDENT | STRING got: LBRACE

Add support for "guest_accelerator" in container_node_pool.node_config

https://www.terraform.io/docs/providers/google/r/container_node_pool.html#node_config
https://www.terraform.io/docs/providers/google/r/container_cluster.html#guest_accelerator

Zones should be optional

Explicitly choosing zones should be optional for regional clusters.

Create a service account for nodes if one isn't provided.

We need a holistic solution here which permanently removes the dependency on the default service account. Including:

Adding a top-level variable of service_account which accepts three values:
a. the email of a custom Service Account,
b. default-compute (the default compute service account), or
c. create - automatically creates a service account for use

This top-level service account will be default for all node pools which don't explicitly provided.

These flags can optionally be implemented incrementally.

Autoscaling cannot be disabled

It appears that users must either specify min_node_count and max_node_count or have them default to 1 and 100; the autoscaling block is always created. Is this by design, or might we in future be able to specify a static node count and disable autoscaling?

Default GKE version failing

The default version for GKE master (1.10.6-gke.2) has been deprecated.

Currently running with the default version throws:

* google_container_cluster.primary: googleapi: Error 400: Master version "1.10.6-gke.2" is unsupported., badRequest

Suggest `network_policy` be enabled by default

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/master/autogen/variables.tf#L106

Suggest enabling it by default on newly created clusters by default. Recommended by the CIS GCP Benchmark to be enabled. See: https://www.cisecurity.org/benchmark/google_cloud_computing_platform/

Pros: Allows for support of NetworkPolicy objects if they are applied without having to modify the cluster.
Cons: The slight overhead of Calico agents and Typha in the cluster if NetworkPolicy is unused.

Make node_pool_labels, node_pool_tags, and node_pool_taints all optional

Desired flow:

Defaults are available for the entire variable.
I can override the variable for one node pool without having to set a value for the whole variable.

Enable easier shared VPC usage

Ran into this issue with creating a GKE cluster.

`1 error(s) occurred:

module.prod-gke-cluster.google_container_cluster.primary: 1 error(s) occurred:
google_container_cluster.primary: googleapi: Error 404: Not found: GAIA email lookup., notFound`

TF Debug output in attached file
debug.log

module "dev-gke-cluster" {
  source = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  name = "gke-dev"
  kubernetes_version = "latest"
  project_id = "${google_project.dev-project.project_id}"
  region = "${var.region}"
  network = "${google_compute_network.dev-network.name}"
  subnetwork = "${google_compute_subnetwork.dev-app-subnet.name}"
  network_project_id = "${google_compute_shared_vpc_host_project.shared_vpc.project}"
  ip_range_pods = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.0.range_name}"
  ip_range_services = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.1.range_name}"
  regional = true
  horizontal_pod_autoscaling = true
  network_policy = true 
  master_authorized_networks_config = [{
    cidr_blocks = [{ cidr_block = "0.0.0.0/0" display_name = "all" }]  
  }]
  node_pools = [
    {
      name = "default-node-pool"
      machine_type    = "n1-standard-2"
      min_count = 1
      max_count = 3
      disk_size_gb = 100
      disk_type = "pd-standard"
      image_type = "COS"
      auto_repair = true
      auto_upgrade = true
    },
  ]
}

Add support for cluster labels

https://www.terraform.io/docs/providers/google/r/container_cluster.html#resource_labels

Setup gcloud credentials when running interactively within docker

Pushing this out to a separate issue so that we can get #20 merged.

When using make docker_run, we should source the test config so that CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE is set in the shell environment.

diff --git i/Makefile w/Makefile
index 6e16919..73ff8bf 100644
--- i/Makefile
+++ w/Makefile
@@ -119,7 +119,7 @@ docker_run:
        docker run --rm -it \
                -v $(CURDIR):/cftk/workdir \
                ${DOCKER_IMAGE_KITCHEN_TERRAFORM}:${DOCKER_TAG_KITCHEN_TERRAFORM} \
-               /bin/bash
+               /bin/bash --rcfile ${TEST_CONFIG_FILE_LOCATION}

 .PHONY: docker_create
 docker_create: docker_build_terraform docker_build_kitchen_terraform

Alternately, we can set the environment variables within the tests, as such:

ENV['CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE'] = File.expand_path(
  File.join("../..", credentials_path),
  __FILE__)

Though this feels somewhat brittle.

Error: "node_config.0.taint" - module doesn't work with 2.0.0 google provider

Error: module.demo-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Error: module.test-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Deprecation warnings from 1.20.0 google provider became errors for 2.0.0, as expected. To fix this we might need to change the way providers are defined inside the module, right? Any quick fix for such thing?

Can not use dynamic Service Account

I am trying to use this module, based on the provided examples, but can't seem to get it to work. It used to be fine few days ago, but not anymore.

Here is the error I get:

Warning: module.gke-cluster.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.


Warning: module.gke-cluster.google_container_node_pool.pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.pools: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.zonal_pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.project.google_project.project: "app_engine": [DEPRECATED] Use the google_app_engine_application resource instead.



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.taint: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.taint: should be a list

And here is the terraform used:

  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  project_id                 = "${local.project_id}"
  name                       = "${local.gke_cluster_name}"
  network                    = "${local.network_name}"
  subnetwork                 = "${local.subnetwork_name}"
  region                     = "${var.default_region}"
  zones                      = "${var.default_zones}"
  ip_range_pods              = "${var.default_region}-gke-01-pods"
  ip_range_services          = "${var.default_region}-gke-01-services"
  http_load_balancing        = true
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  kubernetes_version         = "1.10.6-gke.6"



  node_pools = [
    {
      name            = "default-node-pool"
      machine_type    = "${var.node_pool_machine_type}"
      min_count       = 1
      max_count       = 10
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = true
      service_account = "${module.project.service_account_name}"
    },
  ]

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Double converge required for network policy to work

As discussed in #68 (comment), running kitchen converge twice is currently necessary to enable network policy/Calico.

This is an upstream issue in the API itself.

Forced recreating node_pool at any plan

Once a simple zonal cluster with a node_pool is correctly created, if I run again a terraform apply without any changes, terraform want to destroy and recreate cluster and node_pool.

This is my configuration:

module "kubernetes-cluster" {
  source  = "terraform-google-modules/kubernetes-engine/google"
  version = "0.4.0"
  project_id         = "${var.project_id}"
  name               = "internal-cluster"
  regional           = false
  region             = "${var.region}"
  zones              = ["${var.zone}"]
  network            = "${var.network_name}"
  subnetwork         = "${var.network_name}-subnet-01"
  ip_range_pods      = "${var.network_name}-pod-secondary-range"
  ip_range_services  = "${var.network_name}-services-secondary-range"
  kubernetes_version = "${var.kubernetes_version}"
  node_version       = "${var.kubernetes_version}"
  remove_default_node_pool = true

  providers = {
    google = "google-beta"
  }

  node_pools = [
    {
      name            = "forge-pool"
      machine_type    = "n1-standard-2"
      min_count       = 1
      max_count       = 3
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = false
      service_account = "gke-monitoring@${var.project_id}.iam.gserviceaccount.com"
    },
  ]

  node_pools_labels = {
    all = {}

    forge-pool = {
      scope = "forge"
    }
  }

  node_pools_taints = {
    all = []

    forge-pool = []
  }

  node_pools_tags = {
    all = []

    forge-pool = []
  }
}

As you probably note (presence of remove_default_node_pool in cluster config) I applied patch at #15 and, after that the problem is a little bit mitigated and terraform want to destroy and recreate only the node_pool. This is the output of a terraform plan

Terraform will perform the following actions:

-/+ module.kubernetes-cluster.google_container_node_pool.zonal_pools (new resource required)
      id:                                              "europe-west3-b/internal-cluster/forge-pool" => <computed> (forces new resource)
      autoscaling.#:                                   "1" => "1"
      autoscaling.0.max_node_count:                    "3" => "3"
      autoscaling.0.min_node_count:                    "1" => "1"
      cluster:                                         "internal-cluster" => "internal-cluster"
      initial_node_count:                              "1" => "1"
      instance_group_urls.#:                           "1" => <computed>
      management.#:                                    "1" => "1"
      management.0.auto_repair:                        "true" => "true"
      management.0.auto_upgrade:                       "false" => "false"
      max_pods_per_node:                               "110" => <computed>
      name:                                            "forge-pool" => "forge-pool"
      name_prefix:                                     "" => <computed>
      node_config.#:                                   "1" => "1"
      node_config.0.disk_size_gb:                      "100" => "100"
      node_config.0.disk_type:                         "pd-standard" => "pd-standard"
      node_config.0.guest_accelerator.#:               "0" => <computed>
      node_config.0.image_type:                        "COS" => "COS"
      node_config.0.labels.%:                          "3" => "3"
      node_config.0.labels.cluster_name:               "internal-cluster" => "internal-cluster"
      node_config.0.labels.node_pool:                  "forge-pool" => "forge-pool"
      node_config.0.labels.scope:                      "forge" => "forge"
      node_config.0.local_ssd_count:                   "0" => <computed>
      node_config.0.machine_type:                      "n1-standard-2" => "n1-standard-2"
      node_config.0.metadata.%:                        "1" => "0" (forces new resource)
      node_config.0.metadata.disable-legacy-endpoints: "true" => "" (forces new resource)
      node_config.0.oauth_scopes.#:                    "1" => "1"
      node_config.0.oauth_scopes.1733087937:           "https://www.googleapis.com/auth/cloud-platform" => "https://www.googleapis.com/auth/cloud-platform"
      node_config.0.preemptible:                       "false" => "false"
      node_config.0.service_account:                   "[email protected]" => "[email protected]"
      node_config.0.tags.#:                            "2" => "2"
      node_config.0.tags.0:                            "gke-internal-cluster" => "gke-internal-cluster"
      node_config.0.tags.1:                            "gke-internal-cluster-forge-pool" => "gke-internal-cluster-forge-pool"
      node_count:                                      "1" => <computed>
      project:                                         "xxx-infrastructure" => "xxx-infrastructure"
      version:                                         "1.12.5-gke.5" => "1.12.5-gke.5"
      zone:                                            "europe-west3-b" => "europe-west3-b"


Plan: 1 to add, 0 to change, 1 to destroy.

Is this could be related to this hashicorp/terraform-provider-google#2115?

Any help will be appreciated.

horizontal_pod_autoscaling should be enabled by default

You have horizontal_pod_autoscaling disabled by default. In Terraform (https://www.terraform.io/docs/providers/google/r/container_cluster.html) it's enabled by default though. Why have the opposite? It seems to me that generic Terraform modules should not change the default behavior of the original resources they are wrappers for, unless there's a good reason.

Update examples to include example of setting service account

I was setting up a cluster with this module using the examples as a reference and kept running into issues about the service account not existing (I set up the project with the project factory).

As noted in the README¹ the node pools should specify the service account. I got tripped up on this since it wasn't in the examples.

It could be helpful to add a note in the examples or in the examples' README.

Looks like issue #23 would also resolve this.

Footnotes:

pardon the blame view, needed to link to the line in the readme

Add configuration flag for `pod_security_policy_config`

https://www.terraform.io/docs/providers/google/r/container_cluster.html#enabled-2
But disabled by the default variable(s).

This would allow cluster operators to enable this feature to support policies that prevent pods from being able to mount the host filesystem, run in the host network and/or process namespace, and run as a privileged user.

See: https://www.qwiklabs.com/focuses/5158?parent=catalog for more bg info if desired.

Change node pool resources to use for_each

In the circumstance that a node pool must be replaced and workload transitioned with zero downtime, having all node pools defined in a list and launched with a single google_container_node_pool resource makes it difficult to carry out.

For example, if two node pools are defined and node pool 0 must be replaced. It seems like the only safe approach is to temporarily create a third node pool, transition the workload from node pool 0 to the new node pool, change or replace node pool 0, transition the workload back to node pool 0, and finally destroy the temporary node pool so you're back to two. Which is fine, but it's probably more work than necessary.

Another consideration is if there are many node pools defined and you need to destroy node pool 0 completely. There is no way to do this without affecting node pool 1 and any other subsequently defined node pools.

This adaptability seems to be a limitation of this module. But also perhaps the use of count in resources in general. It might be better if users had control over their node pools as independent resources. However, leveraging count and a list of node pools is likely the only way to make any GKE module flexible enough for broad adoption given the current limitations of Terraform.

I'm opening this issue to see if there is a better way. Or if we can come up with a way to improve conditions.

wait-for-cluster.sh throw error jq: command not found

I've been testing this module for a PoC, but it seems that a script in deployment process assume the presence of a binary which is not present anymore in COS.

The complete error :

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Terraform version

Terraform v0.11.13
+ provider.google v2.3.0
+ provider.google-beta v2.3.0
+ provider.kubernetes v1.6.2
+ provider.null v2.1.2
+ provider.random v2.1.2

Configuration file

provider.tf

provider "random" {
  version = "~> 2.1"
}

provider "google" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

provider "google-beta" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

main.tf

resource "random_id" "gke_id" {
  byte_length = 4
  prefix      = "gke-test-"
}

module "gke" {
  source            = "terraform-google-modules/kubernetes-engine/google"
  version           = "2.0.1"
  project_id        = "<PROJECT_ID>"
  name              = "${random_id.gke_id.hex}"
  region            = "europe-west1"
  network           = "default"
  subnetwork        = "default"
  ip_range_pods     = "p1"
  ip_range_services = "p2"
}

Expected Behavior

Terraform creates plan successfully without error.

Actual Behavior

terraform apply errors with

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Steps to Reproduce

terraform plan -out=test-plan && terraform apply test-plan

"region" is deprecated in google_container_cluster

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

You guys require the region setting in your module:

$ terraform plan
Error: module "gke": missing required argument "region"

But:

Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Private cluster timeout

Private cluster fails with a timeout when posting the config map (when network_policy is set to true:

1 error(s) occurred:

* module.gke.kubernetes_config_map.ip-masq-agent: 1 error(s) occurred:

* kubernetes_config_map.ip-masq-agent: Post https://192.168.134.2/api/v1/namespaces/kube-system/configmaps: dial tcp 192.168.134.2:443: i/o timeout

I think this might be linked to the missing feature for VPC Transitive peering that prevents an on-prem network from reaching the GKE master when it's private.

Relax kubernetes version

Discovered when verifying #20.

Running kitchen converge deploy-service-local:

     * google_container_cluster.primary: googleapi: Error 400: Master version "1.10.6-gke.2" is unsupported., badRequest

Per https://cloud.google.com/kubernetes-engine/versioning-and-upgrades#specifying_cluster_version we can use a looser version constraint. Starting with 1.10.6 should give us a bit more flexibility and will require fewer changes to this module.

Add Support for `cluster_autoscaling`

This is the node auto-provisioning feature that is supported by the cluster_autoscaling field.

Importing existing kube clusters seems to be forbidden

importing clusters seems to fail with this error:

terraform import module.prod_cluster.google_container_cluster.primary $PROJECT/$REGION/$CLUSTER

Error: Provider "kubernetes" depends on non-var "local.cluster_endpoint". Providers for import can currently
only depend on variables or must be hardcoded. You can stop import
from loading configurations by specifying `-config=""`.

This is a huge problem because it breaks the terraform import functionality

Support for Istio

This module doesn't seem to expose https://www.terraform.io/docs/providers/google/r/container_cluster.html#istio_config and it should.

Document requirements for running test-kitchen tests

We need to document the requirements/dependencies for running tests with test-kitchen.

For reference, this is my current set of fixtures:

locals {
  project_name = "thebo-gkefixture"
  project_id   = "${local.project_name}-${random_id.project-suffix.hex}"
}

resource "random_id" "project-suffix" {
  byte_length = 2
}

resource "google_project" "main" {
  name            = "${local.project_name}"
  project_id      = "${local.project_id}"
  folder_id       = "${var.folder_id}"
  billing_account = "${var.billing_account}"
}

resource "google_project_services" "main" {
  project  = "${google_project.main.project_id}"
  services = [
    "compute.googleapis.com",
    "bigquery-json.googleapis.com",
    "container.googleapis.com",
    "containerregistry.googleapis.com",
    "oslogin.googleapis.com",
    "pubsub.googleapis.com",
    "storage-api.googleapis.com",
  ]
}

module "network" {
    source = "github.com/terraform-google-modules/terraform-google-network"

    project_id      = "${google_project.main.project_id}"
    network_name    = "vpc-01"
    //shared_vpc_host = "true"

    subnets = [
        {
            subnet_name   = "us-east4-01"
            subnet_ip     = "10.20.16.0/20"
            subnet_region = "us-east4"
        },
    ]

    secondary_ranges = {
        "us-east4-01" = [
            {
                range_name    = "us-east4-01-gke-01-pod"
                ip_cidr_range = "172.18.16.0/20"
            },
            {
                range_name    = "us-east4-01-gke-01-service"
                ip_cidr_range = "172.18.32.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-pod"
                ip_cidr_range = "172.18.48.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-service"
                ip_cidr_range = "172.18.64.0/20"
            },
        ]
    }
}

Use a different extension for template files

The autogen/*.tf templates cause hcl parsing issues with terraform fmt and other tools. Maybe switch the extension to .tf.tmpl or something.

Support for private clusters

Use TravisCI to run checks

It should be really straightforward to setup a TravisCI integration so tests run on every PR and give a status check; this would greatly help external contributors (who may not be familiar with testing) to refine their PR's until tests Pass and then finally ask for review.

Note that TravisCi integration is free for Public repos on Github.com

can't provision cluster with shared vpc

terraform apply with error

1 error(s) occurred:

module.gke.google_container_cluster.primary: 1 error(s) occurred:

google_container_cluster.primary: googleapi: Error 400: The user does not have access to service account "[email protected]". Ask a project owner to grant you the iam.serviceAccountUser role on the service account., badRequest

The user what ?

sevices account [email protected] owner project

my config with share-vpc

module "gke" {
source = "./modules/tf-moduel-k8s-2.0.0"
project_id = "${var.project}"
name = "${local.cluster_type}-cluster${var.cluster_name_suffix}"
region = "${var.region}"
network = "${var.network}"
network_project_id = "${var.network_project_id}"
subnetwork = "${var.subnetwork}"
ip_range_pods = "${var.ip_range_pods}"
ip_range_services = "${var.ip_range_services}"
service_account = "${var.compute_engine_service_account}"
}

Tks

Add configuration flag for `enable_binary_authorization`

https://www.terraform.io/docs/providers/google/r/container_cluster.html#enable_binary_authorization

Suggest plumbing the flag for it with the default as false. It allows for enabling the BinAuthZ Admission controller for being able to set a whitelist policy for approved container registry paths and also enforce image signing if desired. Note that can safely be set to be true if desired as the GCP project's default BinAuthZ is allow all/permissive.

Allow cluster service account to be overridden without specifying in `var.node_pools`

When a service project is created with terraform-google-project-factory, it does not have a default compute instance, which breaks the default usage thereof. Specifying it as part of var.node_pools does not work in all circumstances, as the tags & taints do not get read in properly.

Examples link in module registry does not work

In the main README, the following text is displayed in the module registry on terraform.io: "There are multiple examples included in the examples folder but simple usage is as follows".

The link for the examples (https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google/examples/) returns a 404 if you click on it.

Suggest enabling metadata-concealment by default

Strongly suggest that the metadata-concealment proxy be enabled to protect cluster privilege escalation attacks (without this control in place, any pod has the ability to use the instance metadata API to obtain the kubelet's credentials which provides a path to gain access to all cluster secrets).

https://www.terraform.io/docs/providers/google/r/container_cluster.html#node_metadata
e.g.

workload_metadata_config {
  node_metadata = "SECURE"
}

See: https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment https://www.4armed.com/blog/kubeletmein-kubelet-hacking-tool/ and https://www.qwiklabs.com/focuses/5158?parent=catalog for more background info on why this control is so important.

Cannot update a node pool count

Tried to create a private cluster with 1 node pool with 3 nodes initially, and a min = 1, max = 3 --> this worked.
Tried to update the node pool with 30 nodes initially, and a min=1, max=300 --> This fails with the following error:

Workaround: comment out the GKE cluster module call, terraform apply, uncomment the GKE cluster module call.

can't provision a cluster with less than than 2 "zones"

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

With code as pasted in the bottom-most section of this ticket, which seems valid as per your docs and examples, I'm getting the following error at terraform plan:

Error: Error running plan: 1 error(s) occurred:

* module.gke.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: Resource 'google_container_cluster.zonal_primary' does not have attribute 'additional_zones' for variable 'google_container_cluster.zonal_primary.*.additional_zones'

Why is user forced to specify more than 1 zone? This is supposed to be a generic module after all.

variable "default-scopes" {
  type = "list"

  default = [
    "https://www.googleapis.com/auth/monitoring",
    "https://www.googleapis.com/auth/devstorage.read_only",
    "https://www.googleapis.com/auth/logging.write",
    "https://www.googleapis.com/auth/service.management.readonly",
    "https://www.googleapis.com/auth/servicecontrol",
    "https://www.googleapis.com/auth/trace.append",
  ]
}

module "gke" {
  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine?ref=master"
  ip_range_pods              = ""                 #TODO
  ip_range_services          = ""                 #TODO
  name                       = "cluster-you-name-it"
  network                    = "vpc-you-name-it"
  project_id                 = "project-you-name-it"
  region                     = "europe-west1"
  subnetwork                 = "vpc-sub-you-name-it"
  zones                      = ["europe-west1-c"]
  monitoring_service         = "monitoring.googleapis.com/kubernetes"
  logging_service            = "logging.googleapis.com/kubernetes"
  maintenance_start_time     = "04:00"
  kubernetes_version         = "1.11.3-gke.18"
  horizontal_pod_autoscaling = true
  regional                   = false

  node_pools = [
    {
      name               = "core"
      machine_type       = "n1-standard-2"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 1
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
    },
    {
      name               = "cc"
      machine_type       = "custom-6-23040"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 0
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
      preemptible        = true
      node_version       = "1.10.9-gke.7"
    },
  ]

  node_pools_labels = {
    all  = {}
    core = {}
    cc   = {}
  }

  node_pools_tags = {
    all  = []
    core = []
    cc   = []
  }

  node_pools_taints = {
    all  = []
    core = []
    cc   = []
  }
}

Terraform Google provider >= 2.4 throws error: "node_pool": conflicts with remove_default_node_pool

I'm just trying out this module for a PoC and ran into some difficulties following the v2.0.0 release. I created a cluster OK using with v1.0.1 but just upgraded and am now getting this error when running a plan.

I've stripped back my config and still see the same issue with the main example config in the
README.md

Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool

Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

I'm not setting remove_default_node_pool option but have tried explicitly setting both true and false and get the same error.

Terraform Version

Terraform v0.11.13
+ provider.google v2.5.1
+ provider.google-beta v2.5.1
+ provider.kubernetes v1.6.2
+ provider.null v2.1.1
+ provider.random v2.1.1

Affected Resource(s)

module.gke.google_container_cluster.primary

Terraform Configuration Files

provider "google" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
provider "google-beta" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
module "gke" {

  source                     = "terraform-google-modules/kubernetes-engine/google"
  project_id                 = "<PROJECT ID>"
  name                       = "gke-test-1"
  region                     = "us-central1"
  zones                      = ["us-central1-a", "us-central1-b", "us-central1-f"]
  network                    = "vpc-01"
  subnetwork                 = "us-central1-01"
  ip_range_pods              = "us-central1-01-gke-01-pods"
  ip_range_services          = "us-central1-01-gke-01-services"
  http_load_balancing        = false
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  remove_default_node_pool   = true

  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "n1-standard-2"
      min_count          = 1
      max_count          = 100
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = true
      service_account    = "project-service-account@<PROJECT ID>.iam.gserviceaccount.com"
      preemptible        = false
      initial_node_count = 80
    },
  ]

  node_pools_oauth_scopes = {
    all = []

    default-node-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Expected Behavior

Terraform creates plan successfully.

Actual Behavior

terraform plan errors with

Warning: module.gke.data.google_container_engine_versions.region: "region": [DEPRECATED] Use location instead



Warning: module.gke.data.google_container_engine_versions.zone: "zone": [DEPRECATED] Use location instead



Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] Use location instead



Warning: module.gke.google_container_node_pool.pools: "region": [DEPRECATED] use location instead



Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool



Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

Steps to Reproduce

terraform plan

docker_build_terraform make target missing

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blame/master/README.md#L251:

make docker_build_terraform

$ make docker_build_terraform
make: *** No rule to make target 'docker_build_terraform'.  Stop.`

A missing target in the Makefile or a doc error?

Support for Master Auth

They seem to not be supported in this module... will they be in the future?

Outputs break after failed create

* module.gke-dev-cluster.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: concat: unexpected type list in list of type string in:

${concat(slice(var.zones,1,length(var.zones)), list(list()))}

Remove default node pools and manage them explicitly

The default node pools cause trouble with managing the cluster. It seems like a better architecture to manage the node pools explicitly. Does it make sense to use the remove_default_node_pool parameter for the google_container_cluster and add just manage the node pools explicitly?

ref: hashicorp/terraform-provider-google#475 (comment)

stub_domains test failed

As of current master (3f7527e when writing this), with a following test/fixtures/shared/terraform.tfvars:

project_id="redacted-project-name"
credentials_path_relative="../../../credentials.json"
region="europe-west1"
zones=["europe-west1-c"]
compute_engine_service_account="[email protected]"

make docker_build_kitchen_terraform, make docker_run, kitchen create and kitchen converge passed fine.

kitchen verify passed fine for deploy_service, node_pool, shared_vpc, simple_regional and simple_zonal. It failed at stub_domains as follows:

Verifying stub_domains

Profile: stub_domain
Version: (not specified)
Target:  local://

  ×  gcloud: Google Compute Engine GKE configuration (1 failed)
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` exit_status should eq 0
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` stderr should eq ""
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster is running
     ×  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster has the expected addon settings
     
     expected: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{}}
          got: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{"disabled"=>true}}
     
     (compared using ==)
     
     Diff:
     @@ -1,5 +1,5 @@
      "horizontalPodAutoscaling" => {},
      "httpLoadBalancing" => {},
      "kubernetesDashboard" => {"disabled"=>true},
     -"networkPolicyConfig" => {},
     +"networkPolicyConfig" => {"disabled"=>true},

  ✔  kubectl: Kubernetes configuration
     ✔  kubernetes configmap kube-dns is created by Terraform
     ✔  kubernetes configmap kube-dns reflects the stub_domains configuration
     ✔  kubernetes configmap ipmasq is created by Terraform
     ✔  kubernetes configmap ipmasq is configured properly


Profile Summary: 1 successful control, 1 control failure, 0 controls skipped
Test Summary: 7 successful, 1 failure, 0 skipped
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Verify failed on instance <stub-domains-local>.  Please see .kitchen/logs/stub-domains-local.log for more details
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

I have the .kitchen/logs/kitchen.log and kitchen diagnose --all output copied, so let me know if you need that.

Support for disabling basic auth & client certificate

There isn't currently a way to disable basic authentication and the creation of a client certificate at cluster creation time. It looks like both of these can be accomplished by setting a master_auth config.

https://www.terraform.io/docs/providers/google/r/container_cluster.html#master_auth

Improve documentation of node_pools variable

Right now the available settings for node_pools are not clearly documented - we should have clear docs on each available param.

Fails with dynamic service account variable

This config fails:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

While this config works:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

Networks and Subnetworks are updated everytime

Everytime the module is executed, it try to change the network and subnetworks internal links. Effectively this causes nothing, but the internal Google Api only writes the pattern: projects/project-id/global/networks/network-name

~ module.kubernetes.google_container_cluster.primary
      network:    "projects/my-project/global/networks/my-network" => "https://www.googleapis.com/compute/v1/projects/my-project/global/networks/my-network"
      subnetwork: "projects/my-project/regions/southamerica-east1/subnetworks/my-subnet" => "https://www.googleapis.com/compute/v1/projects/my-project/regions/southamerica-east1/subnetworks/my-subnet"

I believe this commit may be the cause. This change was made by some issue? I couldn't find a related one

Support for preemptible nodes

Is it in the roadmap for support?
I think this can solve without minor issues:

node_config {
    preemptible    = "${lookup(var.node_pools[count.index], "preemptible", false)}"
}

Make testing procedure more clear

The documentation for running the test suite is a bit unclear.

terraform-google-modules / terraform-google-kubernetes-engine Goto Github PK

terraform-google-kubernetes-engine's People

Stargazers

Watchers

Forkers

terraform-google-kubernetes-engine's Issues

Terraform version

Configuration file

Expected Behavior

Actual Behavior

Steps to Reproduce

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

Recommend Projects

Recommend Topics

Recommend Org