Coder Social home page Coder Social logo

terraform-google-modules / terraform-google-kubernetes-engine Goto Github PK

View Code? Open in Web Editor NEW
1.1K 46.0 1.1K 4.93 MB

Configures opinionated GKE clusters

Home Page: https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google

License: Apache License 2.0

Makefile 0.32% HCL 86.07% Python 1.86% Shell 2.16% Ruby 4.48% Smarty 0.03% Go 5.08%
cft-terraform compute containers

terraform-google-kubernetes-engine's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-google-kubernetes-engine's Issues

No self link, apply is failing

We're following almost exactly the readme spec and terraform plan works fine, but when we run apply we get this error: * module.gke.google_container_cluster.primary: Resource 'data.google_compute_subnetwork.gke_subnetwork' not found for variable 'data.google_compute_subnetwork.gke_subnetwork.self_link'

Is this a versioning problem on our end maybe? We've tried going through the other issues and the readme but have struggled to find the source of our problem. for the provider we have:

provider "google-beta" {
  project     = "project-name"
  region      = "region-name"
}

And in main.tf we define the module with that specified provider:

module "gke" {
  providers {
    google = "google-beta"
  }
...

Any help is much appreciated, thanks.

Create a service account for nodes if one isn't provided.

We need a holistic solution here which permanently removes the dependency on the default service account. Including:

  1. Adding a top-level variable of service_account which accepts three values:
    a. the email of a custom Service Account,
    b. default-compute (the default compute service account), or
    c. create - automatically creates a service account for use

This top-level service account will be default for all node pools which don't explicitly provided.

These flags can optionally be implemented incrementally.

Autoscaling cannot be disabled

It appears that users must either specify min_node_count and max_node_count or have them default to 1 and 100; the autoscaling block is always created. Is this by design, or might we in future be able to specify a static node count and disable autoscaling?

Default GKE version failing

The default version for GKE master (1.10.6-gke.2) has been deprecated.

Currently running with the default version throws:

* google_container_cluster.primary: googleapi: Error 400: Master version "1.10.6-gke.2" is unsupported., badRequest

Suggest `network_policy` be enabled by default

https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/master/autogen/variables.tf#L106

Suggest enabling it by default on newly created clusters by default. Recommended by the CIS GCP Benchmark to be enabled. See: https://www.cisecurity.org/benchmark/google_cloud_computing_platform/

Pros: Allows for support of NetworkPolicy objects if they are applied without having to modify the cluster.
Cons: The slight overhead of Calico agents and Typha in the cluster if NetworkPolicy is unused.

Enable easier shared VPC usage

Ran into this issue with creating a GKE cluster.

`1 error(s) occurred:

  • module.prod-gke-cluster.google_container_cluster.primary: 1 error(s) occurred:
  • google_container_cluster.primary: googleapi: Error 404: Not found: GAIA email lookup., notFound`

TF Debug output in attached file
debug.log

module "dev-gke-cluster" {
  source = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  name = "gke-dev"
  kubernetes_version = "latest"
  project_id = "${google_project.dev-project.project_id}"
  region = "${var.region}"
  network = "${google_compute_network.dev-network.name}"
  subnetwork = "${google_compute_subnetwork.dev-app-subnet.name}"
  network_project_id = "${google_compute_shared_vpc_host_project.shared_vpc.project}"
  ip_range_pods = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.0.range_name}"
  ip_range_services = "${google_compute_subnetwork.dev-app-subnet.secondary_ip_range.1.range_name}"
  regional = true
  horizontal_pod_autoscaling = true
  network_policy = true 
  master_authorized_networks_config = [{
    cidr_blocks = [{ cidr_block = "0.0.0.0/0" display_name = "all" }]  
  }]
  node_pools = [
    {
      name = "default-node-pool"
      machine_type    = "n1-standard-2"
      min_count = 1
      max_count = 3
      disk_size_gb = 100
      disk_type = "pd-standard"
      image_type = "COS"
      auto_repair = true
      auto_upgrade = true
    },
  ]
}

Setup gcloud credentials when running interactively within docker

Pushing this out to a separate issue so that we can get #20 merged.

When using make docker_run, we should source the test config so that CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE is set in the shell environment.

diff --git i/Makefile w/Makefile
index 6e16919..73ff8bf 100644
--- i/Makefile
+++ w/Makefile
@@ -119,7 +119,7 @@ docker_run:
        docker run --rm -it \
                -v $(CURDIR):/cftk/workdir \
                ${DOCKER_IMAGE_KITCHEN_TERRAFORM}:${DOCKER_TAG_KITCHEN_TERRAFORM} \
-               /bin/bash
+               /bin/bash --rcfile ${TEST_CONFIG_FILE_LOCATION}

 .PHONY: docker_create
 docker_create: docker_build_terraform docker_build_kitchen_terraform

Alternately, we can set the environment variables within the tests, as such:

ENV['CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE'] = File.expand_path(
  File.join("../..", credentials_path),
  __FILE__)

Though this feels somewhat brittle.

Error: "node_config.0.taint" - module doesn't work with 2.0.0 google provider

Error: module.demo-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Error: module.test-euw1.google_container_node_pool.zonal_pools: "node_config.0.taint": [REMOVED] This field is in beta. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Deprecation warnings from 1.20.0 google provider became errors for 2.0.0, as expected. To fix this we might need to change the way providers are defined inside the module, right? Any quick fix for such thing?

Can not use dynamic Service Account

I am trying to use this module, based on the provided examples, but can't seem to get it to work. It used to be fine few days ago, but not anymore.

Here is the error I get:

Warning: module.gke-cluster.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.


Warning: module.gke-cluster.google_container_node_pool.pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.pools: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.gke-cluster.google_container_node_pool.zonal_pools: "node_config.0.taint": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.



Warning: module.project.google_project.project: "app_engine": [DEPRECATED] Use the google_app_engine_application resource instead.



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.pools: node_config.0.taint: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.tags: should be a list



Error: module.gke-cluster.google_container_node_pool.zonal_pools: node_config.0.taint: should be a list

And here is the terraform used:

  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine"
  project_id                 = "${local.project_id}"
  name                       = "${local.gke_cluster_name}"
  network                    = "${local.network_name}"
  subnetwork                 = "${local.subnetwork_name}"
  region                     = "${var.default_region}"
  zones                      = "${var.default_zones}"
  ip_range_pods              = "${var.default_region}-gke-01-pods"
  ip_range_services          = "${var.default_region}-gke-01-services"
  http_load_balancing        = true
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  kubernetes_version         = "1.10.6-gke.6"



  node_pools = [
    {
      name            = "default-node-pool"
      machine_type    = "${var.node_pool_machine_type}"
      min_count       = 1
      max_count       = 10
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = true
      service_account = "${module.project.service_account_name}"
    },
  ]

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Forced recreating node_pool at any plan

Once a simple zonal cluster with a node_pool is correctly created, if I run again a terraform apply without any changes, terraform want to destroy and recreate cluster and node_pool.

This is my configuration:

module "kubernetes-cluster" {
  source  = "terraform-google-modules/kubernetes-engine/google"
  version = "0.4.0"
  project_id         = "${var.project_id}"
  name               = "internal-cluster"
  regional           = false
  region             = "${var.region}"
  zones              = ["${var.zone}"]
  network            = "${var.network_name}"
  subnetwork         = "${var.network_name}-subnet-01"
  ip_range_pods      = "${var.network_name}-pod-secondary-range"
  ip_range_services  = "${var.network_name}-services-secondary-range"
  kubernetes_version = "${var.kubernetes_version}"
  node_version       = "${var.kubernetes_version}"
  remove_default_node_pool = true

  providers = {
    google = "google-beta"
  }

  node_pools = [
    {
      name            = "forge-pool"
      machine_type    = "n1-standard-2"
      min_count       = 1
      max_count       = 3
      disk_size_gb    = 100
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = true
      auto_upgrade    = false
      service_account = "gke-monitoring@${var.project_id}.iam.gserviceaccount.com"
    },
  ]

  node_pools_labels = {
    all = {}

    forge-pool = {
      scope = "forge"
    }
  }

  node_pools_taints = {
    all = []

    forge-pool = []
  }

  node_pools_tags = {
    all = []

    forge-pool = []
  }
}

As you probably note (presence of remove_default_node_pool in cluster config) I applied patch at #15 and, after that the problem is a little bit mitigated and terraform want to destroy and recreate only the node_pool. This is the output of a terraform plan

Terraform will perform the following actions:

-/+ module.kubernetes-cluster.google_container_node_pool.zonal_pools (new resource required)
      id:                                              "europe-west3-b/internal-cluster/forge-pool" => <computed> (forces new resource)
      autoscaling.#:                                   "1" => "1"
      autoscaling.0.max_node_count:                    "3" => "3"
      autoscaling.0.min_node_count:                    "1" => "1"
      cluster:                                         "internal-cluster" => "internal-cluster"
      initial_node_count:                              "1" => "1"
      instance_group_urls.#:                           "1" => <computed>
      management.#:                                    "1" => "1"
      management.0.auto_repair:                        "true" => "true"
      management.0.auto_upgrade:                       "false" => "false"
      max_pods_per_node:                               "110" => <computed>
      name:                                            "forge-pool" => "forge-pool"
      name_prefix:                                     "" => <computed>
      node_config.#:                                   "1" => "1"
      node_config.0.disk_size_gb:                      "100" => "100"
      node_config.0.disk_type:                         "pd-standard" => "pd-standard"
      node_config.0.guest_accelerator.#:               "0" => <computed>
      node_config.0.image_type:                        "COS" => "COS"
      node_config.0.labels.%:                          "3" => "3"
      node_config.0.labels.cluster_name:               "internal-cluster" => "internal-cluster"
      node_config.0.labels.node_pool:                  "forge-pool" => "forge-pool"
      node_config.0.labels.scope:                      "forge" => "forge"
      node_config.0.local_ssd_count:                   "0" => <computed>
      node_config.0.machine_type:                      "n1-standard-2" => "n1-standard-2"
      node_config.0.metadata.%:                        "1" => "0" (forces new resource)
      node_config.0.metadata.disable-legacy-endpoints: "true" => "" (forces new resource)
      node_config.0.oauth_scopes.#:                    "1" => "1"
      node_config.0.oauth_scopes.1733087937:           "https://www.googleapis.com/auth/cloud-platform" => "https://www.googleapis.com/auth/cloud-platform"
      node_config.0.preemptible:                       "false" => "false"
      node_config.0.service_account:                   "[email protected]" => "[email protected]"
      node_config.0.tags.#:                            "2" => "2"
      node_config.0.tags.0:                            "gke-internal-cluster" => "gke-internal-cluster"
      node_config.0.tags.1:                            "gke-internal-cluster-forge-pool" => "gke-internal-cluster-forge-pool"
      node_count:                                      "1" => <computed>
      project:                                         "xxx-infrastructure" => "xxx-infrastructure"
      version:                                         "1.12.5-gke.5" => "1.12.5-gke.5"
      zone:                                            "europe-west3-b" => "europe-west3-b"


Plan: 1 to add, 0 to change, 1 to destroy.

Is this could be related to this hashicorp/terraform-provider-google#2115?

Any help will be appreciated.

Update examples to include example of setting service account

I was setting up a cluster with this module using the examples as a reference and kept running into issues about the service account not existing (I set up the project with the project factory).

As noted in the README1 the node pools should specify the service account. I got tripped up on this since it wasn't in the examples.

It could be helpful to add a note in the examples or in the examples' README.

Looks like issue #23 would also resolve this.

Footnotes:

  1. pardon the blame view, needed to link to the line in the readme

Change node pool resources to use for_each

In the circumstance that a node pool must be replaced and workload transitioned with zero downtime, having all node pools defined in a list and launched with a single google_container_node_pool resource makes it difficult to carry out.

For example, if two node pools are defined and node pool 0 must be replaced. It seems like the only safe approach is to temporarily create a third node pool, transition the workload from node pool 0 to the new node pool, change or replace node pool 0, transition the workload back to node pool 0, and finally destroy the temporary node pool so you're back to two. Which is fine, but it's probably more work than necessary.

Another consideration is if there are many node pools defined and you need to destroy node pool 0 completely. There is no way to do this without affecting node pool 1 and any other subsequently defined node pools.

This adaptability seems to be a limitation of this module. But also perhaps the use of count in resources in general. It might be better if users had control over their node pools as independent resources. However, leveraging count and a list of node pools is likely the only way to make any GKE module flexible enough for broad adoption given the current limitations of Terraform.

I'm opening this issue to see if there is a better way. Or if we can come up with a way to improve conditions.

wait-for-cluster.sh throw error jq: command not found

I've been testing this module for a PoC, but it seems that a script in deployment process assume the presence of a binary which is not present anymore in COS.

The complete error :

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Terraform version

Terraform v0.11.13
+ provider.google v2.3.0
+ provider.google-beta v2.3.0
+ provider.kubernetes v1.6.2
+ provider.null v2.1.2
+ provider.random v2.1.2

Configuration file

provider.tf

provider "random" {
  version = "~> 2.1"
}

provider "google" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

provider "google-beta" {
  version     = "~> 2.3"
  credentials = "<CREDENTIAL_PATH>"
  region      = "europe-west1"
}

main.tf

resource "random_id" "gke_id" {
  byte_length = 4
  prefix      = "gke-test-"
}

module "gke" {
  source            = "terraform-google-modules/kubernetes-engine/google"
  version           = "2.0.1"
  project_id        = "<PROJECT_ID>"
  name              = "${random_id.gke_id.hex}"
  region            = "europe-west1"
  network           = "default"
  subnetwork        = "default"
  ip_range_pods     = "p1"
  ip_range_services = "p2"
}

Expected Behavior

Terraform creates plan successfully without error.

Actual Behavior

terraform apply errors with

module.gke.module.cluster.null_resource.wait_for_regional_cluster: Error running command '/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh test-module-240713 gke-test-0f2bc90b': exit status 127.
Output: Waiting for cluster gke-test-0f2bc90b in project test-module-240713 to reconcile...
/home/ponky/perso/test-gke-module/.terraform/modules/25f7ceaf47579277166eb8da7c1678a8/terraform-google-modules-terraform-google-kubernetes-engine-79542bc/scripts/wait-for-cluster.sh: line 25: jq: command not found

Steps to Reproduce

terraform plan -out=test-plan && terraform apply test-plan

"region" is deprecated in google_container_cluster

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

You guys require the region setting in your module:

$ terraform plan
Error: module "gke": missing required argument "region"

But:

Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] This field is in beta and will be removed from this provider. Use it in the the google-beta provider instead. See https://terraform.io/docs/providers/google/provider_versions.html for more details.

Private cluster timeout

Private cluster fails with a timeout when posting the config map (when network_policy is set to true:

1 error(s) occurred:

* module.gke.kubernetes_config_map.ip-masq-agent: 1 error(s) occurred:

* kubernetes_config_map.ip-masq-agent: Post https://192.168.134.2/api/v1/namespaces/kube-system/configmaps: dial tcp 192.168.134.2:443: i/o timeout

I think this might be linked to the missing feature for VPC Transitive peering that prevents an on-prem network from reaching the GKE master when it's private.

Importing existing kube clusters seems to be forbidden

importing clusters seems to fail with this error:

terraform import module.prod_cluster.google_container_cluster.primary $PROJECT/$REGION/$CLUSTER

Error: Provider "kubernetes" depends on non-var "local.cluster_endpoint". Providers for import can currently
only depend on variables or must be hardcoded. You can stop import
from loading configurations by specifying `-config=""`.

This is a huge problem because it breaks the terraform import functionality

Document requirements for running test-kitchen tests

We need to document the requirements/dependencies for running tests with test-kitchen.

For reference, this is my current set of fixtures:

locals {
  project_name = "thebo-gkefixture"
  project_id   = "${local.project_name}-${random_id.project-suffix.hex}"
}

resource "random_id" "project-suffix" {
  byte_length = 2
}

resource "google_project" "main" {
  name            = "${local.project_name}"
  project_id      = "${local.project_id}"
  folder_id       = "${var.folder_id}"
  billing_account = "${var.billing_account}"
}

resource "google_project_services" "main" {
  project  = "${google_project.main.project_id}"
  services = [
    "compute.googleapis.com",
    "bigquery-json.googleapis.com",
    "container.googleapis.com",
    "containerregistry.googleapis.com",
    "oslogin.googleapis.com",
    "pubsub.googleapis.com",
    "storage-api.googleapis.com",
  ]
}

module "network" {
    source = "github.com/terraform-google-modules/terraform-google-network"

    project_id      = "${google_project.main.project_id}"
    network_name    = "vpc-01"
    //shared_vpc_host = "true"

    subnets = [
        {
            subnet_name   = "us-east4-01"
            subnet_ip     = "10.20.16.0/20"
            subnet_region = "us-east4"
        },
    ]

    secondary_ranges = {
        "us-east4-01" = [
            {
                range_name    = "us-east4-01-gke-01-pod"
                ip_cidr_range = "172.18.16.0/20"
            },
            {
                range_name    = "us-east4-01-gke-01-service"
                ip_cidr_range = "172.18.32.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-pod"
                ip_cidr_range = "172.18.48.0/20"
            },
            {
                range_name    = "us-east4-01-gke-02-service"
                ip_cidr_range = "172.18.64.0/20"
            },
        ]
    }
}

Use TravisCI to run checks

It should be really straightforward to setup a TravisCI integration so tests run on every PR and give a status check; this would greatly help external contributors (who may not be familiar with testing) to refine their PR's until tests Pass and then finally ask for review.

Note that TravisCi integration is free for Public repos on Github.com

can't provision cluster with shared vpc

terraform apply with error

1 error(s) occurred:

  • module.gke.google_container_cluster.primary: 1 error(s) occurred:

  • google_container_cluster.primary: googleapi: Error 400: The user does not have access to service account "[email protected]". Ask a project owner to grant you the iam.serviceAccountUser role on the service account., badRequest

The user what ?

sevices account [email protected] owner project

my config with share-vpc

module "gke" {
source = "./modules/tf-moduel-k8s-2.0.0"
project_id = "${var.project}"
name = "${local.cluster_type}-cluster${var.cluster_name_suffix}"
region = "${var.region}"
network = "${var.network}"
network_project_id = "${var.network_project_id}"
subnetwork = "${var.subnetwork}"
ip_range_pods = "${var.ip_range_pods}"
ip_range_services = "${var.ip_range_services}"
service_account = "${var.compute_engine_service_account}"
}

Tks

Add configuration flag for `enable_binary_authorization`

https://www.terraform.io/docs/providers/google/r/container_cluster.html#enable_binary_authorization

Suggest plumbing the flag for it with the default as false. It allows for enabling the BinAuthZ Admission controller for being able to set a whitelist policy for approved container registry paths and also enforce image signing if desired. Note that can safely be set to be true if desired as the GCP project's default BinAuthZ is allow all/permissive.

Suggest enabling metadata-concealment by default

Strongly suggest that the metadata-concealment proxy be enabled to protect cluster privilege escalation attacks (without this control in place, any pod has the ability to use the instance metadata API to obtain the kubelet's credentials which provides a path to gain access to all cluster secrets).

https://www.terraform.io/docs/providers/google/r/container_cluster.html#node_metadata
e.g.

workload_metadata_config {
  node_metadata = "SECURE"
}

See: https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment https://www.4armed.com/blog/kubeletmein-kubelet-hacking-tool/ and https://www.qwiklabs.com/focuses/5158?parent=catalog for more background info on why this control is so important.

Cannot update a node pool count

Tried to create a private cluster with 1 node pool with 3 nodes initially, and a min = 1, max = 3 --> this worked.
Tried to update the node pool with 30 nodes initially, and a min=1, max=300 --> This fails with the following error:
Screen Shot 2019-03-26 at 2 54 52 PM

Workaround: comment out the GKE cluster module call, terraform apply, uncomment the GKE cluster module call.

can't provision a cluster with less than than 2 "zones"

$ terraform --version
Terraform v0.11.11
+ provider.google v1.19.1

With code as pasted in the bottom-most section of this ticket, which seems valid as per your docs and examples, I'm getting the following error at terraform plan:

Error: Error running plan: 1 error(s) occurred:

* module.gke.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: Resource 'google_container_cluster.zonal_primary' does not have attribute 'additional_zones' for variable 'google_container_cluster.zonal_primary.*.additional_zones'

Why is user forced to specify more than 1 zone? This is supposed to be a generic module after all.

variable "default-scopes" {
  type = "list"

  default = [
    "https://www.googleapis.com/auth/monitoring",
    "https://www.googleapis.com/auth/devstorage.read_only",
    "https://www.googleapis.com/auth/logging.write",
    "https://www.googleapis.com/auth/service.management.readonly",
    "https://www.googleapis.com/auth/servicecontrol",
    "https://www.googleapis.com/auth/trace.append",
  ]
}

module "gke" {
  source                     = "github.com/terraform-google-modules/terraform-google-kubernetes-engine?ref=master"
  ip_range_pods              = ""                 #TODO
  ip_range_services          = ""                 #TODO
  name                       = "cluster-you-name-it"
  network                    = "vpc-you-name-it"
  project_id                 = "project-you-name-it"
  region                     = "europe-west1"
  subnetwork                 = "vpc-sub-you-name-it"
  zones                      = ["europe-west1-c"]
  monitoring_service         = "monitoring.googleapis.com/kubernetes"
  logging_service            = "logging.googleapis.com/kubernetes"
  maintenance_start_time     = "04:00"
  kubernetes_version         = "1.11.3-gke.18"
  horizontal_pod_autoscaling = true
  regional                   = false

  node_pools = [
    {
      name               = "core"
      machine_type       = "n1-standard-2"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 1
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
    },
    {
      name               = "cc"
      machine_type       = "custom-6-23040"
      oauth_scopes       = "${var.default-scopes}"
      min_count          = 0
      max_count          = 20
      auto_repair        = true
      auto_upgrade       = false
      initial_node_count = 20
      preemptible        = true
      node_version       = "1.10.9-gke.7"
    },
  ]

  node_pools_labels = {
    all  = {}
    core = {}
    cc   = {}
  }

  node_pools_tags = {
    all  = []
    core = []
    cc   = []
  }

  node_pools_taints = {
    all  = []
    core = []
    cc   = []
  }
}

Terraform Google provider >= 2.4 throws error: "node_pool": conflicts with remove_default_node_pool

I'm just trying out this module for a PoC and ran into some difficulties following the v2.0.0 release. I created a cluster OK using with v1.0.1 but just upgraded and am now getting this error when running a plan.

I've stripped back my config and still see the same issue with the main example config in the
README.md

Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool

Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

I'm not setting remove_default_node_pool option but have tried explicitly setting both true and false and get the same error.

Terraform Version

Terraform v0.11.13
+ provider.google v2.5.1
+ provider.google-beta v2.5.1
+ provider.kubernetes v1.6.2
+ provider.null v2.1.1
+ provider.random v2.1.1

Affected Resource(s)

  • module.gke.google_container_cluster.primary

Terraform Configuration Files

provider "google" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
provider "google-beta" {
  project     = "<PROJECT ID>"
  region      = "us-central1"
  zone        = "us-central1-a"
}
module "gke" {

  source                     = "terraform-google-modules/kubernetes-engine/google"
  project_id                 = "<PROJECT ID>"
  name                       = "gke-test-1"
  region                     = "us-central1"
  zones                      = ["us-central1-a", "us-central1-b", "us-central1-f"]
  network                    = "vpc-01"
  subnetwork                 = "us-central1-01"
  ip_range_pods              = "us-central1-01-gke-01-pods"
  ip_range_services          = "us-central1-01-gke-01-services"
  http_load_balancing        = false
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = true
  network_policy             = true
  remove_default_node_pool   = true

  node_pools = [
    {
      name               = "default-node-pool"
      machine_type       = "n1-standard-2"
      min_count          = 1
      max_count          = 100
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"
      auto_repair        = true
      auto_upgrade       = true
      service_account    = "project-service-account@<PROJECT ID>.iam.gserviceaccount.com"
      preemptible        = false
      initial_node_count = 80
    },
  ]

  node_pools_oauth_scopes = {
    all = []

    default-node-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-node-pool = "true"
    }
  }

  node_pools_metadata = {
    all = {}

    default-node-pool = {
      node-pool-metadata-custom-value = "my-node-pool"
    }
  }

  node_pools_taints = {
    all = []

    default-node-pool = [
      {
        key    = "default-node-pool"
        value  = "true"
        effect = "PREFER_NO_SCHEDULE"
      },
    ]
  }

  node_pools_tags = {
    all = []

    default-node-pool = [
      "default-node-pool",
    ]
  }
}

Expected Behavior

Terraform creates plan successfully.

Actual Behavior

terraform plan errors with

Warning: module.gke.data.google_container_engine_versions.region: "region": [DEPRECATED] Use location instead



Warning: module.gke.data.google_container_engine_versions.zone: "zone": [DEPRECATED] Use location instead



Warning: module.gke.google_container_cluster.primary: "region": [DEPRECATED] Use location instead



Warning: module.gke.google_container_node_pool.pools: "region": [DEPRECATED] use location instead



Error: module.gke.google_container_cluster.primary: "node_pool": conflicts with remove_default_node_pool



Error: module.gke.google_container_cluster.primary: "remove_default_node_pool": conflicts with node_pool

Steps to Reproduce

  1. terraform plan

Outputs break after failed create

* module.gke-dev-cluster.local.cluster_type_output_zonal_zones: local.cluster_type_output_zonal_zones: concat: unexpected type list in list of type string in:

${concat(slice(var.zones,1,length(var.zones)), list(list()))}

stub_domains test failed

As of current master (3f7527e when writing this), with a following test/fixtures/shared/terraform.tfvars:

project_id="redacted-project-name"
credentials_path_relative="../../../credentials.json"
region="europe-west1"
zones=["europe-west1-c"]
compute_engine_service_account="[email protected]"

make docker_build_kitchen_terraform, make docker_run, kitchen create and kitchen converge passed fine.

kitchen verify passed fine for deploy_service, node_pool, shared_vpc, simple_regional and simple_zonal. It failed at stub_domains as follows:

Verifying stub_domains

Profile: stub_domain
Version: (not specified)
Target:  local://

  ×  gcloud: Google Compute Engine GKE configuration (1 failed)
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` exit_status should eq 0
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` stderr should eq ""
     ✔  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster is running
     ×  Command: `gcloud --project=redacted-project-name container clusters --zone=europe-west1 describe stub-domains-cluster-zwwa --format=json` cluster has the expected addon settings
     
     expected: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{}}
          got: {"horizontalPodAutoscaling"=>{}, "httpLoadBalancing"=>{}, "kubernetesDashboard"=>{"disabled"=>true}, "networkPolicyConfig"=>{"disabled"=>true}}
     
     (compared using ==)
     
     Diff:
     @@ -1,5 +1,5 @@
      "horizontalPodAutoscaling" => {},
      "httpLoadBalancing" => {},
      "kubernetesDashboard" => {"disabled"=>true},
     -"networkPolicyConfig" => {},
     +"networkPolicyConfig" => {"disabled"=>true},

  ✔  kubectl: Kubernetes configuration
     ✔  kubernetes configmap kube-dns is created by Terraform
     ✔  kubernetes configmap kube-dns reflects the stub_domains configuration
     ✔  kubernetes configmap ipmasq is created by Terraform
     ✔  kubernetes configmap ipmasq is configured properly


Profile Summary: 1 successful control, 1 control failure, 0 controls skipped
Test Summary: 7 successful, 1 failure, 0 skipped
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Verify failed on instance <stub-domains-local>.  Please see .kitchen/logs/stub-domains-local.log for more details
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration

I have the .kitchen/logs/kitchen.log and kitchen diagnose --all output copied, so let me know if you need that.

Fails with dynamic service account variable

This config fails:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

While this config works:

  node_pools = [
    {
      name            = "pool-01"
      machine_type    = "n1-standard-1"
      min_count       = 2
      max_count       = 2
      disk_size_gb    = 30
      disk_type       = "pd-standard"
      image_type      = "COS"
      auto_repair     = false
      auto_upgrade    = false
      service_account = "${google_service_account.cluster_nodes.email}"
    },
  ]

Networks and Subnetworks are updated everytime

Everytime the module is executed, it try to change the network and subnetworks internal links. Effectively this causes nothing, but the internal Google Api only writes the pattern: projects/project-id/global/networks/network-name

~ module.kubernetes.google_container_cluster.primary
      network:    "projects/my-project/global/networks/my-network" => "https://www.googleapis.com/compute/v1/projects/my-project/global/networks/my-network"
      subnetwork: "projects/my-project/regions/southamerica-east1/subnetworks/my-subnet" => "https://www.googleapis.com/compute/v1/projects/my-project/regions/southamerica-east1/subnetworks/my-subnet"

I believe this commit may be the cause. This change was made by some issue? I couldn't find a related one

Support for preemptible nodes

Is it in the roadmap for support?
I think this can solve without minor issues:

node_config {
    preemptible    = "${lookup(var.node_pools[count.index], "preemptible", false)}"
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.