How To Provision Kubernetes Clusters In GCP With Terraform

Introduction

In this article, you will learn how to create clusters on the GCP Google Kubernetes Engine (GKE) with the gcloud command-line interface and Terraform. By the end of the tutorial, you will automate creating a cluster with a single command.

GKE is a managed Kubernetes service provided by GCP, meaning that that the Google Cloud Platform (GCP) is responsible for maintaining the cluster’s master node. You have the option given to you to manually maintain the worker nodes, but this should only be done under certain circumstances.

How To Provision Kubernetes Clusters In GCP With Terraform

Set up the GCP account

Before starting using the gcloud CLI and Terraform, you have to install the Google Cloud SDK bundle to authenticate your requests to GCP from your Linux command line. Follow the steps below:

    1. First, make sure you have apt-transport-https installed:
      $ sudo apt-get install apt-transport-https ca-certificates gnupg
    2. Add the Cloud SDK distribution URI as a package source:
      $ echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
    3. Import the Google Cloud public key:
      $ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
    4. Update and install the Cloud SDK:
      $ sudo apt-get update && sudo apt-get install google-cloud-sdk
    5. Initiate cloud by executing:
      $ gcloud init

      Next, it will appear something very similar to this:

      Welcome! This command will take you through the configuration of gcloud.
      Your current configuration has been set to: [default]
      You can skip diagnostics next time by using the following flag:
      gcloud init --skip-diagnostics
      Network diagnostic detects and fixes local network connection issues.
      Checking network connection...done.
      Reachability Check passed.
      Network diagnostic passed (1/1 checks passed).
                  
      You must log in to continue. Would you like to log in (Y/n)?  Y
                  
      Go to the following link in your browser:
                  
      https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=42554940551.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=kNFWsd8bN9gf4RngMjCKFd4TFE9cQa&prompt=consent&access_type=offline&code_challenge=0RfyIIefeOofzMGS36f35Z78buzDKZQoGnTbhPndf_A&code_challenge_method=S256
          

      Copy the URL that will appear, and a verification code will appear. Copy and paste it, and you will be logged in.

    6. Enable Compute Engine:
      $ gcloud services enable compute.googleapis.com
    7. Enable Billing:
      $ gcloud services enable container.googleapis.com

Provisioning Kubernetes Cluster Via gcloud CLI

Now that you can freely send requests to GCP, you can go ahead and create your cluster. In this example, we will be creating a cluster using Ubuntu Operating System with Docker.

 

PROJECT=your_project
REGION=your_region

gcloud beta container --project "$PROJECT" clusters create "k8s" --region "$REGION" --no-enable-basic-auth --cluster-version "1.21.4-gke.1801" --release-channel "rapid" --machine-type "e2-medium" --image-type "UBUNTU" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "1" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT/global/networks/default" --subnetwork "projects/$PROJECT/regions/$REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "2" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes

That’s it!

GCP will now provision your K8s cluster consisting of:

  1. Kubernetes version: 1.21.4-gke.1801 on the rapid channel.
  2. Machine type: e2-medium (2 CPU cores, 4Gb Ram).
  3. Image type: Ubuntu.
  4. Disk type: pd-standard (this is a Kubernetes volume storage class).
  5. Disk size: 100Gb.
  6. Max pods per node: 110 (Default setting).
  7. Horizontal (pods), and Vertical (worker nodes) scaling enabled.
  8. Auto-upgrade and auto-repair enabled.

The cluster will be initially created with 3 nodes, each one in a different availability zone (High Availability), and upon heavy load will be scaled to 6 total worker nodes (2 nodes per availability zone x 3 = 6 total nodes). The number of zones depends on how many zones exist within each Region. Each region has at least 3 availability zones, but we could use 4 even if you wish.

Provision A Kubenetes Cluster With Terraform

For starters, what is Terraform?
Terraform is an open-source infrastructure provisioning (Infrastructure as a Code) tool developed and maintained by Hashicorp. With Terraform, you define certain building blocks using HCL (HashiCorp Configuration Language). The building blocks are consisting of:

  1. Provider. This block is instructing Terraform basically which cloud provider to use. (GCP, AWS, etc)
  2. Modules.
  3. Resources.
  4. Output

What is happening under the hood, is that based on given blocks, Terraform makes the appropriate API calls to the cloud provider, to provision the infrastructure described in these blocks.

Enough with the theory now, let’s jump to some actual code!

Create a folder and 3 files in it. These files will be:

  • variables.tf – to define the cluster parameters.
  • main.tf – to store the cluster provisioning blocks.
  • outputs.tf – to define which variables should be printed to the screen.
$ mkdir gke-terraform; cd gke-terraform; touch variables.tf main.tf outputs.tf

The main.tf contents will be:

provider "google" {
  project     = var.project_id
  region      = var.region
}

module "gke_auth" {
  source = "terraform-google-modules/kubernetes-engine/google//modules/auth"
  depends_on   = [module.gke]
  project_id   = var.project_id
  location     = module.gke.location
  cluster_name = module.gke.name
}
resource "local_file" "kubeconfig" {
  content  = module.gke_auth.kubeconfig_raw
  filename = "kubeconfig-${var.env_name}-${timestamp()}"
}

module "gcp-network" {
  source       = "terraform-google-modules/network/google"
  project_id   = var.project_id
  network_name = "${var.network}-${var.env_name}"
  subnets = [
    {
      subnet_name   = "${var.subnetwork}-${var.env_name}"
      subnet_ip     = "10.10.0.0/16"
      subnet_region = var.region
    },
  ]
  secondary_ranges = {
    "${var.subnetwork}-${var.env_name}" = [
      {
        range_name    = var.ip_range_pods_name
        ip_cidr_range = "10.20.0.0/16"
      },
      {
        range_name    = var.ip_range_services_name
        ip_cidr_range = "10.30.0.0/16"
      },
    ]
  }
}

module "gke" {
  source                      = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  project_id                  = var.project_id
  name                        = "${var.cluster_name}-${var.env_name}"
  regional                    = true
  region                      = var.region
  zones                       = var.zones
  network                     = module.gcp-network.network_name
  subnetwork                  = module.gcp-network.subnets_names[0]
  ip_range_pods               = var.ip_range_pods_name
  ip_range_services           = var.ip_range_services_name
  http_load_balancing         = true
  horizontal_pod_autoscaling  = true
  network_policy              = true
  remove_default_node_pool    = true
  release_channel             = "RAPID"
  kubernetes_version          = "latest"
  node_pools = [
    {
      name                  = "regional-pool"
      preeptible            = false
      machine_type          = "e2-custom-4-4096"
      image_type            = "UBUNTU"
      disk_type             = "pd-balanced"
      disk_size_gb          = 30
      local_ssd_count       = 0
      tags                  = "gke-node"
      min_count             = 1
      max_count             = 1
      max_surge             = 2
      max_unavailable       = 1
      autoscaling           = true
      auto_upgrade          = true
      auto_repair           = true
      node_metadata         = "GKE_METADATA_SERVER"
    },
  ]
  node_pools_oauth_scopes = {
    all = []

    regional-pool = [
      "https://www.googleapis.com/auth/cloud-platform",
      "https://www.googleapis.com/auth/ndev.clouddns.readwrite",
      "https://www.googleapis.com/auth/servicecontrol",
      "https://www.googleapis.com/auth/service.management.readonly",
    ]
  }

  node_pools_labels = {
    all = {}

    default-node-pool = {
      default-pool = true
    }
  }

  node_pools_tags = {
    all = []

    default-pool = [
     "gke-node", "${var.project_id}-gke"
    ]
  }
}
resource "google_compute_firewall" "ssh-rule" {
  depends_on   = [module.gke]
  name = "ssh"
  network  = "${var.network}-${var.env_name}"
  project  = "${var.project_id}"
  allow {
    protocol = "tcp"
    ports = ["22"]
  }
  source_ranges = ["0.0.0.0/0"]
}
resource "google_compute_project_metadata" "ansible_ssh_key" {
  project = var.project_id
  metadata = {
    ssh-keys = "${var.ssh_user}:${file(var.key_pairs["root_public_key"])}"
  }
}

Pay attention to the last 2 resource blocks:

  • google_compute_firewallwill create an ssh rule to enabling you to ssh to each node.
  • google_compute_project_metadatawill create ssh key that its location defined in the root_public_key variable. Feel free to delete that resource if you don’t want root access to your nodes.

Now the variables.tf file:

variable "project_id" {
  description = "your_project_id"
  default= "your_project_id"
}

variable "region" {
  description = "The region to host the cluster in"
  default     = "europe-west4"
}

variable "zones" {
  description = "The region to host the cluster in"
  default     = ["europe-west4-a","europe-west4-b","europe-west4-c"]
}

variable "cluster_name" {
  description = "The name for the GKE cluster"
  default     = "kubedemo"
}

variable "env_name" {
  description = "The environment for the GKE cluster"
  default     = "prod"
}
variable "network" {
  description = "The VPC network created to host the cluster in"
  default     = "gke-network"
}

variable "subnetwork" {
  description = "The subnetwork created to host the cluster in"
  default     = "gke-subnet"
}

variable "subnetwork_ipv4_cidr_range" {
  description = "The subnetwork ip cidr block range."
  default     = "10.20.0.0/14"
}

variable "ip_range_pods_name" {
  description = "The secondary ip range to use for pods"
  default     = "ip-range-pods"
}

variable "pod_ipv4_cidr_range" {
  description = "The cidr ip range to use for pods"
  default     = "10.24.0.0/14"
}

variable "ip_range_services_name" {
  description = "The secondary ip range name to use for services"
  default     = "ip-range-services"
}
variable "services_ipv4_cidr_range" {
  description = "The cidr ip range to use for services"
  default     = "10.28.0.0/20"
}

variable "ssh_user" {
  description = "The user that Ansible will use"
  default     = "root"
}

variable "key_pairs" {
  type = map
  default = {
    root_public_key  = "keys/root_id_ed25519.pub",
    root_private_key = "keys/root_id_ed25519"
  }
}

Again here you can delete the last 2 variables, but if you wish to keep them to add ssh capability to your nodes, and enable root access to them using ssh keys for instance to use Ansible, you can create the necessary keys by executing the command:

$ mkdir -p keys;ssh-keygen -t ed25519 -f ./keys/root_id_ed25519

Lastly, the outputs.tf file will be:

output "project_id" {
  value       = var.project_id
  description = "GCloud Project ID"
}

output "region" {
  value       = var.region
  description = "GCloud Region"
}

output "zones" {
  value       = var.zones
  description = "Node Zones"
}

output "cluster_name" {
  description = "Cluster name"
  value       = module.gke.name
}

Now that you finished with the code part, certain steps need to be followed.

  1. Execute the command
    $ terraform init

    This will initialize Terraform and download all the necessary files needed to execute the code for the given cloud provider.

  2. Now execute:
    $ terraform plan

    This will perform a dry-run and will inform you with a detailed summary of the infrastructure that is going to be provisioned.

  3. If you are confident with the result above, you can now execute:
    $ terraform apply

    Now, wait until the cluster is up and running as it takes nearly 10 minutes for all resources to become available.

Congrats, you now have a working K8s cluster provisioned with Terraform in GKE!

You can now deploy apps, or simply play around with your cluster. For example, you can create a namespace if it doesn’t exist:

NS_NAME=some_name
echo -e "apiVersion: v1\nkind: Namespace\nmetadata:\n  name: ${NS_NAME}" | kubectl create ns -

You can find all the project’s source code on GitHub.

External resources for further reading:

  1. Terraform documentation: https://www.terraform.io/docs/index.html
  2. Google Cloud Platform documentation: https://cloud.google.com/docs
  3. Kubernetes documentation: https://kubernetes.io/docs/home/
  4. HashiCorp guide on using Terraform with GCP: https://learn.hashicorp.com/collections/terraform/gcp-get-started
  5. GCP GitHub repository for Terraform modules: https://github.com/GoogleCloudPlatform/terraform-google-modules