How To Provision Kubernetes Clusters In GCP With Terraform
Introduction
In this article, you will learn how to create clusters on the GCP Google Kubernetes Engine (GKE) with the gcloud command-line interface and Terraform. By the end of the tutorial, you will automate creating a cluster with a single command.
GKE is a managed Kubernetes service provided by GCP, meaning that that the Google Cloud Platform (GCP) is responsible for maintaining the cluster’s master node. You have the option given to you to manually maintain the worker nodes, but this should only be done under certain circumstances.
How To Provision Kubernetes Clusters In GCP With Terraform
Set up the GCP account
Before starting using the gcloud CLI and Terraform, you have to install the Google Cloud SDK bundle to authenticate your requests to GCP from your Linux command line. Follow the steps below:
-
- First, make sure you have apt-transport-https installed:
$ sudo apt-get install apt-transport-https ca-certificates gnupg
- Add the Cloud SDK distribution URI as a package source:
$ echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
- Import the Google Cloud public key:
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
- Update and install the Cloud SDK:
$ sudo apt-get update && sudo apt-get install google-cloud-sdk
- Initiate cloud by executing:
$ gcloud init
Next, it will appear something very similar to this:
Welcome! This command will take you through the configuration of gcloud. Your current configuration has been set to: [default] You can skip diagnostics next time by using the following flag: gcloud init --skip-diagnostics Network diagnostic detects and fixes local network connection issues. Checking network connection...done. Reachability Check passed. Network diagnostic passed (1/1 checks passed). You must log in to continue. Would you like to log in (Y/n)? Y Go to the following link in your browser: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=42554940551.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=kNFWsd8bN9gf4RngMjCKFd4TFE9cQa&prompt=consent&access_type=offline&code_challenge=0RfyIIefeOofzMGS36f35Z78buzDKZQoGnTbhPndf_A&code_challenge_method=S256
Copy the URL that will appear, and a verification code will appear. Copy and paste it, and you will be logged in.
- Enable Compute Engine:
$ gcloud services enable compute.googleapis.com
- Enable Billing:
$ gcloud services enable container.googleapis.com
- First, make sure you have apt-transport-https installed:
Provisioning Kubernetes Cluster Via gcloud CLI
Now that you can freely send requests to GCP, you can go ahead and create your cluster. In this example, we will be creating a cluster using Ubuntu Operating System with Docker.
PROJECT=your_project
REGION=your_region
gcloud beta container --project "$PROJECT" clusters create "k8s" --region "$REGION" --no-enable-basic-auth --cluster-version "1.21.4-gke.1801" --release-channel "rapid" --machine-type "e2-medium" --image-type "UBUNTU" --disk-type "pd-standard" --disk-size "100" --metadata disable-legacy-endpoints=true --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --max-pods-per-node "110" --num-nodes "1" --logging=SYSTEM,WORKLOAD --monitoring=SYSTEM --enable-ip-alias --network "projects/$PROJECT/global/networks/default" --subnetwork "projects/$PROJECT/regions/$REGION/subnetworks/default" --no-enable-intra-node-visibility --default-max-pods-per-node "110" --enable-autoscaling --min-nodes "0" --max-nodes "2" --no-enable-master-authorized-networks --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver --enable-autoupgrade --enable-autorepair --max-surge-upgrade 1 --max-unavailable-upgrade 0 --enable-shielded-nodes
That’s it!
GCP will now provision your K8s cluster consisting of:
- Kubernetes version: 1.21.4-gke.1801 on the rapid channel.
- Machine type: e2-medium (2 CPU cores, 4Gb Ram).
- Image type: Ubuntu.
- Disk type: pd-standard (this is a Kubernetes volume storage class).
- Disk size: 100Gb.
- Max pods per node: 110 (Default setting).
- Horizontal (pods), and Vertical (worker nodes) scaling enabled.
- Auto-upgrade and auto-repair enabled.
The cluster will be initially created with 3 nodes, each one in a different availability zone (High Availability), and upon heavy load will be scaled to 6 total worker nodes (2 nodes per availability zone x 3 = 6 total nodes). The number of zones depends on how many zones exist within each Region. Each region has at least 3 availability zones, but we could use 4 even if you wish.
Provision A Kubenetes Cluster With Terraform
For starters, what is Terraform?
Terraform is an open-source infrastructure provisioning (Infrastructure as a Code) tool developed and maintained by Hashicorp. With Terraform, you define certain building blocks using HCL (HashiCorp Configuration Language). The building blocks are consisting of:
- Provider. This block is instructing Terraform basically which cloud provider to use. (GCP, AWS, etc)
- Modules.
- Resources.
- Output
What is happening under the hood, is that based on given blocks, Terraform makes the appropriate API calls to the cloud provider, to provision the infrastructure described in these blocks.
Enough with the theory now, let’s jump to some actual code!
Create a folder and 3 files in it. These files will be:
variables.tf
– to define the cluster parameters.main.tf
– to store the cluster provisioning blocks.outputs.tf
– to define which variables should be printed to the screen.
$ mkdir gke-terraform; cd gke-terraform; touch variables.tf main.tf outputs.tf
The main.tf contents will be:
provider "google" {
project = var.project_id
region = var.region
}
module "gke_auth" {
source = "terraform-google-modules/kubernetes-engine/google//modules/auth"
depends_on = [module.gke]
project_id = var.project_id
location = module.gke.location
cluster_name = module.gke.name
}
resource "local_file" "kubeconfig" {
content = module.gke_auth.kubeconfig_raw
filename = "kubeconfig-${var.env_name}-${timestamp()}"
}
module "gcp-network" {
source = "terraform-google-modules/network/google"
project_id = var.project_id
network_name = "${var.network}-${var.env_name}"
subnets = [
{
subnet_name = "${var.subnetwork}-${var.env_name}"
subnet_ip = "10.10.0.0/16"
subnet_region = var.region
},
]
secondary_ranges = {
"${var.subnetwork}-${var.env_name}" = [
{
range_name = var.ip_range_pods_name
ip_cidr_range = "10.20.0.0/16"
},
{
range_name = var.ip_range_services_name
ip_cidr_range = "10.30.0.0/16"
},
]
}
}
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
project_id = var.project_id
name = "${var.cluster_name}-${var.env_name}"
regional = true
region = var.region
zones = var.zones
network = module.gcp-network.network_name
subnetwork = module.gcp-network.subnets_names[0]
ip_range_pods = var.ip_range_pods_name
ip_range_services = var.ip_range_services_name
http_load_balancing = true
horizontal_pod_autoscaling = true
network_policy = true
remove_default_node_pool = true
release_channel = "RAPID"
kubernetes_version = "latest"
node_pools = [
{
name = "regional-pool"
preeptible = false
machine_type = "e2-custom-4-4096"
image_type = "UBUNTU"
disk_type = "pd-balanced"
disk_size_gb = 30
local_ssd_count = 0
tags = "gke-node"
min_count = 1
max_count = 1
max_surge = 2
max_unavailable = 1
autoscaling = true
auto_upgrade = true
auto_repair = true
node_metadata = "GKE_METADATA_SERVER"
},
]
node_pools_oauth_scopes = {
all = []
regional-pool = [
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/ndev.clouddns.readwrite",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
]
}
node_pools_labels = {
all = {}
default-node-pool = {
default-pool = true
}
}
node_pools_tags = {
all = []
default-pool = [
"gke-node", "${var.project_id}-gke"
]
}
}
resource "google_compute_firewall" "ssh-rule" {
depends_on = [module.gke]
name = "ssh"
network = "${var.network}-${var.env_name}"
project = "${var.project_id}"
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["0.0.0.0/0"]
}
resource "google_compute_project_metadata" "ansible_ssh_key" {
project = var.project_id
metadata = {
ssh-keys = "${var.ssh_user}:${file(var.key_pairs["root_public_key"])}"
}
}
Pay attention to the last 2 resource blocks:
google_compute_firewall
will create an ssh rule to enabling you to ssh to each node.google_compute_project_metadata
will create ssh key that its location defined in the root_public_key variable. Feel free to delete that resource if you don’t want root access to your nodes.
Now the variables.tf file:
variable "project_id" {
description = "your_project_id"
default= "your_project_id"
}
variable "region" {
description = "The region to host the cluster in"
default = "europe-west4"
}
variable "zones" {
description = "The region to host the cluster in"
default = ["europe-west4-a","europe-west4-b","europe-west4-c"]
}
variable "cluster_name" {
description = "The name for the GKE cluster"
default = "kubedemo"
}
variable "env_name" {
description = "The environment for the GKE cluster"
default = "prod"
}
variable "network" {
description = "The VPC network created to host the cluster in"
default = "gke-network"
}
variable "subnetwork" {
description = "The subnetwork created to host the cluster in"
default = "gke-subnet"
}
variable "subnetwork_ipv4_cidr_range" {
description = "The subnetwork ip cidr block range."
default = "10.20.0.0/14"
}
variable "ip_range_pods_name" {
description = "The secondary ip range to use for pods"
default = "ip-range-pods"
}
variable "pod_ipv4_cidr_range" {
description = "The cidr ip range to use for pods"
default = "10.24.0.0/14"
}
variable "ip_range_services_name" {
description = "The secondary ip range name to use for services"
default = "ip-range-services"
}
variable "services_ipv4_cidr_range" {
description = "The cidr ip range to use for services"
default = "10.28.0.0/20"
}
variable "ssh_user" {
description = "The user that Ansible will use"
default = "root"
}
variable "key_pairs" {
type = map
default = {
root_public_key = "keys/root_id_ed25519.pub",
root_private_key = "keys/root_id_ed25519"
}
}
Again here you can delete the last 2 variables, but if you wish to keep them to add ssh capability to your nodes, and enable root access to them using ssh keys for instance to use Ansible, you can create the necessary keys by executing the command:
$ mkdir -p keys;ssh-keygen -t ed25519 -f ./keys/root_id_ed25519
Lastly, the outputs.tf file will be:
output "project_id" {
value = var.project_id
description = "GCloud Project ID"
}
output "region" {
value = var.region
description = "GCloud Region"
}
output "zones" {
value = var.zones
description = "Node Zones"
}
output "cluster_name" {
description = "Cluster name"
value = module.gke.name
}
Now that you finished with the code part, certain steps need to be followed.
- Execute the command
$ terraform init
This will initialize Terraform and download all the necessary files needed to execute the code for the given cloud provider.
- Now execute:
$ terraform plan
This will perform a dry-run and will inform you with a detailed summary of the infrastructure that is going to be provisioned.
- If you are confident with the result above, you can now execute:
$ terraform apply
Now, wait until the cluster is up and running as it takes nearly 10 minutes for all resources to become available.
Congrats, you now have a working K8s cluster provisioned with Terraform in GKE!
You can now deploy apps, or simply play around with your cluster. For example, you can create a namespace if it doesn’t exist:
NS_NAME=some_name
echo -e "apiVersion: v1\nkind: Namespace\nmetadata:\n name: ${NS_NAME}" | kubectl create ns -
You can find all the project’s source code on GitHub.
External resources for further reading:
- Terraform documentation: https://www.terraform.io/docs/index.html
- Google Cloud Platform documentation: https://cloud.google.com/docs
- Kubernetes documentation: https://kubernetes.io/docs/home/
- HashiCorp guide on using Terraform with GCP: https://learn.hashicorp.com/collections/terraform/gcp-get-started
- GCP GitHub repository for Terraform modules: https://github.com/GoogleCloudPlatform/terraform-google-modules