🛠️ A gentle introduction to GKE private cluster deployment

25.2.2023 6-minute read

tutorial • GCP

🧭 Study how to deploy GKE private cluster using terraform and expose an echo server
🔗 Repo: gke-basic-cluster-deployment
🔗 Article posted on medium for Google Cloud also

📧 Found an error or have a question? write to me

📢 Intro
🚀 Deploy
🏗️ Architecture
- Terraform
- GKE
- Network
🥪 Tips and Takeaways
🔗 Resources

📢 Intro

Kubernetes (k8s), although don’t require my introduction, is the most famous and widely adopted container manager in the world. Hosting by yourself is undoubtedly a very advanced and expert topic, so the major of companies choose a provider that provides it as managed service.

There are multiple famous k8s hosting services (e.g. GKE, EKS, AKS, Okteto), but is no doubt that one of the leading is Google Kubernetes Engine (GKE), a product provided by Google Cloud Platform (GCP).

So can we simply open GKE, start a cluster and we are ready to go? Well… yes but no, it maybe could work until something further structured and production ready is required. So then the GKE settings and tweaks start to emerge and require to be addressed. Take for example the official terraform-google-kubernetes-engine terraform deploy: there are a lot of parameters that can really change how GKE will be deployed and employed.

So, simple things first: we will don’t go through all the possible parameters but we will, here, see a basic deployment of GKE with a description of the main settings that a deployment should address.
For better understanding - and code reusage - we deploy the system using terraform, this will give us the prospect to easily explain all the components with the unique clarity that belong to the code.

🚀 Deploy

Prerequisites
- terraform
- kubectl
- gcloud
- tfswitch - optional
Download the repo gke-basic-cluster-deployment
Enable the following GCP APIs using the GCP console [1]
- Open GCP console -> APIs & Services -> enable APIs and services
- Enable:
  - Compute Engine API
  - Kubernetes Engine API

Compile the input variables:

    $ cp iac/variables.tfvars.example iac/variables.tfvars
    $ vi iac/variables.tfvars # replace with your data

Deploy the cluster:

Replace\set the variables with your data

    # configure gcloud to the desired project
    $ gcloud config set project $PROJECT_ID
  
    # configure terraform
    $ cd iac
    $ tfswitch
    $ terraform init
  
    # deploy the GKE pre-requisites
    $ terraform plan -out out.plan -var-file="./variables.tfvars" -var  deploy_cluster=false
    $ terraform apply out.plan
  
    # deploy GKE - can take more than 20 minutes
    $ terraform plan -out out.plan -var-file="./variables.tfvars" -var deploy_cluster=true
    $ terraform apply out.plan

Deploy the services into k8s

Replace\set the variables with your data

    # set kubectl context
    $ gcloud container clusters get-credentials gkedeploy-cluster --zone $PROJECT_REGION --project $PROJECT_ID
  
    # create common resources
    $ kubectl apply -f k8s/common
  
    # deploy the server
    $ kubectl apply -f k8s/gechoserver/
  
    # wait that the ADDRESS will be displayed - can take more than 10 minutes
    $ kubectl -n dev get ingress -o wide
    NAME          CLASS    HOSTS   ADDRESS          PORTS   AGE
    gechoserver   <none>   *       34.120.114.207   80      67s
  
    # query the server from internet - can take  more than 10 minutes
    # replace "34.120.114.207" with your address:
    $ curl -XPOST -v <http://34.120.114.207/> -d "foo=bar"
  
    # ~ Congratulation, your server on GKE is up and running! ~

Destroy the cluster:

    $ cd iac
    $ terraform destroy -auto-approve -var-file="./variables.tfvars" -var deploy_cluster=true

🏗️ Architecture

Terraform

The k8s cluster provider is GKE from Google Cloud Platform (GCP)
The terraform state is stored only locally (e.g. no backend on GCS)

GKE

Deployed as private cluster, so it depends only on internal IPs
Deployed in VPC-native mode:
- The traffic to a specific pod will be routed directly to the correct node thanks to the container native LB
- We deploy the Ingress back-end as ClusterIP instead of NodePort to leverage the container native load balancing
The terraform variable deploy_cluster steers the cluster creation, set to false to create only the networks components
Use a Service Account named gkedeploy-sa

Network

No NAT will be deployed
- Therefore the system cannot pull images from public container ◦ registries like Docker Hub, read more under Tips and Takeaways
VPC doesn’t create a subnet for each zone
Two subnetworks are provided:
- gkedeploy-subnet: with the range 10.10.0.0/24 is the subnetwork where GKE nodes will be deployed
  - Instances within this network can access Google APIs and services by using Private Google Access
- gkedeploy-lb-proxy-only-subnet: with the range 10.14.0.0/23 is a proxy only subnet and is required by GCP to reserve a range of IPs used to deploy the Load Balancers
VPC-Native cluster alias IP range could be checked under “Console -> VPC network details -> secondary IPv4 ranges”
- Under that field we find both the cluster_ipv4_cidr_block (for pods - 10.11.0.0/21) and services_ipv4_cidr_block (for services - 10.12.0.0/21) values,
GKE hosted (by Google) master’s nodes will use the 10.13.0.0/28 range, see master_ipv4_cidr_block parameter

🥪 Tips and Takeaways

Every component and setting described here is reported on the terraform code, with also more insights, read the code to grasp those concepts
We deploy GKE after other resources (this is why we had 2 terraform plans) because otherwise, sometimes the GKE deployment remains infinitely stuck during the health check process, and terraform returns the error Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but [...]
- Maybe the error is due from SA not yet deployed/up&running, so try to deploy firstly all the resources using var deploy_cluster=false and then deploy the cluster using terraform with var deploy_cluster=true
To pull docker images without adding the NAT component we have two choices (memo: we have Private Google Access enabled):
Enable the Artifact Registry service on your GCP project and upload/mirror the desired images
Chose the images to use from the public Google Container Registry that has a Google allowed IP (e.g. the k8s/gechoserver deployment)
By default, you cannot reach GCE vm using -tunnel-through-iap because the firewalls block that connection
- We add fw-iap firewall rule to terraform in order to use this GCP functionality, named IAP for TCP forwarding
[1] We can write Terraform code to enable the GCP APIs, but is opinionated that we should not
On terraform, under the GKE section, why master_ipv4_cidr_block is required?
- because the k8s master(s) are managed by Google and a peering connection will be created from the Google network with the GKE network
- due to this connection, Google needs to know a free IP range used to assign IPs to the master’s components

When deploying a k8s Service object, pay attention when defining UDP/TCP ports: wrong usages fail silently

Example:

Start 2 pods (A and B), declare ClusterIP on port 80 for TCP connection

Run the following code:

# ~ With TCP:
# Client A:
$ nc -l -p 8080
          
# Client B:
$ nc network-multitool 8080
hello in TCP
          
# Client A:
hello in TCP # <-- msg received
          
# ~ With UDP:
# Client A:
$ nc -l -u -p 8081
          
# Client B:
$ nc -u network-multitool 8081
hello in UDP
          
# Client A:
<nothing>

🔗 Resources

[1] Can I automatically enable APIs when using GCP cloud with terraform? - so
[2] Best managed kubernetes platform - reddit
Learn Terraform - Provision a GKE Cluster - gh
Official GCP Terraform provider - doc
GKE Ingress for HTTP(S) Load Balancing - doc
Network overview - doc
VPC-native clusters - doc
DNS on GKE: Everything you need to know - medium
A trip with Google Global Load Balancers - medium