🛠️ A gentle introduction to GKE private cluster deployment

🧭  Study how to deploy GKE private cluster using terraform and expose an echo server
🔗 Repo: gke-basic-cluster-deployment
🔗 Article posted on medium for Google Cloud also

📧 Found an error or have a question? write to me


My last mountain trekking

📢 Intro

Kubernetes (k8s), although don’t require my introduction, is the most famous and widely adopted container manager in the world. Hosting by yourself is undoubtedly a very advanced and expert topic, so the major of companies choose a provider that provides it as managed service.

There are multiple famous k8s hosting services (e.g. GKE, EKS, AKS, Okteto), but is no doubt that one of the leading is Google Kubernetes Engine (GKE), a product provided by Google Cloud Platform (GCP).

So can we simply open GKE, start a cluster and we are ready to go? Well… yes but no, it maybe could work until something further structured and production ready is required. So then the GKE settings and tweaks start to emerge and require to be addressed. Take for example the official terraform-google-kubernetes-engine terraform deploy: there are a lot of parameters that can really change how GKE will be deployed and employed.

So, simple things first: we will don’t go through all the possible parameters but we will, here, see a basic deployment of GKE with a description of the main settings that a deployment should address.
For better understanding - and code reusage - we deploy the system using terraform, this will give us the prospect to easily explain all the components with the unique clarity that belong to the code.

🚀 Deploy

  • Prerequisites

  • Download the repo gke-basic-cluster-deployment

  • Enable the following GCP APIs using the GCP console [1]

    • Open GCP console -> APIs & Services -> enable APIs and services
    • Enable:
      • Compute Engine API
      • Kubernetes Engine API
  • Compile the input variables:

        $ cp iac/variables.tfvars.example iac/variables.tfvars
        $ vi iac/variables.tfvars # replace with your data
    
  • Deploy the cluster:

    • Replace\set the variables with your data
        # configure gcloud to the desired project
        $ gcloud config set project $PROJECT_ID
      
        # configure terraform
        $ cd iac
        $ tfswitch
        $ terraform init
      
        # deploy the GKE pre-requisites
        $ terraform plan -out out.plan -var-file="./variables.tfvars" -var  deploy_cluster=false
        $ terraform apply out.plan
      
        # deploy GKE - can take more than 20 minutes
        $ terraform plan -out out.plan -var-file="./variables.tfvars" -var deploy_cluster=true
        $ terraform apply out.plan
      
    
  • Deploy the services into k8s

    • Replace\set the variables with your data
        # set kubectl context
        $ gcloud container clusters get-credentials gkedeploy-cluster --zone $PROJECT_REGION --project $PROJECT_ID
      
        # create common resources
        $ kubectl apply -f k8s/common
      
        # deploy the server
        $ kubectl apply -f k8s/gechoserver/
      
        # wait that the ADDRESS will be displayed - can take more than 10 minutes
        $ kubectl -n dev get ingress -o wide
        NAME          CLASS    HOSTS   ADDRESS          PORTS   AGE
        gechoserver   <none>   *       34.120.114.207   80      67s
      
        # query the server from internet - can take  more than 10 minutes
        # replace "34.120.114.207" with your address:
        $ curl -XPOST -v <http://34.120.114.207/> -d "foo=bar"
      
        # ~ Congratulation, your server on GKE is up and running! ~
      
    
  • Destroy the cluster:

        $ cd iac
        $ terraform destroy -auto-approve -var-file="./variables.tfvars" -var deploy_cluster=true
      
    

🏗️ Architecture

Terraform

  • The k8s cluster provider is GKE from Google Cloud Platform (GCP)
  • The terraform state is stored only locally (e.g. no backend on GCS)

GKE

  • Deployed as private cluster, so it depends only on internal IPs
  • Deployed in VPC-native mode:
  • The terraform variable deploy_cluster steers the cluster creation, set to false to create only the networks components
  • Use a Service Account named gkedeploy-sa

Network

  • No NAT will be deployed
    • Therefore the system cannot pull images from public container ◦ registries like Docker Hub, read more under Tips and Takeaways
  • VPC doesn’t create a subnet for each zone
  • Two subnetworks are provided:
    • gkedeploy-subnet: with the range 10.10.0.0/24 is the subnetwork where GKE nodes will be deployed
    • gkedeploy-lb-proxy-only-subnet: with the range 10.14.0.0/23 is a proxy only subnet and is required by GCP to reserve a range of IPs used to deploy the Load Balancers
  • VPC-Native cluster alias IP range could be checked under “Console -> VPC network details -> secondary IPv4 ranges”
    • Under that field we find both the cluster_ipv4_cidr_block (for pods - 10.11.0.0/21) and services_ipv4_cidr_block (for services - 10.12.0.0/21) values,
  • GKE hosted (by Google) master’s nodes will use the 10.13.0.0/28 range, see master_ipv4_cidr_block parameter

🥪 Tips and Takeaways

  • Every component and setting described here is reported on the terraform code, with also more insights, read the code to grasp those concepts
  • We deploy GKE after other resources (this is why we had 2 terraform plans) because otherwise, sometimes the GKE deployment remains infinitely stuck during the health check process, and terraform returns the error Error: Error waiting for creating GKE cluster: All cluster resources were brought up, but [...]
    • Maybe the error is due from SA not yet deployed/up&running, so try to deploy firstly all the resources using var deploy_cluster=false and then deploy the cluster using terraform with var deploy_cluster=true
  • To pull docker images without adding the NAT component we have two choices (memo: we have Private Google Access enabled):
  • Enable the Artifact Registry service on your GCP project and upload/mirror the desired images
  • Chose the images to use from the public Google Container Registry that has a Google allowed IP (e.g. the k8s/gechoserver deployment)
  • By default, you cannot reach GCE vm using -tunnel-through-iap because the firewalls block that connection
    • We add fw-iap firewall rule to terraform in order to use this GCP functionality, named IAP for TCP forwarding
  • [1] We can write Terraform code to enable the GCP APIs, but is opinionated that we should not
  • On terraform, under the GKE section, why master_ipv4_cidr_block is required?
    • because the k8s master(s) are managed by Google and a peering connection will be created from the Google network with the GKE network
    • due to this connection, Google needs to know a free IP range used to assign IPs to the master’s components
  • When deploying a k8s Service object, pay attention when defining UDP/TCP ports: wrong usages fail silently
    • Example:
      • Start 2 pods (A and B), declare ClusterIP on port 80 for TCP connection

      • Run the following code:

        # ~ With TCP:
        # Client A:
        $ nc -l -p 8080
                  
        # Client B:
        $ nc network-multitool 8080
        hello in TCP
                  
        # Client A:
        hello in TCP # <-- msg received
                  
        # ~ With UDP:
        # Client A:
        $ nc -l -u -p 8081
                  
        # Client B:
        $ nc -u network-multitool 8081
        hello in UDP
                  
        # Client A:
        <nothing>
                  
        

🔗 Resources

  • [1] Can I automatically enable APIs when using GCP cloud with terraform? - so
  • [2] Best managed kubernetes platform - reddit
  • Learn Terraform - Provision a GKE Cluster - gh
  • Official GCP Terraform provider - doc
  • GKE Ingress for HTTP(S) Load Balancing - doc
  • Network overview - doc
  • VPC-native clusters - doc
  • DNS on GKE: Everything you need to know - medium
  • A trip with Google Global Load Balancers - medium