Configure a Kubernetes cluster on GCP
Objectives
The goal of this tutorial is to set up and configure a Kubernetes cluster on GCP to allow users to install a kdb Insights Enterprise.
Terraform artifacts
If you have a full commercial license, kdb Insights Enterprise provides default Terraform modules which are delivered as a TGZ. These modules are available through the KX Downloads Portal.
You will need to download the artifact and extract it.
Prerequisites
For this tutorial you will need:
A Google Cloud account.
A Google Cloud user with admin privileges.
A Google Cloud project with the following APIs enabled:
Cloud Resource Manager API
Compute Engine API
Kubernetes Engine API
Cloud Filestore API
Sufficient Quotas to deploy the cluster.
A client machine with Google Cloud SDK.
A client machine with Docker.
Note
On Linux, additional steps are required to manage Docker as a non-root user.
Environment Setup
To extract the artifact, execute the following:
tar xzvf kxi-terraform-*.tgz
The above command will create the kxi-terraform directory. The commands below are executed within this directory and thus use relative paths.
To change to this directory execute the following:
cd kxi-terraform
The deployment process is performed within a Docker container which includes all tools needed by the provided scripts. A Dockerfile is provided in the config directory that can be used to build the Docker image. The image name should be kxi-terraform and can be built using the below command:
docker build -t kxi-terraform:latest ./config
Service Account Setup
The Terraform scripts require a Service Account with appropriate permissions which are defined in the kxi-gcp-tf-policy.txt file. The service account should already exist.
Note
The below commands should be run by a user with admin privileges.
Create json key file for service account:
gcloud iam service-accounts keys create "${SERVICE_ACCOUNT}.json" --iam-account="${SERVICE_ACCOUNT_EMAIL}" --no-user-output-enabled
where:
- SERVICE_ACCOUNT is the name of an existing service account
- SERVICE_ACCOUNT_EMAIL is the email address of an existing service account
The command will create the json file in the base directory. You will need to use filename later when updating the configuration file.
Grant roles to service account:
while IFS= read -r role
do
gcloud projects add-iam-policy-binding "${PROJECT}" --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" --role="${role}" --condition=None --no-user-output-enabled
done < config/kxi-gcp-tf-policy.txt
where:
- PROJECT is the GCP project used for deployment
- SERVICE_ACCOUNT_EMAIL is the email address of the service account
Configuration
The Terraform scripts are driven by environment variables, which configure how the Kubernetes cluster is deployed. These variables are populated by running the configure.sh
script as follows.
./scripts/configure.sh
Select GCP
and enter your project name and credentials file name.
Select Cloud Provider
Choose:
AWS
Azure
> GCP
Set GCP Project
> myproject
Set GCP Credentials JSON filename (should exist on the current directory)
> credentials.json
Select the Region to deploy into:
Select Region
asia-northeast2
asia-northeast3
asia-south1
asia-south2
asia-southeast1
asia-southeast2
australia-southeast1
australia-southeast2
europe-central2
europe-north1
europe-southwest1
europe-west1
europe-west10
europe-west12
europe-west2
europe-west3
europe-west4
europe-west6
europe-west8
europe-west9
me-central1
me-central2
me-west1
northamerica-northeast1
northamerica-northeast2
northamerica-south1
southamerica-east1
southamerica-west1
us-central1
us-east1
us-east4
us-east5
us-south1
us-west1
us-west2
us-west3
us-west4
Select the Architecture Profile:
Select Architecture Profile
Choose:
> HA
Performance
Cost-Optimised
Enter how much capacity you require for rook-ceph, if you press enter this uses the default of 100Gi.
Set how much capacity you require for rook-ceph, press Enter to use the default of 100Gi
Please note this is will be the usable storage with replication
> Enter rook-ceph disk space (default: 100)
Enter the environment name, which acts as an identifier for all resources:
Set the environment name (up to 8 characters and can only contain lowercase letters and numbers)
> insights
Enter IPs/Subnets in CIDR notation to allow access to the Bastion Host and VPN:
Set Network CIDR that will be allowed VPN access as well as SSH access to the bastion host
For unrestricted access, set this to 0.0.0.0/0
> 0.0.0.0/0
Enter IPs/Subnets in CIDR notation to allow HTTP/HTTPS access to the cluster's ingress.
Set Network CIDR that will be allowed HTTPS access
For unrestricted access please set this to 0.0.0.0/0
> 0.0.0.0/0
SSL certificate configuration
Choose method for managing SSL certificates
----------------------------------------------
Cert-Manager HTTP Validation: Issues Let's Encrypt Certificates, fully automated but requires unrestricted HTTP access to the cluster.
Cert-Manager DNS Validation: Issues Let's Encrypt Certificates, requires access to the DNS Zone and additional configuration for automation.
Existing Certificates: Requires the SSL certificate to be stored on a Kubernetes Secret on the same namespace where Insights is deployed.
Choose:
> Cert-Manager HTTP Validation
Cert-Manager DNS Validation
Existing Certificates
Custom Tags
The config/default_tags.json
file includes the tags that will be applied to all resources. You can add your own tags in this file to customize your environment.
Note
Only hyphens (-), underscores (_), lowercase characters, and numbers are allowed. Keys must start with a lowercase character. International characters are allowed.
Deployment
To deploy the cluster and apply configuration, execute the following:
./scripts/deploy-cluster.sh
.\scripts\deploy-cluster.bat
Note
A pre-deployment check will be performed before proceeding further. If the check fails, the script will exit immediately to avoid deployment failures. You should resolve all issues before executing the command again.
This script will execute a series of Terraform and custom commands and may take some time to run. If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors. If the error is related with the Cloud Provider account (e.g. limits) you should resolve them first before executing the command again.
If any variable in the configuration file needs to be changed, the cluster should be destroyed first and then re-deployed.
For easier searching and filtering, the created resources are named/tagged using the gcp-${ENV} prefix. For example, if the ENV is set to demo, all resource names/tags include the gcp-demo prefix.
Cluster Access
To access the cluster, execute the following:
./scripts/manage-cluster.sh
.\scripts\manage-cluster.bat
The above command will start a shell session on a Docker container, generate a kubeconfig entry and connect to the VPN. Once the command completes, you will be able to manage the cluster via helm/kubectl.
Note
The kxi-terraform directory on the host is mounted on the container on /terraform. Files and directories created while using this container will be persisted if they are created under /terraform directory even after the container is stopped.
Note
If other users require access to the cluster, they will need to download and extract the artifact, build the Docker container and copy the kxi-terraform.env file as well as the terraform/gcp/client.ovpn file (generated during deployment) to their own extracted artifact directory on the same paths. Once these two files are copied, the above script can be used to access the cluster.
Below you can find kubectl commands to retrieve information about the installed components.
List Kubernetes Worker Nodes
kubectl get nodes
List Kubernetes namespaces
kubectl get namespaces
List cert-manager pods running on cert-manager namespace
kubectl get pods --namespace=cert-manager
List nginx ingress controller pod running on ingress-nginx namespace
kubectl get pods --namespace=ingress-nginx
List rook-ceph pods running on rook-ceph namespace
kubectl get pods --namespace=rook-ceph
Environment Destroy
Before you destroy the environment, make sure you don't have any active shell sessions on the Docker container. You can close the session by executing the following:
exit
To destroy the cluster, execute the following:
./scripts/destroy-cluster.sh
.\scripts\destroy-cluster.bat
If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors.
Note
In some cases, the command may fail due to the VPN being unavailable or GCP resources not cleaned up properly. To resolve this, delete terraform/gcp/client.ovpn file and execute it again.
Note
Even after the cluster is destroyed, the disks created dynamically by the application may still be present and incur additional costs. You should review the GCE Disks to verify if the data is still needed.
Advanced Configuration
It is possible to further configure your cluster by editing the newly generated kxi-terraform.env
file in the current directory. These edits should be made prior to running the deploy-cluster.sh
script. The list of variables which can be edited are given below:
Environment Variable | Details | Default Value | Possible Values |
---|---|---|---|
TF_VAR_enable_metrics | Enables forwarding of container metrics to Cloud-Native monitoring tools | false | true / false |
TF_VAR_enable_logging | Enables forwarding of container metrics to Cloud-Native monitoring tools | false | true / false |
TF_VAR_default_node_type | Node type for default node pool | Depends on profile | VM Instance Type |
TF_VAR_rook_ceph_pool_node_type | Node type for Rook-Ceph node pool (when configured) | Depends on profile | VM Instance Type |
TF_VAR_letsencrypt_account | If you intend to use cert-manager to issue certificates, then you need to provide a valid email address if you wish to receive notifications related to certificate expiration | root@emaildomain.com | email address |
TF_VAR_bastion_whitelist_ips | The list of IPs/Subnets in CIDR notation that are allowed VPN/SSH access to the bastion host. | N/A | IP CIDRs |
TF_VAR_insights_whitelist_ips | The list of IPs/Subnets in CIDR notation that are allowed HTTP/HTTPS access to the VPC | N/A | IP CIDRs |
TF_VAR_letsencrypt_enable_http_validation | Enables issuing of Let's Encrypt certificates using cert-manager HTTP validation. This is disabled by default to allow only pre-existing certificates. | false | true / false |
TF_VAR_rook_ceph_storage_size | Size of usable data provided by rook-ceph. | 100Gi | XXXGi |
TF_VAR_enable_cert_manager | Deploy Cert Manager | true | true / false |
TF_VAR_enable_ingress_nginx | Deploy Ingress NGINX | true | true / false |
TF_VAR_enable_filestore_csi_driver | Deploy Filestore CSI Driver | true | true / false |
TF_VAR_enable_sharedfiles_storage_class | Create storage class for shared files | true | true / false |
TF_VAR_rook_ceph_mds_resources_memory_limit | The default resource limit is 8Gi. You can override this to change the resource limit of the metadataServer of rook-ceph. NOTE: The MDS Cache uses 50%, so with the default setting, the MDS Cache is set to 4Gi. | 8Gi | XXGi |