Configure a Kubernetes cluster on Azure

Objectives

The goal of this tutorial is to set up and configure a Kubernetes cluster on Azure to allow users to install a kdb Insights Enterprise.

Terraform artifacts

If you have a full commercial license, kdb Insights Enterprise provides default Terraform modules which are delivered as a TGZ. These modules are available through the KX Downloads Portal.

You will need to download the artifact and extract it.

Prerequisites

For this tutorial you will need:

An Azure Account.

An Azure Service Principal.

Sufficient Quotas to deploy the cluster.

A client machine with Azure CLI.

A client machine with Docker.

Note

When running the scripts from a bastion host, ensure ports 1174 and 443 are open for outbound access, or enable full outbound access with a 0.0.0.0/0 security group rule.

Note

On Linux, additional steps are required to manage Docker as a non-root user.

Environment Setup

To extract the artifact, execute the following:

tar xzvf kxi-terraform-*.tgz

The above command will create the kxi-terraform directory. The commands below are executed within this directory and thus use relative paths.

To change to this directory execute the following:

cd kxi-terraform

The deployment process is performed within a Docker container that includes all tools needed by the provided scripts.

The Docker image is built using the provided Dockerfile located in the config directory.

The image name is versioned based on the value specified in the version.txt file at the root of the kxi-terraform directory.

To build the Docker image using the correct version tag, execute one of the following scripts:

LinuxWindows

./scripts/build-image.sh

.\scripts\build-image.bat

Service Principal Setup

The Terraform scripts require a Service Principal with appropriate permissions which are defined in the config/kxi-azure-tf-policy.json file. The service principal should already exist.

Note

The below commands should be run by a user with admin privileges.

Update config/kxi-azure-tf-policy.json and replace the following:

<role-name> with your desired role name
<subscription-id> with your Azure Subscription ID

Create role:

az role definition create --role-definition config/kxi-azure-tf-policy.json

Note

The role only needs to be created once and then it can be reused.

Assign role to Service Principal:

az role assignment create --assignee "${CLIENT_ID}" --role "${ROLE_NAME}" --subscription "${SUBSCRIPTION_ID}"

where:

CLIENT_ID is the Application (client) ID of an existing Service Principal
ROLE_NAME is the role name created in the previous step
SUBSCRIPTION_ID is the Azure Subscription ID

Configuration

The Terraform scripts are driven by environment variables, which configure how the Kubernetes cluster are deployed. These variables are populated by running the configure.sh script as follows.

./scripts/configure.sh

Select Azure and enter your credentials

Select Cloud Provider
Choose:
  AWS
> Azure
  GCP

Set Azure Client ID
> a7c7dd92-c0a2-48fd-8ceb-ab134fa41939

Set Azure Client Secret
> ••••••••••••••••••••••••••••••••••••••••

Set Azure Subscription ID
> 5b07c795-8e5f-4979-aa44-c9bed5b513c5

Set Azure Tenant ID
> c004d551-3955-4f08-9eca-49867395bb69

Select the Region to deploy into:

Select Region
  centralindia
  centralus
  centralusstage
  centraluseuap
  eastasia
  eastasiastage
  eastus
  eastusstage
  eastus2
  eastus2stage
  eastus2euap
  eastusstg
  europe
  france
  francecentral
  francesouth
  germany
  germanynorth
  germanywestcentral
  india
  israel
  israelcentral
  italy
  italynorth
  japan
  japaneast
  japanwest
  jioindiacentral
  jioindiawest
  korea
  koreacentral
  koreasouth
  mexicocentral
  newzealand
  newzealandnorth
  northcentralus

Select the Architecture Profile:

Select Architecture Profile
Choose:
> HA
  Performance
  Cost-Optimised

If you are using either the Performance or HA profiles, you must enter which storage type to use for rook-ceph.

Performance uses rook-ceph storage type of managed by default. Press **Enter** to use this or select another storage type:
Choose:                     
> managed                   
  premium2-disk

If you are using Cost-Optimised the following is displayed:

Cost-Optimised uses rook-ceph storage type of managed. If you wish to change this please refer to the docs.

Enter how much capacity you require for rook-ceph, if you press enter this uses the default of 100Gi.

Set how much capacity you require for rook-ceph, press Enter to use the default of 100Gi
Please note this is will be the usable storage with replication
> Enter rook-ceph disk space (default: 100)

Enter environment name which acts as an identifier for all resources.

Set environment name (Up to 8 character, can only contain lowercase letters and numbers)
> insights

Enter IPs/Subnets in CIDR notation to allow access to the Bastion Host and VPN

Set Network CIDR that will be allowed VPN access as well as SSH access to the bastion host
For convenience, this is pre-populated with your public IP address (using command: curl -s ipinfo.io/ip).
To specify multiple CIDRs, use a comma-separated list (for example, 192.1.1.1/32,192.1.1.2/32). Do not include quotation marks around the input.
For unrestricted access, set to 0.0.0.0/0. Ensure your network team allows such access.
> 0.0.0.0/0

Enter IPs/Subnets in CIDR notation to allow HTTP/HTTPS access to the cluster's ingress.

Set Network CIDR that will be allowed HTTPS access
For convenience, this is pre-populated with your public IP address (using command: curl -s ipinfo.io/ip).
To specify multiple CIDRs, use a comma-separated list (for example, 192.1.1.1/32,192.1.1.2/32). Do not include quotation marks around the input.
For unrestricted access, set to 0.0.0.0/0. Ensure your network team allows such access.
> 0.0.0.0/0

SSL certificate Configuration

Choose method for managing SSL certificates
----------------------------------------------
Existing Certificates: Requires the SSL certificate to be stored on a Kubernetes Secret on the same namespace where Insights is deployed.
Cert-Manager HTTP Validation: Issues Let's Encrypt Certificates; fully automated but requires unrestricted HTTP access to the cluster.
Choose:
> Existing Certificates
  Cert-Manager HTTP Validation

Custom Tags

The config/default_tags.json file includes the tags that will be applied to all resources. You can add your own tags in this file to customize your environment.

Deployment

To deploy the cluster and apply configuration, execute the following:

LinuxWindows

./scripts/deploy-cluster.sh

.\scripts\deploy-cluster.bat

Note

A pre-deployment check will be performed before proceeding further. If the check fails, the script will exit immediately to avoid deployment failures. You should resolve all issues before executing the command again.

This script will execute a series of Terraform and custom commands and may take some time to run. If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors. If the error is related with the Cloud Provider account (e.g. limits) you should resolve them first before executing the command again.

If any variable in the configuration file needs to be changed, the cluster should be destroyed first and then re-deployed.

For easier searching and filtering, the created resources are named/tagged using the azure-${ENV} prefix. For example, if the ENV is set to demo, all resource names/tags include the azure-demo prefix.

Cluster Access

To access the cluster, execute the following:

LinuxWindows

./scripts/manage-cluster.sh

.\scripts\manage-cluster.bat

The above command will start a shell session on a Docker container, generate a kubeconfig entry and connect to the VPN. Once the command completes, you will be able to manage the cluster via helm/kubectl.

Note

The kxi-terraform directory on the host is mounted on the container on /terraform. Files and directories created while using this container will be persisted if they are created under /terraform directory even after the container is stopped.

Note

If other users require access to the cluster, they will need to download and extract the artifact, build the Docker container and copy the kxi-terraform.env file as well as the terraform/azure/client.ovpn file (generated during deployment) to their own extracted artifact directory on the same paths. Once these two files are copied, the above script can be used to access the cluster.

Below you can find kubectl commands to retrieve information about the installed components.

List Kubernetes Worker Nodes

kubectl get nodes

List Kubernetes namespaces

kubectl get namespaces

List cert-manager pods running on cert-manager namespace

kubectl get pods --namespace=cert-manager

List nginx ingress controller pod running on ingress-nginx namespace

kubectl get pods --namespace=ingress-nginx

List rook-ceph pods running on rook-ceph namespace

kubectl get pods --namespace=rook-ceph

Environment Destroy

Before you destroy the environment, make sure you don't have any active shell sessions on the Docker container. You can close the session by executing the following:

exit

To destroy the cluster, execute the following:

LinuxWindows

./scripts/destroy-cluster.sh

.\scripts\destroy-cluster.bat

If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors.

Note

In some cases, the command may fail due to the VPN being unavailable or Azure resources not cleaned up properly. To resolve this, delete terraform/azure/client.ovpn file and execute it again.

Note

Even after the cluster is destroyed, the disks created dynamically by the application may still be present and incur additional costs. You should review the Azure Disks to verify if the data is still needed.

Advanced Configuration

It is possible to further configure your cluster by editing the newly generated kxi-terraform.env file in the current directory. These edits should be made prior to running the deploy-cluster.sh script. The list of variables which can be edited are given below:

Environment Variable	Details	Default Value	Possible Values
TF_VAR_enable_metrics	Enables forwarding of container metrics to Cloud-Native monitoring tools	false	true / false
TF_VAR_enable_logging	Enables forwarding of container metrics to Cloud-Native monitoring tools	false	true / false
TF_VAR_default_node_type	Node type for default node pool	Depends on profile	VM Instance Type
TF_VAR_letsencrypt_account	If you intend to use cert-manager to issue certificates, then you need to provide a valid email address if you wish to receive notifications related to certificate expiration	root@emaildomain.com	email address
TF_VAR_bastion_whitelist_ips	The list of IPs/Subnets in CIDR notation that are allowed VPN/SSH access to the bastion host.	N/A	IP CIDRs
TF_VAR_insights_whitelist_ips	The list of IPs/Subnets in CIDR notation that are allowed HTTP/HTTPS access to the VPC	N/A	IP CIDRs
TF_VAR_letsencrypt_enable_http_validation	Enables issuing of Let's Encrypt certificates using cert-manager HTTP validation. This is disabled by default to allow only pre-existing certificates.	false	true / false
TF_VAR_rook_ceph_storage_size	Size of usable data provided by rook-ceph.	100Gi	XXXGi
TF_VAR_enable_cert_manager	Deploy Cert Manager	true	true / false
TF_VAR_enable_ingress_nginx	Deploy Ingress NGINX	true	true / false
TF_VAR_enable_sharedfiles_storage_class	Create storage class for shared files	true	true / false
TF_VAR_rook_ceph_mds_resources_memory_limit	The default resource limit is 8Gi. You can override this to change the resource limit of the metadataServer of rook-ceph. NOTE: The MDS Cache uses 50%, so with the default setting, the MDS Cache is set to 4Gi.	8Gi	XXGi
TF_VAR_rook_ceph_storage_type	The storage type to be used for rook-ceph.	managed	managed / premium2-disk

Update whitelisted CIDRs

To modify the whitelisted CIDRs for HTTPS or SSH access, update the following variables in the kxi-terraform.env file:

# List of IPs or Subnets that will be allowed VPN access as well as SSH access
# to the bastion host for troubleshooting VPN issues.
TF_VAR_bastion_whitelist_ips=["192.168.0.1/32", "192.168.0.2/32"]

# List of IPs or Subnets that will be allowed HTTPS access
TF_VAR_insights_whitelist_ips=["192.168.0.1/32", "192.168.0.2/32"]

Once you have updated these with the correct CIDRs, run the deploy script:

LinuxWindows

./scripts/deploy-cluster.sh

.\scripts\deploy-cluster.bat

Note

You can specify up to three CIDRs, as this is the default limit imposed by the maximum number of allowed NACL rules. To use more than three, you must request a quota increase from AWS for the relevant account.