Configure a Kubernetes cluster on AWS
Overview of Deployment Scripts
This page describes the Terraform-based deployment scripts designed to configure and provision a Kubernetes (EKS) cluster on AWS to install kdb Insights Enterprise. These scripts automate the setup of essential infrastructure components including: the VPC (Virtual Private Cloud), bastion host, security groups, node groups, and associated services. They support both new and existing VPC deployments and offer flexibility through environment variable configuration and architectural profiles. The scripts are bundled in the kxi-terraform
package and are designed to run in a pre-configured Docker container for consistency and ease of use across environments.
Objectives
The goal of this tutorial is to set up and configure a Kubernetes cluster on AWS to allow users to install a kdb Insights Enterprise.
Terraform artifacts
If you have a full commercial license, kdb Insights Enterprise provides default Terraform modules which are delivered as a TGZ. These modules are available through the KX Downloads Portal.
You need to download the artifact and extract it.
Prerequisites
For this tutorial you need:
- An AWS account.
- An AWS user with access keys.
- Sufficient Quotas to deploy the cluster.
- A client machine with AWS CLI.
- A client machine with Docker.
Note
When running the scripts from a bastion host, ensure ports 1174 and 443 are open for outbound access, or enable full outbound access with a 0.0.0.0/0 security group rule.
Note
On Linux, additional steps are required to manage Docker as a non-root user.
Prerequisites for existing VPC
A VPC (Virtual Private Cloud) with the following:
- Minimum of 2 public subnets with outbound access allowed.
- Minimum of 2 private subnets.
- Public subnet Network Access List needs to allow HTTP (80) and HTTPS (443) from CIDR's that need access to Insights.
- Bastion host to be used to deploy the terraform code and Insights.
Note
These scripts also support deployment to an existing VPC (Virtual Private Cloud). If you already have a VPC, you must have access to the associated account to retrieve the necessary VPC details. Additionally, ensure that your environment meets the prerequisites outlined in the following section before proceeding with deployment to an existing VPC.
Environment Setup
-
To extract the artifact, execute the following:
tar xzvf kxi-terraform-*.tgz
This creates the kxi-terraform directory. The commands below are executed within this directory and thus use relative paths.
-
To change to this directory execute the following:
cd kxi-terraform
The deployment process is performed within a Docker container which includes all tools needed by the provided scripts. A Dockerfile is provided in the config directory that can be used to build the Docker image. The image name must be kxi-terraform and can be built using the below command:
```script
docker build -t kxi-terraform:latest ./config
```
User Setup
The following Terraform scripts require an existing user, with appropriate permissions which are defined in the config/kxi-aws-tf-policy.json file.
-
Create policy:
aws iam create-policy --policy-name "${POLICY_NAME}" --policy-document file://config/kxi-aws-tf-policy.json
Note
The policy only needs to be created once and then it can be reused.
where POLICY_NAME is your desired policy name
-
Assign policy to user:
aws iam attach-user-policy --policy-arn "${USER_POLICY_ARN}" --user-name "${USER}"
where:
- USER_POLICY_ARN is the ARN of the policy created in the previous step
- USER is the username of an existing user
Configuration
The Terraform scripts are driven by environment variables, which configure how the Kubernetes cluster are deployed. These variables are populated by running the configure.sh
script as follows.
```script
./scripts/configure.sh
```
-
Select
AWS
and enter your credentialsSelect Cloud Provider Choose: > AWS Azure GCP
Set AWS Access Key ID > ••••••••••••••••••••
Set AWS Secret Access Key > ••••••••••••••••••••••••••••••••••••••••
-
Select the Region to deploy into:
Select Region Choose: > af-south-1 ap-east-1 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 ca-central-1 ca-west-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 il-central-1 me-central-1 me-south-1 mx-central-1 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2 cn-north-1 cn-northwest-1
-
Select the Architecture Profile:
Select Architecture Profile Choose: > HA Performance Cost-Optimised
-
Select if you are deploying to an existing VPC or wish to create one:
Are you using an existing VPC or wish to create one? Choose: > New VPC Existing VPC
If you choose
Existing VPC
you are asked the following questions, if selectingNew VPC
skip ahead to the next part.Please enter the vpc id of the existing vpc: > vpc-0490ed4841d8f58cf Please enter private subnet IDs (comma-separated, no quotes): > subnet-07884998863f2d554, subnet-0e7903b8757f1025 Please enter the Network ACL that is allocated to the public subnets in the existing VPC: > acl-0c0a778b0b58d53f5 Please enter the security group ID which is attached to the bastion host you are deploying from: > sg-077098aea8747407e
-
If you are using either the
Performance
orHA
profiles, you must enter the storage type to use for rook-ceph.Performance uses rook-ceph storage type of gp3 by default. Press **Enter** to use this or select another storage type: Choose: > gp3 io2
-
If you are using
Cost-Optimised
the following is displayed:Cost-Optimised uses rook-ceph storage type of gp3. If you wish to change this please refer to the docs.
-
Enter how much capacity you require for rook-ceph, if you press enter this uses the default of 100Gi.
Set how much capacity you require for rook-ceph, press Enter to use the default of 100Gi Please note this is will be the usable storage with replication > Enter rook-ceph disk space (default: 100)
-
Enter the environment name which acts as an identifier for all resources.
Set environment name (Up to 8 character, can only contain lowercase letters and numbers) > insights
Note
When you are deploying to an existing VPC, the following step is not required.
-
Enter IPs/Subnets in CIDR notation to allow access to the Bastion Host and VPN
Set Network CIDR that will be allowed VPN access as well as SSH access to the bastion host For convenience, this is pre-populated with your public IP address (using command: curl -s ipinfo.io/ip). To specify multiple CIDRs, use a comma-separated list (for example, 192.1.1.1/32,192.1.1.2/32). Do not include quotation marks around the input. For unrestricted access, set to 0.0.0.0/0. Ensure your network team allows such access. > 0.0.0.0/0
-
Enter IPs/Subnets in CIDR notation to allow HTTP/HTTPS access to the cluster's ingress.
Set Network CIDR that will be allowed HTTPS access For convenience, this is pre-populated with your public IP address (using command: curl -s ipinfo.io/ip). To specify multiple CIDRs, use a comma-separated list (for example, 192.1.1.1/32,192.1.1.2/32). Do not include quotation marks around the input. For unrestricted access, set to 0.0.0.0/0. Ensure your network team allows such access. > 0.0.0.0/0
-
SSL certificate Configuration
Choose method for managing SSL certificates ---------------------------------------------- Existing Certificates: Requires the SSL certificate to be stored on a Kubernetes Secret on the same namespace where Insights is deployed. Cert-Manager HTTP Validation: Issues Let's Encrypt Certificates; fully automated but requires unrestricted HTTP access to the cluster. Choose: > Existing Certificates Cert-Manager HTTP Validation
Custom Tags
The config/default_tags.json
file includes the tags that are applied to all resources. You can add your own tags in this file to customize your environment.
Deployment
To deploy the cluster and apply configuration, execute the following:
./scripts/deploy-cluster.sh
.\scripts\deploy-cluster.bat
Note
A pre-deployment check is performed before proceeding further. If the check fails, the script exits immediately to avoid deployment failures. You must resolve all issues before executing the command again.
This script executes a series of Terraform and custom commands and may take some time to run. If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors. If the error is related with the Cloud Provider account, for example limits, you must resolve them first before executing the command again.
If any variable in the configuration file needs to be changed, the cluster must be destroyed first and then re-deployed.
For easier searching and filtering, the created resources are named/tagged using the aws-${ENV} prefix. For example, if the ENV is set to demo, all resource names/tags include the aws-demo prefix. An exception is the EKS Node Group EC2 instances as they use the Node Group name ("default"). If you have deployed multiple clusters, you can use the Cluster tag on the EC2 Instances Dashboard.
Cluster Access
To access the cluster, execute the following:
./scripts/manage-cluster.sh
.\scripts\manage-cluster.bat
This command starts a shell session on a Docker container, generates a kubeconfig entry and connects to the VPN. Once the command completes, you can manage the cluster via helm/kubectl.
Note
The kxi-terraform directory on the host is mounted on the container on /terraform. Files and directories created while using this container are persisted if they are created under /terraform directory even after the container is stopped.
Note
If other users require access to the cluster, they need to download and extract the artifact, build the Docker container and copy the kxi-terraform.env file as well as the terraform/aws/client.ovpn file (generated during deployment) to their own extracted artifact directory on the same paths. Once these two files are copied, the above script can be used to access the cluster.
The following kubectl commands can be used to retrieve information about the installed components.
-
List Kubernetes Worker Nodes
kubectl get nodes
-
List Kubernetes namespaces
kubectl get namespaces
-
List cert-manager pods running on cert-manager namespace
kubectl get pods --namespace=cert-manager
-
List nginx ingress controller pod running on ingress-nginx namespace
kubectl get pods --namespace=ingress-nginx
-
List rook-ceph pods running on rook-ceph namespace
kubectl get pods --namespace=rook-ceph
Environment Destroy
Before you destroy the environment, make sure you don't have any active shell sessions on the Docker container. You can close the session by executing the following:
```script
exit
```
To destroy the cluster, execute the following:
./scripts/destroy-cluster.sh
.\scripts\destroy-cluster.bat
If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors.
Note
In some cases, the command may fail due to the VPN being unavailable or AWS resources not cleaned up properly. To resolve this, delete terraform/aws/client.ovpn file and execute it again.
Note
Even after the cluster is destroyed, the disks created dynamically by the application may still be present and incur additional costs. To filter these disks on the EBS dashboard, the kubernetes.io/cluster/aws-${ENV} tag needs to be added.
Advanced Configuration
You can further configure your cluster by editing the newly generated kxi-terraform.env
file in the current directory. These edits must be made prior to running the deploy-cluster.sh
script. The list of variables which can be edited are given below:
Environment Variable | Details | Default Value | Possible Values |
---|---|---|---|
TF_VAR_enable_metrics | Enables forwarding of container metrics to AWS CloudWatch | false | true / false |
TF_VAR_enable_logging | Enables forwarding of container logs to AWS CloudWatch | false | true / false |
TF_VAR_default_node_type | Node type for default node pool | Depends on profile | EC2 Instance Type |
TF_VAR_letsencrypt_account | If you intend to use cert-manager to issue certificates, then you need to provide a valid email address if you wish to receive notifications related to certificate expiration | root@emaildomain.com | email address |
TF_VAR_bastion_whitelist_ips | The list of IPs/Subnets in CIDR notation that are allowed VPN/SSH access to the bastion host. | N/A | IP CIDRs |
TF_VAR_insights_whitelist_ips | The list of IPs/Subnets in CIDR notation that are allowed HTTP/HTTPS access to the VPC | N/A | IP CIDRs |
TF_VAR_letsencrypt_enable_http_validation | Enables issuing of Let's Encrypt certificates using cert-manager HTTP validation. This is disabled by default to allow only pre-existing certificates. | false | true / false |
TF_VAR_rook_ceph_storage_size | Size of usable data provided by rook-ceph. | 100Gi | XXXGi |
TF_VAR_enable_cert_manager | Deploy Cert Manager | true | true / false |
TF_VAR_enable_ingress_nginx | Deploy Ingress NGINX | true | true / false |
TF_VAR_enable_cluster_autoscaler | Deploy AWS Cluster Autoscaler | true | true / false |
TF_VAR_enable_ebs_csi_driver | Deploy EBS CSI Driver | true | true / false |
TF_VAR_enable_efs_csi_driver | Deploy EFS CSI Driver | true | true / false |
TF_VAR_rook_ceph_mds_resources_memory_limit | The default resource limit is 8Gi. You can override this to change the resource limit of the metadataServer of rook-ceph. NOTE: The MDS Cache uses 50%, so with the default setting, the MDS Cache is set to 4Gi. | 8Gi | XXGi |
TF_VAR_rook_ceph_storage_type | The storage type to be used for rook-ceph. | gp3 | gp3 / io2 |
Update whitelisted CIDRs
To modify the whitelisted CIDRs for HTTPS or SSH access, update the following variables in the kxi-terraform.env
file:
```hcl
# List of IPs or Subnets that will be allowed VPN access as well as SSH access
# to the bastion host for troubleshooting VPN issues.
TF_VAR_bastion_whitelist_ips=["192.168.0.1/32", "192.168.0.2/32"]
# List of IPs or Subnets that will be allowed HTTPS access
TF_VAR_insights_whitelist_ips=["192.168.0.1/32", "192.168.0.2/32"]
```
Once you have updated these with the correct CIDRs, run the deploy script:
./scripts/deploy-cluster.sh
.\scripts\deploy-cluster.bat
Note
You can specify up to three CIDRs, as this is the default limit imposed by the maximum number of allowed NACL rules. To use more than three, you must request a quota increase from AWS for the relevant account.
Existing VPC notes
If you are deploying to an existing VPC you need to ensure that the public subnets that are used do not restrict traffic over http (80) and https (443) from the sources you wish to access insights from.