Configure a Kubernetes cluster on AWS
Objectives
The goal of this tutorial is to set up and configure a Kubernetes cluster on AWS to allow users to install a kdb Insights Enterprise.
Terraform artifacts
If you have a full commercial license, KX Insights Terraform modules are delivered as a TGZ available via Nexus.
You will need to download the artifact and extract it.
Prerequisites
For this tutorial you will need:
An AWS account.
An AWS user with access keys.
Sufficient Quotas to deploy the cluster.
A client machine with AWS CLI.
A client machine with Docker.
Note
On Linux, additional steps are required to manage Docker as a non-root user.
Environment Setup
To extract the artifact, execute the following:
tar xzvf kxi-terraform-*.tgz
The above command will create the kxi-terraform directory. The commands below are executed within this directory and thus use relative paths.
To change to this directory execute the following:
cd kxi-terraform
The deployment process is performed within a Docker container which includes all tools needed by the provided scripts. A Dockerfile is provided in the config directory that can be used to build the Docker image. The image name should be kxi-terraform and can be built using the below command:
docker build -t kxi-terraform:latest ./config
User Setup
The Terraform scripts require a user with appropriate permissions which are defined in the config/kxi-aws-tf-policy.json file. The user should already exist.
Note
The below commands should be run by a user with admin privileges.
Create policy:
aws iam create-policy --policy-name "${POLICY_NAME}" --policy-document file://config/kxi-aws-tf-policy.json
Note
The policy only needs to be created once and then it can be reused.
where:
- POLICY_NAME is your desired policy name
Assign policy to user:
aws iam attach-user-policy --policy-arn "${USER_POLICY_ARN}" --user-name "${USER}"
where:
- USER_POLICY_ARN is the ARN of the policy created in the previous step
- USER is the username of an existing user
Configuration
The Terraform scripts are driven by environment variables which configure how the Kubernetes cluster will be deployed. These variables are stored in the kxi-terraform.env file in the base directory.
Copy environment file to base directory
cp config/kxi-terraform-aws.env kxi-terraform.env
copy config\kxi-terraform-aws.env kxi-terraform.env
Update kxi-terraform.env file and populate the following variables:
-
AWS_ACCESS_KEY_ID : Access Key ID of user
-
AWS_SECRET_ACCESS_KEY : Secret Access Key of user
-
AWS_DEFAULT_REGION : Region to deploy the cluster. Make sure you update this to your desired region
-
TF_VAR_region : Region to deploy the cluster. Should have the same value with AWS_DEFAULT_REGION
-
ENV : Unique identifier for all resources. You will need to change it if you want to repeat the process and create an additional cluster. The variable can only contain lowercase letters and numbers
-
TF_VAR_enable_metrics: Enables forwarding of container metrics to AWS CloudWatch. This is disabled by default and can be enabled by setting the variable to true.
-
TF_VAR_enable_logging: Enables forwarding of container logs to AWS CloudWatch. This is disabled by default and can be enabled by setting the variable to true.
-
TF_VAR_letsencrypt_account : Email account for Let's Encrypt registration and notifications. If you intend to use cert-manager to issue certificates then you will need to provide a valid email address if you wish to receive notifications related to certificate expiration
-
TF_VAR_bastion_whitelist_ips: The list of IPs/Subnets in CIDR notation that are allowed VPN/SSH access to the bastion host.
-
TF_VAR_insights_whitelist_ips: The list of IPs/Subnets in CIDR notation that are allowed HTTP/HTTPS access to the VPC.
-
TF_VAR_letsencrypt_enable_http_validation: Enables issuing Let's Encrypt certificates using cert-manager HTTP validation. This is disabled by default to allow only pre-existing certificates.
Kubernetes Nodes Setup
Depending on the requirements, the following three options are available:
Default node pool and Dedicated node pool for rook-ceph - Local SSDs for rook-ceph (default option)
-
TF_VAR_default_node_type : Instance type for Kubernetes nodes. Default value is sufficient in most cases.
-
TF_VAR_enable_rook_ceph_node_pool : Enables the dedicated node pool for rook-ceph
-
TF_VAR_rook_ceph_pool_node_type : Instance type for the dedicated node pool for rook-ceph. The node type should support Local SSDs. Default value is sufficient in most cases.
Default node pool - Local SSDs for rook-ceph
- TF_VAR_default_node_type : Instance type for Kubernetes nodes. The node type should support Local SSDs. Default value is sufficient in most cases.
Autoscaling Consideration
The default node type uses local SSDs to provide the best possible performance. This will allow the cluster to scale up but block scaling down operations if the utilisation is low since the cluster autoscaler cannot remove nodes that run pods using local storage. Therefore, additional costs may be incurred.
Default node pool - Cloud Storage for rook-ceph
-
TF_VAR_default_node_type : Instance type for Kubernetes nodes. Default value is sufficient in most cases.
-
TF_VAR_rook_ceph_storage_type : Storage Class available in Kubernetes. Variable should not be changed in most cases.
-
TF_VAR_rook_ceph_storage_size : Size of usable data provided by rook-ceph.
Note
The last two variables should be enabled only if using a node type that supports Cloud Storage.
Deployment
To deploy the cluster and apply configuration, execute the following:
./scripts/deploy-cluster.sh
.\scripts\deploy-cluster.bat
Note
A pre-deployment check will be performed before proceeding further. If the check fails, the script will exit immediately to avoid deployment failures. You should resolve all issues before executing the command again.
This script will execute a series of Terraform and custom commands and may take some time to run. If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors. If the error is related with the Cloud Provider account (e.g. limits) you should resolve them first before executing the command again.
If any variable in the configuration file needs to be changed, the cluster should be destroyed first and then re-deployed.
For easier searching and filtering, the created resources are named/tagged using the aws-${ENV} prefix. For example, if the ENV is set to demo, all resource names/tags include the aws-demo prefix. An exception is the EKS Node Group EC2 instances as they use the Node Group name ("default"). If you have deployed multiple clusters, you can use the Cluster tag on the EC2 Instances Dashboard.
Cluster Access
To access the cluster, execute the following:
./scripts/manage-cluster.sh
.\scripts\manage-cluster.bat
The above command will start a shell session on a Docker container, generate a kubeconfig entry and connect to the VPN. Once the command completes, you will be able to manage the cluster via helm/kubectl.
Note
The kxi-terraform directory on the host is mounted on the container on /terraform. Files and directories created while using this container will be persisted if they are created under /terraform directory even after the container is stopped.
Note
If other users require access to the cluster, they will need to download and extract the artifact, build the Docker container and copy the kxi-terraform.env file as well as the terraform/aws/client.ovpn file (generated during deployment) to their own extracted artifact directory on the same paths. Once these two files are copied, the above script can be used to access the cluster.
Below you can find kubectl commands to retrieve information about the installed components.
List Kubernetes Worker Nodes
kubectl get nodes
kubectl get namespaces
List cert-manager pods running on cert-manager namespace
kubectl get pods --namespace=cert-manager
List nginx ingress controller pod running on ingress-nginx namespace
kubectl get pods --namespace=ingress-nginx
List rook-ceph pods running on rook-ceph namespace
kubectl get pods --namespace=rook-ceph
Environment Destroy
Before you destroy the environment, make sure you don't have any active shell sessions on the Docker container. You can close the session by executing the following:
exit
To destroy the cluster, execute the following:
./scripts/destroy-cluster.sh
.\scripts\destroy-cluster.bat
If the command fails at any point due to network issues/timeouts you can execute again until it completes without errors.
Note
In some cases, the command may fail due to the VPN being unavailable or AWS resources not cleaned up properly. To resolve this, delete terraform/aws/client.ovpn file and execute it again.
Note
Even after the cluster is destroyed, the disks created dynamically by the application may still be present and incur additional costs. To filter these disks on the EBS dashboard, the kubernetes.io/cluster/aws-${ENV} tag needs to be added.