Keycloak and PostgreSQL Configuration¶
This page explains how Keycloak and PostgreSQL are deployed and configured within kdb Insights Enterprise.
Keycloak is an open-source identity and access management platform used by kdb Insights Enterprise to provide features such as:
- User authentication and authorization
- Service account authentication and authorization
- Role management
- Single sign on (SSO) and identity brokering
Keycloak is deployed using the Codecentric Helm chart, which uses images from Codecentric’s public image catalog.
In kdb Insights Enterprise Keycloak is backed by PostgreSQL. To support high-availability (HA) deployments and improve configurability the CloudNativePG (CNPG) is used. CloudNativePG is the Kubernetes operator that covers the full lifecycle of a highly available PostgreSQL database cluster with a primary/standby architecture, using native streaming replication.
Install scenarios¶
The install scenarios are as follows:
| Operation | Database Behavior |
|---|---|
| New Install | A new empty CNPG database cluster is created and initialized. |
| Upgrade | The data and roles from the existing PostgreSQL database are automatically migrated into the new CNPG cluster to preserve all application data. The old PostgreSQL volume is retained in case of a rollback being required. |
| Rollback | When rolling back to an version prior to 1.17 the rollback reuses the existing PersistentVolumeClaim (PVC) from the previous PostgreSQL installation to restore the original database state. Changes made to the database with the upgraded system will be lost. |
Keycloak Configuration¶
Configuration of Keycloak can be managed through the values file.
Example configuration snippet:
global:
keycloak:
auth:
existingSecret: kxi-keycloak
guiClientSecret: guiClientSecret
operatorClientSecret: operatorClientSecret
keycloak:
importUsers: true
initClient:
clientId: test-client
clientSecret: test-secret
enabled: true
initUser:
auth: test-password
name: test-user
enabled: true
replicas: 3
resources:
requests:
cpu: 80m
memory: 128Mi
CNPG Configuration¶
Configuration of both the CNPG database and the CNPG operator can be managed through the values file.
The following configuration snippet shows the current defaults:
cnpg-database:
image: ghcr.io/cloudnative-pg/postgresql:17.6-202511030807-standard-bullseye
instances: 3
resources:
limits:
cpu: 2000m
memory: 400Mi
requests:
cpu: 50m
memory: 100Mi
storage: 8Gi
max-slot-wal-keep-size-mb: 1024
cnpg-operator:
private-registry:
enabled: false
host: registry-local.aws-red.kxi-dev.kx.com
pull-secret: kxi-registry-pull-secret
version: 0.25.0
Configuration changes¶
You can adjust the above fields based on your environment and deployment requirements.
For example, you can change the number of replicas by changing the following:
keycloak:
replicas: <Value>
cnpg-database:
instances: <Value>
Troubleshooting¶
CNPG pods may become unhealthy when a former primary or lagging replica requires WAL log segments that have already been recycled. For example, after lowering max_slot_wal_keep_size below the WAL retained for an inactive replication slot.
PostgreSQL’s max_slot_wal_keep_size controls how much WAL a replication slot may retain at checkpoint time. If exceeded, a standby may no longer be able to continue replication.
Recovery depends on whether CNPG has already promoted another instance.
One way to find which instance is the current primary is to execute the following commands:
kubectl get pods -n <namespace> -l cnpg.io/cluster=cnpg-database -o widelists all CNPG pods and their IP addresses.kubectl get endpointslice -n <namespace> -l kubernetes.io/service-name=cnpg-database-rw— returns the IP address of the current read/write endpoint, which identifies the primary instance.
Based on these details, different actions are required:
-
If another instance has been promoted, the broken pod should be treated as a former primary. Because it failed due to missing WAL segments, its PVC cannot be reused. Delete the PVC for the former primary and let CNPG recreate the instance as a fresh replica. After a new CNPG replica is added, you may also need to delete the old failed pod.
-
If no new primary exists, do not delete the old primary PVC immediately, because it may contain the only latest copy of data. First determine whether any standby is promotable. If a standby can be promoted, promote/fail over to it using CNPG-supported operations, then rebuild the old primary by deleting its PVC.
Steps include:
-
Identify all the pods:
kubectl get pods -n <namespace> -l cnpg.io/cluster=cnpg-database -o wide -
Check which instances can answer SQL:
for pod in <cnpg-pod-1> <cnpg-pod-2> <cnpg-pod-3-etc>; do echo "=== $pod ===" kubectl exec -n <namespace> "$pod" -- \ psql -U postgres -d postgres -tAc \ "SELECT pg_is_in_recovery(), pg_last_wal_replay_lsn();" 2>/dev/null || echo "cannot connect" done f | ... means primary t | ... means replica-
Request promotion by patching the CNPG cluster status:
kubectl patch cluster cnpg-database -n <namespace> --type merge --subresource=status -p '{"status":{"targetPrimary":"cnpg-database-2"}}'Replace
cnpg-database-2andnamespacewith the new healthy pod and namespace accordingly. -
Verify the new primary:
kubectl exec -n <namespace> cnpg-database-2 -- \ psql -U postgres -d postgres -tAc "SELECT pg_is_in_recovery();"- Only delete the old primary PVC after confirming another instance is the current writable primary.
-
-
If no standby is promotable, recover from backup.
For more details reference the CNPG troubleshooting guide.