Keycloak and PostgreSQL Configuration¶

This page explains how Keycloak and PostgreSQL are deployed and configured within kdb Insights Enterprise.

Keycloak is an open-source identity and access management platform used by kdb Insights Enterprise to provide features such as:

User authentication and authorization
Service account authentication and authorization
Role management
Single sign on (SSO) and identity brokering

Keycloak is deployed using the Codecentric Helm chart, which uses images from Codecentric’s public image catalog.

In kdb Insights Enterprise Keycloak is backed by PostgreSQL. To support high-availability (HA) deployments and improve configurability the CloudNativePG (CNPG) is used. CloudNativePG is the Kubernetes operator that covers the full lifecycle of a highly available PostgreSQL database cluster with a primary/standby architecture, using native streaming replication.

Install scenarios¶

The install scenarios are as follows:

Operation	Database Behavior
New Install	A new empty CNPG database cluster is created and initialized.
Upgrade	The data and roles from the existing PostgreSQL database are automatically migrated into the new CNPG cluster to preserve all application data. The old PostgreSQL volume is retained in case of a rollback being required.
Rollback	When rolling back to an version prior to 1.17 the rollback reuses the existing PersistentVolumeClaim (PVC) from the previous PostgreSQL installation to restore the original database state. Changes made to the database with the upgraded system will be lost.

Keycloak Configuration¶

Configuration of Keycloak can be managed through the values file.

Example configuration snippet:

global:
  keycloak:
    auth:
      existingSecret: kxi-keycloak
    guiClientSecret: guiClientSecret
    operatorClientSecret: operatorClientSecret

keycloak:
  importUsers: true
  initClient:
    clientId: test-client
    clientSecret: test-secret
    enabled: true
  initUser:
    auth: test-password
    name: test-user
    enabled: true
  replicas: 3
  resources:
    requests:
      cpu: 80m
      memory: 128Mi

CNPG Configuration¶

Configuration of both the CNPG database and the CNPG operator can be managed through the values file.

The following configuration snippet shows the current defaults:

cnpg-database:
  image: ghcr.io/cloudnative-pg/postgresql:17.6-202511030807-standard-bullseye
  instances: 3
  resources:
    limits:
      cpu: 2000m
      memory: 400Mi
    requests:
      cpu: 50m
      memory: 100Mi
  storage: 8Gi
  max-slot-wal-keep-size-mb: 1024
cnpg-operator:
  private-registry:
    enabled: false
    host: registry-local.aws-red.kxi-dev.kx.com
    pull-secret: kxi-registry-pull-secret
  version: 0.25.0

Configuration changes¶

You can adjust the above fields based on your environment and deployment requirements.

For example, you can change the number of replicas by changing the following:

keycloak:
  replicas: <Value>
cnpg-database:
  instances: <Value>

Troubleshooting¶

CNPG pods may become unhealthy when a former primary or lagging replica requires WAL log segments that have already been recycled. For example, after lowering max_slot_wal_keep_size below the WAL retained for an inactive replication slot. PostgreSQL’s max_slot_wal_keep_size controls how much WAL a replication slot may retain at checkpoint time. If exceeded, a standby may no longer be able to continue replication.

Recovery depends on whether CNPG has already promoted another instance.

One way to find which instance is the current primary is to execute the following commands:

kubectl get pods -n <namespace> -l cnpg.io/cluster=cnpg-database -o wide lists all CNPG pods and their IP addresses.
kubectl get endpointslice -n <namespace> -l kubernetes.io/service-name=cnpg-database-rw — returns the IP address of the current read/write endpoint, which identifies the primary instance.

Based on these details, different actions are required:

If another instance has been promoted, the broken pod should be treated as a former primary. Because it failed due to missing WAL segments, its PVC cannot be reused. Delete the PVC for the former primary and let CNPG recreate the instance as a fresh replica. After a new CNPG replica is added, you may also need to delete the old failed pod.

If no new primary exists, do not delete the old primary PVC immediately, because it may contain the only latest copy of data. First determine whether any standby is promotable. If a standby can be promoted, promote/fail over to it using CNPG-supported operations, then rebuild the old primary by deleting its PVC.

Steps include:

Identify all the pods:

kubectl get pods -n <namespace> -l cnpg.io/cluster=cnpg-database -o wide

Check which instances can answer SQL:

for pod in <cnpg-pod-1> <cnpg-pod-2> <cnpg-pod-3-etc>; do
  echo "=== $pod ==="
  kubectl exec -n <namespace> "$pod" -- \
    psql -U postgres -d postgres -tAc \
    "SELECT pg_is_in_recovery(), pg_last_wal_replay_lsn();" 2>/dev/null || echo "cannot connect"
done

f | ...  means primary
t | ...  means replica

Request promotion by patching the CNPG cluster status:

kubectl patch cluster cnpg-database -n <namespace> --type merge --subresource=status -p '{"status":{"targetPrimary":"cnpg-database-2"}}'

Replace cnpg-database-2 and namespace with the new healthy pod and namespace accordingly.

Verify the new primary:

kubectl exec -n <namespace> cnpg-database-2 -- \
  psql -U postgres -d postgres -tAc "SELECT pg_is_in_recovery();"

Only delete the old primary PVC after confirming another instance is the current writable primary.

If no standby is promotable, recover from backup.

For more details reference the CNPG troubleshooting guide.