Backup and Restore
kdb Insights CLI allows you to backup data stored in kdb Insights Enterprise hosted on any cloud provider, to an Azure blob using K8up.
The following data repositories are backed up as part of this process:
- Historical database (HDB)
- Intraday database (IDB)
- Packages repository
- Postgres database used by Keycloak
A backup can only target Azure
A backup can be taken from kdb Insights Enterprise that is hosted on any cloud provider, but they can only be backed up to Azure at present.
Prerequisities
Before taking a backup, the following prerequisites need to be in place:
- kdb Insights CLI is installed
- You have a running instance of kdb Insights Enterprise
- You have direct Kubernetes cluster access
Backup
Initialization
-
Teardown the following:
-
Teardown all databases deployed in the UI and any assemblies managed outside of the UI.
-
Teardown any pipelines that are ingesting data into these databases and assemblies.
Teardown but do not clean up
From the UI make sure you do not check the clean up resources option as that deletes all data as well.
-
-
Publishers, or feedhandlers, sending data to any of the databases or assemblies that use an RT SDK can remain running as long as the local storage is sufficient to hold all the published messages while the database is offline.
Publisher warnings
The publishers emit warning messages about the dropped connection if you keep them running.
-
You need to annotate each ReadWriteOnce PVCs, as these are not relevant to the backup, and can cause issues. This can be done as follows:
kubectl get pvc -A -o json | jq '.items | map(. | select(.spec.accessModes[] | contains("ReadWriteOnce")) | .metadata.name )' | jq .[] --raw-output > /tmp/rwovolumes cat /tmp/rwovolumes | while read rwovolumes do kubectl annotate pvc $rwovolumes k8up.io/backup=false -n insights done
Adding this annotation to each ReadWriteOnce PVC is essential for a successful backup and restore
If the annotation is not added to ReadWriteOnce volumes they are also backed up. This creates a large number of unnecessary snapshots for volumes that we do not want to restore. Also, for certain K8up versions, a large number of snapshots can cause corruptions in its repository.
-
The kdb Insights CLI needs the Azure storage account properties to be initialized:
-
Run the
backup init
command as follows:kxi backup init -n <NAMESPACE>
where
is the namespace where the resources you wish to backup are located. -
You are prompted for the following storage account details:
Please enter Azure storage account name: <ACCOUNT> Please enter Azure storage account access key: <ACCESS_KEY> Please enter custom Restic repo password: <PASSWORD> Repeat for confirmation: <PASSWORD>
-
Cloud provider options. You are asked to specify the cloud provider. Currently 'AZURE' is the only available option, therefore any other provider specified is ignored.
```sh Determining cloud provider... Cloud provider: AZURE Determining cloud provider... Cloud provider: AZURE Please enter target object store type AZURE/GCP/AWS [AZURE]: AZURE ```
-
When the initialization is complete, the following messages are displayed:
Secret created: <ACCOUNT> Secret created: <ACCESS_KEY> Postgres pod annotation successful: insights-postgresql-0
-
The initialization only needs to be done before the first backup is taken or when the credentials change.
Changing credentials
If you change the credentials, remove the backup-repo
and azure-blob-creds
secrets from the target Kubernetes namespace before you run this initialization again.
Start the backup
To start the backup take the following steps:
-
Run the
backup set-backup
command as follows:kxi backup set-backup -n <NAMESPACE>
-
You are prompted for the following details:
JOB_NAME
: used to identify the backup job when checking the statusCONTAINER_NAME
: ensure this blob container has been created before the back starts.
Please enter backup job name: <JOB_NAME> Please enter backup job blob container name: <CONTAINER_NAME>
Note
As part of the initialization step you defined the storage account details, but as part of the backup step you can choose which blob container to use.
-
When the backup has been started, the following messages are displayed:
Configure and start a backup K8up Backup CRD creation done: backupjobname
Do not abort a backup
Once started, we recommend that you do not abort a backup as the Azure blob container will be left in an unknown state.
-
Check the backup status. As K8up CRD-s are similar objects to a pod, you can use the
get
verb to list basic information:kubectl get backups --namespace insights
K8up operator schedules a backup pod using the backup job name you picked above. Detailed information can be found in its logs.
-
When the backup is complete it is present in the backup snapshots list. We recommend that you check your Azure blob container folder, contains the following:
/data /index /keys /snapshots config
Snaphots
To list the completed backups in a specific blob container, call the kxi backup snapshots
command. This provides details of the snapshot id, which needs to be referenced as part of the restore, as well as the time the backup completed and the path to the backup.
-
Run the snapshots command:
kxi backup snapshots -n <NAMESPACE>
-
Enter the blob container name:
Please enter backup job blob container name: <CONTAINER_NAME>
-
A list is returned; it contains the backups that have completed in this blob container, as shown in the example below:
Check and list created snapshots Pod creation done: k8up-snapshot-list-pod Reading logs ID Time Host Tags Paths ---------------------------------------------------------------------------------- 46aba94e 2023-06-22 09:34:12 insights /data/insights-packages-pvc ab82c3a1 2023-06-22 09:34:15 insights /data/assembly-hdb ab82c3a1 2023-06-22 09:34:15 insights /data/assembly-idb 59d90c13 2023-06-22 09:34:24 insights /insights-postgresql.sql 1999a27a 2023-06-23 10:48:32 insights /data/insights-packages-pvc aba6d2ae 2023-06-23 10:48:37 insights /data/assembly-hdb 1ba0e0db 2023-06-23 10:48:47 insights /data/assembly-idb 3aff729d 2023-06-23 10:49:10 insights /insights-postgresql.sql ---------------------------------------------------------------------------------- 8 snapshots Deleting pod Pod deletion successful: k8up-snapshot-list-pod
Restore
Currently restoration is not available as part of the kdb Insights CLI, but it can be done via a K8up Restore CRD.
HDB, IDB and packages
To restore the HDB, IDB and packages repository follow the steps below:
-
When restoring the HDB or IDB, teardown the following:
-
Teardown all databases deployed in the UI and any assemblies managed outside of the UI
-
Teardown any pipelines that are ingesting data into these databases and assemblies.
Teardown but not clean up
From the UI make sure you do not check the clean up resources option as that will delete all the resources.
-
-
When restoring the HDB or IDB, publishers, or feedhandlers, sending data to kdb Insights Enterprise that use an RT language interface can remain running as long as their local storage is sufficient to hold all the messages being published while the database is offline.
-
Prepare target volumes:
The target system is either the original one or one freshly created, as mentioned above.
-
We recommended that you ensure an exact copy of the database/assembly definition is defined on the target system to ensure all underlaying objects are provisioned.
-
The target database/assembly should be stopped (but not cleared).
-
HDB and IDB volumes must be cleaned manually before restoration using the following commands:
kubectl exec -n insights <ASSEMBLYNAME>-sm-0 -- bash -c "rm -rf /data/db/idb/*" kubectl exec -n insights <ASSEMBLYNAME>-sm-0 -- bash -c "rm -rf /data/db/hdb/*"
Set ASSEMBLYNAME to the name of your database or assembly.
-
-
Save a yaml file for each repository being restored with the following content:
apiVersion: k8up.io/v1 kind: Restore metadata: name: <NAME> namespace: <NAMESPACE> spec: podSecurityContext: fsGroup: 65532 fsGroupChangePolicy: OnRootMismatch restoreMethod: folder: claimName: <CLAIM_NAME> snapshot: <SNAPSHOT_ID> backend: repoPasswordSecretRef: name: backup-repo key: password azure: container: k8upcontainer accountNameSecretRef: name: azure-blob-creds key: username accountKeySecretRef: name: azure-blob-creds key: password
-
Update the yaml files as follows:
NAME
- Restore CRD nameNAMESPACE
- Name of the namespace where kdb Insights Enterprise is deployedCLAIM_NAME
- Target PersistentVolumeClaimSNAPSHOT_ID
- the appropriate snapshot ID collected from thekxi backup snapshots
command, or the Restic snapshot list.
-
Apply the files using the following command:
kubectl apply -f <your_file>.yaml
-
Check the restore status. As K8up CRD-s are similar objects to a pod, you can use the
get
verb to list basic information:kubectl get restores --namespace <NAMESPACE>
K8up operator schedules a restore pod named after the backup name you picked above, detailed information can be found in its logs.
-
When the restore jobs are complete, start the restored assemblies/databases, pipelines and publishers you might have stopped.
-
Run a simple query to verify the restored data. You can do this using any of the querying methods available, including the UI and REST.
Postgres database used by Keycloak
To restore the Postgres database, follow the steps below.
-
Install restic on your local machine:
export PASSWORD=<resticRepoPassword> export ACCOUNT=<azureStorageAccountName> export ACCESS_KEY=<azureStorageAccountAccessKey> sudo apt-get install restic sudo restic self-update
-
Set the number of replicas to 0 for the Keycloak statefulset to prevent modifications to the database while it is being restored.
kubectl scale statefulsets $KEYCLOAK_STATEFULSET --replicas=0
-
Copy the backup into the Postgresql primary pod and connect to it.
restic -r <OBJ_STORE_TYPE>:<CONTAINER_NAME>:/ restore <SNAPSHOT_ID> --target /tmp/ kubectl cp /tmp/insights-postgresql.sql <NAMESPACE>/insights-postgresql-0/opt/init.sql
where:
- Azure blob container name in the Storage Account - Currently on azure
is supported- namespace where the Postgres pod runs - the appropriate snapshot ID collected from the kxi backup snapshots
command, or the Restic snapshot list.
-
Drop the existing database:
cat <<EOF > /opt/init.sql drop database $POSTGRES_DB; create database $POSTGRES_DB; create user $POSTGRES_USER; alter role $POSTGRES_USER with password '$POSTGRES_PASSWORD'; grant all privileges on database $POSTGRES_DB to $POSTGRES_USER; alter database $POSTGRES_DB owner to $POSTGRES_USER; EOF
# This command will prompt for a password # The password for the 'postgres' user can be view in the environment variable POSTGRESQL_POSTGRES_PASSWORD psql -U postgres < /opt/init.sql;
-
Restore the backup (replacing
with the appropriate value): # This command will prompt for a password # The password for the 'postgres' user can be view in the environment variable POSTGRESQL_POSTGRES_PASSWORD psql -U postgres $POSTGRES_DB < /opt/<backup file>;
-
Detach from the pod using CTRL+P,CTRL+Q.
-
Scale the number of Keycloak replicas back to 1.
kubectl scale statefulsets $KEYCLOAK_STATEFULSET --replicas=1