Data persistence and migration
KX Insights Storage Manager (SM) subscribes to data from an ingestion stream and persists it at scheduled intervals within a day, and at the end of day. (These processes are referred to as end-of-interval and end-of-day writedown respectively.) The writedown process organizes it on disk in an appropriate, fault-tolerant manner, dividing days into periods and time into days.
Storage Manager also integrates with
- KX Insights Data Access and its Data Access processes (DAPs)
- custom DAPs (i.e. RDBs and HDBs in a traditional kdb+ application) of an existing custom application
for management of the data lifecycle.
It also migrates data from pre-defined tiers of on-disk storage, such as from SSD to spinning disk to object storage. This migration occurs as the last step of the daily persistence operations of data.
The data lifecycle¶
In the initial phase of the data lifecyle the Storage Manager moves data from an in-memory store (e.g. a Data Access Process, or a RDB) to on-disk storage. SM moves the data without interrupting the running processes. When complete, it signals the aforementioned processes that writedown operations are complete and it is safe to purge memory of the appropriate data. (More on this below.)
Each writedown event performed by SM is considered as a general migration of data between the storage tiers. Each DAP exposes the data from a certain mount, which corresponds to one or more storage tiers. The details of mounts and storage tiers are specified in the assembly configuration YAML file.
The first step of moving data from memory to disk involves a Storage Manager concept called the Interval Database, or IDB. This is an on-disk datastore that is written to at a prescribed, configurable interval within a single day. Its purpose is to relieve memory pressure by allowing more frequent pushes to disk, rather than waiting until an end-of-day writedown. A portion of the in-memory data is moved to the IDB at a regular interval throughout the day. This interval is commonly one hour, but should be determined by the expected data volumes of the application, and the resource allocation of its hardware.
For example, if data volumes are very high, and machine memory is limited, the capacity of an in-memory datastore is probably limited; frequent writes to the IDB are required to avoid memory starvation and its impact on system operations. On the other hand, if data volume is low enough that the RDB processes can hold an entire day’s data, the system can be configured without an IDB mount (but with an IDB storage tier), in which case the RDB purge will be triggered by SM only after the end-of-day writedown. (For a general storage tier migration, SM only notifies the processes which use the mounts affected by the migration.)
The IDB populates individual partitions on disk at the specified interval throughout the day. At the end-of-day point, Storage Manager persists all IDB partitions of the data to the HDB and the intraday interaction between data in memory and the IDB will restart for the following day.
Storage Manager comprises four processes.
These components operate in an integrated manner, and are not built for individual use.
is responsible for the coordination of the writedown operations, as well as exposing the front end interface for other microservices to communicate with SM.
is responsible for performing the end-of-interval operation that persists a portion of an in-memory datastore to disk, stored in IDB partitions.
is responsible for performing the end-of-day operation that persists the entirety of the on-disk IDB data to disk, stored in HDB partitions.
is responsible for performing the migration of data between HDB storage tiers, which are unique portions of the on-disk database spread across various storage volumes. Such volumes are commonly of various storage types, ranging from high-performance storage (for most-recent, business-critical data) to slower, cheaper storage for data less frequently accessed, possibly in a compressed format.
To achieve the instant reloading of HDB and IDB and full recoverability from any writedown failure, SM creates the loadable kdb+ database where the table directories are symbolic links to the versioned physical table data. If an existing kdb+ database is detected in SM’s configured first HDB-tier directory on the first run, it will be enhanced with all the symbolic links SM needs for managing writedown.
SM writes down all tables it receives from the message stream, as configured in the assembly schema.
Currently, each partitioned table must have a timestamp column (configured as
prtnCol in the schema) which is used to determine the target partition date for end-of-day writedown.
Storage Manager supports migrating historical data to object storage (as the last HDB storage tier) and making it accessible as a segment of a regular loadable HDB. Note that all date partitions written to object storage are treated as immutable, and the late table data targeting any partition already migrated to object storage will be discarded (with a warning message in DBM logs).
Storage Manager is most appropriate where you need
- a writedown process resilient against failures
- a parallelized writedown process for greater efficiency in persistence operations
- to ensure writedown operations do not impact other system processes and their functions, e.g. serving queries
- to move data from on-disk storage to object storage in the cloud
- to run a tiered database and seamlessly move data through the tiers
- to compress certain tiers of your database
- Data-agnostic in design and implementation
- Commits ingested data to disk in an organized, fault-tolerant manner that is resilient against failure at any point during writedown operations and efficient for querying
- Manages its own checkpoint state
- Supports transparent periodic data reorganization processing
- Notifies processes affected by data reorganization
- Supports various types of storage (memory, NVMe, SSD, etc.)
- Provides parallelism of maintenance operations where possible, to reduce elapsed time
- Supports late data (data arriving on a day different from that with which it is associated)
- Supports standard kdb+ table types (basic, splayed, partitioned)
- Supports tiered storage with configurable migration and compression policies
- Supports starting off from a pre-existing standard kdb+ partitioned database (which gets converted into SM format)
- Supports migrating historical data to object storage
- Support for dynamic schema changes with zero downtime
- Dynamic scalability
- Export of on-disk data to a standard kdb+ partitioned database