Tiers describe the locality, segmentation format, and rollover configuration of each storage tier. Storage tiers are used to migrate data over time from fast, expensive storage to slower, less-expensive storage. Depending on your use case, this configuration can be tuned to either have more data in memory for faster query performance, or have more data on disk to reduce costs.
Refer to tier configuration for the full list of configuration options.
Tiers differ in performance and cost. As data ages, its value decreases and it can be transitioned to slower, less-expensive storage. Tiers for a database are split into a single stream tier, a single ordinal tier and one or more date tiers. The type of the tier is determined based on the configuration of the tier's mount.
sm: tiers: - name: rdb mount: rdb - name: idb mount: idb schedule: freq: 0D00:10:00 # every 10 minutes - name: hdb mount: hdb schedule: freq: 1D00:00:00 # every day snap: 01:35:00 # at 1:35 AM retain: time: 2 days
A stream tier is responsible for in-memory data that is received between write-down events. This tier is used to hold the most recent data available in memory for the fastest possible query results. This tier is typically referred to as the real-time database or RDB. This tier has no persisted data, so it must have enough memory capacity to hold data for the interval between writes to disk. This interval has a default of 10 minutes. There can only be one stream tier in a tier configuration and it must always be the first tier.
A stream tier is configured by using a stream mount with no partitioning scheme.
mounts: rdb: type: stream partition: none elements: sm: tiers: - name: rdb mount: rdb
An ordinal tier is responsible for data written to disk after migrating out of the stream tier. This tier holds data that has been written periodically throughout the day after flushing the data from the stream tier. As a result, it is typically referred to as the intra-day database or IDB. Data in this tier is stored on disk in a partitioned table that is sequenced in order of the intervals written throughout the day. There can only be one ordinal based tier in a tier configuration and it must always follow the stream tier.
An ordinal tier is configured using a
local mount with an
ordinal partitioning scheme. The frequency for how often data is flushed from the stream tier and written to disk here is configurable, and has a default of 10 minutes.
mounts: rdb: type: stream partition: none idb: type: local partition: ordinal sm: tiers: - name: rdb mount: rdb - name: idb mount: idb schedule: freq: 0D00:10:00 # interval every 10 minutes
A tier configuration can have one or more date tiers following an
ordinal tier. Date tiers hold historical data partitioned by date, and you can choose to have one or many of them. Date tiers can be configured to hold data for a set duration before either migrating it to the next tier, or discarding it. This is based on your desired configuration and data retention requirements. Data is migrated exactly once per day from the
ordinal tier to the first
date tier. A
snap value can be set to configure what time of day the migration happens relative to midnight.
A date tier is configured by using a
local mount with a
date partitioning scheme. Multiple date tiers can be added sequentially with slower, less-expensive disks as data ages.
Date tiers in object storage
For data that is very old and/or infrequently queried, a date object storage tier can be added. This is configured using a normal date mount, with extra tier configuration to point to the object storage bucket. See object storage tier below for details.
mounts: rdb: type: stream partition: none baseURI: none idb: type: local partition: ordinal baseURI: file:///data/db/idb hdb: type: local partition: date baseURI: file:///data/db/hdb hdb2: type: local partition: date baseURI: file:///data/db/hdb2 elements: sm: source: stream tiers: - name: rdb mount: rdb - name: idb mount: idb schedule: freq: 0D00:10:00 # every 10 minutes - name: hdb mount: hdb schedule: freq: 1D00:00:00 # every day snap: 01:35:00 # at 1:35 AM retain: time: 2 days - name: hdb2 mount: hdb2 retain: 2 weeks
Object storage tier
For data that needs to be preserved on very inexpensive storage, it can be migrated to an object storage tier. Object storage tiers offer the benefit of being endlessly scalable and highly available. An object storage tier is a final destination for data in the tier lifecycle.
Object storage tiers vs mounting an object storage database
An object storage tier is a read-write location and is configured differently than a read-only database. See querying object storage for an example of how to query an existing object storage database.
An object storage tier is a date tier with extra details about the object storage location. To configure an object storage tier, add a
store field with the remote data location. To improve query performance, a cache index called an inventory file can be configured.
In the example below, data flows from the normal
hdb disk to the
odb object storage tier after being on disk for 2 weeks. Both the
hdb and the
odb tier use the same
hdb mount, but the
odb tier overrides the
store location to point to an AWS S3 location.
mounts: rdb: type: stream partition: none baseURI: none idb: type: local partition: ordinal baseURI: file:///data/db/idb hdb: type: local partition: date baseURI: file:///data/db/hdb elements: sm: source: stream tiers: - name: rdb mount: rdb - name: idb mount: idb - name: hdb mount: hdb retain: time: 2 weekss - name: odb mount: hdb store: s3://kxi-example-data/db inventory: enabled: true location: inventory/test-db-inventory.tgz
Data must persist to disk before migrating to object storage
Before data can be migrated to an object storage tier, it must be persisted in a date tier for at least one full day. This implies that the minimum retention period for the HDB tier is 1 day.
To learn more about configuring object storage tiers, see the guide on object storage configuration.