Storage tiers
Tiers describe the locality, segmentation format, and rollover configuration of each storage tier. Storage tiers are used to migrate data over time from fast, expensive storage to slower, less-expensive storage. Depending on your use case, this configuration can be tuned to either have more data in memory for faster query performance, or have more data on disk to reduce costs.
Tier configuration
Refer to tier configuration for the full list of configuration options.
Tier types
Tiers differ in performance and cost. As data ages, its value decreases and it can be transitioned to slower, less-expensive storage. Tiers for a database are split into a single stream tier, a single ordinal tier and one or more date tiers. The type of the tier is determined based on the configuration of the tier's mount.
sm:
tiers:
- name: rdb
mount: rdb
- name: idb
mount: idb
schedule:
freq: 0D00:10:00 # every 10 minutes
- name: hdb
mount: hdb
schedule:
freq: 1D00:00:00 # every day
snap: 01:35:00 # at 1:35 AM
retain:
time: 2 days
Stream tier
A stream tier is responsible for in-memory data that is received between write-down events. This tier is used to hold the most recent data available in memory for the fastest possible query results. This tier is typically referred to as the real-time database or RDB. This tier has no persisted data, so it must have enough memory capacity to hold data for the interval between writes to disk. This interval has a default of 10 minutes. There can only be one stream tier in a tier configuration and it must always be the first tier.
A stream tier is configured by using a stream mount with no partitioning scheme.
mounts:
rdb:
type: stream
partition: none
elements:
sm:
tiers:
- name: rdb
mount: rdb
Ordinal tier
An ordinal tier is responsible for data written to disk after migrating out of the stream tier. This tier holds data that has been written periodically throughout the day after flushing the data from the stream tier. As a result, it is typically referred to as the intra-day database or IDB. Data in this tier is stored on disk in a partitioned table that is sequenced in order of the intervals written throughout the day. There can only be one ordinal based tier in a tier configuration and it must always follow the stream tier.
An ordinal tier is configured using a local
mount with an ordinal
partitioning scheme. The frequency for how often data is flushed from the stream tier and written to disk here is configurable, and has a default of 10 minutes.
mounts:
rdb:
type: stream
partition: none
idb:
type: local
partition: ordinal
sm:
tiers:
- name: rdb
mount: rdb
- name: idb
mount: idb
schedule:
freq: 0D00:10:00 # interval every 10 minutes
Date tier
A tier configuration can have one or more date tiers following an ordinal
tier. Date tiers hold historical data partitioned by date, and you can choose to have one or many of them. Date tiers can be configured to hold data for a set duration before either migrating it to the next tier, or discarding it. This is based on your desired configuration and data retention requirements. Data is migrated exactly once per day from the ordinal
tier to the first date
tier. A snap
value can be set to configure what time of day the migration happens relative to midnight.
A date tier is configured by using a local
mount with a date
partitioning scheme. Multiple date tiers can be added sequentially with slower, less-expensive disks as data ages.
Date tiers in object storage
For data that is very old and/or infrequently queried, a date object storage tier can be added. This is configured using a normal date mount, with extra tier configuration to point to the object storage bucket. See object storage tier below for details.
mounts:
rdb:
type: stream
partition: none
baseURI: none
idb:
type: local
partition: ordinal
baseURI: file:///data/db/idb
hdb:
type: local
partition: date
baseURI: file:///data/db/hdb
hdb2:
type: local
partition: date
baseURI: file:///data/db/hdb2
elements:
sm:
source: stream
tiers:
- name: rdb
mount: rdb
- name: idb
mount: idb
schedule:
freq: 0D00:10:00 # every 10 minutes
- name: hdb
mount: hdb
schedule:
freq: 1D00:00:00 # every day
snap: 01:35:00 # at 1:35 AM
retain:
time: 2 days
- name: hdb2
mount: hdb2
retain: 2 weeks
Object storage tier
For data that needs to be preserved on very inexpensive storage, it can be migrated to an object storage tier. Object storage tiers offer the benefit of being endlessly scalable and highly available. An object storage tier is a final destination for data in the tier lifecycle.
Object storage tiers vs mounting an object storage database
An object storage tier is a read-write location and is configured differently than a read-only database. See querying object storage for an example of how to query an existing object storage database.
An object storage tier is a date tier with extra details about the object storage location. To configure an object storage tier, add a store
field with the remote data location. To improve query performance, a cache index called an inventory file can be configured.
In the example below, data flows from the normal hdb
disk to the odb
object storage tier after being on disk for 2 weeks. Both the hdb
and the odb
tier use the same hdb
mount, but the odb
tier overrides the store
location to point to an AWS S3 location.
mounts:
rdb:
type: stream
partition: none
baseURI: none
idb:
type: local
partition: ordinal
baseURI: file:///data/db/idb
hdb:
type: local
partition: date
baseURI: file:///data/db/hdb
elements:
sm:
source: stream
tiers:
- name: rdb
mount: rdb
- name: idb
mount: idb
- name: hdb
mount: hdb
retain:
time: 2 weekss
- name: odb
mount: hdb
store: s3://kxi-example-data/db
inventory:
enabled: true
location: inventory/test-db-inventory.tgz
Data must persist to disk before migrating to object storage
Before data can be migrated to an object storage tier, it must be persisted in a date tier for at least one full day. This implies that the minimum retention period for the HDB tier is 1 day.
To learn more about configuring object storage tiers, see the guide on object storage configuration.