Storage Manager configuration
The Storage Manager (SM) takes its configuration from the Assembly Configuration file specified by KXI_ASSEMBLY_FILE
environment variable.
SM expects the following sections to be specified in the assembly.
name short name for this assembly
tables schemas for the tables operated upon within the assembly (dictionary)
mounts mount points for stored data (dictionary)
bus configuration of the message bus used for
coordination between elements (dictionary)
elements.sm SM configuration (dictionary)
includes source
and tiers
URI schemas
mounts[X].baseURI
, and elements.sm.tiers[N].store
permit URIs; these may presently use the file://
or s3://
URI schemas. Other schemas may be supported in the future.
SM configuration
Configuration options for SM go in the sm
entry of elements
:
key | required | purpose | value & default |
---|---|---|---|
source | yes | name of bus entry | |
tiers | yes | storage tiers | list |
enforceSchema | whether to enforce table schemas when persisting (with performance penalty; for debugging) | boolean false |
|
disableREST | whether to disable the REST interface, leaving only q IPC support | boolean false |
|
disableDiscovery | whether to disable registration with discovery | boolean false |
|
chunkSize | chunk size used for writing tables | integer 500000 |
|
sortLimitGB | memory limit when sorting splayed tables or partitions on disk, in GB | integer 10 |
|
waitTm | time to wait between connection attempts, in milliseconds | integer 250 |
|
eodPeachLevel | level at which EOD peaches to parallelize HDB table processing | list:part table in any combination |
|
reloadTimeout | maximum time SM waits for client to reload | timespan 1 hour |
See the deployment example for an example configuration.
Tiers
Tiers describe the locality, segmentation format, and rollover configuration of each storage tier.
A storage tier has the following structure:
key | required | purpose | value & default |
---|---|---|---|
name | yes | ||
mount | yes | corresponding mounts entry which determine locality and segmentation format, and also location at which data in the tier may be accessed |
|
store | where the tier will physically store data | see below | |
inventory | object storage inventory file location | see below | |
schedule | policy for when rollovers should be considered | see below | |
retain | policy for how much data should be stored in this tier before it is rolled over into the next tier | see below | |
compression | policy for compression of data | see below |
store
-
URI describing where this tier will physically store data. If not specified, becomes
<baseURI>/data
of the correspondingmount
(enforced, even if specified, for mounts of typelocal
withpartition:ordinal
). For multiple tiers within the same mount, there can be only one tier without explicitly specifiedstore
. If specified explicitly,store
must be outside the mount'sbaseURI
. schedule
-
If present, this dictionary contains the following keys.
freq
: HH:MM:SS Used by the ordinal partition mount (IDB) to specify length of interval in each ordinal partition. Default 00:10:00.snap
: HH:MM:SS Used by the date partition mount (HDB) to specify when to move data from ordinal to date partition mount. Default 00:00:00.
snap
A snap value of 00:01:00 would allow any late data that arrives in the one minute from 00:00 -> 00:01 belonging to the previous date partition to be saved to that location. Any late data that arrives after 00:01:00 belonging to the previous date partition will be written at the next snap. The data received from 00:00 -> 00:01 belonging to the current date partition will also be saved at this time.
retain
-
This dictionary may have one or more of the following keys.
time
: A timespan consisting of a number followed by a unit: {Years
,Months
,Weeks
,Days
,Hours
,Minutes
}, e.g.2 Years
. Data which has been stored for this length of time is rolled over.sizePct
: A size as percentage of total storage of corresponding mount, specified as a number from 1 to 100.
If multiple keys are set, they are interpreted in an inclusive-OR fashion.
A
mount
partitioned asordinal
, or of typestream
cannot be used with a storage tier that has aretain
policy. compression
-
If present, this dictionary contains the following keys.
algorithm
: Compression algorithm: {none
,qipc
,gzip
,snappy
,lz4hc
}block
: Block sizelevel
: Compression level
The
compression
policy currently applies only to tiers associated with amount
oftype:local
andpartition:date
. inventory
-
If present, this dictionary contains the following keys.
enabled
: true or false to enable inventory files. If true must providelocation
(default false)location
: Location relative to the root of the bucket/storage that the inventory will be written to.
Inventory only applies when using a store that is an object storage URI.
An example configuration, which will produce
s3://kxi-example-data/inventory/inventory.tgz
is:name: hdb-s3 mount: hdb store: s3://kxi-example-data/db inventory: enabled: true location: inventory/test-db-inventory.tgz
Tiers can be categorized according to their locality and segmentation format, which imply the characteristics and governing rules:
Stream based tier
Stream based tier represents the in-memory data that is received between write-down events. It is implicit and need not be specified.
Local-ordinal based tier
There has to always be one tier that corresponds to mount of type local
with partition ordinal
. However its configuration can be omitted, in which case the frequency defaults to 10 minutes.
Local-date based tier
There can be one or more tiers that correspond to mount of type local
with partition date
. However when only one tier is used, its configuration can be omitted in which case snap-time defaults to midnight, frequency to 1 day, and retain to infinite.
Using Reliable Transport
Register Storage Manager (SM) with a Reliable Transport (RT) compatible message bus to receive the table updates and publish the _prtnEnd
and _reload
signals.
See the deployment example for the configuration, schema, and code to use a tickerplant.
Object Storage Inventory files
The Storage Manager can write inventory files at end of day, or produce them on startup if none exist. The inventory files will be used to speed up subsequent reload times for the Storage Manager and Data Access processes.
To configure the SM to produce these files, set inventory
along with store
under the tier configuration. See the tiers section above for layout information.
The DA may configured to set KX_OBJSTR_INVENTORY_FILE
to the inventory path, relative to the root of the bucket.
A full configuration of the DA and the SM would look like:
sm:
tiers:
- name: streaming
mount: rdb
- name: interval
mount: idb
schedule:
freq: 01:00:00
snap: 00:00:00
- name: recent
mount: hdb
schedule:
freq: 1D00:00:00
snap: 00:00:00
retain:
time: 7 Days
- name: s3
mount: hb
store: s3://kxi-sm-example/db
inventory:
enabled: true
location: inventory/inventory.tgz
dap:
instances:
da:
env:
- name: KX_OBJSTR_INVENTORY_FILE
value: "inventory/inventory.tgz"