Batch ingest
Batch ingest allows you to backfill static data directly to the historical tier of an existing database. This is a useful option for reducing memory footprint when importing large amounts of static data. Batch ingest works by replacing partitions for a given partitioned table with a static copy provided to the database. This is best used for replacing empty partitions with backfilled data.
Data replacement
Batch ingest replaces partitions in the HDB with a new version.
Initial import
Batch ingest is similar to an initial import except it is intended for an existing database instead of an empty database.
Data organization
For batch ingestion, data must be located in the staging
directory in the HDB root. This location is pointed to by the baseURI
of the HDB mount. The top level directory is considered the session name for ingestion. The content within the directory should be a partitioned database with only the tables related to the ingestion. Below is an example directory layout.
Example schema definition
tables:
trace:
description: Manufacturing trace data
type: partitioned
blockSize: 10000
prtnCol: updateTS
sortColsOrd: [sensorID]
sortColsDisk: [sensorID]
columns:
- name: sensorID
description: Sensor Identifier
type: int
attrMem: grouped
attrOrd: parted
attrDisk: parted
- name: readTS
description: Reading timestamp
type: timestamp
- name: captureTS
description: Capture timestamp
type: timestamp
- name: valFloat
description: Sensor value
type: float
- name: qual
description: Reading quality
type: byte
- name: alarm
description: Enumerated alarm flag
type: byte
- name: updateTS
description: Ingestion timestamp
type: timestamp
/data/db/hdb/staging/backfill
├── 2023.01.01
│ └── trace
│ ├── alarm
│ ├── qual
│ ├── readTS
│ ├── sensorID
│ ├── updateTS
│ └── valFloat
├── 2023.01.03
│ └── trace
│ ├── alarm
│ ├── qual
│ ├── readTS
│ ├── sensorID
│ ├── updateTS
│ └── valFloat
└── sym
In this scenario, the table trace
for the dates 2023.01.01
and 2023.01.03
will be overwritten with the content of these directories when a session is started for the backfill
directory.
Running a batch ingest
Batch ingest sessions in kdb Insights Enterprise are managed by the Stream Processor. To perform a batch ingest, use either .qsp.write.toDatabase
or the UI writer.
Use a batch source
Batch ingest currently only supports writing with a batch data source such as reading from Amazon S3, Azure Blob Storage, Google Cloud storage or static files.
Cleaning up a batch ingest
If a batch ingestion session completes successfully, the session directory will automatically be cleared. If the ingestion fails, the session directory will be left on disk for the user to perform a cleanup. An error will be reported to the client that triggered the ingest which can be used to help debug the failed ingest for another attempt.