Batch ingest allows you to backfill static data directly to the historical tier of an existing database. This is a useful option for reducing memory footprint when importing large amounts of static data. Batch ingest works by replacing partitions for a given partitioned table with a static copy provided to the database. This is best used for replacing empty partitions with backfilled data.
Batch ingest replaces partitions in the HDB with a new version.
Batch ingest is similar to an initial import except it is inteded for an existing database instead of an empty database.
For batch ingestion, data must be located in the
staging directory in the HDB root. This location is pointed to by the
baseURI of the HDB mount. The top level directory is considered the session name for ingestion. The content within the directory should be a partitioned database with only the tables related to the ingestion. Below is an example directory layout.
Example schema definition
tables: trace: description: Manufacturing trace data type: partitioned blockSize: 10000 prtnCol: updateTS sortColsOrd: [sensorID] sortColsDisk: [sensorID] columns: - name: sensorID description: Sensor Identifier type: int attrMem: grouped attrOrd: parted attrDisk: parted - name: readTS description: Reading timestamp type: timestamp - name: captureTS description: Capture timestamp type: timestamp - name: valFloat description: Sensor value type: float - name: qual description: Reading quality type: byte - name: alarm description: Enumerated alarm flag type: byte - name: updateTS description: Ingestion timestamp type: timestamp
/data/db/hdb/staging/backfill ├── 2023.01.01 │ └── trace │ ├── alarm │ ├── qual │ ├── readTS │ ├── sensorID │ ├── updateTS │ └── valFloat ├── 2023.01.03 │ └── trace │ ├── alarm │ ├── qual │ ├── readTS │ ├── sensorID │ ├── updateTS │ └── valFloat └── sym
In this scenario, the table
trace for the dates
2023.01.03 will be overwritten with the content of these directories when a session is started for the
Running a batch ingest
Use a batch source
Batch ingest currently only supports writing with a batch data source such as reading from Amazon S3, Azure Blob Storage, Google Cloud storage or static files.
Cleaning up a batch ingest
If a batch ingestion session completes successfully, the session directory will automatically be cleared. If the ingestion fails, the session directory will be left on disk for the user to perform a cleanup. An error will be reported to the client that triggered the ingest which can be used to help debug the failed ingest for another attempt.