Ingest Historic OneTick US Comp Data¶
This page describes how the otdataloader pipeline ingests historic OneTick US Comp data from an AWS S3 bucket into kdb Insights Enterprise.
Prerequisites¶
The otdataloader pipeline ingests OneTick data from an AWS S3 bucket managed by KX and OneTick. Confirm that your environment can access this bucket before running the pipeline.
The bucket has the following structure:
$ aws s3 ls s3://${BUCKET}/${PREFIX}/
PRE hdb/
PRE status/
2026-06-11 17:23:03 1314676 sym
The bucket contains the following directories and files:
-
status/directory: contains text files that signal when a day of data is ready for ingestion by theotdataloaderpipeline.- Files follow the naming convention
finished_YYYY_MM_DD.txt, whereYYYY_MM_DDrepresents the year, month, and date of the data ready to ingest. - The
otdataloaderpipeline monitors this directory and checks for new data every 20 minutes.
- Files follow the naming convention
-
hdb/directory: contains OneTick US Comp data in kdb date-partitioned format. Theotdataloaderingests this data when triggered by files in thestatus/directory. -
symfile — contains all sym enumerations for the data in thehdb/directory. Theotdataloadercopies this file alongside the date directory when triggered by files in thestatus/directory.
Pipeline options¶
Several environment variables control the behavior of the otdataloader pipeline. Set these at runtime using the CLI. For details, see Inject environment variables.
| Environment Variable | Required | Purpose | Notes |
|---|---|---|---|
OT_DATA_LOADER_S3_URI |
yes | Path to bucket containing OneTick data which is managed by KX and OneTick. | Format: s3://<BUCKET_NAME>/<PREFIX>/Should be the full URI as described in the prerequisites section |
OT_DATA_LOADER_REGION |
yes | AWS_REGION of OT_DATA_LOADER_S3_URI. |
|
AWS_ACCESS_KEY_ID |
yes | Access Key ID which allows the otdataloader to access OT_DATA_LOADER_S3_URI. |
|
AWS_SECRET_ACCESS_KEY |
yes | Secret Access Key which allows the otdataloader to access OT_DATA_LOADER_S3_URI. |
|
AWS_SESSION_TOKEN |
no | Session token which allows the otdataloader to access OT_DATA_LOADER_S3_URI. |
Optional - may not be required if using long-running AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY |
START_DATE |
no | The date from which the pipeline should begin ingesting partitions from OT_DATA_LOADER_S3_URI. |
Format = YYYY.MM.DD. |
DATE_LIMIT |
no | Limits how many dates can be ingested in parallel. Defaults to 1. |
Example¶
The below command starts the otdataloader pipeline with all options set:
# First Set Environment Variables to be used by pipeline
OT_DATA_LOADER_S3_URI='s3://<BUCKET_NAME>/<PREFIX>/'
OT_DATA_LOADER_REGION=<INSERT_REGION>
START_DATE=<INSERT_START_DATE_FOR_SP_INGESTION_FROM_S3>
DATE_LIMIT=2
AWS_ACCESS_KEY_ID=<INSERT_AWS_ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<INSERT_AWS_SECRET_ACCESS_KEY>
AWS_SESSION_TOKEN=<INSERT_AWS_SESSION_TOKEN>
# Note: Variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN` can be quickly set using the command `eval $(aws configure export-credentials --format env)`
# Command to deploy the SP pipeline
kxi pm deploy fsi-app-ot-uscomp --pipeline otdataloader --env otdataloader:OT_DATA_LOADER_S3_URI=$OT_DATA_LOADER_S3_URI --env otdataloader:OT_DATA_LOADER_REGION=$AWS_REGION --env otdataloader:AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env otdataloader:AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY --env otdataloader:AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN --env otdataloader:START_DATE=$START_DATE --env otdataloader:DATE_LIMIT=$DATE_LIMIT
Omit any --env flags for variables you do not need.
Next steps¶
- If you have not yet deployed the Accelerator, follow the Quickstart Guide.
- To ingest real-time data, see the Realtime Pipeline documentation.
- Review the release notes for the latest updates.