Ingest Historic OneTick US Comp Data¶

This page describes how the otdataloader pipeline ingests historic OneTick US Comp data from an AWS S3 bucket into kdb Insights Enterprise.

Prerequisites¶

The otdataloader pipeline ingests OneTick data from an AWS S3 bucket managed by KX and OneTick. Confirm that your environment can access this bucket before running the pipeline.

The bucket has the following structure:

$ aws s3 ls s3://${BUCKET}/${PREFIX}/
                           PRE hdb/
                           PRE status/
2026-06-11 17:23:03    1314676 sym

The bucket contains the following directories and files:

status/ directory: contains text files that signal when a day of data is ready for ingestion by the otdataloader pipeline.
- Files follow the naming convention finished_YYYY_MM_DD.txt, where YYYY_MM_DD represents the year, month, and date of the data ready to ingest.
- The otdataloader pipeline monitors this directory and checks for new data every 20 minutes.
hdb/ directory: contains OneTick US Comp data in kdb date-partitioned format. The otdataloader ingests this data when triggered by files in the status/ directory.
sym file — contains all sym enumerations for the data in the hdb/ directory. The otdataloader copies this file alongside the date directory when triggered by files in the status/ directory.

Pipeline options¶

Several environment variables control the behavior of the otdataloader pipeline. Set these at runtime using the CLI. For details, see Inject environment variables.

Environment Variable	Required	Purpose	Notes
`OT_DATA_LOADER_S3_URI`	yes	Path to bucket containing OneTick data which is managed by KX and OneTick.	Format: `s3://<BUCKET_NAME>/<PREFIX>/` Should be the full URI as described in the prerequisites section
`OT_DATA_LOADER_REGION`	yes	AWS_REGION of `OT_DATA_LOADER_S3_URI`.
`AWS_ACCESS_KEY_ID`	yes	Access Key ID which allows the `otdataloader` to access `OT_DATA_LOADER_S3_URI`.
`AWS_SECRET_ACCESS_KEY`	yes	Secret Access Key which allows the `otdataloader` to access `OT_DATA_LOADER_S3_URI`.
`AWS_SESSION_TOKEN`	no	Session token which allows the `otdataloader` to access `OT_DATA_LOADER_S3_URI`.	Optional - may not be required if using long-running `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
`START_DATE`	no	The date from which the pipeline should begin ingesting partitions from `OT_DATA_LOADER_S3_URI`. NOTE: Will ingest starting at earliest availaible date if not set.	Format = `YYYY.MM.DD`.
`DATE_LIMIT`	no	Limits how many dates can be ingested in parallel. NOTE: Will attempt to ingest all dates in parallel if not set.

Example¶

The below command starts the otdataloader pipeline with all options set:

# First Set Environment Variables to be used by pipeline
OT_DATA_LOADER_S3_URI='s3://<BUCKET_NAME>/<PREFIX>/'
OT_DATA_LOADER_REGION=<INSERT_REGION>
START_DATE=<INSERT_START_DATE_FOR_SP_INGESTION_FROM_S3>
DATE_LIMIT=2
AWS_ACCESS_KEY_ID=<INSERT_AWS_ACCESS_KEY_ID>
AWS_SECRET_ACCESS_KEY=<INSERT_AWS_SECRET_ACCESS_KEY>
AWS_SESSION_TOKEN=<INSERT_AWS_SESSION_TOKEN>
# Note: Variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_SESSION_TOKEN` can be quickly set using the command `eval $(aws configure export-credentials --format env)`

# Command to deploy the SP pipeline
kxi pm deploy fsi-app-ot-uscomp --pipeline otdataloader --env otdataloader:OT_DATA_LOADER_S3_URI=$OT_DATA_LOADER_S3_URI --env otdataloader:OT_DATA_LOADER_REGION=$AWS_REGION --env otdataloader:AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID --env otdataloader:AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY --env otdataloader:AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN --env otdataloader:START_DATE=$START_DATE --env otdataloader:DATE_LIMIT=$DATE_LIMIT

Omit any --env flags for variables you do not need.

Next steps¶

If you have not yet deployed the Accelerator, follow the Quickstart Guide.
To ingest real-time data, see the Realtime Pipeline documentation.
Review the release notes for the latest updates.