Send Feedback
Skip to content

Import Data into KDB-X DB Service

This page introduces the data import capabilities of KDB-X DB Service. It explains the available ingestion methods, how import jobs are processed, and how to choose the appropriate approach for batch, inline, or database-based ingestion workflows.

The KDB-X DB Service supports several data ingest methods.

  • File import, via API: batch ingestion from delimited text (CSV), Parquet, or q binary. Use this for running backfills or scheduled imports from file.

  • kdb database import, via API: ingest an existing kdb HDB. Use this for migrating partitioned historical data from an existing kdb database, or if you have complex transform needs on large datasets. The import is highly performant, it does the least amount of processing of data.

  • JSON data import, via API: that inserts rows directly via JSON request. Use this when inserting small volumes of data, integrating with applications, or bootstrapping a dataset. Do not use this for large batches of data, or high-frequency streaming data imports - kdb Database import or Streaming ingest are more performant.

  • Streaming ingest, via Reliable Transport (RT): high performance streaming ingest.

A detailed reference, including examples, for all APIs is available in the API spec. Sample notebooks demonstrating API usage are bundled with the DB Service repo on GitHub.

Timezones

All timestamps are considered UTC in the KDB-X DB Service. Timestamps can however be queried and converted for other timezones.

Job control

All import requests are asynchronous, and return a jobId. The jobID can be used to check status and cancel import jobs

File import

The file import endpoint is POST /api/v0/imports/files. The API supports delimited text (CSV), Parquet, and q binary (as a single table in a file). Files are processed as follows:

  1. Data is read and parsed from specified files in ./data/imports. Import paths support globbing:
    • * matches within a path; for example, January/USD*.csv
    • ** matches across subdirectories; for example, forex/**/quote*.csv
  2. Transformations specified in the postparse dictionary are applied (delimited text only). Postparse applies basic q expressions to columns, and can:
    • Modify values of existing columns; for example, to handle a tricky timestamp format
    • Create entirely new derived columns
    • Reference values from one or more other columns
    • Include the filename as a value in the expression
  3. Columns specified for include are carried forward.

Data is read in chunks, and processed data is written to an intermediate location. Once all specified files are processed, the intermediate data is efficiently merged into the database.

If a target table does not exist, it can be created by setting the createTable flag on API imports. For production use, it is recommended to create tables explicitly via the API to maintain full control over table configuration.

For date partitions that already exist, the mode setting controls how new data is handled - either merge or overwrite for that date's partition.

Delimited text (CSV) import options

Several optional parameters are available to control parsing of delimited text:

  • delimiter: character that separates fields in the file. Default is a comma ","
  • decimal: character that is the decimal separator. Default is a period "."
  • headers: list of strings that specify (or override) column names for the file.
  • headerRowIndex: index (from 0) to start reading the header. -1 indicates no header; values greater than 0 will skip initial rows (for example, comments)
  • types: data types string, in Tok format (uppercase).

Example file import

An example CSV import that uses multiple features at once:

session.importFiles([
  table:"fxquote";
  path:"fxquote.csv.gz";
  delimiter:",";
  decimal:".";
  header:("trddate";"ts";"sym";"bid";"ask");
  types:"DPSFF";
  postparse:([sprd:"(data`ask)-data`bid";
              bid:"data`bid";
              ask:"data`ask"]);
  include:("trddate";"ts";"sym";"bid";"ask";"sprd");
  createTable:1b]);
session.import_files(
  table="fxquote",
  path="fxquote.csv.gz",
  delimiter=",",
  decimal=".",
  header=["trddate","ts","sym","bid","ask"],
  types="DPSFF",
  postparse={"sprd":"(data`ask)-data`bid",
          "bid":"data`bid",
          "ask":"data`ask"},
  include=["trddate","ts","sym","bid","ask","sprd"],
  createTable=True);
curl -s -X POST "http://localhost:8080/api/v1/imports/files" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{"table":"fxquote",
      "path":"fxquote.csv.gz",
      "delimiter":",",
      "decimal":".",
      "header":["trddate","ts","sym","bid","ask"],
      "types":"DPSFF",
      "postparse":{"sprd":"(data`ask)-data`bid",
                  "bid":"data`bid",
                  "ask":"data`ask"},
      "include":["trddate","ts","sym","bid","ask","sprd"],
      "createTable":true}'

kdb database import

The kdb database import endpoint is POST /api/v0/imports/kdb. The API allows you to import an existing partitioned kdb HDB, or splayed kdb tables. This is particularly useful for migrating historical databases from another kdb+ system.

As the most efficient method for merging data into the KDB-X DB Service, it is also useful for large batch imports or batch data that requires more complex transformations than those supported by the file import API. In these cases, data can be persisted to a kdb HDB as the final step of an extract/transform pipeline, and this HDB can then be loaded using the kdb database API.

Partitioned HDBs must be partitioned by date on a timestamp column.

Example kdb database import

See below an example kdb HDB import:

session.importKDB([table:"fxquote";path:"fxquote-hdb";mode:"overwrite"])
session.import_database(table="fxquote", path="fxquote-hdb", mode="overwrite")
curl -s -X POST "http://localhost:8080/api/v0/imports/kdb" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
      "table": "fxquote",
      "path": "fxquote-hdb",
      "mode": "merge"}'

JSON data import

JSON import vs. Streaming

Importing data via JSON API is not efficient for large batch loads (use file or kdb database import instead), or for streaming real-time data (use streaming ingest instead). It is best used for small and infrequent regular updates, or as a convenient ad-hoc manual import.

The JSON import endpoint is POST /api/v0/imports/data. The API allows you to import JSON data directly into the database. Data can be sent in two different formats:

  • Rows as lists, matching the table schema:
    ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3],
    ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3]
    
  • Objects with explicitly named fields:
    {"instrumentid": 77, "sym": "USDBRL", "category": "EM", "decimals": 4, "pipdecimals": 4},
    {"instrumentid": 78, "sym": "USDKRW", "category": "EM", "decimals": 2, "pipdecimals": 2}
    

As with CSV import, additional options are available to help resolve column names and data types:

  • columnNames: specifies column names for row payloads.
  • types: data types string, in cast format (lowercase).

Example JSON data import

See below an example JSON data import, using the row format:

// Rows payload imported to the 'fxquote' table. ColumnNames are required for row inserts.
data:(("2026-01-21";"2026-01-21T10:00:00.000";"EURUSD";901.2;901.3);("2026-01-21";"2026-01-21T10:00:00.000";"EURUSD";901.2;901.3));
session.importData[`table`data`columnNames!(`fxquote;data;(`trddate`ts`sym`bid`ask))]

// Objects payload imported to 'instruments' table
data:((`instrumentid`sym`category`decimals`pipdecimals)!(77;"USDBRL";"EM";4;4);(`instrumentid`sym`category`decimals`pipdecimals)!(78;"USDKRW";"EM";2;2));
session.importData[`table`data!(`instruments;data)]
# Rows payload imported to the 'fxquote' table. ColumnNames are required for row inserts.
session.import_data(
    table="fxquote",
    data=[
        ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3],
        ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3],
    ],
    columnNames=["trddate", "ts", "sym", "bid", "ask"],
    insert_as="rows",
)

# Objects payload imported to 'instruments' table
session.import_data(
    table="instruments",
    data=[
        {"instrumentid": 77, "sym": "USDBRL", "category": "EM", "decimals": 4, "pipdecimals": 4},
        {"instrumentid": 78, "sym": "USDKRW", "category": "EM", "decimals": 2, "pipdecimals": 2},
    ],
    insert_as="objects",
)
# Rows payload imported to the 'fxquote' table. ColumnNames are required for row inserts.
curl -s -X POST "http://localhost:8080/api/v1/imports/data" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
    "table": "fxquote",
    "data": [
    ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3],
    ["2026-01-21", "2026-01-21T10:00:00.000", "EURUSD", 901.2, 901.3]
    ],
    "columnNames": ["trddate", "ts", "sym", "bid", "ask"],
    "insert_as": "rows"
}'

# Objects payload imported to 'instruments' table
curl -s -X POST "http://localhost:8080/api/v1/imports/data" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-d '{
    "table": "instruments",
    "data": [
    {"instrumentid": 77, "sym": "USDBRL", "category": "EM", "decimals": 4, "pipdecimals": 4},
    {"instrumentid": 78, "sym": "USDKRW", "category": "EM", "decimals": 2, "pipdecimals": 2}
    ],
    "insert_as": "objects"
}'

Streaming ingest

Data can be streamed into the database using Reliable Transport (RT), a high performance data ingest system. The current single-node version of the DB Service incorporates a single-node deployment of RT. Updates should be streamed to the data topic, by default on port 5002.

Example data feed

An example Python feed is included with the DB Service, reproduced below.

from rtpy import rt_helper
from datetime import datetime, timezone
from random import random, choice
import time
import os

print("Starting feed...")

symlist = {'EURUSD':[1.16,4],'GBPUSD':[1.34,4],'USDJPY':[158,2]}

cfg_path = 'file://' + os.path.dirname(os.path.realpath(__file__)) + '/rtconfig.json'
params = rt_helper.RTParams(config_url=cfg_path, console_log_level='error')
h, status_code = rt_helper.start(params)

while True:
    for sym in symlist:
        mid,dec = symlist[sym]
        newmid = round(mid + choice([-1,1]) * random()/10**(dec), dec)
        spread = round(random()/10**(dec-1),dec)
        now = datetime.now(timezone.utc)
        price = [{'trddate':now.date(),
                  'ts':now,
                  'sym': sym,
                  'bid': round(newmid-spread/2,dec+1),
                  'ask': round(newmid+spread/2,dec+1)}]
        symlist[sym][0] = newmid
        rt_helper.insert(h, 'fxquote', price)
    time.sleep(1)

Along with its rtconfig.json.

{
  "name": "dbs-fxpub",
  "useSslRt": false,
  "topics": { "insert": "data" },
  "insert": { "insert": [":localhost:5002"] }
}

RT interfaces are available for C/C++, Java, q, C#, and Python. Most production feed solutions use the C/C++ or Java SDKs.

Next steps