Databases in KDB-X Python
This page explains the concept of databases in KDB-X Python, including the creation and management of databases.
What's a KDB-X Python database?
In KDB-X Python, the term database refers to a KDB-X database which can hold a set of splayed and partitioned tables.
Splayed Database
A splayed KDB-X database consists of a single table stored on-disk with each column stored as a separate file rather than using a single file for the whole table. Tables of medium-size with < 100 million rows and many columns are good candidates for being stored as splayed tables, in particular when only a small subset of columns are being accessed often.
quotes
├── .d
├── price
├── sym
└── time
More information on splayed databases
The splayed database format used by KDB-X Python has been used in production environments for decades. As such there is a significant amount of information available on the creation and use of these databases. Below are some articles.
Partitioned Database
A partitioned KDB-X database consists of one or more tables saved on-disk, where they are split into separate folders called partitions. These partitions are most often based on a temporal field within the dataset, such as date or month. Each table within the database must follow the same partition structure.
A visual representation of a database containing 2 tables (trade and quote) partitioned by date would be as follows, where price, sym, time in the quotes folder are columns within the table:
db
├── 2020.10.04
│ ├── quotes
│ │ ├── .d
│ │ ├── price
│ │ ├── sym
│ │ └── time
│ └── trades
│ ├── .d
│ ├── price
│ ├── sym
│ ├── time
│ └── vol
├── 2020.10.06
│ ├── quotes
..
└── sym
More information on partitioned databases
The partitioned database format used by KDB-X Python has been used in production environments for decades in many of the world's best-performing tier-1 investment banks. Today, there is a significant amount of information available on the creation and maintenance of these databases. Below are some articles related to their creation and querying.
How to use databases in KDB-X Python
Creating and managing databases is crucial for handling large amounts of data. The pykx.DB module helps make these tasks easier, Pythonic, and more user-friendly.
KDB-X Python Database API supports the following operations:
| Operation | Description |
|---|---|
| Generate | Learn how to generate a new historical database using data from Python/q and expand it over time. |
| Load | Learn how to load existing databases and fix some common issues with databases. |
| Manage | Copy, change datatypes or names of columns, apply functions to columns, delete columns from a table, rename tables and backfill data. |
Check out a full breakdown of the database API.