Skip to content

Table

This page explains the concepts of table (splayed and partitioned), columnar storage, and memory mapping in KDB.AI.

In KDB.AI, tables are the main structures for storing and organizing data. They come in two types, each suited to different use cases and performance requirements.

Table types

Below is an overview of the types of tables supported in KDB.AI: splayed and partitioned:

Table Type Splayed Partitioned
Description Stores each column as a separate file, improving performance for large datasets. A splayed table further divided into partitions based on a column with special types like date, month, or year.
Use Cases Best for medium-sized tables (up to 100 million rows) where queries often access a subset of columns. Suitable for very large datasets (over 100 million rows) or when data grows over time and queries can be limited to specific partitions.
Additional Information Requires more disk space and management but offers faster reading times for specific columns. Enhances performance by reducing the amount of data scanned during queries.

Splayed tables

Splayed tables store each column in a separate file. This structure allows for more efficient queries, especially when only a subset of columns is needed. Splayed tables are particularly useful for medium-sized datasets where the overhead of managing multiple files is outweighed by the performance benefits.

Partitioned tables

Partitioned tables take the concept of splayed tables further by dividing the data into partitions based on a specific column, such as date. Partitioning allows for even more efficient queries by limiting the data scanned to relevant partitions. Partitioned tables are ideal for very large datasets or datasets that grow over time, such as time-series data.

Examples in q:

// Example of creating a splayed table for trade data
trades: ([] 
date: `date$(), 
time: `time$(), 
sym: `symbol$(), 
price: `real$(), 
size: `int$(), 
cond: `char$()
)

// Insert sample data
`trades insert (2024.11.01; 10:03:54.347; `AAPL; 150.25; 1000; "N")
`trades insert (2024.11.01; 10:04:05.827; `GOOG; 2750.50; 500; "B")

// Save the table as splayed
`:trades/ set trades
// Example of creating a partitioned table for trade data
trades: ([] 
date: `date$(), 
time: `time$(), 
sym: `symbol$(), 
price: `real$(), 
size: `int$(), 
cond: `char$()
)

// Insert sample data
`trades insert (2024.11.01; 10:03:54.347; `AAPL; 150.25; 1000; "N")
`trades insert (2024.11.01; 10:04:05.827; `GOOG; 2750.50; 500; "B")

// Save the table as partitioned by date
`:trades/2024.11.01/ set trades

Columnar storage

Columnar storage is a method of storing tables by column rather than by row. This approach significantly improves the performance of read-heavy operations, such as analytical queries, because it allows for more efficient data compression and faster access to the relevant columns. In KDB.AI, columnar storage is particularly beneficial for large datasets where only a few columns are queried at a time.

Benefits of columnar storage:

  • Improved Query Performance: By storing data in columns, queries that access only a subset of columns can be executed more quickly.
  • Efficient Compression: Columnar storage allows for better data compression, reducing storage requirements and improving I/O performance.
  • Optimized for Analytical Workloads: Ideal for scenarios where read operations are more frequent than write operations, such as data analytics and reporting.

Memory mapping

Memory mapping is a technique used to access data stored on disk as if it were in memory. This method allows KDB.AI to handle large datasets efficiently by mapping files directly into the virtual memory space of a process. Memory mapping reduces the overhead of copying data between disk and memory, leading to faster data retrieval and improved performance.

Benefits of memory mapping:

  • Faster Data Access: By mapping files directly into memory, the need for data copying is eliminated, resulting in quicker data access.
  • Efficient Use of Resources: Memory mapping allows the system to use disk storage as if it were RAM, effectively increasing the available memory for large datasets.
  • Reduced I/O Overhead: Direct access to the file data reduces the number of I/O operations, enhancing overall system performance.

Next steps