Choosing the Right File System for kdb+: A Case Study with KX Nano¶
The performance of a kdb+ system is critically dependent on the throughput and latency of its underlying storage. In a Linux environment, the file system is the foundational layer that enables data management on a given storage partition.
This paper presents a comparative performance analysis of various file systems using the KX Nano benchmarking utility. The evaluation was conducted across two distinct test environments, each with different operating systems and storage bandwidth (6500 vs 14000 MB/s) and IOPS (700K vs 2500K).
Summary¶
No single file system demonstrated superior performance across all tested metrics; the optimal choice depends on the primary workload characteristics. The optimal choice depends on the specific operations you need to accelerate. Furthermore, the host operating system (e.g., Red Hat Enterprise Linux vs. Ubuntu) constrains the set of available and supported file systems.
Our key recommendations are as follows:
-
For write-intensive workloads where data ingestion rate is the primary driver, XFS is the recommended file system.
- XFS consistently demonstrated the highest write throughput, particularly under concurrent write scenarios. For instance, a kdb+ set operation on a large float vector (31 million elements) executed 5.5x faster on XFS than on ext4 and nearly 50x faster than on ZFS.
- This superior write performance translates to significant speedups in other I/O-heavy operations. Parallel disk sorting was 3.4x faster, and applying the
p#
(parted) attribute was 6.6x faster on XFS compared to ext4. Consequently, workloads like end-of-day (EOD) data processing will achieve the best performance with XFS.
-
For read-intensive workloads where query latency is paramount, the choice is nuanced:
- On Red Hat Enterprise Linux 9, ext4 holds a slight advantage for queries dominated by sequential reads. For random reads, its performance was comparable to XFS.
- On Ubuntu, ZFS excelled in random read scenarios. However, this performance advantage diminished significantly if the requested data was already available in the operating system's page cache.
kdb+ also supports tiering. For tiered data architectures (e.g., hot, mid, cold tiers), a hybrid approach is advisable.
- Hot tier: Data is frequently queried and often resides in the page cache. For this tier, a read-optimized file system like ext4 or XFS is effective.
- Mid/Cold Tier: Data is queried less often, meaning reads are more likely to come directly from storage. In this scenario, ZFS's strong random read performance from storage provides a distinct advantage.
Disclaimer: These guidelines are specific to the tested hardware and workloads. We strongly encourage readers to perform their own benchmarks that reflect their specific application profiles. To facilitate this, the benchmarking suite used in this study is included with the KX Nano codebase, available on GitHub.
Details¶
All benchmarks were executed in September 2025 using kdb+ 4.1 (2025.04.28) and KX Nano 6.4.1. Each kdb+ process was configured to use 8 worker threads (-s 8
).
We used the default vector length of KX Nano, which are:
* small: 63k
* medium: 127k
* large: 31m
* huge: 1000m
Test 1: Red Hat Enterprise Linux 9 with Intel NVMe SSD (PCIe 4.0)¶
This first test configuration utilized an Intel NVMe SSD on a server running Red Hat Enterprise Linux (RHEL) 9.3. Following RHEL 9's official supported file systems, the comparison was limited to ext4 and XFS.
Test Setup¶
Component | Specification |
---|---|
Storage | * Type: 3.84 TB Intel SSD D7-P5510 * Interface: PCIe 4.0 x4, NVMe * Sequential R/W: 6500 MB/s / 3400 MB/s * Random Read: 700K IOPS (4K) * Latency: Random Read: 82 µs (4K), Sequential Read / Write: 10 µs / 13 µs (4K) |
CPU | Intel(R) Xeon(R) 6747P (2 sockets, 48 cores per socket, 2 threads per core) |
Memory | 502GiB, DDR5 @ 6400 MT/s |
OS | RHEL 9.3 (kernel 5.14.0-362.8.1.el9_3.x86_64) |
The values presented in the result tables represent throughput in MB/s, where higher figures indicate better performance. The "Ratio" column quantifies the performance of XFS relative to ext4 (e.g., a value of 2 indicates XFS was twice as fast).
Write¶
We split the write results into two tables. The first table contains the "high-impact" tests and should be considered with more weight. These test are related to EOD (write, sort, applying attribute) and EOI (append) works that is often the bottleneck of ingestion.
Single kdb+ process:¶
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read mem write disk | add attribute | 261 | 233 | 1.12 |
read write disk | disk sort | 106 | 97 | 1.09 |
write disk | open append mid float, sync once | 1030 | 877 | 1.18 |
open append mid sym, sync once | 924 | 853 | 1.08 | |
write float large | 2098 | 1304 | 1.61 | |
write int huge | 3367 | 2170 | 1.55 | |
write int medium | 1309 | 729 | 1.80 | |
write int small | 474 | 380 | 1.25 | |
write sym large | 913 | 862 | 1.06 | |
GEOMETRIC MEAN | 779 | 608 | 1.28 | |
MAX RATIO | 3367 | 2170 | 1.80 |
Observation: XFS is almost always faster than ext4. In critical tests, the advantage is almost 30% on average with a maximal difference 80%.
The performance of the less critical write operations is below. The Linux sync
command synchronizes cached data to permanent storage. This data includes modified superblocks, modified inodes, delayed reads and writes, and others. EOD and EOI solutions often use sync operations to improve resiliency by ensuring data is persisted to storage and not held temporarily in caches. The sync operation is typically much faster than the set
command because Linux executes it behind the scenes (compare the speed of write float large
and sync float large
).
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
write disk | append small, sync once | 750 | 482 | 1.55 |
append tiny, sync once | 554 | 366 | 1.51 | |
open append small, sync once | 931 | 813 | 1.14 | |
open append tiny, sync once | 253 | 211 | 1.20 | |
open replace tiny, sync once | 139 | 96 | 1.45 | |
sync column after parted attribute | 177163 | 30758180 | 0.01 | |
sync float large | 143720 | 145828 | 0.99 | |
sync int huge | 78917 | 77110 | 1.02 | |
sync int medium | 40977 | 40090 | 1.02 | |
sync int small | 6730 | 6244 | 1.08 | |
sync sym large | 211642 | 180587 | 1.17 | |
sync table after sort | 54142270 | 59348750 | 0.91 | |
GEOMETRIC MEAN | 14500 | 19311 | 0.75 | |
MAX RATIO | 54142270 | 59348750 | 1.55 |
64 kdb+ processes:¶
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read mem write disk | add attribute | 12834 | 1948 | 6.59 |
read write disk | disk sort | 3059 | 914 | 3.35 |
write disk | open append mid float, sync once | 1933 | 1378 | 1.40 |
open append mid sym, sync once | 2309 | 2065 | 1.12 | |
write float large | 63140 | 10817 | 5.84 | |
write int huge | 2449 | 2665 | 0.92 | |
write int medium | 40098 | 6453 | 6.21 | |
write int small | 17609 | 4931 | 3.57 | |
write sym large | 57674 | 14279 | 4.04 | |
GEOMETRIC MEAN | 10110 | 3434 | 2.94 | |
MAX RATIO | 63140 | 14279 | 6.59 |
Observation: The results show that XFS consistently and significantly outperformed ext4 in write-intensive operations. In critical ingestion and EOD tasks, write throughput on XFS was on average 3 times higher. This advantage peaked in specific operations, such as applying the p#
attribute, where XFS was a remarkable 6.6x faster than ext4.
The performance of the less critical write operations are below.
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
write disk | append small, sync once | 1657 | 1955 | 0.85 |
append tiny, sync once | 2429 | 2083 | 1.17 | |
open append small, sync once | 1384 | 1432 | 0.97 | |
open append tiny, sync once | 2407 | 1361 | 1.77 | |
open replace tiny, sync once | 531 | 1035 | 0.51 | |
sync column after parted attribute | 132748 | 197769600 | 0.00 | |
sync float large | 91966 | 93039 | 0.99 | |
sync int huge | 216302 | 217269 | 1.00 | |
sync int medium | 177815 | 127043 | 1.40 | |
sync int small | 112098 | 105078 | 1.07 | |
sync sym large | 137169 | 140796 | 0.97 | |
sync table after sort | 161152200 | 423427500 | 0.38 | |
GEOMETRIC MEAN | 37714 | 73810 | 0.51 | |
MAX RATIO | 161152200 | 423427500 | 1.77 |
There were two minor test cases where ext4 was faster. The first, "replace tiny
", involves overwriting a very small vector. This is a very fast operation anyway, so the discrepancy is negligible. Also, the operation is not representative of typical, performance-critical kdb+ workloads. The second, "sync column after parted attribute/sort
", also showed ext4 ahead. However, the absolute time difference was minimal, making its impact on overall application performance insignificant in most practical scenarios.
Read¶
We divide read tests into two categories depending on the source of the data, disk vs page cache (memory).
Single kdb+ process:¶
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read disk | mmap, random read 1M | 592 | 600 | 0.99 |
mmap, random read 4k | 19 | 19 | 1.01 | |
mmap, random read 64k | 198 | 196 | 1.01 | |
random read 1M | 601 | 557 | 1.08 | |
random read 4k | 20 | 19 | 1.06 | |
random read 64k | 204 | 188 | 1.09 | |
sequential read binary | 697 | 698 | 1.00 | |
read disk write mem | sequential read float large | 2012 | 799 | 2.52 |
sequential read int huge | 2004 | 961 | 2.09 | |
sequential read int medium | 658 | 645 | 1.02 | |
sequential read int small | 295 | 246 | 1.20 | |
GEOMETRIC MEAN | 315 | 261 | 1.21 | |
MAX RATIO | 2012 | 961 | 2.52 |
Observation: XFS reads the data faster from disk sequentially than ext4. Apart from this, the differences are negligible.
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read mem | mmap, random read 1M | 2519 | 2465 | 1.02 |
mmap, random read 4k | 264 | 249 | 1.06 | |
mmap, random read 64k | 1713 | 1755 | 0.98 | |
random read 1M | 3021 | 3024 | 1.00 | |
random read 4k | 1245 | 1255 | 0.99 | |
random read 64k | 3003 | 3011 | 1.00 | |
read mem write mem | sequential read binary | 2544 | 2520 | 1.01 |
sequential reread float large | 14883 | 15071 | 0.99 | |
sequential reread int huge | 33797 | 33874 | 1.00 | |
sequential reread int medium | 7801 | 8176 | 0.95 | |
sequential reread int small | 2148 | 2047 | 1.05 | |
GEOMETRIC MEAN | 3123 | 3112 | 1.00 | |
MAX RATIO | 33797 | 33874 | 1.06 |
Observation: There is no performance difference between XFS and ext4 with a single kdb+ reader if the data is coming from page cache.
64 kdb+ processes:¶
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read disk | mmap, random read 1M | 2812 | 2819 | 1.00 |
mmap, random read 4k | 515 | 544 | 0.95 | |
mmap, random read 64k | 1068 | 1075 | 0.99 | |
random read 1M | 2779 | 2784 | 1.00 | |
random read 4k | 543 | 546 | 0.99 | |
random read 64k | 1065 | 1070 | 1.00 | |
sequential read binary | 100438 | 5067 | 19.82 | |
read disk write mem | sequential read float large | 2124 | 3292 | 0.65 |
sequential read int huge | 3180 | 3300 | 0.96 | |
sequential read int medium | 2164 | 5910 | 0.37 | |
sequential read int small | 1456 | 6923 | 0.21 | |
GEOMETRIC MEAN | 2181 | 2207 | 0.99 | |
MAX RATIO | 100438 | 6923 | 19.82 |
Observation: Despite the edge of XFS with a single reader, ext4 outperforms XFS sequential read if multiple kdb+ processes are reading various data in parallel. This scenario is common in a pool of HDBs where multiple concurrent queries with non-selective filters result in numerous parallel sequential reads from disk.
XFS (MB/s) | ext4 (MB/s) | ratio | ||
---|---|---|---|---|
Test Type | Test | |||
read mem | mmap, random read 1M | 26736 | 41860 | 0.64 |
mmap, random read 4k | 3370 | 5458 | 0.62 | |
mmap, random read 64k | 13300 | 22897 | 0.58 | |
random read 1M | 124141 | 127773 | 0.97 | |
random read 4k | 92490 | 92293 | 1.00 | |
random read 64k | 156725 | 162621 | 0.96 | |
read mem write mem | sequential read binary | 27670 | 24988 | 1.11 |
sequential reread float large | 1022969 | 1073493 | 0.95 | |
sequential reread int huge | 1358929 | 1360467 | 1.00 | |
sequential reread int medium | 544712 | 581002 | 0.94 | |
sequential reread int small | 124434 | 124221 | 1.00 | |
GEOMETRIC MEAN | 94900 | 109235 | 0.87 | |
MAX RATIO | 1358929 | 1360467 | 1.11 |
Observation: ext4 outperforms XFS in random reads if the data is coming from page cache.
Test 2: Ubuntu with Samsung NVMe SSD (PCIe 5.0)¶
Test setup¶
Component | Specification |
---|---|
Storage | * Type: 3.84 TB SAMSUNG MZWLO3T8HCLS-00A07 * Interface: PCIe 5.0 x4 * Sequential R/W: 14000 MB/s / 6000 MB/s * Random Read: 2500K IOPS (4K) * Latency: Random Read: 215 µs (4K Blocks), Sequential Read /Write: 436 µs / 1350 µs (4K) |
CPU | AMD EPYC 9575F (Turin), 2 sockets, 64 cores per socket, 2 threads per core, 256 MB L3 cache, SMT off |
Memory | 2.2 TB, DDR5@6400 MT/s (12 channels per socket) |
OS | Ubuntu 24.04.3 LTS (kernel: 6.8.0-83-generic) |
The values presented in the result tables represent throughput ratios to XFS throughput (e.g., a value of 2 indicates XFS was twice as fast).
Write¶
Single kdb+ process:¶
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read mem write disk | add attribute | 1.1 | 1.1 | 1.0 | 1.0 |
read write disk | disk sort | 1.0 | 1.1 | 1.0 | 1.0 |
write disk | open append mid float, sync once | 1.7 | 1.7 | 2.0 | 1.0 |
open append mid sym, sync once | 1.2 | 1.1 | 1.2 | 1.0 | |
write float large | 2.7 | 1.8 | 2.5 | 2.3 | |
write int huge | 2.9 | 1.9 | 2.6 | 2.2 | |
write int medium | 2.6 | 1.8 | 1.8 | 1.3 | |
write int small | 0.6 | 0.6 | 0.6 | 1.0 | |
write sym large | 1.0 | 0.9 | 0.9 | 1.7 | |
GEOMETRIC MEAN | 1.5 | 1.2 | 1.4 | 1.3 | |
MAX RATIO | 2.9 | 1.9 | 2.6 | 2.3 |
Observation: xfs outperforms all other file systems if a single kdb+ process writes the data. The only notable weakness for XFS was in writing small files.
The performance of the less critical write operations are below.
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
write disk | append small, sync once | 2.2 | 2.0 | 1.4 | 2.2 |
append tiny, sync once | 1.4 | 1.2 | 1.0 | 1.7 | |
open append small, sync once | 1.5 | 1.7 | 2.0 | 0.5 | |
open append tiny, sync once | 0.9 | 0.7 | 0.8 | 1.6 | |
open replace tiny, sync once | 2.1 | 2.1 | 1.2 | 2.8 | |
sync float large | 1.2 | 1.4 | 1.2 | 1.6 | |
sync int huge | 1.1 | 1.2 | 1.7 | 0.3 | |
sync int medium | 1.5 | 1.4 | 1.3 | 2.0 | |
sync int small | 0.8 | 1.8 | 0.9 | 0.9 | |
sync sym large | 1.3 | 1.5 | 1.3 | 4.5 | |
sync table after sort | 1.0 | 1.0 | 3.1 | 6.1 | |
GEOMETRIC MEAN | 1.3 | 1.4 | 1.3 | 1.6 | |
MAX RATIO | 2.2 | 2.1 | 3.1 | 6.1 |
64 kdb+ processes:¶
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read mem write disk | add attribute | 2.9 | 2.9 | 2.9 | 1.8 |
read write disk | disk sort | 2.1 | 2.2 | 2.0 | 1.7 |
write disk | open append mid float, sync once | 1.1 | 1.1 | 2.7 | 1.0 |
open append mid sym, sync once | 1.1 | 1.1 | 2.1 | 1.7 | |
write float large | 3.0 | 2.9 | 40.3 | 26.8 | |
write int huge | 1.0 | 2.0 | 4.1 | 1.5 | |
write int medium | 3.2 | 2.9 | 47.1 | 3.8 | |
write int small | 1.4 | 5.3 | 16.4 | 2.1 | |
write sym large | 1.2 | 1.1 | 9.7 | 9.5 | |
GEOMETRIC MEAN | 1.7 | 2.1 | 7.0 | 2.9 | |
MAX RATIO | 3.2 | 5.3 | 47.1 | 26.8 |
Observation: XFS significantly outperforms all other file systems. Its margin can be significant, for example persisting a large float vector (the set
operation) is over 27 times faster on XFS than on ZFS.
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
write disk | append small, sync once | 1.4 | 1.5 | 3.8 | 2.5 |
append tiny, sync once | 0.8 | 1.3 | 7.3 | 1.2 | |
open append small, sync once | 1.0 | 1.0 | 3.4 | 0.8 | |
open append tiny, sync once | 0.8 | 1.5 | 11.5 | 1.2 | |
open replace tiny, sync once | 3.2 | 24.0 | 4.8 | 6.2 | |
sync float large | 1.0 | 1.0 | 0.6 | 0.5 | |
sync int huge | 1.0 | 1.3 | 0.5 | 0.0 | |
sync int medium | 1.0 | 0.7 | 3.0 | 1.3 | |
sync int small | 0.8 | 1.2 | 3.0 | 1.1 | |
sync sym large | 1.0 | 0.9 | 0.6 | 1.2 | |
sync table after sort | 0.6 | 44.7 | 11.3 | 11.1 | |
GEOMETRIC MEAN | 1.0 | 2.1 | 2.8 | 1.1 | |
MAX RATIO | 3.2 | 44.7 | 11.5 | 11.1 |
Read¶
Single kdb+ process:¶
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read disk | mmap, random read 1M | 1.1 | 4.4 | 1.1 | 2.1 |
mmap, random read 4k | 1.0 | 1.1 | 1.0 | 0.9 | |
mmap, random read 64k | 0.9 | 6.4 | 1.0 | 0.8 | |
random read 1M | 1.0 | 4.4 | 1.1 | 2.3 | |
random read 4k | 1.0 | 1.2 | 1.1 | 0.8 | |
random read 64k | 0.9 | 6.7 | 1.0 | 0.8 | |
sequential read binary | 2.1 | 6.4 | 2.0 | 2.2 | |
read disk write mem | sequential read float large | 1.1 | 0.8 | 1.3 | 3.3 |
sequential read int huge | 1.2 | 0.8 | 1.2 | 4.7 | |
sequential read int medium | 1.2 | 1.7 | 1.2 | 5.8 | |
sequential read int small | 0.7 | 0.9 | 1.3 | 1.6 | |
GEOMETRIC MEAN | 1.1 | 2.2 | 1.2 | 1.8 | |
MAX RATIO | 2.1 | 6.7 | 2.0 | 5.8 |
Observation: XFS excels in reading from disk if there is a single kdb+ reader.
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read mem | mmap, random read 1M | 1.2 | 1.3 | 1.4 | 1.4 |
mmap, random read 4k | 1.1 | 1.1 | 1.0 | 1.0 | |
mmap, random read 64k | 1.1 | 1.1 | 1.1 | 1.1 | |
random read 1M | 1.0 | 1.2 | 1.1 | 1.2 | |
random read 4k | 1.0 | 1.1 | 1.1 | 0.9 | |
random read 64k | 1.0 | 1.1 | 1.2 | 1.1 | |
read mem write mem | sequential read binary | 0.9 | 0.9 | 0.9 | 1.1 |
sequential reread float large | 1.7 | 2.1 | 2.5 | 2.5 | |
sequential reread int huge | 2.0 | 2.2 | 1.9 | 2.3 | |
sequential reread int medium | 1.6 | 2.7 | 1.5 | 2.4 | |
sequential reread int small | 0.7 | 0.7 | 0.6 | 1.1 | |
GEOMETRIC MEAN | 1.1 | 1.3 | 1.2 | 1.4 | |
MAX RATIO | 2.0 | 2.7 | 2.5 | 2.5 |
Observation: XFS excels in reading from page cache if there is a single kdb+ reader.
64 kdb+ processes:¶
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read disk | mmap, random read 1M | 1.0 | 2.3 | 0.9 | 0.4 |
mmap, random read 4k | 1.0 | 1.1 | 0.9 | 0.7 | |
mmap, random read 64k | 0.8 | 1.5 | 0.7 | 0.2 | |
random read 1M | 1.0 | 2.2 | 0.9 | 0.5 | |
random read 4k | 1.0 | 1.0 | 0.9 | 0.5 | |
random read 64k | 0.8 | 1.5 | 0.8 | 0.2 | |
sequential read binary | 9.1 | 8.6 | 9.0 | 6.5 | |
read disk write mem | sequential read float large | 0.8 | 0.7 | 0.8 | 0.5 |
sequential read int huge | 0.9 | 0.8 | 0.9 | 0.6 | |
sequential read int medium | 1.0 | 0.8 | 0.9 | 1.1 | |
sequential read int small | 1.2 | 0.5 | 2.0 | 1.1 | |
GEOMETRIC MEAN | 1.1 | 1.3 | 1.2 | 0.6 | |
MAX RATIO | 9.1 | 8.6 | 9.0 | 6.5 |
Observation: ZFS excels in reading from disk if many kdb+ processes (HDB pool) reading the data in parallel. The only exception is read binary (read1
) but this is not considered a typical query pattern in a production kdb+ environment
ext4 | Btrfs | F2FS | ZFS | ||
---|---|---|---|---|---|
Test Type | Test | ||||
read mem | mmap, random read 1M | 1.1 | 1.1 | 1.1 | 1.2 |
mmap, random read 4k | 1.0 | 1.0 | 1.0 | 2.2 | |
mmap, random read 64k | 1.0 | 1.0 | 1.0 | 2.1 | |
random read 1M | 1.1 | 1.1 | 1.1 | 1.1 | |
random read 4k | 1.0 | 1.0 | 1.0 | 0.9 | |
random read 64k | 1.1 | 1.1 | 1.1 | 1.1 | |
read mem write mem | sequential read binary | 1.0 | 1.0 | 1.0 | 1.1 |
sequential reread float large | 1.9 | 1.9 | 2.3 | 5.4 | |
sequential reread int huge | 1.8 | 1.8 | 2.2 | 2.1 | |
sequential reread int medium | 2.0 | 2.2 | 2.0 | 8.3 | |
sequential reread int small | 1.1 | 1.1 | 6.8 | 1.8 | |
GEOMETRIC MEAN | 1.2 | 1.2 | 1.5 | 1.9 | |
MAX RATIO | 2.0 | 2.2 | 6.8 | 8.3 |
Observation: The performance advantage of ZFS vanishes entirely when data is served from the page cache.