Skip to content

Choosing the Right File System for kdb+: A Case Study with KX Nano

The performance of a kdb+ system is critically dependent on the throughput and latency of its underlying storage. In a Linux environment, the file system is the foundational layer that enables data management on a given storage partition.

This paper presents a comparative performance analysis of various file systems using the KX Nano benchmarking utility. The evaluation was conducted across two distinct test environments, each with different operating systems and storage bandwidth (6500 vs 14000 MB/s) and IOPS (700K vs 2500K).

Summary

No single file system demonstrated superior performance across all tested metrics; the optimal choice depends on the primary workload characteristics. The optimal choice depends on the specific operations you need to accelerate. Furthermore, the host operating system (e.g., Red Hat Enterprise Linux vs. Ubuntu) constrains the set of available and supported file systems.

Our key recommendations are as follows:

  • For write-intensive workloads where data ingestion rate is the primary driver, XFS is the recommended file system.

    • XFS consistently demonstrated the highest write throughput, particularly under concurrent write scenarios. For instance, a kdb+ set operation on a large float vector (31 million elements) executed 5.5x faster on XFS than on ext4 and nearly 50x faster than on ZFS.
    • This superior write performance translates to significant speedups in other I/O-heavy operations. Parallel disk sorting was 3.4x faster, and applying the p# (parted) attribute was 6.6x faster on XFS compared to ext4. Consequently, workloads like end-of-day (EOD) data processing will achieve the best performance with XFS.
  • For read-intensive workloads where query latency is paramount, the choice is nuanced:

    • On Red Hat Enterprise Linux 9, ext4 holds a slight advantage for queries dominated by sequential reads. For random reads, its performance was comparable to XFS.
    • On Ubuntu, ZFS excelled in random read scenarios. However, this performance advantage diminished significantly if the requested data was already available in the operating system's page cache.

kdb+ also supports tiering. For tiered data architectures (e.g., hot, mid, cold tiers), a hybrid approach is advisable.

  • Hot tier: Data is frequently queried and often resides in the page cache. For this tier, a read-optimized file system like ext4 or XFS is effective.
  • Mid/Cold Tier: Data is queried less often, meaning reads are more likely to come directly from storage. In this scenario, ZFS's strong random read performance from storage provides a distinct advantage.

Disclaimer: These guidelines are specific to the tested hardware and workloads. We strongly encourage readers to perform their own benchmarks that reflect their specific application profiles. To facilitate this, the benchmarking suite used in this study is included with the KX Nano codebase, available on GitHub.

Details

All benchmarks were executed in September 2025 using kdb+ 4.1 (2025.04.28) and KX Nano 6.4.1. Each kdb+ process was configured to use 8 worker threads (-s 8).

We used the default vector length of KX Nano, which are:

  * small: 63k
  * medium: 127k
  * large: 31m
  * huge: 1000m

Test 1: Red Hat Enterprise Linux 9 with Intel NVMe SSD (PCIe 4.0)

This first test configuration utilized an Intel NVMe SSD on a server running Red Hat Enterprise Linux (RHEL) 9.3. Following RHEL 9's official supported file systems, the comparison was limited to ext4 and XFS.

Test Setup

Component Specification
Storage * Type: 3.84 TB Intel SSD D7-P5510
* Interface: PCIe 4.0 x4, NVMe
* Sequential R/W: 6500 MB/s / 3400 MB/s
* Random Read: 700K IOPS (4K)
* Latency: Random Read: 82 µs (4K), Sequential Read / Write: 10 µs / 13 µs (4K)
CPU Intel(R) Xeon(R) 6747P (2 sockets, 48 cores per socket, 2 threads per core)
Memory 502GiB, DDR5 @ 6400 MT/s
OS RHEL 9.3 (kernel 5.14.0-362.8.1.el9_3.x86_64)

The values presented in the result tables represent throughput in MB/s, where higher figures indicate better performance. The "Ratio" column quantifies the performance of XFS relative to ext4 (e.g., a value of 2 indicates XFS was twice as fast).

Write

We split the write results into two tables. The first table contains the "high-impact" tests and should be considered with more weight. These test are related to EOD (write, sort, applying attribute) and EOI (append) works that is often the bottleneck of ingestion.

Single kdb+ process:

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read mem write disk add attribute 261 233 1.12
read write disk disk sort 106 97 1.09
write disk open append mid float, sync once 1030 877 1.18
open append mid sym, sync once 924 853 1.08
write float large 2098 1304 1.61
write int huge 3367 2170 1.55
write int medium 1309 729 1.80
write int small 474 380 1.25
write sym large 913 862 1.06
GEOMETRIC MEAN 779 608 1.28
MAX RATIO 3367 2170 1.80

Observation: XFS is almost always faster than ext4. In critical tests, the advantage is almost 30% on average with a maximal difference 80%.

The performance of the less critical write operations is below. The Linux sync command synchronizes cached data to permanent storage. This data includes modified superblocks, modified inodes, delayed reads and writes, and others. EOD and EOI solutions often use sync operations to improve resiliency by ensuring data is persisted to storage and not held temporarily in caches. The sync operation is typically much faster than the set command because Linux executes it behind the scenes (compare the speed of write float large and sync float large).

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
write disk append small, sync once 750 482 1.55
append tiny, sync once 554 366 1.51
open append small, sync once 931 813 1.14
open append tiny, sync once 253 211 1.20
open replace tiny, sync once 139 96 1.45
sync column after parted attribute 177163 30758180 0.01
sync float large 143720 145828 0.99
sync int huge 78917 77110 1.02
sync int medium 40977 40090 1.02
sync int small 6730 6244 1.08
sync sym large 211642 180587 1.17
sync table after sort 54142270 59348750 0.91
GEOMETRIC MEAN 14500 19311 0.75
MAX RATIO 54142270 59348750 1.55

64 kdb+ processes:

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read mem write disk add attribute 12834 1948 6.59
read write disk disk sort 3059 914 3.35
write disk open append mid float, sync once 1933 1378 1.40
open append mid sym, sync once 2309 2065 1.12
write float large 63140 10817 5.84
write int huge 2449 2665 0.92
write int medium 40098 6453 6.21
write int small 17609 4931 3.57
write sym large 57674 14279 4.04
GEOMETRIC MEAN 10110 3434 2.94
MAX RATIO 63140 14279 6.59

Observation: The results show that XFS consistently and significantly outperformed ext4 in write-intensive operations. In critical ingestion and EOD tasks, write throughput on XFS was on average 3 times higher. This advantage peaked in specific operations, such as applying the p# attribute, where XFS was a remarkable 6.6x faster than ext4.

The performance of the less critical write operations are below.

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
write disk append small, sync once 1657 1955 0.85
append tiny, sync once 2429 2083 1.17
open append small, sync once 1384 1432 0.97
open append tiny, sync once 2407 1361 1.77
open replace tiny, sync once 531 1035 0.51
sync column after parted attribute 132748 197769600 0.00
sync float large 91966 93039 0.99
sync int huge 216302 217269 1.00
sync int medium 177815 127043 1.40
sync int small 112098 105078 1.07
sync sym large 137169 140796 0.97
sync table after sort 161152200 423427500 0.38
GEOMETRIC MEAN 37714 73810 0.51
MAX RATIO 161152200 423427500 1.77

There were two minor test cases where ext4 was faster. The first, "replace tiny", involves overwriting a very small vector. This is a very fast operation anyway, so the discrepancy is negligible. Also, the operation is not representative of typical, performance-critical kdb+ workloads. The second, "sync column after parted attribute/sort", also showed ext4 ahead. However, the absolute time difference was minimal, making its impact on overall application performance insignificant in most practical scenarios.

Read

We divide read tests into two categories depending on the source of the data, disk vs page cache (memory).

Single kdb+ process:

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read disk mmap, random read 1M 592 600 0.99
mmap, random read 4k 19 19 1.01
mmap, random read 64k 198 196 1.01
random read 1M 601 557 1.08
random read 4k 20 19 1.06
random read 64k 204 188 1.09
sequential read binary 697 698 1.00
read disk write mem sequential read float large 2012 799 2.52
sequential read int huge 2004 961 2.09
sequential read int medium 658 645 1.02
sequential read int small 295 246 1.20
GEOMETRIC MEAN 315 261 1.21
MAX RATIO 2012 961 2.52

Observation: XFS reads the data faster from disk sequentially than ext4. Apart from this, the differences are negligible.

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read mem mmap, random read 1M 2519 2465 1.02
mmap, random read 4k 264 249 1.06
mmap, random read 64k 1713 1755 0.98
random read 1M 3021 3024 1.00
random read 4k 1245 1255 0.99
random read 64k 3003 3011 1.00
read mem write mem sequential read binary 2544 2520 1.01
sequential reread float large 14883 15071 0.99
sequential reread int huge 33797 33874 1.00
sequential reread int medium 7801 8176 0.95
sequential reread int small 2148 2047 1.05
GEOMETRIC MEAN 3123 3112 1.00
MAX RATIO 33797 33874 1.06

Observation: There is no performance difference between XFS and ext4 with a single kdb+ reader if the data is coming from page cache.

64 kdb+ processes:

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read disk mmap, random read 1M 2812 2819 1.00
mmap, random read 4k 515 544 0.95
mmap, random read 64k 1068 1075 0.99
random read 1M 2779 2784 1.00
random read 4k 543 546 0.99
random read 64k 1065 1070 1.00
sequential read binary 100438 5067 19.82
read disk write mem sequential read float large 2124 3292 0.65
sequential read int huge 3180 3300 0.96
sequential read int medium 2164 5910 0.37
sequential read int small 1456 6923 0.21
GEOMETRIC MEAN 2181 2207 0.99
MAX RATIO 100438 6923 19.82

Observation: Despite the edge of XFS with a single reader, ext4 outperforms XFS sequential read if multiple kdb+ processes are reading various data in parallel. This scenario is common in a pool of HDBs where multiple concurrent queries with non-selective filters result in numerous parallel sequential reads from disk.

    XFS (MB/s) ext4 (MB/s) ratio
Test Type Test      
read mem mmap, random read 1M 26736 41860 0.64
mmap, random read 4k 3370 5458 0.62
mmap, random read 64k 13300 22897 0.58
random read 1M 124141 127773 0.97
random read 4k 92490 92293 1.00
random read 64k 156725 162621 0.96
read mem write mem sequential read binary 27670 24988 1.11
sequential reread float large 1022969 1073493 0.95
sequential reread int huge 1358929 1360467 1.00
sequential reread int medium 544712 581002 0.94
sequential reread int small 124434 124221 1.00
GEOMETRIC MEAN 94900 109235 0.87
MAX RATIO 1358929 1360467 1.11

Observation: ext4 outperforms XFS in random reads if the data is coming from page cache.

Test 2: Ubuntu with Samsung NVMe SSD (PCIe 5.0)

Test setup

Component Specification
Storage * Type: 3.84 TB SAMSUNG MZWLO3T8HCLS-00A07
* Interface: PCIe 5.0 x4
* Sequential R/W: 14000 MB/s / 6000 MB/s
* Random Read: 2500K IOPS (4K)
* Latency: Random Read: 215 µs (4K Blocks), Sequential Read /Write: 436 µs / 1350 µs (4K)
CPU AMD EPYC 9575F (Turin), 2 sockets, 64 cores per socket, 2 threads per core, 256 MB L3 cache, SMT off
Memory 2.2 TB, DDR5@6400 MT/s (12 channels per socket)
OS Ubuntu 24.04.3 LTS (kernel: 6.8.0-83-generic)

The values presented in the result tables represent throughput ratios to XFS throughput (e.g., a value of 2 indicates XFS was twice as fast).

Write

Single kdb+ process:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem write disk add attribute 1.1 1.1 1.0 1.0
read write disk disk sort 1.0 1.1 1.0 1.0
write disk open append mid float, sync once 1.7 1.7 2.0 1.0
open append mid sym, sync once 1.2 1.1 1.2 1.0
write float large 2.7 1.8 2.5 2.3
write int huge 2.9 1.9 2.6 2.2
write int medium 2.6 1.8 1.8 1.3
write int small 0.6 0.6 0.6 1.0
write sym large 1.0 0.9 0.9 1.7
GEOMETRIC MEAN 1.5 1.2 1.4 1.3
MAX RATIO 2.9 1.9 2.6 2.3

Observation: xfs outperforms all other file systems if a single kdb+ process writes the data. The only notable weakness for XFS was in writing small files.

The performance of the less critical write operations are below.

    ext4 Btrfs F2FS ZFS
Test Type Test        
write disk append small, sync once 2.2 2.0 1.4 2.2
append tiny, sync once 1.4 1.2 1.0 1.7
open append small, sync once 1.5 1.7 2.0 0.5
open append tiny, sync once 0.9 0.7 0.8 1.6
open replace tiny, sync once 2.1 2.1 1.2 2.8
sync float large 1.2 1.4 1.2 1.6
sync int huge 1.1 1.2 1.7 0.3
sync int medium 1.5 1.4 1.3 2.0
sync int small 0.8 1.8 0.9 0.9
sync sym large 1.3 1.5 1.3 4.5
sync table after sort 1.0 1.0 3.1 6.1
GEOMETRIC MEAN 1.3 1.4 1.3 1.6
MAX RATIO 2.2 2.1 3.1 6.1

64 kdb+ processes:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem write disk add attribute 2.9 2.9 2.9 1.8
read write disk disk sort 2.1 2.2 2.0 1.7
write disk open append mid float, sync once 1.1 1.1 2.7 1.0
open append mid sym, sync once 1.1 1.1 2.1 1.7
write float large 3.0 2.9 40.3 26.8
write int huge 1.0 2.0 4.1 1.5
write int medium 3.2 2.9 47.1 3.8
write int small 1.4 5.3 16.4 2.1
write sym large 1.2 1.1 9.7 9.5
GEOMETRIC MEAN 1.7 2.1 7.0 2.9
MAX RATIO 3.2 5.3 47.1 26.8

Observation: XFS significantly outperforms all other file systems. Its margin can be significant, for example persisting a large float vector (the set operation) is over 27 times faster on XFS than on ZFS.

    ext4 Btrfs F2FS ZFS
Test Type Test        
write disk append small, sync once 1.4 1.5 3.8 2.5
append tiny, sync once 0.8 1.3 7.3 1.2
open append small, sync once 1.0 1.0 3.4 0.8
open append tiny, sync once 0.8 1.5 11.5 1.2
open replace tiny, sync once 3.2 24.0 4.8 6.2
sync float large 1.0 1.0 0.6 0.5
sync int huge 1.0 1.3 0.5 0.0
sync int medium 1.0 0.7 3.0 1.3
sync int small 0.8 1.2 3.0 1.1
sync sym large 1.0 0.9 0.6 1.2
sync table after sort 0.6 44.7 11.3 11.1
GEOMETRIC MEAN 1.0 2.1 2.8 1.1
MAX RATIO 3.2 44.7 11.5 11.1

Read

Single kdb+ process:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read disk mmap, random read 1M 1.1 4.4 1.1 2.1
mmap, random read 4k 1.0 1.1 1.0 0.9
mmap, random read 64k 0.9 6.4 1.0 0.8
random read 1M 1.0 4.4 1.1 2.3
random read 4k 1.0 1.2 1.1 0.8
random read 64k 0.9 6.7 1.0 0.8
sequential read binary 2.1 6.4 2.0 2.2
read disk write mem sequential read float large 1.1 0.8 1.3 3.3
sequential read int huge 1.2 0.8 1.2 4.7
sequential read int medium 1.2 1.7 1.2 5.8
sequential read int small 0.7 0.9 1.3 1.6
GEOMETRIC MEAN 1.1 2.2 1.2 1.8
MAX RATIO 2.1 6.7 2.0 5.8

Observation: XFS excels in reading from disk if there is a single kdb+ reader.

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem mmap, random read 1M 1.2 1.3 1.4 1.4
mmap, random read 4k 1.1 1.1 1.0 1.0
mmap, random read 64k 1.1 1.1 1.1 1.1
random read 1M 1.0 1.2 1.1 1.2
random read 4k 1.0 1.1 1.1 0.9
random read 64k 1.0 1.1 1.2 1.1
read mem write mem sequential read binary 0.9 0.9 0.9 1.1
sequential reread float large 1.7 2.1 2.5 2.5
sequential reread int huge 2.0 2.2 1.9 2.3
sequential reread int medium 1.6 2.7 1.5 2.4
sequential reread int small 0.7 0.7 0.6 1.1
GEOMETRIC MEAN 1.1 1.3 1.2 1.4
MAX RATIO 2.0 2.7 2.5 2.5

Observation: XFS excels in reading from page cache if there is a single kdb+ reader.

64 kdb+ processes:

    ext4 Btrfs F2FS ZFS
Test Type Test        
read disk mmap, random read 1M 1.0 2.3 0.9 0.4
mmap, random read 4k 1.0 1.1 0.9 0.7
mmap, random read 64k 0.8 1.5 0.7 0.2
random read 1M 1.0 2.2 0.9 0.5
random read 4k 1.0 1.0 0.9 0.5
random read 64k 0.8 1.5 0.8 0.2
sequential read binary 9.1 8.6 9.0 6.5
read disk write mem sequential read float large 0.8 0.7 0.8 0.5
sequential read int huge 0.9 0.8 0.9 0.6
sequential read int medium 1.0 0.8 0.9 1.1
sequential read int small 1.2 0.5 2.0 1.1
GEOMETRIC MEAN 1.1 1.3 1.2 0.6
MAX RATIO 9.1 8.6 9.0 6.5

Observation: ZFS excels in reading from disk if many kdb+ processes (HDB pool) reading the data in parallel. The only exception is read binary (read1) but this is not considered a typical query pattern in a production kdb+ environment

    ext4 Btrfs F2FS ZFS
Test Type Test        
read mem mmap, random read 1M 1.1 1.1 1.1 1.2
mmap, random read 4k 1.0 1.0 1.0 2.2
mmap, random read 64k 1.0 1.0 1.0 2.1
random read 1M 1.1 1.1 1.1 1.1
random read 4k 1.0 1.0 1.0 0.9
random read 64k 1.1 1.1 1.1 1.1
read mem write mem sequential read binary 1.0 1.0 1.0 1.1
sequential reread float large 1.9 1.9 2.3 5.4
sequential reread int huge 1.8 1.8 2.2 2.1
sequential reread int medium 2.0 2.2 2.0 8.3
sequential reread int small 1.1 1.1 6.8 1.8
GEOMETRIC MEAN 1.2 1.2 1.5 1.9
MAX RATIO 2.0 2.2 6.8 8.3

Observation: The performance advantage of ZFS vanishes entirely when data is served from the page cache.